KR20230077280A

KR20230077280A - Method for learning prediction model for regression prediction of time series data and method for predicting using prediction model

Info

Publication number: KR20230077280A
Application number: KR1020210164395A
Authority: KR
Inventors: 성현중; 이원석; 전소현; 구윤정; 김덕형; 추가영; 박상종; 이명호
Original assignee: (주)브릭
Priority date: 2021-11-25
Filing date: 2021-11-25
Publication date: 2023-06-01

Abstract

본 발명의 바람직한 실시예에 따른 시계열 데이터 회귀 예측을 위한 예측 모델의 학습 방법 및 예측 모델을 이용한 예측 방법은, 시계열 데이터의 회귀 예측을 위한 스태킹 모델(stacking model)인 예측 모델을 학습하고, 학습되어 구축된 예측 모델을 이용하여 품질을 예측하며 예측 모델을 자동으로 업데이트함으로써, 실시간 예측에 방해가 되지 않으면서도 예측 모델의 예측 성능이 저하되는 현상을 방지할 수 있다.A method of learning a predictive model for regression prediction of time series data and a prediction method using the prediction model according to a preferred embodiment of the present invention learns a prediction model, which is a stacking model for regression prediction of time series data, and learns the By predicting quality using the built prediction model and automatically updating the prediction model, it is possible to prevent degradation of prediction performance of the prediction model without interfering with real-time prediction.

Description

Method for learning prediction model for regression prediction of time series data and method for predicting using prediction model}

본 발명은 시계열 데이터 회귀 예측을 위한 예측 모델의 학습 방법 및 예측 모델을 이용한 예측 방법에 관한 것으로서, 더욱 상세하게는 시계열 데이터의 회귀 예측을 위한 모델을 학습하고, 학습되어 구축된 모델을 이용하여 품질을 예측하는, 방법에 관한 것이다.The present invention relates to a method for learning a predictive model for regression prediction of time series data and a prediction method using the predictive model, and more particularly, to a method for learning a model for regression prediction of time series data and using the model built by learning quality control. It is about how to predict .

시계열 데이터를 이용한 회귀 예측은 과거 측정값 데이터를 학습하여 결과값을 예측하는 것이다. 이때, 측정 및 수집 이슈로 충분한 학습 데이터가 없거나 데이터 샘플링 문제로 모델 과적합이 일어날 수 있다. 또한, 성능이 좋고 고도화된 모델을 사용해도 실시간 데이터는 그 분포나 경향이 계속 바뀔 수 있으므로 모델의 성능이 저하될 수 있다. 따라서, 가벼우면서도 과적합을 극복할 수 있는 모델링 기법과 시스템의 실시간성을 저해하지 않고 데이터의 변화를 모델에 반영할 수 있는 자동 업데이트의 방법의 개발이 필요하다.Regression prediction using time series data is to predict the result value by learning past measured value data. At this time, there may be insufficient training data due to measurement and collection issues, or model overfitting may occur due to data sampling problems. In addition, even if a high-performance and highly advanced model is used, the distribution or trend of real-time data may continue to change, so the performance of the model may deteriorate. Therefore, it is necessary to develop a lightweight modeling technique that can overcome overfitting and an automatic update method that can reflect changes in data to the model without compromising real-time performance of the system.

본 발명이 이루고자 하는 목적은, 시계열 데이터의 회귀 예측을 위한 스태킹 모델(stacking model)인 예측 모델을 학습하고, 학습되어 구축된 예측 모델을 이용하여 품질을 예측하며 예측 모델을 자동으로 업데이트하는, 시계열 데이터 회귀 예측을 위한 예측 모델의 학습 방법 및 예측 모델을 이용한 예측 방법을 제공하는 데 있다.An object to be achieved by the present invention is to learn a prediction model, which is a stacking model for regression prediction of time series data, predict quality using the learned and built prediction model, and automatically update the prediction model. An object of the present invention is to provide a method of learning a predictive model for data regression prediction and a method of predicting using the predictive model.

본 발명의 명시되지 않은 또 다른 목적들은 하기의 상세한 설명 및 그 효과로부터 용이하게 추론할 수 있는 범위 내에서 추가적으로 고려될 수 있다.Other non-specified objects of the present invention may be additionally considered within the scope that can be easily inferred from the following detailed description and effects thereof.

상기의 목적을 달성하기 위한 본 발명의 바람직한 실시예에 따른 시계열 데이터 회귀 예측을 위한 예측 모델의 학습 방법은, 제품의 생산 과정에서 획득한 시계열 데이터를 기반으로 제1 학습 데이터를 획득하는 단계; 상기 제1 학습 데이터 및 상기 제1 학습 데이터에 대응되는 상기 제품의 품질 측정값을 기반으로. 상기 제1 학습 데이터를 입력으로 하고 상기 제품의 품질 예측값을 출력으로 하는 미리 설정된 개수의 서브 모델을 학습하는 단계; 미리 설정된 개수의 상기 서브 모델의 출력인 미리 설정된 개수의 상기 품질 예측값을 기반으로 제2 학습 데이터를 획득하는 단계; 상기 제2 학습 데이터 및 상기 품질 측정값을 기반으로, 상기 품질 예측값을 입력으로 하고 상기 제품의 최종 품질 예측값을 출력으로 하는 하나의 메타 모델을 학습하는 단계; 및 미리 설정된 개수의 상기 서브 모델 및 하나의 상기 메타 모델을 기반으로 상기 시계열 데이터의 회귀 예측을 위한 하나의 예측 모델을 획득하는 단계;를 포함한다.In order to achieve the above object, a method for learning a predictive model for regression prediction of time series data according to a preferred embodiment of the present invention includes obtaining first learning data based on time series data obtained in a production process of a product; Based on the first learning data and the quality measurement value of the product corresponding to the first learning data. learning a preset number of sub-models using the first learning data as an input and outputting a quality prediction value of the product; obtaining second training data based on a preset number of quality prediction values that are outputs of a preset number of sub-models; learning a meta-model having the quality prediction value as an input and a final quality prediction value of the product as an output, based on the second learning data and the quality measurement value; and obtaining one prediction model for regression prediction of the time series data based on a preset number of sub-models and one meta-model.

여기서, 상기 서브 모델 학습 단계는, 상기 제1 학습 데이터 및 상기 품질 측정값을 기반으로 K-fold 교차 검증 방법을 이용하여 복수개의 상기 서브 모델을 학습하고, 복수개의 상기 서브 모델의 성능 평가 결과를 기반으로 미리 설정된 개수의 상기 서브 모델을 획득하는 것으로 이루어질 수 있다.Here, in the sub-model learning step, a plurality of sub-models are learned using a K-fold cross-validation method based on the first training data and the quality measurement value, and performance evaluation results of the plurality of sub-models are evaluated. Based on this, a preset number of the sub-models may be obtained.

여기서, 상기 서브 모델 학습 단계는, 회귀 예측 모델 성능 평가 방법을 이용하여 상기 서브 모델의 출력인 상기 품질 예측값 및 상기 품질 측정값을 기반으로 복수개의 상기 서브 모델 각각의 성능을 평가하고, 복수개의 상기 서브 모델의 성능 평가 결과를 기반으로 미리 설정된 개수의 상기 서브 모델을 획득하는 것으로 이루어질 수 있다.Here, the sub-model learning step evaluates the performance of each of a plurality of the sub-models based on the quality prediction value and the quality measurement value, which are outputs of the sub-model, using a regression prediction model performance evaluation method, and It may consist of acquiring a preset number of sub-models based on performance evaluation results of the sub-models.

여기서, 상기 회귀 예측 모델 성능 평가 방법은, ME(mean of erros), RMSE(root mean of squared erros), MAE(mean of absolute errors), MPE(mean of percentage errors), MAPE(mean of absolute percentage errors) 및 MASE(mean of absolute scaled errors) 중 적어도 하나를 이용하여 회귀 예측 모델을 평가하는 방법일 수 있다.Here, the regression prediction model performance evaluation method includes mean of erros (ME), root mean of squared erros (RMSE), mean of absolute errors (MAE), mean of percentage errors (MPE), and mean of absolute percentage errors (MAPE) ) and mean of absolute scaled errors (MASE) to evaluate the regression prediction model.

여기서, 상기 메타 모델 학습 단계는, 상기 제2 학습 데이터 및 상기 품질 측정값을 기반으로 K-fold 교차 검증 방법을 이용하여 하나의 상기 메타 모델을 학습하는 것으로 이루어질 수 있다.Here, the meta-model learning step may include learning one meta-model using a K-fold cross-validation method based on the second training data and the quality measurement value.

여기서, 상기 예측 모델 획득 단계는, 미리 설정된 개수의 상기 서브 모델과 하나의 상기 메타 모델을 조합하여 하나의 상기 예측 모델을 획득하는 것으로 이루어질 수 있다.Here, the step of acquiring the predictive model may include obtaining one predictive model by combining a preset number of sub-models and one meta-model.

여기서, 상기 제1 학습 데이터 획득 단계는, 복수개의 시계열 변수 각각에 대한 복수개의 시계열 샘플로 이루어진 상기 시계열 데이터를 기반으로 복수개의 상기 시계열 변수 각각에 대하여 상기 시계열 변수의 통계값을 획득하고, 복수개의 상기 시계열 변수 각각에 대한 통계값을 기반으로 상기 제1 학습 데이터를 획득하는 것으로 이루어질 수 있다.Here, the first learning data obtaining step obtains a statistical value of the time series variable for each of the plurality of time series variables based on the time series data consisting of a plurality of time series samples for each of the plurality of time series variables, and The first learning data may be obtained based on statistical values for each of the time series variables.

여기서, 상기 서브 모델은, 회귀 예측을 수행하는 머신러닝 알고리즘 또는 통계적인 알고리즘이고, 상기 메타 모델은, 회귀 예측을 수행하면서 입력과 출력의 관계를 나타내는 파라미터를 통해 업데이트를 수행하는 머신러닝 알고리즘 또는 통계적인 알고리즘일 수 있다.Here, the sub-model is a machine learning algorithm or statistical algorithm that performs regression prediction, and the meta-model is a machine learning algorithm or statistical algorithm that performs an update through parameters representing the relationship between input and output while performing regression prediction. It can be an arbitrary algorithm.

상기의 목적을 달성하기 위한 본 발명의 바람직한 실시예에 따른 시계열 데이터 회귀 예측을 위한 예측 모델을 이용한 예측 방법은, 제품의 생산 과정에서 획득한 시계열 데이터를 기반으로 현재 시점에 대한 입력 데이터를 획득하는 단계; 및 미리 학습되어 구축된 예측 모델을 이용하여, 상기 입력 데이터를 기반으로 상기 현재 시점에 대한 상기 제품의 최종 품질 예측값을 획득하는 단계;를 포함하며, 상기 예측 모델은, 상기 입력 데이터를 입력으로 하고 상기 제품의 품질 예측값을 출력으로 하는 미리 설정된 개수의 서브 모델 및 상기 품질 예측값을 입력으로 하고 상기 최종 품질 예측값을 출력으로 하는 하나의 메타 모델을 조합하여 획득되는 모델이다.In order to achieve the above object, a prediction method using a predictive model for regression prediction of time series data according to a preferred embodiment of the present invention is to obtain input data for a current point in time based on time series data obtained in the production process of a product. step; and obtaining a final quality prediction value of the product at the current point in time based on the input data using a pre-learned and built prediction model, wherein the prediction model takes the input data as an input and It is a model obtained by combining a preset number of sub-models that take the quality prediction value of the product as an output and one meta-model that takes the quality prediction value as an input and the final quality prediction value as an output.

여기서, 상기 최종 품질 예측값 획득 단계는, 상기 예측 모델을 이용하여 상기 입력 데이터를 기반으로 상기 현재 시점에 대한 최종 품질 예측 후보값을 획득하고, 상기 현재 시점에 대한 상기 최종 품질 예측 후보값을 기반으로 미리 설정된 윈도우 크기에 따른 과거 시점에 대한 품질 측정값을 이용하여 상기 현재 시점에 대한 상기 최종 품질 예측값을 획득하는 것으로 이루어질 수 있다.Here, the obtaining of the final quality prediction value includes obtaining a final quality prediction candidate value for the current time point based on the input data using the prediction model, and obtaining a final quality prediction candidate value for the current time point based on the input data. The final quality prediction value for the current time point may be obtained by using a quality measurement value for a past time point according to a preset window size.

여기서, 상기 현재 시점에 대한 품질 측정값 및 상기 현재 시점에 대한 상기 최종 품질 예측값을 기반으로 상기 예측 모델의 성능을 평가하고, 상기 현재 시점에 대한 상기 품질 측정값과 상기 현재 시점에 대한 상기 최종 품질 예측값을 기반으로 상기 예측 모델의 상기 메타 모델의 입력과 출력의 관계를 나타내는 파라미터를 조정하여 상기 예측 모델을 업데이트하며, 업데이트한 상기 예측 모델을 새로운 상기 예측 모델로 저장하는 단계;를 더 포함할 수 있다.Here, the performance of the predictive model is evaluated based on the quality measurement value for the current time point and the final quality prediction value for the current time point, and the quality measurement value for the current time point and the final quality value for the current time point are evaluated. Updating the predictive model by adjusting parameters representing the relationship between the input and output of the meta-model of the predictive model based on the predictive value, and storing the updated predictive model as the new predictive model; may further include there is.

여기서, 저장되어 있는 상기 예측 모델 중에서 검색 기준을 기반으로 복수개의 상기 예측 모델을 선택하고, 상기 현재 시점을 기준으로 미리 설정된 과거 기간에 대응되는 상기 시계열 데이터로 이루어지는 평가 데이터를 기반으로 상기 평가 데이터에 대한 상기 최종 품질 예측값을 복수개의 상기 예측 모델 각각에 대해 획득하며, 복수개의 상기 예측 모델 각각에 대한 상기 최종 품질 예측값 및 상기 평가 데이터에 대응되는 상기 제품의 품질 측정값을 기반으로 복수개의 상기 예측 모델 각각의 성능을 평가하고, 복수개의 상기 예측 모델의 성능 평가 결과를 기반으로 복수개의 상기 예측 모델 중에서 하나의 상기 예측 모델을 선택하는 단계;를 더 포함하며, 상기 최종 품질 예측값 획득 단계는, 복수개의 상기 예측 모델 중에서 선택된 하나의 상기 예측 모델을 이용하여 상기 입력 데이터를 기반으로 상기 현재 시점에 대한 상기 최종 품질 예측값을 획득하는 것으로 이루어질 수 있다.Here, a plurality of the predictive models are selected based on a search criterion from among the stored predictive models, and the evaluation data is evaluated based on evaluation data composed of the time series data corresponding to a preset past period based on the current point in time. The final quality prediction value for each of the plurality of prediction models is obtained, and the plurality of prediction models are obtained based on the final quality prediction value for each of the plurality of prediction models and the quality measurement value of the product corresponding to the evaluation data. Evaluating the performance of each and selecting one of the plurality of prediction models based on the performance evaluation result of the plurality of prediction models; wherein the obtaining of the final quality prediction value comprises a plurality of The final quality prediction value for the current time point may be obtained based on the input data by using one of the prediction models selected from among the prediction models.

여기서, 상기 예측 모델 선택 단계는, 회귀 예측 모델 성능 평가 방법을 이용하여 복수개의 상기 예측 모델 각각에 대한 상기 최종 품질 예측값 및 상기 평가 데이터에 대응되는 상기 제품의 품질 측정값을 기반으로 복수개의 상기 예측 모델 각각의 성능을 평가하고, 복수개의 상기 예측 모델의 성능 평가 결과를 기반으로 복수개의 상기 예측 모델 중에서 하나의 상기 예측 모델을 선택하며, 회귀 예측 모델 설명 평가 방법을 이용하여 복수개의 상기 예측 모델의 각각에 대한 설명 능력을 평가하고, 복수개의 상기 예측 모델 전부의 설명 능력이 미리 설정된 설명 평가 기준보다 낮으면 상기 평가 데이터를 기반으로 미리 설정된 개수의 상기 서브 모델을 학습하여 새로운 상기 예측 모델을 학습하며, 학습한 새로운 상기 예측 모델을 저장하는 것으로 이루어질 수 있다.Here, the predictive model selection step may include a plurality of predictions based on the final quality prediction value for each of the plurality of prediction models and the quality measurement value of the product corresponding to the evaluation data using a regression prediction model performance evaluation method. Evaluate the performance of each model, select one of the plurality of prediction models from among the plurality of prediction models based on the performance evaluation result of the plurality of prediction models, and use the regression prediction model description evaluation method to evaluate the plurality of prediction models. Evaluate the explanatory ability of each, and if the explanatory ability of all of the plurality of predictive models is lower than a preset explanation evaluation criterion, learn a new predictive model by learning a preset number of sub-models based on the evaluation data; , it may consist of storing the new predicted model learned.

본 발명의 바람직한 실시예에 따른 시계열 데이터 회귀 예측을 위한 예측 모델의 학습 방법 및 예측 모델을 이용한 예측 방법에 의하면, 시계열 데이터의 회귀 예측을 위한 스태킹 모델(stacking model)인 예측 모델을 학습하고, 학습되어 구축된 예측 모델을 이용하여 품질을 예측하며 예측 모델을 자동으로 업데이트함으로써, 실시간 예측에 방해가 되지 않으면서도 예측 모델의 예측 성능이 저하되는 현상을 방지할 수 있다.According to a method for learning a predictive model for regression prediction of time series data and a prediction method using a prediction model according to a preferred embodiment of the present invention, a prediction model, which is a stacking model for regression prediction of time series data, is learned, and learning is performed. By predicting the quality using the built-in predictive model and automatically updating the predictive model, it is possible to prevent degradation of the predictive performance of the predictive model without interfering with real-time prediction.

즉, 본 발명은 실시간 데이터를 이용한 예측에서 데이터의 분포가 변함으로써 예측 모델의 예측 성능이 저하되는 현상을 자동 업데이트를 통해 관리 인력 비용을 감소시킬 수 있다.That is, the present invention can reduce management manpower costs by automatically updating a phenomenon in which prediction performance of a prediction model is deteriorated due to a change in data distribution in prediction using real-time data.

또한, 본 발명은 모델 업데이트를 통해 속도가 중요한 실시간 예측의 흐름을 방해하지 않으면서 예측 모델의 성능 저하를 최대한 막을 수 있다.In addition, the present invention can prevent performance degradation of a predictive model as much as possible without disturbing the flow of real-time prediction where speed is important through model update.

또한, 본 발명은 업데이트되는 모델들을 관리하고 최신의 데이터로 모델의 성능을 실시간 평가하여 해당 시점의 데이터 분포에 가장 적합한 모델을 다음 예측에 사용함으로써 더 나은 성능을 기대할 수 있다.In addition, the present invention manages updated models, evaluates the performance of the model with the latest data in real time, and uses the model most suitable for the data distribution at that time in the next prediction, so that better performance can be expected.

본 발명의 효과들은 이상에서 언급한 효과들로 제한되지 않으며, 언급되지 않은 또 다른 효과들은 아래의 기재로부터 통상의 기술자에게 명확하게 이해될 수 있을 것이다.The effects of the present invention are not limited to the effects mentioned above, and other effects not mentioned will be clearly understood by those skilled in the art from the description below.

도 1은 본 발명의 바람직한 실시예에 따른 시계열 데이터 회귀 예측 장치를 설명하기 위한 블록도이다.
도 2는 본 발명의 바람직한 실시예에 따른 시계열 데이터 회귀 예측을 위한 예측 모델의 학습 방법을 설명하기 흐름도이다.
도 3은 본 발명의 바람직한 실시예에 따른 제1 학습 데이터 획득 동작을 설명하기 위한 도면이다.
도 4는 본 발명의 바람직한 실시예에 따른 제1 학습 데이터 획득 동작의 일례를 설명하기 위한 도면이다.
도 5는 본 발명의 바람직한 실시예에 따른 서브 모델 학습 동작을 설명하기 위한 도면이다.
도 6은 본 발명의 바람직한 실시예에 따른 제2 학습 데이터 획득 동작을 설명하기 위한 도면이다.
도 7은 본 발명의 바람직한 실시예에 따른 서브 모델 학습 동작과 제2 학습 데이터 획득 동작의 일례를 설명하기 위한 도면이다.
도 8은 본 발명의 바람직한 실시예에 따른 메타 모델 학습 동작을 설명하기 위한 도면이다.
도 9는 본 발명의 바람직한 실시예에 따른 예측 모델 획득 동작을 설명하기 위한 도면이다.
도 10은 본 발명의 바람직한 실시예에 따른 메타 모델 학습 동작과 예측 모델 획득 동작의 일례를 설명하기 위한 도면이다.
도 11은 본 발명의 바람직한 실시예에 따른 시계열 데이터 회귀 예측을 위한 예측 모델을 이용한 예측 방법을 설명하기 위한 흐름도이다.
도 12는 본 발명의 바람직한 실시예에 따른 예측 모델을 이용한 품질 예측 동작을 설명하기 위한 도면이다.
도 13은 도 12에 도시한 품질 예측 동작의 상세 동작을 설명하기 위한 도면이다.
도 14는 본 발명의 바람직한 실시예에 따른 품질 예측 동작의 일례를 설명하기 위한 도면이다.
도 15는 본 발명의 바람직한 실시예에 따른 예측 모델 성능 평가 및 업데이트 동작을 설명하기 위한 도면이다.
도 16은 본 발명의 바람직한 실시예에 따른 예측 모델 선택 동작을 설명하기 위한 도면이다.
도 17은 본 발명의 바람직한 실시예에 따른 새로운 예측 모델 학습 동작을 설명하기 위한 도면이다.
도 18은 본 발명의 바람직한 실시예에 따른 예측 모델 선택 동작과 새로운 예측 모델 학습 동작의 일례를 설명하기 위한 도면이다.1 is a block diagram for explaining a time series data regression prediction apparatus according to a preferred embodiment of the present invention.
2 is a flowchart illustrating a method of learning a predictive model for regression prediction of time series data according to a preferred embodiment of the present invention.
3 is a diagram for explaining a first learning data acquisition operation according to a preferred embodiment of the present invention.
4 is a diagram for explaining an example of a first learning data acquisition operation according to a preferred embodiment of the present invention.
5 is a diagram for explaining a sub-model learning operation according to a preferred embodiment of the present invention.
6 is a diagram for explaining a second learning data acquisition operation according to a preferred embodiment of the present invention.
7 is a diagram for explaining an example of a sub-model learning operation and a second training data acquisition operation according to a preferred embodiment of the present invention.
8 is a diagram for explaining a meta-model learning operation according to a preferred embodiment of the present invention.
9 is a diagram for explaining an operation of acquiring a predictive model according to a preferred embodiment of the present invention.
10 is a diagram for explaining an example of a meta-model learning operation and a predictive model acquisition operation according to a preferred embodiment of the present invention.
11 is a flowchart illustrating a prediction method using a prediction model for regression prediction of time series data according to a preferred embodiment of the present invention.
12 is a diagram for explaining a quality prediction operation using a prediction model according to a preferred embodiment of the present invention.
FIG. 13 is a diagram for explaining a detailed operation of the quality prediction operation shown in FIG. 12 .
14 is a diagram for explaining an example of a quality prediction operation according to a preferred embodiment of the present invention.
15 is a diagram for explaining a predictive model performance evaluation and update operation according to a preferred embodiment of the present invention.
16 is a diagram for explaining a predictive model selection operation according to a preferred embodiment of the present invention.
17 is a diagram for explaining a new predictive model learning operation according to a preferred embodiment of the present invention.
18 is a diagram for explaining an example of a prediction model selection operation and a new prediction model learning operation according to a preferred embodiment of the present invention.

이하, 첨부된 도면을 참조하여 본 발명의 실시예를 상세히 설명한다. 본 발명의 이점 및 특징, 그리고 그것들을 달성하는 방법은 첨부되는 도면과 함께 상세하게 후술되어 있는 실시예들을 참조하면 명확해질 것이다. 그러나, 본 발명은 이하에서 게시되는 실시예들에 한정되는 것이 아니라 서로 다른 다양한 형태로 구현될 수 있으며, 단지 본 실시예들은 본 발명의 게시가 완전하도록 하고, 본 발명이 속하는 기술 분야에서 통상의 지식을 가진 자에게 발명의 범주를 완전하게 알려주기 위해 제공되는 것이며, 본 발명은 청구항의 범주에 의해 정의될 뿐이다. 명세서 전체에 걸쳐 동일 참조 부호는 동일 구성 요소를 지칭한다.Hereinafter, embodiments of the present invention will be described in detail with reference to the accompanying drawings. Advantages and features of the present invention, and methods of achieving them, will become clear with reference to the detailed description of the following embodiments taken in conjunction with the accompanying drawings. However, the present invention is not limited to the embodiments disclosed below, but may be implemented in various different forms, and only the present embodiments make the disclosure of the present invention complete, and are common in the art to which the present invention belongs. It is provided to fully inform the knowledgeable person of the scope of the invention, and the invention is only defined by the scope of the claims. Like reference numbers designate like elements throughout the specification.

다른 정의가 없다면, 본 명세서에서 사용되는 모든 용어(기술 및 과학적 용어를 포함)는 본 발명이 속하는 기술 분야에서 통상의 지식을 가진 자에게 공통적으로 이해될 수 있는 의미로 사용될 수 있을 것이다. 또한, 일반적으로 사용되는 사전에 정의되어 있는 용어들은 명백하게 특별히 정의되어 있지 않는 한 이상적으로 또는 과도하게 해석되지 않는다.Unless otherwise defined, all terms (including technical and scientific terms) used in this specification may be used in a meaning commonly understood by those of ordinary skill in the art to which the present invention belongs. In addition, terms defined in commonly used dictionaries are not interpreted ideally or excessively unless explicitly specifically defined.

본 명세서에서 "제1", "제2" 등의 용어는 하나의 구성 요소를 다른 구성 요소로부터 구별하기 위한 것으로, 이들 용어들에 의해 권리범위가 한정되어서는 아니 된다. 예컨대, 제1 구성 요소는 제2 구성 요소로 명명될 수 있고, 유사하게 제2 구성 요소도 제1 구성 요소로 명명될 수 있다.In this specification, terms such as "first" and "second" are used to distinguish one component from another, and the scope of rights should not be limited by these terms. For example, a first component may be referred to as a second component, and similarly, the second component may also be referred to as a first component.

본 명세서에서 각 단계들에 있어 식별부호(예컨대, a, b, c 등)는 설명의 편의를 위하여 사용되는 것으로 식별부호는 각 단계들의 순서를 설명하는 것이 아니며, 각 단계들은 문맥상 명백하게 특정 순서를 기재하지 않는 이상 명기된 순서와 다르게 일어날 수 있다. 즉, 각 단계들은 명기된 순서와 동일하게 일어날 수도 있고 실질적으로 동시에 수행될 수도 있으며 반대의 순서대로 수행될 수도 있다.In this specification, identification codes (e.g., a, b, c, etc.) for each step are used for convenience of explanation, and identification codes do not describe the order of each step, and each step is clearly a specific order in context. Unless specified, it may occur in a different order from the specified order. That is, each step may occur in the same order as specified, may be performed substantially simultaneously, or may be performed in the reverse order.

본 명세서에서, "가진다", "가질 수 있다", "포함한다" 또는 "포함할 수 있다" 등의 표현은 해당 특징(예컨대, 수치, 기능, 동작, 또는 부품 등의 구성 요소)의 존재를 가리키며, 추가적인 특징의 존재를 배제하지 않는다.In this specification, expressions such as “has”, “can have”, “includes” or “can include” indicate the existence of a corresponding feature (eg, numerical value, function, operation, or component such as a part). indicated, and does not preclude the presence of additional features.

이하에서 첨부한 도면을 참조하여 본 발명에 따른 시계열 데이터 회귀 예측을 위한 예측 모델의 학습 방법 및 예측 모델을 이용한 예측 방법의 바람직한 실시예에 대해 상세하게 설명한다.Hereinafter, preferred embodiments of a method for learning a predictive model for regression prediction of time-series data and a predictive method using the predictive model according to the present invention will be described in detail with reference to the accompanying drawings.

먼저, 도 1을 참조하여 본 발명의 바람직한 실시예에 따른 시계열 데이터 회귀 예측 장치에 대하여 설명한다.First, referring to FIG. 1, a time-series data regression prediction apparatus according to a preferred embodiment of the present invention will be described.

도 1은 본 발명의 바람직한 실시예에 따른 시계열 데이터 회귀 예측 장치를 설명하기 위한 블록도이다.1 is a block diagram for explaining a time series data regression prediction apparatus according to a preferred embodiment of the present invention.

도 1을 참조하면, 본 발명의 바람직한 실시예에 따른 시계열 데이터 회귀 예측 장치(100)는 시계열 데이터의 회귀 예측을 위한 스태킹 모델(stacking model)인 예측 모델을 학습하고, 학습되어 구축된 예측 모델을 이용하여 품질을 예측하며 예측 모델을 자동으로 업데이트할 수 있다. 이에 따라, 본 발명은 실시간 예측에 방해가 되지 않으면서도 예측 모델의 예측 성능이 저하되는 현상을 방지할 수 있다.Referring to FIG. 1, the time series data regression prediction apparatus 100 according to a preferred embodiment of the present invention learns a prediction model, which is a stacking model for regression prediction of time series data, and uses the learned and built prediction model. It can be used to predict quality and automatically update the predictive model. Accordingly, the present invention can prevent a phenomenon in which prediction performance of a prediction model is degraded without interfering with real-time prediction.

이를 위해, 시계열 데이터 회귀 예측 장치(100)는 하나 이상의 프로세서(110), 컴퓨터 판독 가능한 저장 매체(130) 및 통신 버스(150)를 포함할 수 있다.To this end, the time series data regression prediction apparatus 100 may include one or more processors 110, a computer readable storage medium 130, and a communication bus 150.

프로세서(110)는 시계열 데이터 회귀 예측 장치(100)가 동작하도록 제어할 수 있다. 예컨대, 프로세서(110)는 컴퓨터 판독 가능한 저장 매체(130)에 저장된 하나 이상의 프로그램(131)을 실행할 수 있다. 하나 이상의 프로그램(131)은 하나 이상의 컴퓨터 실행 가능 명령어를 포함할 수 있으며, 컴퓨터 실행 가능 명령어는 프로세서(110)에 의해 실행되는 경우 시계열 데이터 회귀 예측 장치(100)로 하여금 시계열 데이터의 회귀 예측을 위한 예측 모델을 학습하고, 학습되어 구축된 예측 모델을 이용하여 품질을 예측하며 예측 모델을 자동으로 업데이트하기 위한 동작을 수행하도록 구성될 수 있다.The processor 110 may control the time series data regression prediction device 100 to operate. For example, the processor 110 may execute one or more programs 131 stored in the computer readable storage medium 130 . The one or more programs 131 may include one or more computer executable instructions, and the computer executable instructions, when executed by the processor 110, cause the time series data regression prediction device 100 to predict the regression of time series data. It may be configured to learn a predictive model, predict quality using the learned and built predictive model, and perform an operation for automatically updating the predictive model.

컴퓨터 판독 가능한 저장 매체(130)는 시계열 데이터의 회귀 예측을 위한 예측 모델을 학습하고, 학습되어 구축된 예측 모델을 이용하여 품질을 예측하며 예측 모델을 자동으로 업데이트하기 위한 컴퓨터 실행 가능 명령어 내지 프로그램 코드, 프로그램 데이터 및/또는 다른 적합한 형태의 정보를 저장하도록 구성된다. 컴퓨터 판독 가능한 저장 매체(130)에 저장된 프로그램(131)은 프로세서(110)에 의해 실행 가능한 명령어의 집합을 포함한다. 일 실시예에서, 컴퓨터 판독 가능한 저장 매체(130)는 메모리(랜덤 액세스 메모리와 같은 휘발성 메모리, 비휘발성 메모리, 또는 이들의 적절한 조합), 하나 이상의 자기 디스크 저장 디바이스들, 광학 디스크 저장 디바이스들, 플래시 메모리 디바이스들, 그 밖에 시계열 데이터 회귀 예측 장치(100)에 의해 액세스되고 원하는 정보를 저장할 수 있는 다른 형태의 저장 매체, 또는 이들의 적합한 조합일 수 있다.The computer readable storage medium 130 is computer executable instructions or program codes for learning a predictive model for regression prediction of time series data, predicting quality using the learned and built predictive model, and automatically updating the predictive model. , program data and/or other suitable form of information. The program 131 stored in the computer readable storage medium 130 includes a set of instructions executable by the processor 110 . In one embodiment, computer readable storage medium 130 may include memory (volatile memory such as random access memory, non-volatile memory, or a suitable combination thereof), one or more magnetic disk storage devices, optical disk storage devices, flash It may be memory devices, other types of storage media that can be accessed by the time series data regression prediction apparatus 100 and store desired information, or a suitable combination thereof.

통신 버스(150)는 프로세서(110), 컴퓨터 판독 가능한 저장 매체(130)를 포함하여 시계열 데이터 회귀 예측 장치(100)의 다른 다양한 컴포넌트들을 상호 연결한다.The communication bus 150 interconnects various other components of the time-series data regression prediction apparatus 100, including the processor 110 and the computer readable storage medium 130.

시계열 데이터 회귀 예측 장치(100)는 또한 하나 이상의 입출력 장치를 위한 인터페이스를 제공하는 하나 이상의 입출력 인터페이스(170) 및 하나 이상의 통신 인터페이스(190)를 포함할 수 있다. 입출력 인터페이스(170) 및 통신 인터페이스(190)는 통신 버스(150)에 연결된다. 입출력 장치(도시하지 않음)는 입출력 인터페이스(170)를 통해 시계열 데이터 회귀 예측 장치(100)의 다른 컴포넌트들에 연결될 수 있다.The time series data regression prediction device 100 may also include one or more input/output interfaces 170 and one or more communication interfaces 190 providing interfaces for one or more input/output devices. The input/output interface 170 and the communication interface 190 are connected to the communication bus 150 . An input/output device (not shown) may be connected to other components of the time series data regression prediction apparatus 100 through an input/output interface 170 .

그러면, 도 2 내지 도 10을 참조하여 본 발명의 바람직한 실시예에 따른 시계열 데이터 회귀 예측을 위한 예측 모델의 학습 방법에 대하여 설명한다.Next, a method for learning a predictive model for regression prediction of time series data according to a preferred embodiment of the present invention will be described with reference to FIGS. 2 to 10 .

도 2는 본 발명의 바람직한 실시예에 따른 시계열 데이터 회귀 예측을 위한 예측 모델의 학습 방법을 설명하기 흐름도이고, 도 3은 본 발명의 바람직한 실시예에 따른 제1 학습 데이터 획득 동작을 설명하기 위한 도면이며, 도 4는 본 발명의 바람직한 실시예에 따른 제1 학습 데이터 획득 동작의 일례를 설명하기 위한 도면이고, 도 5는 본 발명의 바람직한 실시예에 따른 서브 모델 학습 동작을 설명하기 위한 도면이며, 도 6은 본 발명의 바람직한 실시예에 따른 제2 학습 데이터 획득 동작을 설명하기 위한 도면이고, 도 7은 본 발명의 바람직한 실시예에 따른 서브 모델 학습 동작과 제2 학습 데이터 획득 동작의 일례를 설명하기 위한 도면이며, 도 8은 본 발명의 바람직한 실시예에 따른 메타 모델 학습 동작을 설명하기 위한 도면이고, 도 9는 본 발명의 바람직한 실시예에 따른 예측 모델 획득 동작을 설명하기 위한 도면이며, 도 10은 본 발명의 바람직한 실시예에 따른 메타 모델 학습 동작과 예측 모델 획득 동작의 일례를 설명하기 위한 도면이다.2 is a flowchart illustrating a method for learning a predictive model for regression prediction of time series data according to a preferred embodiment of the present invention, and FIG. 3 is a diagram for explaining a first learning data acquisition operation according to a preferred embodiment of the present invention. 4 is a diagram for explaining an example of a first learning data acquisition operation according to a preferred embodiment of the present invention, and FIG. 5 is a diagram for explaining a sub-model learning operation according to a preferred embodiment of the present invention, 6 is a diagram for explaining a second training data acquisition operation according to a preferred embodiment of the present invention, and FIG. 7 describes an example of a sub-model learning operation and a second training data acquisition operation according to a preferred embodiment of the present invention. FIG. 8 is a diagram for explaining a meta-model learning operation according to a preferred embodiment of the present invention, and FIG. 9 is a diagram for explaining a predictive model acquisition operation according to a preferred embodiment of the present invention. 10 is a diagram for explaining an example of a meta-model learning operation and a predictive model acquisition operation according to a preferred embodiment of the present invention.

도 2를 참조하면, 시계열 데이터 회귀 예측 장치(100)의 프로세서(110)는 제품의 생산 과정에서 획득한 시계열 데이터를 기반으로 제1 학습 데이터를 획득할 수 있다(S110).Referring to FIG. 2 , the processor 110 of the time-series data regression prediction apparatus 100 may obtain first learning data based on time-series data obtained in the production process of a product (S110).

여기서, 시계열 데이터는 도 3에 도시된 바와 같이, 복수개의 시계열 변수(시계열 변수 1, 시계열 변수 2, ..., 시계열 변수 n) 각각에 대한 복수개의 시계열 샘플(시계열 샘플 1, 시계열 샘플 2, 시계열 샘플 3, ..., 시계열 샘플 n)로 이루어질 수 있다. 예컨대, 시계열 데이터는 제품의 제조 공정에 이용되는 생산 장비에서 수집된 데이터, 제품의 목표 품질값을 맞추기 위해 조작되는 생산 장비의 설정값, 실제 제조 공정 시 생산 장비에서 측정된 데이터, 제품의 제조에 이용되는 원부자재의 측정값 등과 같이, 제품의 생산에 이용되는 시계열 변수들에 대한 시계열 샘플들을 말한다.Here, as shown in FIG. 3, the time series data is a plurality of time series samples (time series sample 1, time series sample 2, time series sample 2, time series sample 3, ..., time series sample n). For example, time series data is data collected from production equipment used in the manufacturing process of a product, set values of production equipment operated to meet the target quality value of a product, data measured by production equipment during the actual manufacturing process, and It refers to time-series samples of time-series variables used in the production of products, such as the measured values of raw and subsidiary materials used.

즉, 프로세서(110)는 도 3에 도시된 바와 같이, 시계열 데이터를 기반으로 복수개의 시계열 변수(시계열 변수 1, 시계열 변수 2, ..., 시계열 변수 n) 각각에 대하여 시계열 변수의 통계값, 즉 평균값(평균값 1, 평균값 2, ..., 평균값 n)과 표준편차값(표준편차값 1, 표준편차값 2, ..., 표준편차값 n)을 획득할 수 있다. 여기서, 통계값은 평균값, 최소값, 최대값, 사분위값, 표준편차값 등 다양한 종류의 통계값이 이용될 수 있으며, 본 발명의 설명의 편의를 위해, 평균값과 표준편차값을 통계값으로 이용하는 것으로 가정하고 설명한다.That is, the processor 110, as shown in FIG. 3, for each of a plurality of time series variables (time series variable 1, time series variable 2, ..., time series variable n) based on time series data, statistical values of time series variables, That is, an average value (average value 1, average value 2, ..., average value n) and standard deviation values (standard deviation value 1, standard deviation value 2, ..., standard deviation value n) can be obtained. Here, various types of statistical values such as mean value, minimum value, maximum value, quartile value, and standard deviation value may be used as the statistical value, and for convenience of description of the present invention, the average value and standard deviation value are used as statistical values. Assume and explain.

그리고, 프로세서(110)는 도 3에 도시된 바와 같이, 복수개의 시계열 변수(시계열 변수 1, 시계열 변수 2, ..., 시계열 변수 n) 각각에 대한 평균값(평균값 1, 평균값 2, ..., 평균값 n)과 표준편차값(표준편차값 1, 표준편차값 2, ..., 표준편차값 n)을 기반으로 제1 학습 데이터를 획득할 수 있다.And, as shown in FIG. 3, the processor 110 calculates average values (average value 1, average value 2, ..., time series variable n) for each of a plurality of time series variables (time series variable 1, time series variable 2, ..., time series variable n). , average value n) and standard deviation values (standard deviation value 1, standard deviation value 2, ..., standard deviation value n), it is possible to obtain the first training data.

한편, 본 발명에 따른 예측 모델은 하나의 제품에 대응되는 시계열 데이터를 기반으로 해당 하나의 제품에 대한 품질을 예측할 수 있다. 이 경우, 본 발명은 도 3에 도시된 바와 같이 하나의 제품에 대응되는 시계열 데이터를 기반으로 해당 하나의 제품에 대응되는 제1 학습 데이터를 획득할 수 있다. 물론, 본 발명에 따른 예측 모델은 복수개의 제품 각각에 대응되는 시계열 데이터를 기반으로 해당 복수개의 제품 각각에 대한 품질을 예측할 수도 있다. 이 경우, 본 발명은 복수개의 제품 각각에 대응되는 시계열 데이터를 기반으로 제품 각각을 하나의 행으로 구성하여 복수개의 제품 각각에 대한 학습 데이터가 포함된 제1 학습 데이터를 획득할 수 있다. 예컨대, 도 4에 도시된 바와 같이, 제1 학습 데이터의 첫 번째 행은 "제품 1"에 대한 시계열 변수 1(Param 1), 시계열 변수 2(Param 2), 시계열 변수 n(Param n) 등에 대한 복수개의 시계열 샘플로 이루어진 시계열 데이터를 기반으로 획득한 학습 데이터인 "시계열 변수 1의 평균값 1(Param1_mean), 시계열 변수 1의 표준편차값 1(Param1_std), 시계열 변수 n의 평균값 n(Paramn_mean), 시계열 변수 n의 표준편차값 n(Paramn_std)"으로 이루어지고, "제품 1"의 학습 데이터에 대응되는 품질 측정값(y1)도 획득될 수 있다. 제1 학습 데이터의 두 번째 행은 "제품 2'에 대한 시계열 변수 1(Param 1), 시계열 변수 2(Param 2), 시계열 변수 n(Param n) 등에 대한 복수개의 시계열 샘플로 이루어진 시계열 데이터를 기반으로 획득한 학습 데이터인 "시계열 변수 1의 평균값 1(Param1_mean), 시계열 변수 1의 표준편차값 1(Param1_std), 시계열 변수 n의 평균값 n(Paramn_mean), 시계열 변수 n의 표준편차값 n(Paramn_std)"으로 이루어지고, "제품 2"의 학습 데이터에 대응되는 품질 측정값(y2)도 획득될 수 있다. 제1 학습 데이터의 마지막 행은 "제품 M"에 대한 시계열 변수 1(Param 1), 시계열 변수 2(Param 2), 시계열 변수 n(Param n) 등에 대한 복수개의 시계열 샘플로 이루어진 시계열 데이터를 기반으로 획득한 학습 데이터인 "시계열 변수 1의 평균값 1(Param1_mean), 시계열 변수 1의 표준편차값 1(Param1_std), 시계열 변수 n의 평균값 n(Paramn_mean), 시계열 변수 n의 표준편차값 n(Paramn_std)"으로 이루어지고, "제품 M"의 학습 데이터에 대응되는 품질 측정값(yM)도 획득될 수 있다.Meanwhile, the prediction model according to the present invention may predict the quality of a single product based on time-series data corresponding to the single product. In this case, as shown in FIG. 3 , the present invention may obtain first learning data corresponding to one product based on time-series data corresponding to one product. Of course, the prediction model according to the present invention may predict the quality of each of a plurality of products based on time-series data corresponding to each of the plurality of products. In this case, the present invention may obtain first learning data including learning data for each of a plurality of products by organizing each product into one row based on time-series data corresponding to each of a plurality of products. For example, as shown in FIG. 4, the first row of the first training data is for time series variable 1 (Param 1), time series variable 2 (Param 2), time series variable n (Param n) for "product 1", and the like. Learning data acquired based on time series data consisting of a plurality of time series samples, "average value 1 (Param1_mean) of time series variable 1, standard deviation value 1 (Param1_std) of time series variable 1, mean value n (Paramn_mean) of time series variable n, time series A quality measurement value y1 consisting of the standard deviation value n(Paramn_std) of the variable n" and corresponding to the training data of "Product 1" can also be obtained. The second row of the first training data is based on time series data consisting of a plurality of time series samples for time series variable 1 (Param 1), time series variable 2 (Param 2), and time series variable n (Param n) for "Product 2". The learning data obtained by “mean value 1 (Param1_mean) of time series variable 1, standard deviation value 1 (Param1_std) of time series variable 1, mean value n (Paramn_mean) of time series variable n, standard deviation value n (Paramn_std) of time series variable n” ", and a quality measurement value (y2) corresponding to the training data of "Product 2" can also be obtained. The last row of the first training data is time series variable 1 (Param 1) for "Product M", the time series Learning data obtained based on time series data consisting of a plurality of time series samples for variable 2 (Param 2) and time series variable n (Param n), etc. 1 (Param1_std), the average value n (Paramn_mean) of time series variable n, and the standard deviation value n (Paramn_std) of time series variable n”, and the quality measurement value (yM) corresponding to the learning data of “product M” is also obtained. can

그런 다음, 프로세서(110)는 제1 학습 데이터 및 제1 학습 데이터에 대응되는 제품의 품질 측정값을 기반으로 미리 설정된 개수의 서브 모델을 학습할 수 있다(S120).Then, the processor 110 may learn a preset number of sub-models based on first training data and product quality measurement values corresponding to the first training data (S120).

여기서, 서브 모델은 제1 학습 데이터를 입력으로 하고 제품의 품질 예측값을 출력으로 할 수 있다. 그리고, 서브 모델은 회귀 예측을 수행하는 머신러닝 알고리즘 또는 통계적인 알고리즘일 수 있다.Here, the sub-model may take the first learning data as an input and a product quality prediction value as an output. And, the sub-model may be a machine learning algorithm or a statistical algorithm that performs regression prediction.

즉, 프로세서(110)는 도 5에 도시된 바와 같이, 제1 학습 데이터 및 제1 학습 데이터에 대응되는 품질 측정값을 기반으로 K-fold 교차 검증 방법을 이용하여 복수개의 서브 모델(서브 모델 1, 서브 모델 2, ..., 서브 모델 n)을 학습할 수 있다. 여기서, K-fold 교차 검증 방법은 학습을 K번 만큼 반복할 수 있도록 원 데이터를 균등하게 나눠 fold가 진행됨에 따라 학습 데이터를 시간순으로 늘려가며 이후 시점의 데이터를 검증 데이터로 사용하는 방법을 말한다.That is, as shown in FIG. 5, the processor 110 uses the K-fold cross-validation method based on the first training data and the quality measurement value corresponding to the first training data to generate a plurality of sub-models (sub-model 1 , submodel 2, ..., submodel n) can be learned. Here, the K-fold cross-validation method refers to a method of equally dividing the original data so that learning can be repeated K times, increasing the training data in chronological order as the fold progresses, and using the data at a later point in time as verification data.

이때, 프로세서(110)는 회귀 예측 모델 성능 평가 방법을 이용하여 서브 모델의 출력인 품질 예측값 및 품질 측정값을 기반으로 복수개의 서브 모델(서브 모델 1, 서브 모델 2, ..., 서브 모델 n) 각각의 성능(성능 1, 성능 2, ..., 성능 n)을 평가할 수 있다.At this time, the processor 110 uses a regression prediction model performance evaluation method to generate a plurality of submodels (submodel 1, submodel 2, ..., submodel n based on the quality prediction value and the quality measurement value, which are outputs of the submodels). ) can evaluate each performance (performance 1, performance 2, ..., performance n).

여기서, 회귀 예측 모델 성능 평가 방법은 ME(mean of erros), RMSE(root mean of squared erros), MAE(mean of absolute errors), MPE(mean of percentage errors), MAPE(mean of absolute percentage errors) 및 MASE(mean of absolute scaled errors) 중 적어도 하나를 이용하여 회귀 예측 모델을 평가하는 방법일 수 있다. 또한, 본 발명은 스태킹 모델(stacking model)의 장점(각 모델의 예측값을 활용한 편향성 및 과적합 감소 등)을 고려하여 예측값의 상관관계가 적은 서브 모델의 조합을 선택할 수도 있다.Here, the regression prediction model performance evaluation method is mean of erros (ME), root mean of squared erros (RMSE), mean of absolute errors (MAE), mean of percentage errors (MPE), mean of absolute percentage errors (MAPE) and It may be a method of evaluating a regression prediction model using at least one of mean of absolute scaled errors (MASE). In addition, the present invention may select a combination of sub-models having a low correlation of predicted values in consideration of the advantages of the stacking model (reduction of bias and overfitting using the predicted values of each model, etc.).

그리고, 프로세서(110)는 도 5에 도시된 바와 같이, 복수개의 서브 모델(서브 모델 1, 서브 모델 2, ..., 서브 모델 n)의 성능 평가 결과를 기반으로 성능이 좋은 순서대로 미리 설정된 개수의 서브 모델(서브 모델 1, 서브 모델 2, ..., 서브 모델 k)을 획득할 수 있다.And, as shown in FIG. 5 , the processor 110 presets a plurality of sub-models (sub-model 1, sub-model 2, ..., sub-model n) in order of good performance based on performance evaluation results. A number of sub-models (sub-model 1, sub-model 2, ..., sub-model k) can be obtained.

여기서, n개의 서브 모델 중에서 성능을 기반으로 선택된 서브 모델의 개수 k는 n보다 작으며, 예측 시스템 사양, 적용 공정에서 필요한 최소 속도 등을 기반으로 설정될 수 있다.Here, the number k of submodels selected based on performance among the n submodels is less than n, and may be set based on prediction system specifications, minimum speed required in the application process, and the like.

예컨대, 도 7에 도시된 바와 같이, 제1 학습 데이터 및 제1 학습 데이터에 대응되는 품질 측정값을 기반으로 K-fold 교차 검증 방법을 이용하여 미리 설정된 개수의 서브 모델(Model 1, Model 2, ..., Model k)을 획득할 수 있다.For example, as shown in FIG. 7, a preset number of sub-models (Model 1, Model 2, ..., Model k) can be obtained.

그런 다음, 프로세서(110)는 미리 설정된 개수의 서브 모델의 출력인 미리 설정된 개수의 품질 예측값을 기반으로 제2 학습 데이터를 획득할 수 있다(S130).Then, the processor 110 may obtain second training data based on a preset number of quality prediction values that are outputs of a preset number of sub-models (S130).

즉, 프로세서(110)는 도 6에 도시된 바와 같이, 미리 설정된 개수의 서브 모델(서브 모델 1, 서브 모델 2, ..., 서브 모델 k)의 출력인 미리 설정된 개수의 품질 예측값(품질 예측값 1, 품질 예측값 2, ..., 품질 예측값 k)을 기반으로 제2 학습 데이터를 획득할 수 있다.That is, as shown in FIG. 6 , the processor 110 outputs a preset number of quality prediction values (quality prediction values), which are outputs of a preset number of submodels (submodel 1, submodel 2, ..., submodel k). 1, the quality prediction value 2, ..., the quality prediction value k), the second training data may be obtained.

한편, 본 발명에 따른 예측 모델은 하나의 제품에 대응되는 시계열 데이터를 기반으로 해당 하나의 제품에 대한 품질을 예측할 수 있다. 이 경우, 본 발명은 도 6에 도시된 바와 같이 하나의 제품에 대응되는 미리 설정된 개수의 서브 모델(서브 모델 1, 서브 모델 2, ..., 서브 모델 k)의 출력인 미리 설정된 개수의 품질 예측값(품질 예측값 1, 품질 예측값 2, ..., 품질 예측값 k)을 기반으로 해당 하나의 제품에 대응되는 제2 학습 데이터를 획득할 수 있다. 물론, 본 발명에 따른 예측 모델은 복수개의 제품 각각에 대응되는 시계열 데이터를 기반으로 해당 복수개의 제품 각각에 대한 품질을 예측할 수도 있다. 이 경우, 본 발명은 복수개의 제품 각각에 대응되는 미리 설정된 개수의 서브 모델(서브 모델 1, 서브 모델 2, ..., 서브 모델 k)의 출력인 미리 설정된 개수의 품질 예측값(품질 예측값 1, 품질 예측값 2, ..., 품질 예측값 k)을 기반으로 제품 각각을 하나의 행으로 구성하여 복수개의 제품 각각에 대한 학습 데이터가 포함된 제2 학습 데이터를 획득할 수 있다.Meanwhile, the prediction model according to the present invention may predict the quality of a single product based on time-series data corresponding to the single product. In this case, as shown in FIG. 6, the present invention provides a preset number of quality outputs of a preset number of submodels (submodel 1, submodel 2, ..., submodel k) corresponding to one product. Based on the prediction values (quality prediction value 1, quality prediction value 2, ..., quality prediction value k), second learning data corresponding to the corresponding one product may be obtained. Of course, the prediction model according to the present invention may predict the quality of each of a plurality of products based on time-series data corresponding to each of the plurality of products. In this case, the present invention provides a preset number of quality prediction values (quality prediction value 1, Second training data including training data for each of a plurality of products may be obtained by composing each product into one row based on the quality prediction values 2, ..., and the quality prediction value k).

예컨대, 도 7에 도시된 바와 같이, 제1 학습 데이터 및 제1 학습 데이터에 대응되는 품질 측정값을 기반으로 K-fold 교차 검증 방법을 이용하여 획득한 미리 설정된 개수의 서브 모델(Model 1, Model 2, ..., Model k)의 품질 예측값을 기반으로 새로운 학습 데이터인 제2 학습 데이터를 획득할 수 있다.For example, as shown in FIG. 7, a preset number of sub-models (Model 1, Model 2, ..., second learning data that is new learning data may be obtained based on the quality prediction value of Model k).

그런 다음, 프로세서(110)는 제2 학습 데이터 및 품질 측정값을 기반으로 하나의 메타 모델을 학습할 수 있다(S140).Then, the processor 110 may learn one meta-model based on the second training data and the quality measurement value (S140).

여기서, 메타 모델은 서브 모델의 출력인 품질 예측값을 입력으로 하고 제품의 최종 품질 예측값을 출력으로 할 수 있다. 그리고, 메타 모델은 회귀 예측을 수행하면서 입력과 출력의 관계를 나타내는 파라미터를 통해 업데이트를 수행하는 머신러닝 알고리즘 또는 통계적인 알고리즘일 수 있다.Here, the meta-model may take the quality prediction value, which is the output of the sub-model, as an input and the final quality prediction value of the product as an output. In addition, the meta-model may be a machine learning algorithm or a statistical algorithm that performs an update through parameters representing a relationship between inputs and outputs while performing regression prediction.

즉, 프로세서(110)는 도 8에 도시된 바와 같이, 제2 학습 데이터 및 품질 측정값을 기반으로 K-fold 교차 검증 방법을 이용하여 하나의 메타 모델을 학습할 수 있다.That is, as shown in FIG. 8 , the processor 110 may learn one metamodel using the K-fold cross-validation method based on the second training data and the quality measurement value.

예컨대, 프로세서(110)는 도 10에 도시된 바와 같이, 미리 설정된 개수의 서브 모델(Model 1, Model 2, ..., Model k)의 품질 예측값을 기반으로 획득한 새로운 학습 데이터인 제2 학습 데이터 및 품질 측정값을 기반으로 K-fold 교차 검증 방법을 이용하여 메타 모델을 획득할 수 있다.For example, as shown in FIG. 10, the processor 110 performs second learning, which is new learning data acquired based on the quality prediction values of a preset number of sub-models (Model 1, Model 2, ..., Model k). Based on data and quality measures, a meta-model can be obtained using the K-fold cross-validation method.

이후, 프로세서(110)는 미리 설정된 개수의 서브 모델 및 하나의 메타 모델을 기반으로 시계열 데이터의 회귀 예측을 위한 하나의 예측 모델을 획득할 수 있다(S150).Thereafter, the processor 110 may obtain one prediction model for regression prediction of time series data based on a preset number of sub-models and one meta-model (S150).

즉, 프로세서(110)는 도 9에 도시된 바와 같이, 미리 설정된 개수의 서브 모델(서브 모델 1, 서브 모델 2, ..., 서브 모델 k)과 하나의 메타 모델을 조합하여 하나의 예측 모델을 획득할 수 있다.That is, as shown in FIG. 9, the processor 110 combines a preset number of sub-models (sub-model 1, sub-model 2, ..., sub-model k) and one meta-model to form one predictive model. can be obtained.

예컨대, 프로세서(110)는 도 10에 도시된 바와 같이, 미리 설정된 개수의 서브 모델(Model 1, Model 2, ..., Model k)과 하나의 메타 모델로 이루어지는 스태킹 모델(stacking model)인 하나의 예측 모델을 획득할 수 있다.For example, as shown in FIG. 10, the processor 110 is a stacking model consisting of a preset number of sub-models (Model 1, Model 2, ..., Model k) and one meta-model. A predictive model of can be obtained.

그리고, 프로세서(110)는 획득한 하나의 예측 모델을 저장할 수 있다.Also, the processor 110 may store one obtained prediction model.

그러면, 도 11 내지 도 18을 참조하여 본 발명의 바람직한 실시예에 따른 시계열 데이터 회귀 예측을 위한 예측 모델을 이용한 예측 방법에 대하여 설명한다.Next, a prediction method using a predictive model for regression prediction of time series data according to a preferred embodiment of the present invention will be described with reference to FIGS. 11 to 18 .

도 11은 본 발명의 바람직한 실시예에 따른 시계열 데이터 회귀 예측을 위한 예측 모델을 이용한 예측 방법을 설명하기 위한 흐름도이고, 도 12는 본 발명의 바람직한 실시예에 따른 예측 모델을 이용한 품질 예측 동작을 설명하기 위한 도면이며, 도 13은 도 12에 도시한 품질 예측 동작의 상세 동작을 설명하기 위한 도면이고, 도 14는 본 발명의 바람직한 실시예에 따른 품질 예측 동작의 일례를 설명하기 위한 도면이며, 도 15는 본 발명의 바람직한 실시예에 따른 예측 모델 성능 평가 및 업데이트 동작을 설명하기 위한 도면이고, 도 16은 본 발명의 바람직한 실시예에 따른 예측 모델 선택 동작을 설명하기 위한 도면이며, 도 17은 본 발명의 바람직한 실시예에 따른 새로운 예측 모델 학습 동작을 설명하기 위한 도면이고, 도 18은 본 발명의 바람직한 실시예에 따른 예측 모델 선택 동작과 새로운 예측 모델 학습 동작의 일례를 설명하기 위한 도면이다.11 is a flowchart illustrating a prediction method using a predictive model for regression prediction of time series data according to a preferred embodiment of the present invention, and FIG. 12 describes a quality prediction operation using a predictive model according to a preferred embodiment of the present invention. 13 is a diagram for explaining the detailed operation of the quality prediction operation shown in FIG. 12, and FIG. 14 is a diagram for explaining an example of the quality prediction operation according to a preferred embodiment of the present invention. 15 is a diagram for explaining a predictive model performance evaluation and update operation according to a preferred embodiment of the present invention, FIG. 16 is a diagram for explaining a predictive model selection operation according to a preferred embodiment of the present invention, and FIG. 18 is a diagram for explaining an example of a predictive model selection operation and a new predictive model learning operation according to a preferred embodiment of the present invention.

도 11을 참조하면, 시계열 데이터 회귀 예측 장치(100)의 프로세서(110)는 제품의 생산 과정에서 획득한 시계열 데이터를 기반으로 현재 시점에 대한 입력 데이터를 획득할 수 있다(S210).Referring to FIG. 11 , the processor 110 of the time-series data regression prediction device 100 may obtain input data for a current point in time based on time-series data obtained during the production process of a product (S210).

그런 다음, 프로세서(110)는 미리 학습되어 구축된 예측 모델을 이용하여, 입력 데이터를 기반으로 현재 시점에 대한 제품의 최종 품질 예측값을 획득할 수 있다(S220).Then, the processor 110 may obtain a final quality prediction value of the product at the current point in time based on the input data using the pre-learned and built predictive model (S220).

여기서, 예측 모델은 도 12에 도시된 바와 같이, 입력 데이터를 입력으로 하고 제품의 품질 예측값을 출력으로 하는 미리 설정된 개수의 서브 모델(서브 모델 1, 서브 모델 2, ..., 서브 모델 k) 및 품질 예측값을 입력으로 하고 최종 품질 예측값을 출력으로 하는 하나의 메타 모델을 조합하여 획득되는 모델일 수 있다.Here, as shown in FIG. 12, the prediction model is a preset number of sub-models (sub-model 1, sub-model 2, ..., sub-model k) having input data as input and product quality prediction values as output. and a model obtained by combining one meta-model having a quality prediction value as an input and a final quality prediction value as an output.

즉, 프로세서(110)는 도 13에 도시된 바와 같이, 예측 모델을 이용하여 입력 데이터를 기반으로 현재 시점에 대한 최종 품질 예측 후보값을 획득할 수 있다.That is, as shown in FIG. 13 , the processor 110 may obtain a final quality prediction candidate value for a current view based on input data using a predictive model.

이때, 프로세서(110)는 저장되어 있는 복수개의 예측 모델 중에서 선택된 하나의 예측 모델을 이용하여 입력 데이터를 기반으로 현재 시점에 대한 상기 최종 품질 예측값을 획득할 수 있다. 저장되어 있는 복수개의 예측 모델 중에서 하나의 예측 모델을 선택하는 동작에 대해서는 이하 설명한다.At this time, the processor 110 may obtain the final quality prediction value for the current time point based on the input data by using one prediction model selected from among a plurality of stored prediction models. An operation of selecting one predictive model from among a plurality of stored predictive models will be described below.

그리고, 프로세서(110)는 도 13에 도시된 바와 같이, 현재 시점에 대한 최종 품질 예측 후보값을 기반으로 미리 설정된 윈도우 크기(예컨대, "v" 등에 따른 과거 시점에 대한 품질 측정값("현재 시점 - 1"에 대한 품질 측정값, ..., "현재 시점 - v"에 대한 품질 측정값))을 이용하여 현재 시점에 대한 최종 품질 예측값을 획득할 수 있다.And, as shown in FIG. 13, the processor 110 performs a quality measurement value ("current time point" for a past time point according to a preset window size (eg, "v", etc.) based on the final quality prediction candidate value for the current time point, as shown in FIG. - The final quality prediction value for the current time point may be obtained using the quality measurement value for 1", ..., the quality measurement value for "current time - v")).

예컨대, 도 14에 도시된 바와 같이, 현재 시점(t)에 대한 실시간 데이터(X)인 입력 데이터를 기반으로 미리 선택된 하나의 예측 모델(Best Model)을 이용하여 현재 시점(t)에 대한 최종 품질 예측값(최종 Y 예측값)을 획득할 수 있다. 이때, 미리 선택된 하나의 예측 모델(Best Model)을 이용하여 획득한 현재 시점(t)에 대한 최종 품질 예측 후보값(Y 예측값 yhat_t)과 과거 시점에 대한 품질 측정값(t t-1 시점에 대한 yhat_t-1, ..., t-v+1 시점에 대한 yhat_t-v+1, t-v 시점에 대한 yhat_t-v)을 기반으로 미리 설정된 가중치(t 시점에 대한 가중치 w_t, t-1 시점에 대한 가중치 w_t-1, ..., t-v+1 시점에 대한 가중치 w_t-v+1, t-v 시점에 대한 가중치 w_t-v)를 이용하여 현재 시점(t)에 대한 최종 품질 예측값(최종 Y 예측값)을 획득할 수 있다. 이때, 현재 시점(t)을 기준으로 최근 시점에 대한 가중치를 더 높게 설정할 수 있다. 즉, 현재 시점(t)에 대한 최종 품질 예측값(최종 Y 예측값)을 도출할 때 과거의 실제 품질 측정값의 경향을 반영할 수 있다. 이때, 과거를 고려한 현재 시점(t)에 대한 최종 품질 예측값의 획득 과정은 오래된 데이터가 미치는 영향을 지수적으로 감소시키는 지수 가중 평균 등의 방법을 이용할 수 있다.For example, as shown in FIG. 14, the final quality for the current time point (t) is obtained by using one prediction model (Best Model) selected in advance based on input data, which is real-time data (X) for the current time point (t). A prediction value (final Y prediction value) can be obtained. At this time, the final quality prediction candidate value (Y prediction value yhat _t ) for the current time point (t) obtained using one preselected prediction model (Best Model) and the quality measurement value for the past time point (at time t t-1 yhat _t-1 , ..., t-v+1 for time points yhat _t-v+1 for time points yhat tv for time points _tv ) based on preset weights (weight w _t , t-1 for time points t Final quality prediction value for the current time point (t) using weights for time points w _t-1 , ..., t-v+1 weights for time points w _t-v+1 , weights for time points tv w _tv ) (final Y predicted value) can be obtained. At this time, based on the current time point t, the weight for the latest point of time may be set higher. That is, when deriving the final quality prediction value (final Y prediction value) for the current point in time t, the trend of the actual quality measurement value in the past may be reflected. In this case, the process of obtaining the final quality predictive value for the current point in time t considering the past may use a method such as an exponential weighted average that exponentially reduces the influence of old data.

그런 다음, 프로세서(110)는 현재 시점에 대한 품질 측정값과 현재 시점에 대한 최종 품질 예측값을 기반으로 예측 모델의 성능을 평가하고, 예측 모델을 업데이트하며, 업데이트한 예측 모델을 새로운 예측 모델로 저장할 수 있다(S230).Then, the processor 110 evaluates the performance of the prediction model based on the quality measurement value for the current point in time and the final quality prediction value for the current point in time, updates the prediction model, and stores the updated prediction model as a new prediction model. It can (S230).

즉, 프로세서(110)는 현재 시점에 대한 품질 측정값이 획득되면, 도 15에 도시된 바와 같이, 현재 시점에 대한 품질 측정값 및 현재 시점에 대한 최종 품질 예측값을 기반으로 예측 모델의 성능을 평가할 수 있다.That is, when the quality measurement value for the current time point is obtained, the processor 110 evaluates the performance of the predictive model based on the quality measurement value for the current point in time and the final quality prediction value for the current point in time, as shown in FIG. 15 . can

예컨대, 프로세서(110)는 "현재 시점에 대한 품질 측정값 - 현재 시점에 대한 최종 품질 예측값"의 절대값인 잔차(residual)로 예측 모델의 성능을 평가할 수 있다.For example, the processor 110 may evaluate the performance of the predictive model with a residual, which is an absolute value of “a quality measurement value for a current point in time - a final quality prediction value for a current point in time”.

그리고, 프로세서(110)는 도 15에 도시된 바와 같이, 현재 시점에 대한 품질 측정값과 현재 시점에 대한 최종 품질 예측값을 기반으로 예측 모델의 메타 모델의 입력과 출력의 관계를 나타내는 파라미터를 조정하여 예측 모델을 업데이트할 수 있다.And, as shown in FIG. 15, the processor 110 adjusts parameters representing the relationship between the input and output of the meta-model of the predictive model based on the quality measurement value for the current time point and the final quality prediction value for the current time point. The predictive model can be updated.

예컨대, 예측 모델의 메타 모델은 해당 예측 모델의 서브 모델들이 도출한 품질 예측값들을 입력받아 최종 품질 예측값을 계산할 수 있다. 메타 모델은 입력(품질 예측값)과 출력(최종 품질 예측값)의 관계를 도출하는 파라미터(w)를 가지고 있다. 메타 모델은 "최종 품질 예측값 = w * 품질 예측값" 등과 같은 관계식을 통해 현재 시점(t)에 대한 최종 품질 예측값을 계산할 수 있다. 이후, 현재 시점(t)에 대한 실제 품질 측정값이 획득되면, 오차(품질 측정값 - 최종 품질 예측값)가 줄어들도록 현재 시점(t)의 품질 측정값과 최종 품질 예측값을 이용하여 메타 모델의 파라미터(w)를 조정하는 업데이트를 진행할 수 있다.For example, the meta-model of the prediction model may receive quality prediction values derived by sub-models of the prediction model and calculate a final quality prediction value. The meta-model has a parameter (w) that derives the relationship between the input (quality prediction value) and the output (final quality prediction value). The meta-model may calculate the final quality prediction value for the current point in time t through a relational expression such as "final quality prediction value = w * quality prediction value". Then, when the actual quality measurement value for the current time point (t) is obtained, the parameters of the meta model are used by using the quality measurement value at the current time point (t) and the final quality prediction value so that the error (quality measurement value - final quality prediction value) is reduced. You can proceed with an update that adjusts (w).

그리고, 프로세서(110)는 도 15에 도시된 바와 같이, 업데이트한 예측 모델을 새로운 예측 모델로 저장할 수 있다.Also, the processor 110 may store the updated prediction model as a new prediction model, as shown in FIG. 15 .

이후, 프로세서(110)는 저장되어 있는 예측 모델 중에서 하나의 예측 모델을 선택할 수 있다(S240).Thereafter, the processor 110 may select one prediction model from among the stored prediction models (S240).

즉, 프로세서(110)는 도 16에 도시된 바와 같이, 저장되어 있는 예측 모델 중에서 검색 기준을 기반으로 복수개의 예측 모델(예측 모델 1, 예측 모델 2, ..., 예측 모델 n)을 선택할 수 있다.That is, as shown in FIG. 16, the processor 110 may select a plurality of prediction models (prediction model 1, prediction model 2, ..., prediction model n) from among the stored prediction models based on a search criterion. there is.

여기서, 검색 기준은 저장되어 있는 예측 모델 중에서 선택하고자 하는 예측 모델을 추출할 수 있는 소정의 조건을 말한다. 예컨대, 검색 기준에 따라, 저장되어 있는 예측 모델 중에서 최근에 생성된 10개의 서브 모델을 선택하고, 해당 서브 모델을 이용하여 생성된 메타 모델 중에서 최근에 생성된 100개의 예측 모델을 선택할 수 있다. 또는, 검색 기준은 최근에 재학습한 서브 모델 n개와 해당 서브 모델을 이용한 예측 모델 중 성능이 가장 좋은 m개 또는 해당 서브 모델을 이용한 예측 모델 중 최근에 만들어진 m개로 설정될 수 있고, 이에 따라 n*m개의 예측 모델이 선택될 수 있다.Here, the search criterion refers to a predetermined condition for extracting a predictive model to be selected from among stored predictive models. For example, according to a search criterion, 10 recently generated sub-models may be selected from among stored prediction models, and 100 recently generated prediction models may be selected from meta-models generated using the corresponding sub-models. Alternatively, the search criterion may be set to n recently retrained submodels and m models with the best performance among prediction models using the corresponding submodels or m recently created prediction models using the corresponding submodels. Accordingly, n *m predictive models can be selected.

그리고, 프로세서(110)는 도 16에 도시된 바와 같이, 현재 시점을 기준으로 미리 설정된 과거 기간에 대응되는 시계열 데이터로 이루어지는 평가 데이터를 기반으로 평가 데이터에 대한 최종 품질 예측값(최종 품질 예측값 1, 최종 품질 예측값 2, ..., 최종 품질 예측값 n)을 복수개의 예측 모델(예측 모델 1, 예측 모델 2, ..., 예측 모델 n) 각각에 대해 획득할 수 있다.And, as shown in FIG. 16, the processor 110, based on the evaluation data consisting of time-series data corresponding to a preset past period based on the current time point, the final quality prediction value (final quality prediction value 1, final Quality prediction value 2, ..., final quality prediction value n) may be obtained for each of a plurality of prediction models (prediction model 1, prediction model 2, ..., prediction model n).

그리고, 프로세서(110)는 도 16에 도시된 바와 같이, 복수개의 예측 모델 각각에 대한 최종 품질 예측값(최종 품질 예측값 1, 최종 품질 예측값 2, ..., 최종 품질 예측값 n) 및 평가 데이터에 대응되는 제품의 품질 측정값을 기반으로 복수개의 예측 모델(예측 모델 1, 예측 모델 2, ..., 예측 모델 n) 각각의 성능(성능 1, 성능 2, ..., 성능 n)을 평가할 수 있다.And, as shown in FIG. 16, the processor 110 corresponds to final quality prediction values (final quality prediction value 1, final quality prediction value 2, ..., final quality prediction value n) and evaluation data for each of a plurality of prediction models. It is possible to evaluate the performance (performance 1, performance 2, ..., performance n) of each of the plurality of prediction models (prediction model 1, prediction model 2, ..., prediction model n) based on the quality measurement value of the product being there is.

이때, 프로세서(110)는 회귀 예측 모델 성능 평가 방법을 이용하여 복수개의 예측 모델(예측 모델 1, 예측 모델 2, ..., 예측 모델 n) 각각의 성능을 평가할 수 있다.At this time, the processor 110 may evaluate the performance of each of the plurality of prediction models (prediction model 1, prediction model 2, ..., prediction model n) using a regression prediction model performance evaluation method.

여기서, 회귀 예측 모델 성능 평가 방법은 ME(mean of erros), RMSE(root mean of squared erros), MAE(mean of absolute errors), MPE(mean of percentage errors), MAPE(mean of absolute percentage errors) 및 MASE(mean of absolute scaled errors) 중 적어도 하나를 이용하여 회귀 예측 모델을 평가하는 방법일 수 있다.Here, the regression prediction model performance evaluation method is mean of erros (ME), root mean of squared erros (RMSE), mean of absolute errors (MAE), mean of percentage errors (MPE), mean of absolute percentage errors (MAPE) and It may be a method of evaluating a regression prediction model using at least one of mean of absolute scaled errors (MASE).

그리고, 프로세서(110)는 도 16에 도시된 바와 같이, 복수개의 예측 모델(예측 모델 1, 예측 모델 2, ..., 예측 모델 n)의 성능 평가 결과를 기반으로 복수개의 예측 모델(예측 모델 1, 예측 모델 2, ..., 예측 모델 n) 중에서 가장 성능이 좋은 하나의 예측 모델을 선택할 수 있다. 즉, 현재 데이터 추세 및 분포에 가장 잘 학습되고 좋은 성능을 보이는 예측 모델을 선택할 수 있다. 그러면, 프로세서(110)는 이후 선택된 하나의 예측 모델을 이용하여 입력 데이터를 기반으로 현재 시점에 대한 최종 품질 예측값을 획득할 수 있다.And, as shown in FIG. 16, the processor 110 selects a plurality of prediction models (prediction models) based on performance evaluation results of the plurality of prediction models (prediction model 1, prediction model 2, ..., prediction model n). One prediction model with the best performance can be selected from among 1, prediction model 2, ..., prediction model n). In other words, you can select a predictive model that is best trained for the current data trend and distribution and shows good performance. Then, the processor 110 may obtain a final quality prediction value for the current time point based on the input data using one selected prediction model.

예컨대, 도 18에 도시된 바와 같이, 저장되어 있는 예측 모델(관리 중인 Model 리스트) 중에서 검색 기준인 특정 조건을 기반으로 N개의 예측 모델을 선택할 수 있다. 최근 n개의 시계열 데이터인 평가 데이터를 기반으로 평가 기준 1(회귀 예측 모델 성능 평가 방법)을 이용하여 N개의 예측 모델의 성능을 평가할 수 있다. N개의 예측 모델의 성능 평가 결과를 기반으로 가장 성능이 좋은 "모델 a"를 예측 모델(Best Model)로 선정할 수 있다.For example, as shown in FIG. 18 , N predictive models may be selected from among stored predictive models (managed model list) based on a specific condition as a search criterion. Performance of N prediction models can be evaluated using evaluation criterion 1 (regression prediction model performance evaluation method) based on evaluation data, which are the latest n time series data. Based on the performance evaluation results of the N predictive models, “model a” with the best performance may be selected as the best model.

또한, 프로세서(110)는 도 17에 도시된 바와 같이, 회귀 예측 모델 설명 평가 방법을 이용하여 복수개의 예측 모델(예측 모델 1, 예측 모델 2, ..., 예측 모델 n)의 각각에 대한 설명 능력(설명 능력 1, 설명 능력 2, ..., 설명 능력 n)을 평가할 수 있다.In addition, as shown in FIG. 17, the processor 110 describes each of a plurality of prediction models (prediction model 1, prediction model 2, ..., prediction model n) using the regression prediction model description evaluation method. Ability (explainability 1, explanatory ability 2, ..., explanatory ability n) can be evaluated.

여기서, 회귀 예측 모델 설명 평가 방법은 결정 계수(R-squared) 등과 같이 모델이 데이터에 얼마나 적합한지를 판단하는 방법일 수 있다.Here, the regression prediction model description evaluation method may be a method for determining how well the model fits the data, such as a coefficient of determination (R-squared).

그리고, 프로세서(110)는 도 17에 도시된 바와 같이, 복수개의 예측 모델(예측 모델 1, 예측 모델 2, ..., 예측 모델 n)의 각각에 대한 설명 능력(설명 능력 1, 설명 능력 2, ..., 설명 능력 n) 및 미리 설정된 설명 평가 기준을 기반으로 학습 여부를 판단할 수 있다.And, as shown in FIG. 17, the processor 110 sets the explanatory capability (explanatory capability 1, explanatory capability 2) for each of the plurality of predictive models (prediction model 1, predictive model 2, ..., predictive model n). , ..., it is possible to determine learning based on explanation ability n) and preset explanation evaluation criteria.

즉, 프로세서(110)는 도 17에 도시된 바와 같이, 복수개의 예측 모델(예측 모델 1, 예측 모델 2, ..., 예측 모델 n) 전부의 설명 능력(설명 능력 1, 설명 능력 2, ..., 설명 능력 n)이 미리 설정된 설명 평가 기준보다 낮으면 평가 데이터를 기반으로 미리 설정된 개수의 서브 모델을 학습하여 새로운 예측 모델을 학습할 수 있다.That is, as shown in FIG. 17, the processor 110 has the explanatory capabilities (explanatory capability 1, explanatory capability 2, . .., a new predictive model may be learned by learning a preset number of sub-models based on the evaluation data when the explanatory ability n) is lower than the preset explanation evaluation criterion.

그리고, 프로세서(110)는 도 17에 도시된 바와 같이, 학습한 새로운 예측 모델을 저장할 수 있다.And, the processor 110 may store the learned new predictive model as shown in FIG. 17 .

예컨대, 도 18에 도시된 바와 같이, 저장되어 있는 예측 모델(관리 중인 Model 리스트) 중에서 검색 기준인 특정 조건을 기반으로 N개의 예측 모델을 선택할 수 있다. 최근 n개의 시계열 데이터인 평가 데이터를 기반으로 평가 기준 2(회귀 예측 모델 설명 평가 방법)을 이용하여 N개의 예측 모델의 설명 능력을 평가할 수 있다. N개의 예측 모델 전부의 설명 능력이 미리 설정된 설명 평가 기준보다 낮으면 평가 데이터를 기반으로 k개의 서브 모델과 하나의 메타 모델을 학습하여 새로운 예측 모델(new stacking model)을 학습할 수 있다.For example, as shown in FIG. 18 , N predictive models may be selected from among stored predictive models (managed model list) based on a specific condition as a search criterion. Based on the evaluation data, which are the latest n time series data, the explanatory ability of the N prediction models can be evaluated using evaluation criterion 2 (regression prediction model description evaluation method). If the explanatory abilities of all the N predictive models are lower than the preset explanatory evaluation criterion, a new stacking model may be learned by learning k sub-models and one meta-model based on the evaluation data.

본 실시예들에 따른 동작은 다양한 컴퓨터 수단을 통하여 수행될 수 있는 프로그램 명령 형태로 구현되어 컴퓨터 판독 가능한 저장 매체에 기록될 수 있다. 컴퓨터 판독 가능한 저장 매체는 실행을 위해 프로세서에 명령어를 제공하는데 참여한 임의의 매체를 나타낸다. 컴퓨터 판독 가능한 저장 매체는 프로그램 명령, 데이터 파일, 데이터 구조 또는 이들의 조합을 포함할 수 있다. 예컨대, 자기 매체, 광기록 매체, 메모리 등이 있을 수 있다. 컴퓨터 프로그램은 네트워크로 연결된 컴퓨터 시스템 상에 분산되어 분산 방식으로 컴퓨터가 읽을 수 있는 코드가 저장되고 실행될 수도 있다. 본 실시예를 구현하기 위한 기능적인(Functional) 프로그램, 코드, 및 코드 세그먼트들은 본 실시예가 속하는 기술 분야의 프로그래머들에 의해 용이하게 추론될 수 있을 것이다.Operations according to the present embodiments may be implemented in the form of program instructions that can be executed through various computer means and recorded in a computer readable storage medium. A computer readable storage medium refers to any medium that participates in providing instructions to a processor for execution. A computer readable storage medium may include program instructions, data files, data structures, or combinations thereof. For example, there may be a magnetic medium, an optical recording medium, a memory, and the like. The computer program may be distributed over networked computer systems so that computer readable codes are stored and executed in a distributed manner. Functional programs, codes, and code segments for implementing this embodiment may be easily inferred by programmers in the art to which this embodiment belongs.

본 실시예들은 본 실시예의 기술 사상을 설명하기 위한 것이고, 이러한 실시예에 의하여 본 실시예의 기술 사상의 범위가 한정되는 것은 아니다. 본 실시예의 보호 범위는 아래의 청구범위에 의하여 해석되어야 하며, 그와 동등한 범위 내에 있는 모든 기술 사상은 본 실시예의 권리범위에 포함되는 것으로 해석되어야 할 것이다.These embodiments are for explaining the technical idea of this embodiment, and the scope of the technical idea of this embodiment is not limited by these embodiments. The scope of protection of this embodiment should be construed according to the claims below, and all technical ideas within the scope equivalent thereto should be construed as being included in the scope of rights of this embodiment.

100 : 시계열 데이터 회귀 예측 장치,
110 : 프로세서,
130 : 컴퓨터 판독 가능한 저장 매체,
131 : 프로그램,
150 : 통신 버스,
170 : 입출력 인터페이스,
190 : 통신 인터페이스100: time series data regression predictor,
110: processor,
130: computer readable storage medium,
131: program,
150: communication bus,
170: input/output interface,
190: communication interface

Claims

obtaining first learning data based on time-series data obtained in a product production process;
Based on the first learning data and the quality measurement value of the product corresponding to the first learning data. learning a preset number of sub-models using the first learning data as an input and outputting a quality prediction value of the product;
obtaining second training data based on a preset number of quality prediction values that are outputs of a preset number of sub-models;
learning a meta-model having the quality prediction value as an input and a final quality prediction value of the product as an output, based on the second learning data and the quality measurement value; and
obtaining one prediction model for regression prediction of the time series data based on a preset number of sub-models and one meta-model;
A method for learning a prediction model for regression prediction of time series data including

In paragraph 1,
In the sub-model learning step,
A plurality of the sub-models are learned using a K-fold cross-validation method based on the first training data and the quality measurement value, and a preset number of the sub-models are based on performance evaluation results of the plurality of sub-models. Consisting of obtaining
A method for training a predictive model for regression forecasting of time series data.

In paragraph 2,
In the sub-model learning step,
The regression prediction model performance evaluation method is used to evaluate the performance of each of the plurality of sub-models based on the quality prediction value and the quality measurement value, which are outputs of the sub-model, and based on the performance evaluation result of the plurality of sub-models Consisting of acquiring a preset number of the sub-models,
A method for training a predictive model for regression forecasting of time series data.

In paragraph 3,
The regression prediction model performance evaluation method,
Among the mean of erros (ME), root mean of squared erros (RMSE), mean of absolute errors (MAE), mean of percentage errors (MPE), mean of absolute percentage errors (MAPE), and mean of absolute scaled errors (MASE). A method of evaluating a regression prediction model using at least one,
A method for training a predictive model for regression forecasting of time series data.

In paragraph 1,
In the meta model learning step,
Learning one meta-model using a K-fold cross-validation method based on the second learning data and the quality measurement value,
A method for training a predictive model for regression forecasting of time series data.

In paragraph 1,
The step of acquiring the predictive model,
Acquiring one prediction model by combining a preset number of sub-models and one meta-model,
A method for training a predictive model for regression forecasting of time series data.

In paragraph 1,
The first learning data acquisition step,
Based on the time series data consisting of a plurality of time series samples for each of a plurality of time series variables, a statistical value of the time series variable is obtained for each of the plurality of time series variables, and based on the statistical value for each of the plurality of time series variables Consisting of acquiring the first learning data,
A method for training a predictive model for regression forecasting of time series data.

In paragraph 1,
The sub model is
A machine learning algorithm or statistical algorithm that performs regression prediction;
The meta model,
A machine learning algorithm or statistical algorithm that performs updates through parameters representing the relationship between inputs and outputs while performing regression prediction,
A method for training a predictive model for regression forecasting of time series data.

Obtaining input data for a current point in time based on time series data obtained during the production process of a product; and
obtaining a final quality prediction value of the product at the current point in time based on the input data by using a pre-learned and built predictive model;
Including,
The predictive model,
A model obtained by combining a preset number of submodels that take the input data as input and the quality prediction value of the product as output and one meta-model that takes the quality prediction value as input and the final quality prediction value as output,
A prediction method using a predictive model for time series data regression prediction.

In paragraph 9,
The step of obtaining the final quality prediction value,
Obtaining a final quality prediction candidate value for the current point in time based on the input data using the prediction model;
Obtaining the final quality prediction value for the current time point by using a quality measurement value for a past time point according to a preset window size based on the final quality prediction candidate value for the current time point.
A prediction method using a predictive model for time series data regression prediction.

In paragraph 9,
The performance of the predictive model is evaluated based on the quality measurement value for the current time point and the final quality prediction value for the current time point, and the quality measurement value for the current time point and the final quality prediction value for the current time point are evaluated. updating the predictive model by adjusting a parameter indicating a relationship between an input and an output of the meta-model of the predictive model based on the base, and storing the updated predictive model as the new predictive model;
Prediction method using a prediction model for time series data regression prediction further comprising.

In paragraph 9,
A plurality of the predictive models are selected based on a search criterion from among the stored predictive models, and based on the evaluation data consisting of the time-series data corresponding to a preset past period based on the current point in time, the evaluation data for the evaluation data is selected. A final quality prediction value is obtained for each of the plurality of prediction models, and based on the final quality prediction value for each of the plurality of prediction models and the quality measurement value of the product corresponding to the evaluation data, each of the plurality of prediction models Evaluating performance and selecting one of the plurality of prediction models based on performance evaluation results of the plurality of prediction models;
Including more,
The step of obtaining the final quality prediction value,
Acquiring the final quality prediction value for the current time point based on the input data using one of the prediction models selected from among the plurality of prediction models.
A prediction method using a predictive model for time series data regression prediction.

In paragraph 12,
The predictive model selection step,
Using a regression prediction model performance evaluation method, the performance of each of the plurality of prediction models is evaluated based on the final quality prediction value for each of the plurality of prediction models and the quality measurement value of the product corresponding to the evaluation data, and Selecting one of the plurality of prediction models based on performance evaluation results of the prediction models;
Evaluate the explanatory ability of each of a plurality of the predictive models using a regression prediction model explanation evaluation method, and if the explanatory ability of all of the plurality of the predictive models is lower than the preset explanation evaluation criterion, the explanation capability is preset based on the evaluation data. Learning the number of sub-models to learn a new predictive model, and storing the learned new predictive model.
A prediction method using a predictive model for time series data regression prediction.