KR102238424B1

KR102238424B1 - System Modeling Method by Machine Learning using Big data

Info

Publication number: KR102238424B1
Application number: KR1020190089769A
Authority: KR
Inventors: 양영진; 유호동; 김탁곤
Original assignee: (주)아인스에스엔씨
Priority date: 2019-07-24
Filing date: 2019-07-24
Publication date: 2021-04-09
Also published as: KR20190120729A

Abstract

본 발명은 빅 데이터 기계학습이 내장된 시스템 모델링방법에 관한 것으로, 가설적 모델을 세우고 실제 시스템에서 획득한 빅데이터를 기계학습을 이용하여 검증된 파라미터 값들을 산출하고, 이를 가설적 모델에 적용하도록 하는데 그 목적이 있다.
본 발명의 목적은 대상 시스템의 지식획득을 통해 정의된 가설적모델에 상기 대상시스템의 운영 및 관측으로 획득된 빅데이터에 대한 기계학습을 통해 대상시스템에 대한 시뮬레이션 모델을 완성하도록 하는 시스템 모델링방법에 있어서, 대상 시스템과 관련된 획득 가능한 정보를 파악하여 대상시스템에 대한 가설적모델을 정의하는 과정; 가설적모델에서 제공하는 기능블록에 상기 대상시스템에서 실제 획득된 빅데이터를 기계학습 알고리즘을 이용하여 가설적모델에 필요한 정보를 학습하는 과정; 기계학습 과정을 통해 학습 및 검증된 정보들을 가설적 모델에 적용하여 대상시스템에 대한 시뮬레이션모델을 완성하는 과정;을 포함하여 이루어진 것을 특징으로 한다.The present invention relates to a system modeling method in which big data machine learning is embedded, in which a hypothetical model is established, and parameter values verified using machine learning are calculated using big data obtained from an actual system, and then applied to the hypothetical model It has its purpose.
An object of the present invention is to provide a system modeling method for completing a simulation model for a target system through machine learning on big data acquired through operation and observation of the target system in a hypothetical model defined through knowledge acquisition of a target system. In this regard, the process of defining a hypothetical model for the target system by identifying obtainable information related to the target system; Learning information necessary for a hypothetical model by using a machine learning algorithm with big data actually acquired from the target system in a functional block provided by a hypothetical model; And a process of completing a simulation model for a target system by applying information learned and verified through a machine learning process to a hypothetical model.

Description

System Modeling Method by Machine Learning using Big data}

본 발명은 물리적인 복잡계(대상시스템)을 모델링하고 시뮬레이션하는 기술에 관한 것으로, 보다 구체적으로는 실제 시스템에서 획득한 빅 데이터(Big Data)를 모델에 맞도록 기계학습(Machine Learning)시키고, 검증된 파라미터 값들을 산출하고, 이를 가설적 모델에 적용하도록 하는 빅 데이터 기계학습이 내장된 시스템 모델링 방법에 관한 것이다.The present invention relates to a technology for modeling and simulating a physical complex system (target system), and more specifically, machine learning to fit the model of big data acquired from an actual system, and verified The present invention relates to a system modeling method with built-in big data machine learning that calculates parameter values and applies them to a hypothetical model.

삭제delete

일반적으로 실 세계의 시스템에 대한 동작/성능 분석이나 예측을 수행하기 위해 해당 시스템에 대한 추상화된 모델을 만들고 실행하여 동작/성능 등 관심 측면의 척도를 측정/관찰하게 된다. 이러한 모델을 통한 대상 시스템의 분석/예측 결과가 신뢰성을 확보하기 위해서는 해당 시스템을 얼마나 잘(정확하게) 모델링 하는지가 중요한 관건이 된다. In general, in order to perform motion/performance analysis or prediction for a real-world system, an abstracted model is created and executed to measure/observe measures of interest aspects such as motion/performance. In order to ensure the reliability of the analysis/prediction results of the target system through such a model, how well (accurately) the system is modeled becomes an important issue.

모델링의 한 방법으로 대상 시스템이 포함하는 물리적인 법칙이나 동작 규칙 등의 지식을 활용하여 추상화 모델을 만드는 방법이 있는데, 이는 제어된 입력과 해당 출력 사이의 인과관계를 모델로 표현할 수 있는 모델링 및 시뮬레이션(M&S) 기반의 방법이다. 이 방법은 모델링 대상이 되는 시스템에 대한 자세한 정보들이 가용해야 한다는 제한사항이 있다. 또한 모델링 및 시뮬레이션(M&S) 기반의 방식으로 모델을 구축하는 경우에 그 모델의 유효성을 보장하기 위해서 실제 시스템을 얼마나 잘 반영하는지에 대한 모델실증 과정이 필수적이다. 실 세계 시스템으로부터 데이터 획득이 어려울 경우에 모델의 유효성을 확인할 수 없기 때문에 그 모델 기반의 분석/예측 결과에 대한 신뢰성을 보장할 수 없는 문제가 생길 수 있다. As a method of modeling, there is a method of creating an abstraction model using knowledge of the physical laws or rules of operation included in the target system. This is modeling and simulation that can express the causal relationship between the controlled input and the corresponding output as a model. It is a (M&S) based method. This method has the limitation that detailed information about the system to be modeled must be available. In addition, when a model is built in a modeling and simulation (M&S)-based method, it is essential to verify the model how well it reflects the actual system in order to ensure the validity of the model. When it is difficult to obtain data from a real-world system, since the validity of the model cannot be verified, there may be a problem that the reliability of the analysis/prediction results based on the model cannot be guaranteed.

시스템의 예측 및 분석을 위한 다른 모델링 방법으로 대상 시스템의 운용 및 관측을 통해 획득된 많은 데이터들을 분석함으로써 해당 시스템이 내포하고 있는 규칙/패턴/함수를 도출하는 형태의 데이터 모델링 방법이 있다. 데이터 모델링의 대표적인 방법이라 할 수 있는 기계학습 기반의 모델은 한 세트의 데이터와 다른 데이터 세트 간의 상관관계를 나타내는 방법으로, 빅 데이터 시대에 들어서며 수 많은 데이터를 활용하여 좀 더 효과적인 기계학습이 가능케 되었다. 그러나 기계학습을 통해 구축된 데이터 모델은 해당 시스템이 미래에도 아무런 변경이 없이 운용된다는 전제하에 예측이 가능할 수 있지만, 시스템의 구성 또는 운용 법칙 등이 변경되면 이전에 학습된 모델로 미래를 예측할 수 없다는 한계점이 있다.As another modeling method for predicting and analyzing a system, there is a data modeling method in which the rules/patterns/functions contained in the corresponding system are derived by analyzing a large number of data acquired through operation and observation of the target system. The machine learning-based model, which is a representative method of data modeling, is a method that represents the correlation between one set of data and another data set. In the era of big data, more effective machine learning has become possible by utilizing a large number of data. . However, the data model built through machine learning can be predicted under the premise that the system will operate without any change in the future, but if the composition or operating laws of the system change, the future cannot be predicted with the previously learned model. There are limitations.

다각화 된 사회를 예측하는 수단으로 '빅 데이터'라는 용어가 널리 보급되고 있다. 빅 데이터는 일반적인 소프트웨어 도구가 허용 가능한 경과 시간 내에 수집, 관리 및 처리 할 수 있는 능력을 뛰어 넘는 크기의 데이터 셋을 의미한다. 이러한 대용량 데이터는 기존의 제한된 데이터보다 더 많은 통찰력을 제공하기 때문에 과학, 공학, 국방, 경영, 의학, 정치 등 다양한 분야의 연구에서 많은 관심을 받고 있다. 이러한 이유로 빅 데이터를 활용한 모델링이 빅 데이터 시대에 필수적이고 중요한 이슈가 되고 있다.The term'big data' is widely spread as a means of predicting a diversified society. Big data refers to a set of data that exceeds the ability of common software tools to collect, manage, and process within acceptable elapsed time. Because such large amounts of data provide more insight than existing limited data, it is receiving a lot of interest in research in various fields such as science, engineering, defense, business, medicine, and politics. For this reason, modeling using big data is becoming an essential and important issue in the era of big data.

빅 데이터를 이용한 모델링은 데이터의 상관관계를 나타내는 데 중점을 둔 데이터 모델링이라고 정의할 수 있다. 이러한 접근법은 데이터 마이닝(data mining)과 기계학습(machine learning)의 두 가지 유형으로 분류되어 연구되고 있다. 도 1은 두 가지 방식으로 데이터 모델링을 수행하는 방법을 보여준다. Modeling using big data can be defined as data modeling that focuses on showing the correlation of data. These approaches are classified into two types, data mining and machine learning, and are being studied. 1 shows a method of performing data modeling in two ways.

데이터 마이닝은 도 1의 왼쪽과 같이 데이터 모델링의 유용한 방법 중에 하나이다. 데이터 마이닝 기술을 사용함으로써 시스템 결과를 예측하고자 하는 사용자는 한 차원에서 데이터의 패턴이나 속성을 분석할 수 있을 뿐 아니라, 데이터 패턴에서 데이터의 분포함수를 식별할 수 있다. Data mining is one of the useful methods of data modeling as shown on the left of FIG. 1. By using data mining technology, users who want to predict system outcomes can not only analyze data patterns or attributes in one dimension, but also identify distribution functions of data in data patterns.

그런 다음 적합성 테스트(GOF; Goodness of Fitness) 같은 방법을 사용하여 실 세계의 데이터로 분포함수를 검증한 후 "난수 생성" 모델을 얻을 수 있다. 최종적으로 얻어진 모델은 미래의 데이터 패턴을 예측하는 과정에서 활용될 수 있다. You can then use a method such as Goodness of Fitness (GOF) to verify the distribution function with real-world data, and then obtain a "random number generation" model. The finally obtained model can be used in the process of predicting future data patterns.

한편, 기계학습은 도 1의 우측에서와 같이 데이터 모델링의 또 다른 수단이 될 수 있다. 인공신경망(ANN) 및 유전자 알고리즘(GE)과 같은 기계학습 알고리즘을 사용하여 사용자는 한 세트의 데이터(d1)와 다른 세트의 데이터(dn) 사이의 연관을 매핑할 수 있다. Meanwhile, machine learning can be another means of data modeling as shown in the right side of FIG. 1. Using machine learning algorithms such as artificial neural networks (ANNs) and genetic algorithms (GE), users can map associations between one set of data d1 and another set of data dn.

데이터 마이닝 과정과 마찬가지로 RMSE (Root-Mean-Square Error)와 같은 일반 성능 지수를 사용하여 실제 맵 데이터의 유효성 검사 프로세스를 통과함으로써 데이터 모델을 획득할 수 있다. 그 후 주어진 데이터 세트 "d1"을 사용하여 데이터 세트 "dn"의 미래 값을 예측할 수 있다.Similar to the data mining process, a data model can be obtained by passing the validation process of the actual map data using a general figure of merit such as RMSE (Root-Mean-Square Error). The future value of the data set "dn" can then be predicted using the given data set "d1".

이러한 데이터 모델링은 획득, 모델링, 검증 및 예측과 같은 일련의 과정으로 진행된다. 데이터 모델링은 대상 시스템의 미래 행동을 예측하기 위해 과학, 공학, 경제, 산업 등 여러 분야에서 폭넓게 사용되어 왔으며, 일부 연구자들은 충분한 정보가 주어지면 상관관계가 강력하고 유익한 예측을 내리기에 충분하다고 주장했다. This data modeling proceeds as a series of processes such as acquisition, modeling, verification and prediction. Data modeling has been widely used in various fields such as science, engineering, economy, and industry to predict the future behavior of a target system, and some researchers have argued that given enough information, the correlation is sufficient to make strong and informative predictions. .

그러나 이러한 기대와는 달리, 데이터 모델링이 언제나 강력한 모델링 방식이 되는 것을 의미하지 않는다. 이 방법은 몇 가지 한계를 가지는데, 대표적인 한계 중 하나는 제어 입력과 해당 출력 간의 인과관계를 나타내는 것이 아니라 데이터 간의 상관관계를 설명할 수 있다는 것이다. Contrary to these expectations, however, data modeling doesn't mean that it will always be a powerful modeling approach. This method has several limitations. One of the typical limitations is that it can account for correlations between data, rather than representing a causal relationship between control inputs and their outputs.

데이터 모델을 통해서는 시스템의 돌발 상황과 변화하는 상황에 대처할 수 없다. 즉, 모델을 학습한 후에 시스템의 구성요소 또는 구조/행위가 변경되면 데이터 모델을 이용한 정확한 예측이 불가능하다. The data model cannot cope with unexpected and changing situations in the system. That is, if the components or structure/behavior of the system are changed after learning the model, accurate prediction using the data model is impossible.

또 다른 한계는 예기치 않은 이벤트에 대처할 수 없다는 것이다. 실제 시스템에서는 시스템의 복잡성과 불확실성으로 인해 예기치 않은 이벤트가 발생할 수 있다. 이들은 일반적으로 우리가 획득할 수 있는 데이터 세트에 포함되지 않는데, 이러한 이벤트가 발생하면 원래 데이터 세트에 기초한 데이터 모델은 예상치 못한 이벤트를 정확하게 예측할 수 없다. Another limitation is the inability to cope with unexpected events. In real systems, unexpected events can occur due to the complexity and uncertainty of the system. These are generally not included in the data set we can acquire, and when such an event occurs, a data model based on the original data set cannot accurately predict unexpected events.

또한 이와 비슷하게 대상 시스템으로부터 획득된 데이터의 양에 따라 예측 결과에 영향을 받는다는 한계가 있다.In addition, similarly, there is a limitation that the prediction result is affected by the amount of data acquired from the target system.

이러한 데이터 모델링의 한계를 극복하기 위해 시스템 과학 기반의 시뮬레이션 모델링이 필요하다. 시뮬레이션 모델링은 시뮬레이션 필드에서 일반적으로 사용되는 이론 기반 모델링 방식으로 정의될 수 있으며, 모델을 구축하기 위해 대상 시스템에 존재하는 물리적 또는 운영 법칙이 사용된다. Systems science-based simulation modeling is required to overcome these limitations of data modeling. Simulation modeling can be defined as a theory-based modeling method commonly used in the simulation field, and physical or operational laws existing in the target system are used to build the model.

이를 통해 데이터 모델링과 달리 제어 입력과 대응 출력 세트 사이의 인과관계를 명확하게 나타내는 것이 가능하다. 그럼에도 불구하고 시뮬레이션 모델링만으로도 빅 데이터 시대에 복잡한 시스템의 모델링을 위한 완벽한 해결책이 될 수 없다. This makes it possible to clearly represent the causal relationship between the control input and the corresponding output set, unlike data modeling. Nevertheless, simulation modeling alone cannot be a perfect solution for modeling complex systems in the era of big data.

예를 들어, 시스템에 대한 지식을 충분히 얻는 것이 어려울 때, 목적을 만족시키는 모델링 및 시뮬레이션(M&S) 모델을 완전하게 구축하는 것은 불가능하다. 시뮬레이션 모델링 접근법은 대상 시스템에 대한 사전 지식을 기반으로 하며, 그 완성은 시스템에 대해 얼마나 많이 이해할 수 있는지에 달려 있기 때문이다. 정확한 시뮬레이션 모델링을 위해 대상 시스템에 대한 광범위한 물리적 및 운영 지식이 필요하다. For example, when it is difficult to obtain sufficient knowledge about the system, it is impossible to completely build a modeling and simulation (M&S) model that satisfies the objective. This is because the simulation modeling approach is based on prior knowledge of the target system, and its completion depends on how much you can understand about the system. Extensive physical and operational knowledge of the target system is required for accurate simulation modeling.

또한 시뮬레이션 모델이 만들어진 후에 모델검증을 통한 모델의 유효성을 확인하는 과정이 필요한데, 실제 시스템에서 검증용 데이터가 없거나 획득이 어려울 경우에는 모델의 유효성 확인이 곤란하게 된다. In addition, after the simulation model is created, it is necessary to check the validity of the model through model verification. If there is no data for verification in the actual system or it is difficult to obtain, it is difficult to check the validity of the model.

이와 같이 두 가지 모델링 방식에는 한계점이 명확하게 존재한다. As such, there are clearly limitations to the two modeling methods.

도 2는 이러한 한계점을 간단한 예를 통해 설명하고 있다. 도 2의 상단에 나타난 x1, x2, x3, x4, y를 입출력으로 가지는 시스템을 모델링 한다고 했을 때, 시뮬레이션 모델링 방법을 사용한 경우는 오른쪽 도안과 같다. 시스템 모델은 대상 시스템에 대한 수학적인 지식을 통해서 구축 될 수 있지만, 모델을 통해 출력되는 출력 값이 실제 시스템에서 획득된 데이터를 통해서 실증되는 경우에 유효하게 된다. 2 illustrates this limitation through a simple example. Assuming that a system having x1, x2, x3, x4, and y as inputs and outputs shown at the top of FIG. 2 is modeled, the case of using the simulation modeling method is as shown in the right diagram. The system model can be constructed through mathematical knowledge of the target system, but it becomes effective when the output value output through the model is verified through the data acquired from the actual system.

반대로 왼쪽 도안과 같이 시스템을 운용하여 얻은 x1, x2, x3, x4, y에 대한 데이터를 통해 기계학습을 할 경우에는 x1, x2, x3, x4 값을 입력했을 때 y를 출력하는 정확도 높은 모델을 얻을 수 있지만 도 2의 상단의 시스템 운용 법칙을 변경하는 경우(예를 들어, 도 2의 상단의 우측의 시스템 운용 법칙("X", 곱하기)을 나누기로 변경하는 경우) 기존에 획득된 빅데이터를 통해 나온 기계학습 모델에 의해서는 시스템 운용 법칙이 변경된 연구대상 시스템의 출력 y를 적절히 예측하지 못한다는 한계가 있다. On the contrary, in the case of machine learning based on the data on x1, x2, x3, x4, y obtained by operating the system as shown in the diagram on the left, a high-accuracy model that outputs y when x1, x2, x3, x4 values are input is used. It can be obtained, but if the system operation law at the top of Fig. 2 is changed (for example, when the system operation law ("X", multiply) at the upper right of Fig. 2 is changed by division), the previously acquired big data There is a limitation in that the machine learning model derived through the system cannot properly predict the output y of the system under study whose system operation laws have changed.

미국공개특허 제2017/0286572호 "Digital twin of twinned physical system" (2017년 10월 5일)US Patent Publication No. 2017/0286572 "Digital twin of twinned physical system" (October 5, 2017)

상기에 기재한 종래기술의 문제점을 극복하기 위하여, 보다 강건한 분석/예측 지원이 가능하도록 두 모델링 방식의 장점을 상호보완적으로 활용하여 각 접근방법의 한계점을 극복할 수 있는 상호협력적인 방법이 필요하다. 따라서, 본 발명은 종래기술의 문제점을 개선하기 위하여, 가설적 모델을 세우고, 실제 시스템에서 획득한 빅 데이터(Big data)를 기계학습(Machine Learning)을 이용하여 검증된 파라미터 값들을 산출하며, 이를 가설적 모델(Gray Box)에 적용하도록 하는 빅 데이터 기계학습이 내장된 시스템 모델링방법을 제공하는데 그 목적이 있다.In order to overcome the problems of the prior art described above, there is a need for a mutually collaborative method that can overcome the limitations of each approach by complementarily utilizing the advantages of the two modeling methods to enable more robust analysis/prediction support. Do. Therefore, in order to improve the problems of the prior art, the present invention establishes a hypothetical model, calculates parameter values verified using machine learning, and calculates the big data obtained from the actual system. Its purpose is to provide a system modeling method with built-in big data machine learning that can be applied to a hypothetical model (Gray Box).

본 발명의 목적을 달성하기 위한 빅 데이터 기계학습이 내장된 시스템 모델링과정은 대상시스템의 지식획득을 통해 정의된 가설적모델에 상기 대상시스템의 운영 및 관측으로 획득된 빅데이터에 대한 기계학습을 통해 대상시스템에 대한 시뮬레이션 모델을 완성하도록 하는 시스템 모델링방법으로서, 상기 대상시스템과 관련된 획득 가능한 정보를 파악하여 대상시스템에 대한 가설적모델을 정의하는 제1과정; 상기 가설적모델에서 제공하는 기능블록에 상응하여 상기 대상시스템에서 실제 획득된 빅데이터를 기계학습 알고리즘을 통해 가설적모델에 필요한 정보를 학습하는 제2과정; 상기 기계학습 과정을 통해 학습 및 검증된 정보들을 가설적 모델에 적용하여 대상시스템에 대한 시뮬레이션모델을 완성하는 제3과정;을 포함하여 이루어진 것을 특징으로 한다.The system modeling process with built-in big data machine learning to achieve the object of the present invention is through machine learning on big data acquired through operation and observation of the target system in a hypothetical model defined through knowledge acquisition of the target system. A system modeling method for completing a simulation model for a target system, comprising: a first step of defining a hypothetical model for a target system by identifying obtainable information related to the target system; A second process of learning information necessary for a hypothetical model through a machine learning algorithm on the big data actually obtained from the target system according to the functional blocks provided by the hypothetical model; And a third process of completing a simulation model for a target system by applying the information learned and verified through the machine learning process to a hypothetical model.

여기서, 상기 제2과정은 가설적모델 내부의 다수의 기능블록은 각 기능블록 별로 기계학습을 수행하여 검증된 파라미터값을 산출하여 이를 상기 가설적모델에 필요한 정보로 제공하는 것을 특징으로 한다.Here, the second process is characterized in that the plurality of functional blocks inside the hypothetical model perform machine learning for each functional block to calculate a verified parameter value, and provide it as information necessary for the hypothetical model.

상기 파라미터값은 변수값, 확률, 함수 또는 그래프를 포함하며, 이는 상기 기능블록 안에 들어가는 것을 특징으로 한다.The parameter value includes a variable value, a probability, a function or a graph, which is characterized in that it is placed in the function block.

또한, 상기 가설적모델의 한가지 형태로서 세포 오토마타(Cellular Automata)모델을 사용하고, 상기 빅 데이타를 이용하여 기계학습을 이용하여 가설적 모델에 필요한 셀(cell)의 상태천이 함수가 학습되고, 이것을 세포 오토마타 모델에 제공하는 것을 특징으로 한다.In addition, as one form of the hypothetical model, a cellular automata model is used, and the state transition function of a cell required for the hypothetical model is learned using machine learning using the big data. It is characterized in that it is provided in a cell automata model.

본 발명에 따른 빅 데이터 기계학습이 내장된 시스템 모델링방법은 대상 시스템의 운영 및 관측을 통해 획득된 실제 빅데이터를 이용한 기계학습 내용을 내장함으로써, 모델 자체가 검증된 모델이 되며, 각각의 방법만을 사용하여 관심대상 시스템을 분석 또는 예측할 때 직면할 수 있는 한계점들을 극복할 수 있다.In the system modeling method with built-in big data machine learning according to the present invention, the model itself becomes a verified model by embedding the contents of machine learning using real big data acquired through operation and observation of the target system. Can be used to overcome limitations that may be encountered when analyzing or predicting the system of interest.

즉, 기계학습을 시스템모델에 내장 시킴으로써, 대상 시스템에 대해 요구되는 사전지식의 정도를 경감시키고 모델 실증의 효과를 달성하는 한편, 기계학습만으로 할 수 없었던 시스템 구조/규칙 변경에 따른 시스템 행위 분석 및 예측이 가능해진다. In other words, by embedding machine learning into the system model, it reduces the degree of prior knowledge required for the target system and achieves the effect of model verification, while analyzing the system behavior according to system structure/rule changes that could not be done with machine learning alone. Prediction becomes possible.

또한 본 발명은 산불, 교통, 질병 등 지형 그리드를 기반으로 하는 모든 시스템에 적용이 용이하다.In addition, the present invention can be easily applied to all systems based on the terrain grid, such as forest fires, traffic, and diseases.

또한, 빅 데이터와 기계학습을 통해 세포 오토마타 모델의 상태천이 함수를 학습하기 때문에 다음과 같은 이점을 갖는다. 첫째는 공간변이(space variant)에 대한 특징으로, 상태천이 함수가 전체 셀에 대해서 일괄적으로 표현되는 것이 아니고 개별 셀에 따라 다르게 표현될 수 있다. 예를 들어 위치에 따라 달라지는 지형의 특징을 셀 별로 반영할 수 있다. In addition, it has the following advantages because it learns the state transition function of the cell automata model through big data and machine learning. First, as a feature of space variant, the state transition function is not collectively expressed for all cells, but can be expressed differently for individual cells. For example, it is possible to reflect the features of the terrain that vary depending on the location for each cell.

둘째는 시간변이(time variant)에 대한 특징으로, 상태천이 함수가 오전, 오후 등의 시간에 따라 달라질 수 있는 특징을 반영할 수 있다. The second is a feature of a time variant, and the state transition function can reflect a feature that may vary depending on the time of day, such as morning or afternoon.

셋째는 확률(stochastic)에 대한 특징으로, 예를 들어 교통 시뮬레이션에서 운전자의 행위가 결정적인 것이 아니라 확률적으로 나타날 수 있기 때문에 빅 데이터와 기계학습을 통해 확률적인(stochastic) 특징을 반영하는 것이 중요하다.Third, it is important to reflect the stochastic feature through big data and machine learning because the driver's behavior can appear probabilistically rather than deterministic in traffic simulations, for example. .

또한, 본 발명은 시스템의 비선형(nonlinear) 특징들을 반영할 수 있다. 즉 실제 세계의 많은 시스템들은 입출력 사이에 중첩의 정리가 성립이 되지 않는 비선형적인 특징을 갖기 때문에 상태천이 함수가 비선형(nonlinear) 특징을 고려할 수 있다는 것은 큰 이점이 된다. 마지막으로 실제 시스템으로부터 얻어진 빅 데이터를 통해 기상 조건 등 추가적인 정보를 표현할 수 있다는 효과가 있다.In addition, the present invention may reflect the nonlinear characteristics of the system. In other words, since many systems in the real world have nonlinear characteristics in which the superposition between inputs and outputs is not established, it is a great advantage that the state transition function can take into account nonlinear characteristics. Finally, there is an effect that additional information such as weather conditions can be expressed through big data obtained from the actual system.

도 1은 종래기술에 따른 데이터 마이닝과 기계학습의 두 가지 방식으로 데이터 모델링을 수행하는 과정을 설명하기 위한 도이고,
도 2는 도 1의 두 가지 모델링 방식의 한계점을 설명하기 위한 도이고,
도 3은 본 발명을 구현하기 위한 가설적 모델에 빅 데이터 기계학습 적용을 통한 모델링 구성도이고,
도 4는 본 발명의 실시 예에 따른 빅 데이터 기계학습이 내장된 시스템 모델링 과정의 전체 흐름도이고,
도 5는 도 3의 구체적인 실시 예에 대하여 설명하기 위한 구성도이고,
도 6은 도 5에서 가설적모델에 적용된 세포 오토마타의 정의와 상태 갱신의 방식을 설명하기 위한 도이고,
도 7은 도 6의 세포 오토마타에서 각 세포의 상태천이 함수에 대해 설명하기 위한 도면이다.1 is a diagram for explaining a process of performing data modeling in two ways, data mining and machine learning according to the prior art,
2 is a diagram for explaining the limitations of the two modeling methods of FIG. 1,
3 is a modeling configuration diagram through application of big data machine learning to a hypothetical model for implementing the present invention,
4 is an overall flowchart of a system modeling process in which big data machine learning is embedded according to an embodiment of the present invention.
5 is a configuration diagram for explaining a specific embodiment of FIG. 3,
6 is a diagram for explaining the definition of a cell automata applied to the hypothetical model in FIG. 5 and a method of updating the state;
7 is a diagram for explaining a state transition function of each cell in the cell automata of FIG. 6.

상술한 본 발명의 특징 및 효과는 첨부된 도면과 관련한 다음의 상세한 설명을 통하여 보다 분명해질 것이며, 그에 따라 본 발명이 속하는 기술분야에서 통상의 지식을 가진 자가 본 발명의 기술적 사상을 용이하게 실시할 수 있을 것이다. The features and effects of the present invention described above will become more apparent through the following detailed description in connection with the accompanying drawings, and accordingly, those of ordinary skill in the technical field to which the present invention pertains can easily implement the technical idea of the present invention. I will be able to.

본 발명은 다양한 변경을 가할 수 있고 여러 가지 형태를 가질 수 있는 바, 특정 실시 예들을 예시하고 본문에 상세하게 설명하고자 한다. In the present invention, various modifications may be made and various forms may be applied, and specific embodiments will be exemplified and described in detail in the text.

그러나, 이는 본 발명을 특정한 제시형태에 대해 한정하려는 것이 아니며, 본 발명의 사상 및 기술 범위에 포함되는 모든 변경, 균등물 내지 대체물을 포함하는 것으로 이해되어야 한다. However, this is not intended to limit the present invention to a specific form of presentation, it should be understood to include all changes, equivalents, and substitutes included in the spirit and scope of the present invention.

본 출원에서 사용한 용어는 단지 특정한 실시 예들을 설명하기 위해 사용된 것으로, 본 발명을 한정하려는 의도가 아니다.The terms used in the present application are only used to describe specific embodiments, and are not intended to limit the present invention.

본 발명은 실제 시스템에서 획득한 빅 데이터(Big data)를 모델에 맞도록 기계학습(Machine Learning) 시키므로, 검증된 파라미터 값들을 산출하고, 이를 가설적 모델(Gray Box)에 적용하도록 하는 빅 데이터 기계학습이 내장된 시스템 모델링방법에 관한 것이다. The present invention is a big data machine that calculates validated parameter values and applies it to a hypothetical model (Gray Box) because machine learning is performed to fit the model of big data acquired from an actual system. It relates to a system modeling method with built-in learning.

본 발명의 바람직한 실시 예에 따른 구성 및 구체적인 작용에 대하여 첨부된 도 3 내지 도 6을 참조하여 상세히 설명하면 다음과 같다. The configuration and specific operation according to a preferred embodiment of the present invention will be described in detail with reference to FIGS. 3 to 6.

먼저, 본 발명은 모델링 및 시뮬레이션 기반의 모델링에 기계학습 방법을 적용한 협력적 모델링 방법으로, 본 발명의 대상시스템에서 실제 획득된 빅 데이터를 이용한 기계학습 방법을 활용하여, 가설적 모델 구체화에 필요한 동작/기능 함수, 파라미터 등을 예측하도록 한다.First, the present invention is a cooperative modeling method in which a machine learning method is applied to modeling and simulation-based modeling, and the operation required for the specification of a hypothetical model by using the machine learning method using the big data actually obtained from the target system of the present invention. /Predict function functions, parameters, etc.

도 3 및 도 4는 본 발명의 실시 예에 따른 빅 데이터 기계학습이 내장된 시스템 구성도 및 모델링과정의 흐름도로서, 이를 참조하면 모델링하기 위한 대상시스템(100)의 모델링 및 시뮬레이션의 목적에 따른 사용자의 모델링 요구사항을 분석하여야 한다. (S110)3 and 4 are a system configuration diagram and a flow chart of a modeling process in which big data machine learning is embedded according to an embodiment of the present invention. Referring to this, a user according to the purpose of modeling and simulation of the target system 100 for modeling The modeling requirements of should be analyzed. (S110)

모델링은 상기 대상시스템(100) 전체를 완벽하게 표현하는 것이 아니라, 미리 설정된 목적에 맞도록 시스템을 추상화 하는 과정이기 때문에 목적에 맞는 시스템 모델링이 중요하다Since modeling is not a complete representation of the entire target system 100, but is a process of abstracting the system for a predetermined purpose, modeling a system suitable for the purpose is important.

시스템 모델링을 위하여 먼저, 대상 시스템과 관련한 획득 가능한 도메인 지식/경험 및 이론 등을 파악하여 해당 시스템에 대한 가설적모델(Gray box model)(110)을 정의한다. (S130)For system modeling, first, a gray box model 110 is defined for the system by grasping the domain knowledge/experience and theories that can be acquired related to the target system. (S130)

상기 가설적모델(110) 자체로는 시뮬레이션이 불가능하기 때문에 모델 완성에 필요한 동작 함수, 파라미터 등의 정보를 확보하여 완전한 모델을 구성하는 과정이 필요하게 된다. Since the hypothetical model 110 itself cannot be simulated, a process of constructing a complete model by securing information such as operation functions and parameters required for model completion is required.

모델 완성에 필요한 정보들은 모델링 대상시스템(100)을 실제로 운용/관측함으로써 빅데이터(120)를 획득하고, 획득된 빅데이터(120)에 대해 기계학습을 통해 가설적모델(110)에 필요한 정보를 획득한다. (S150)Information necessary for model completion is obtained by actually operating/observing the modeling target system 100 to obtain the big data 120, and information necessary for the hypothetical model 110 through machine learning on the acquired big data 120. Acquire. (S150)

즉, 실제 시스템의 운용 및 관측을 통해서 획득된 빅데이터(120)를 인공신경망 등의 기계학습 알고리즘을 이용하여 가설적 모델(110)에 필요한 정보들을 학습할 수 있다.That is, information necessary for the hypothetical model 110 may be learned from the big data 120 acquired through the operation and observation of an actual system using a machine learning algorithm such as an artificial neural network.

실 데이터에 기반하여 학습 및 검증된 정보들을 가설적 모델에 적용하여 대상 시스템에 대한 시스템모델 (White box model)(130) 을 완성한다. (S170)The white box model 130 for the target system is completed by applying the learned and verified information to the hypothetical model based on real data. (S170)

최종적으로, 완성된 시스템모델(130)의 시뮬레이션을 통해 관심대상 시스템에 대한 동작/성능 분석 및 예측을 수행한다. (S190)Finally, the operation/performance analysis and prediction of the target system of interest is performed through the simulation of the completed system model 130. (S190)

도 5는 도 3을 구체적인 실시 예로 설명하기 위한 도로서, 먼저 대상 시스템(100)의 지식 획득을 통해 대상시스템(100)의 가설적모델(110)을 세우게 되는데, 상기 대상시스템(100)의 구조적인 특징을 가설적모델(110)로 구할 수 있지만, "m, n, a, b, c, d, e, k"에 해당하는 계수들, 즉 실제 값들은 알 수 없다. FIG. 5 is a diagram for explaining FIG. 3 as a specific embodiment. First, a hypothetical model 110 of the target system 100 is established through knowledge acquisition of the target system 100. The structure of the target system 100 Although a characteristic feature can be obtained with the hypothetical model 110, coefficients corresponding to "m, n, a, b, c, d, e, k", that is, actual values are not known.

따라서, 가설적모델(110) 내부의 다수의 기능블록은 각 기능블록 별로 빅데이터를 이용한 기계학습을 수행하여 검증된 파라미터값을 산출하게 되고, 산출된 파라미터값은 상기 가설적모델(110)에 필요한 정보로 제공한다. Therefore, a plurality of functional blocks inside the hypothetical model 110 calculates a verified parameter value by performing machine learning using big data for each functional block, and the calculated parameter value is added to the hypothetical model 110. Provide necessary information.

보다 상세하게는, 상기 계수들은 대상시스템(100)의 실제 운용 및 관측을 통해 획득된 빅데이터(120)(x1, x2, x3, x4, g1, g2, y)를 통해서 학습될 수 있는데, 인공신경망과 같은 학습 방법을 통해 각 기능블록 별로 기계학습을 수행한다. 즉, g()는 "g1, g2, y"를 이용하고, g1()은 "x1, x2, g1"을 이용하며, g2()는 "x3, x4, g2"를 이용하여 각각 학습될 수 있다. More specifically, the coefficients can be learned through big data 120 (x1, x2, x3, x4, g1, g2, y) acquired through actual operation and observation of the target system 100. Machine learning is performed for each functional block through a learning method such as a neural network. That is, g() uses "g1, g2, y", g1() uses "x1, x2, g1", and g2() can be learned using "x3, x4, g2", respectively. have.

여기서, 상기 파라미터 값들은 함수뿐만 아니라, 변수값, 확률 또는 그래프를 포함하며, 이는 각 기능블록 안에 들어가게 된다.Here, the parameter values include not only functions, but also variable values, probabilities, or graphs, which are included in each functional block.

이와 같이 학습을 통해 상기 검증된 파라미터(m, n, a, b, c, d, e, k) 값 들이 산출되며, 이를 가설적 모델(110)에 적용하면 시스템 모델(130)이 완성되며, 완성된 시스템모델(130)의 시뮬레이션을 통해 대상시스템(100)에 대한 동작/성능 분석 및 예측을 수행하게 된다.Through learning in this way, the verified values of the parameters (m, n, a, b, c, d, e, k) are calculated, and when applied to the hypothetical model 110, the system model 130 is completed, The operation/performance analysis and prediction of the target system 100 is performed through the simulation of the completed system model 130.

도 6은 도 5에서 가설적모델에 적용된 세포 오토마타의 정의와 상태 갱신의 방식을 설명하기 위한 도로서, 상기 가설적모델(110)의 한 실시 예로서, 상기 가설적모델(110)은 세포 오토마타(Cellular Automata) 모델(111)을 통해 만들어지며, 상기 빅데이타(120)를 이용하여 기계학습을 통해 가설적 모델에 필요한 셀(cell)의 상태천이 함수가 학습되고, 이것을 세포 오토마타 모델(111)에 제공하게 된다.6 is a diagram for explaining the definition of a cell automata applied to the hypothetical model in FIG. 5 and a method of updating the state. As an embodiment of the hypothetical model 110, the hypothetical model 110 is a cell automata. (Cellular Automata) is made through the model 111, and the state transition function of a cell necessary for a hypothetical model is learned through machine learning using the big data 120, and this is a cell automata model 111 Will be provided to.

보다 상세히 설명하면, 상기 대상시스템(100)에서 수집된 빅데이터(120)를 이용하여 인공신경망 모델링 등과 같은 기계학습을 사용하여 가설적모델(110)에 필요한 세포(Cell)의 상태천이 함수(Transition Function)가 학습되고, 이것을 가설적모델인 세포 오토마타 모델(111)에 제공한다. In more detail, the state transition function of the cell required for the hypothetical model 110 using machine learning such as artificial neural network modeling using the big data 120 collected by the target system 100 Function) is learned, and it is provided to a hypothetical model, the cell automata model 111.

즉, 학습된 상태천이 함수들과 세포 오토마타 모델(111)이 합쳐지면 시스템모델(130)(White box)이 완성된다. 이러한 상호협력적인 접근 방법에서는 빅 데이터를 통한 기계학습을 통해 시뮬레이션 모델의 실증 문제가 해결되고, 시뮬레이션 모델링을 통한 가설적 모델을 통해 시스템 구조 및 규칙 변경에 대한 문제가 해결되는 향상된 모델링이 가능하다는 것을 알 수 있다.That is, when the learned state transition functions and the cell automata model 111 are combined, the system model 130 (white box) is completed. In this mutually collaborative approach, it was found that improved modeling is possible in which the problem of simulation model is solved through machine learning through big data, and the problem of system structure and rule change is solved through hypothetical model through simulation modeling. Able to know.

여기서, 상기 세포 오토마타 모델(111)은 수학, 물리학, 복잡계, 생물학, 미세구조 등의 모델링에서 다루는 이산 모형으로, 규칙적인 격자 형태로 배열된 셀(세포)들에서 정의된다. Here, the cell automata model 111 is a discrete model handled in modeling such as mathematics, physics, complex systems, biology, and microstructures, and is defined in cells (cells) arranged in a regular grid form.

각 셀들은 유한한 수의 상태를 가질 수 있으며, 격자는 유한한 수의 차원으로 정의된다. 각 셀에 대하여, 이웃들이라 부르는 셀들은 그 셀에 대한 관계로 정의하는데, 예를 들어 이웃을 하나의 셀에 대해 모든 방향으로 한 칸씩 떨어져 있는 셀들이라고 정의하면 된다. Each cell can have a finite number of states, and a grid is defined with a finite number of dimensions. For each cell, cells called neighbors are defined as a relationship to that cell, for example, neighbors can be defined as cells separated by one cell in all directions to one cell.

시간 t=0 일 때 각 셀의 상태를 지정해놓고 이를 초기 상태라고 한다. 새로운 세대는 상태천이 함수에 의해 이전 세대로부터 만들어지는데, 이 함수는 각 셀과 그 이웃들의 상태에 따라 그 셀의 새로운 상태가 지정되는, 즉 셀들의 행동 규칙을 정하는 수학적인 함수이다. When the time t=0, the state of each cell is designated and this is called the initial state. The new generation is created from the previous generation by the state transition function, which is a mathematical function that specifies the cell's new state according to the state of each cell and its neighbors, that is, sets the behavioral rules of the cells.

일반적으로 상기 행동 규칙은 각 셀에 대해 동일하고 시간에 따라 변하지 않으며 각 세대의 모든 셀에 동시에 적용된다. In general, the rules of behavior are the same for each cell, do not change over time, and are applied to all cells of each generation at the same time.

상기 세포 오토마타 및 상태천이 함수에 대하여 도 6을 참조하여 상세히 설명하면, Cell(i, j)는 위치 (i, j)에 있는 셀로, 상태 s(i, j)를 가지며, N은 셀 (i, j)에서의 이웃 세포 패턴을, T는 상태천이 함수, 즉 에이전트의 행동 규칙을 나타낸다. 각 셀의 상태는 교통, 수질오염, 화재확산 등 문제의 종류에 따라 정의될 수 있다.When the cell automata and the state transition function will be described in detail with reference to FIG. 6, Cell(i, j) is a cell at a position (i, j), has a state s(i, j), and N is a cell (i , j) represents the neighboring cell pattern, and T represents the state transition function, that is, the agent's rule of action. The state of each cell can be defined according to the type of problem, such as traffic, water pollution, and fire spread.

도 7은 세포 오토마타에서 각 세포의 상태천이 함수에 대해 설명한 도로서, 상기 상태천이 함수가 일반적인 모델과 어떻게 다른지 알 수 있다.7 is a diagram illustrating a state transition function of each cell in a cell automata, and it can be seen how the state transition function differs from a general model.

일반적으로 이상적인 상태천이 함수는 이웃 셀들의 상태를 반영하여 상태천이가 일어나지만, 본 발명에서는 이웃 셀들의 상태뿐만 아니라 지형정보, 외부 요인(날씨, 온도 등의 기상 조건 등)을 입력으로 받아 상태천이가 이루어진다.In general, the ideal state transition function reflects the state of neighboring cells to cause a state transition, but in the present invention, the state transition is received by receiving not only the state of the neighboring cells, but also the terrain information and external factors (weather conditions such as weather, temperature, etc.). Is made.

즉, 이상적인 모델에서 지형의 특성을 반영하고 나아가 실시간 기상정보까지 반영함으로써 다음 셀에 대한 더욱 정확한 예측이 이루어지는 것을 알 수 있다.That is, it can be seen that more accurate prediction for the next cell is made by reflecting the characteristics of the terrain in the ideal model and even real-time weather information.

이때 상태천이 함수는 기존의 방식처럼 각 셀에 대해 동일하거나 시간에 따라 변하지 않는 것이 아니라 셀의 위치 변화와 시간의 변화에 따라 달라지며, 또한 상태천이 규칙이 결정적이지 않고 불확실성을 띄게 된다. 상태천이 함수를 만약 기존처럼 도메인 지식을 이용하여 구할 경우에는 실증(validation)이 필요하지만, 빅 데이터의 기계학습을 통해 구해지면 실제 데이터에 기반하기 때문에 실증의 문제가 해결될 수 있다.At this time, the state transition function is not the same for each cell or does not change over time as in the conventional method, but varies according to the change of the cell position and time, and the state transition rule is not deterministic and has uncertainty. If the state transition function is obtained using domain knowledge as in the past, validation is required, but if it is obtained through machine learning of big data, it is based on real data, so the problem of validation can be solved.

이상과 같이, 본 발명의 실시 예에 따른 빅데이터 기계학습이 내장된 시스템 모델링방법은 비록 한정된 실시 예와 도면에 의해 설명되었으나 이 실시 예에 의해 한정되지 않으며, 본 발명이 속하는 기술 분야에서 통상의 지식을 가진 자에 의해 본 발명의 기술 사상과 아래에 기재될 특허청구범위의 균등범위 내에서 다양한 수정 및 변형 가능함은 물론이다.As described above, the system modeling method with built-in big data machine learning according to an embodiment of the present invention is not limited by this embodiment, although it has been described by the limited embodiments and drawings, and is not limited to the general technical field to which the present invention belongs. It goes without saying that various modifications and variations are possible within the scope of the technical spirit of the present invention and the scope of the claims to be described below by those of ordinary skill in the art.

100 : 대상시스템 110 : 가설적모델
111 : 세포오토마타 모델 120 : 빅데이터
130 : 시스템모델100: target system 110: hypothetical model
111: cell automata model 120: big data
130: system model

Claims

In a computing system that implements a model for a target system,
The computing system
A hypothetical model defined on the basis of structural information obtainable related to the target system through acquiring knowledge of the target system and including a plurality of functional blocks therein; And
At least one processor; Including,
The plurality of functional blocks are
At least one or more first functional blocks for inputting first data and outputting second data among big data obtained through actual operation and observation of the target system based on the structural information; And
At least one or more second functional blocks configured to input the second data of the big data and output the third data based on the structural information;
Including,
At least one or more of the at least one first functional block and the at least one second functional block is a machine learning functional block capable of machine learning,
The at least one or more processors
The first machine learning function to perform machine learning using a first machine learning function block included in the at least one or more first function blocks by receiving the first data as an input and outputting the second data among the big data. The second machine learning so that machine learning is performed by a second machine learning function block included in the at least one or more second function blocks by controlling a block or by receiving the second data as an input and the third data as an output. A computing system that controls a functional block and implements a model for the target system.

The method of claim 1,
The hypothetical model and the plurality of functional blocks are defined based on obtainable domain knowledge, experience, and theory related to the target system.

The method of claim 1,
The at least one or more processors
When machine learning is performed by the first machine learning function block, a verified first parameter value is obtained and the first parameter value is used as information to identify the first machine learning function block, or the second machine learning function A computing system for implementing a model for a target system, obtaining a verified second parameter value when machine learning is performed by a block and using the second parameter value as information for identifying the second machine learning function block.

The method of claim 3,
The first parameter value or the second parameter value includes a variable value, probability, function, or graph input to the first machine learning function block or the second machine learning function block. Computing system.

The method of claim 1,
The first data is data expressed as an input of the hypothetical model based on the structural information, the second data is data expressed as an internal variable of the hypothetical model based on the structural information, and the third A computing system for implementing a model for a target system, wherein data is data expressed as an output of the hypothetical model based on the structural information.

In a computing system that implements a model for a target system,
The computing system
A hypothetical model defined on the basis of structural information obtainable related to the target system through acquiring knowledge of the target system and including a plurality of functional blocks therein; And
At least one processor; Including,
The plurality of functional blocks are
At least one or more first functional blocks for inputting first data and outputting second data among big data obtained through actual operation and observation of the target system based on the structural information; And
At least one or more second functional blocks configured to input second data of the big data and output third data based on the structural information;
Including,
At least one of the at least one or more first-stage functional blocks and the at least one or more two-stage functional blocks is a machine learning functional block capable of machine learning,
The at least one or more processors
Receive new input data for the target system,
Inputting the new input data into the hypothetical model and controlling the hypothetical model to execute an inference process of the hypothetical model,
A computing system for implementing a model for a target system, providing the output of the hypothetical model as a result of inference of the hypothetical model with respect to the output of the target system for the new input data.