KR20220059193A

KR20220059193A - System and Method for providing early warning to prevent false alarm

Info

Publication number: KR20220059193A
Application number: KR1020200144490A
Authority: KR
Inventors: 김희수; 오준석; 손종덕; 김재동; 박다온
Original assignee: 한국전력공사
Priority date: 2020-11-02
Filing date: 2020-11-02
Publication date: 2022-05-10
Also published as: KR20230104096A; KR20230104552A; KR20230104551A; KR20230098123A; KR102550014B1

Abstract

Disclosed is an early warning system capable of detecting mysterious symptom of a plant in advance. The early warning system includes: a collection unit which collects data of power generation facilities; a calculation unit which calculates data for each signal using the data of the power generation facilities, performs dimension reduction and data visualization through a recommended learning interval using the data for each signal to set a final learning interval, and updates a target model to execute model retraining, which creates an updated model; and a prediction unit for generating prediction information for real-time prediction of the power generation facilities in order to prevent false detection by using the update model.

Description

BACKGROUND ART System and Method for providing early warning to prevent false alarm

본 발명은 조기 경보 기술에 관한 것으로서, 더 상세하게는 플랜트의 이상 징후를 사전에 발견하기 위한 조기 경보 시스템 및 방법에 대한 것이다.The present invention relates to early warning technology, and more particularly, to an early warning system and method for detecting abnormal signs of a plant in advance.

또한, 본 발명은 조기경보 성능의 핵심이 되는 예측모델을 생성 시 잘못된 학습구간 선정에 따른 빈번한 오탐지(false alarm)를 사전에 예방할 수 있는 조기 경보 시스템 및 방법에 대한 것이다.In addition, the present invention relates to an early warning system and method capable of preventing frequent false alarms due to incorrect selection of a learning section when generating a predictive model that is the core of early warning performance.

일반적으로, 발전 또는 화학 등의 대형 플랜트들은 수백 개의 기계 및 전기 설비들이 복잡하게 연결되어 운전되고 있다. 이런 대형 플랜트들은 안정적으로 전력 및 제품을 공급하기 위해 사고의 발단이 되는 이상 징후를 상시로 측정해 신뢰성을 확보하여야 한다. In general, large-scale plants, such as power generation or chemical, are operated in which hundreds of mechanical and electrical facilities are intricately connected. In order to stably supply power and products to such large plants, reliability must be secured by constantly measuring abnormal signs that cause accidents.

조기경보 기술에서 핵심이 되는 예측모델의 성능은 과거 이력 데이터를 기반으로 물리적 상관성이 높은 신호 그룹을 추출하고 신호 그룹의 정상상태 패턴을 학습한 후 신호가 학습된 정상구간을 벗어나 이상상태로 변화해 가는 추세(trend)를 조기에 감지하는 것에 있다. The performance of the predictive model, which is the core of early warning technology, is to extract a signal group with high physical correlation based on past history data, learn the steady state pattern of the signal group, and then change the signal to an abnormal state outside the learned normal section. Early detection of trends.

그런데, 기존의 기계학습 방법의 경우, 매번 이력 데이터를 차트 또는 그래프에 의존하여 사용자가 직접 임의 구간을 선정하여야 한다. 더욱이, 예측모델을 생성하였다 하더라도, 과거 유사한 신호 패턴이 있었는지, 학습에 필요한 데이터양이 어느 정도였는지에 대한 이력 관리가 되지 않고 있다. 따라서, 재학습 시에도 같은 작업을 반복해야 하고 시행착오를 거쳐 적정한 학습구간을 찾을 수 있게 된다.However, in the case of the existing machine learning method, the user must directly select an arbitrary section by relying on a chart or graph for historical data each time. Moreover, even if a predictive model is generated, history management is not performed on whether there has been a similar signal pattern in the past or how much data is required for learning. Therefore, even during re-learning, the same operation must be repeated, and an appropriate learning section can be found through trial and error.

이로 인해 종래의 기술은 학습구간을 잘못 선정시는 현재 상태의 신호 패턴이 학습되어 있지 않아 다수의 오탐지(false alarm)를 발생시킨다. 이는, 시스템 운영자의 혼란을 가중하고 실제 알람 여부를 확인하기 위한 추가적인 작업이 발생함으로 인해 오히려 사용자의 불편함 및/또는 업무를 가중시키는 결과를 초래한다.For this reason, in the prior art, when the learning section is incorrectly selected, the signal pattern of the current state is not learned, and thus a number of false alarms are generated. This results in aggravating the confusion of the system operator and increasing user's inconvenience and/or work due to an additional operation for checking whether an actual alarm is made.

특히, 발전 설비는 장기운영을 위해 유지보수가 필수적으로 정기적/비정기적인 정비작업이 발생하게 되는데, 일반적으로 정비 후 설비의 상태가 변하게 되어, 변경 상태를 반영하기 위해 재학습이 요구된다. In particular, for power generation facilities, maintenance is essential for long-term operation, and regular/irregular maintenance work occurs. In general, the condition of the facility changes after maintenance, and re-learning is required to reflect the changed condition.

또한, 계절 및 부하조건에 따라서도 설비상태가 변화할 수 있어 기계 학습 모델에 대한 업데이트가 요구되므로 모델수정의 기능을 강화하고 오탐지(false alarm)를 최소화할 수 있는 조기 경보 시스템이 요구된다.In addition, since the equipment state can change depending on the season and load conditions, and an update of the machine learning model is required, an early warning system capable of strengthening the function of model correction and minimizing false alarms is required.

1. 한국등록특허번호 제10-1960754호(등록일자: 2019.03.15)1. Korea Patent No. 10-1960754 (Registration Date: 2019.03.15)

본 발명은 위 배경기술에 따른 문제점을 해소하기 위해 제안된 것으로서, 플랜트의 이상 징후를 사전에 발견할 수 있는 조기 경보 시스템 및 방법을 제공하는데 그 목적이 있다.The present invention has been proposed to solve the problems according to the above background art, and an object of the present invention is to provide an early warning system and method capable of detecting abnormal signs of a plant in advance.

또한, 본 발명은 조기경보 성능의 핵심이 되는 예측모델을 생성 시 잘못된 학습구간 선정에 따른 빈번한 오탐지(false alarm)를 사전에 예방할 수 있는 조기 경보 시스템 및 방법을 제공하는데 다른 목적이 있다.Another object of the present invention is to provide an early warning system and method capable of preventing frequent false alarms in advance due to selection of an erroneous learning section when a prediction model, which is the core of early warning performance, is generated.

본 발명은 위에서 제시된 과제를 달성하기 위해, 플랜트의 이상 징후를 사전에 발견할 수 있는 조기 경보 시스템을 제공한다.The present invention provides an early warning system capable of detecting abnormal signs of a plant in advance in order to achieve the above object.

상기 조기 경보 시스템은,The early warning system is

발전 설비의 데이터를 수집하는 수집부;a collection unit for collecting data of power generation facilities;

발전 설비의 상기 데이터를 이용하여 신호별 데이터를 산출하고 상기 신호별 데이터를 이용하여 추천되는 추천학습구간을 통해 차원축소 및 데이터 시각화를 수행하여 최종 학습 구간을 설정하고, 대상 모델을 업데이트하여 업데이트 모델을 생성하는 모델 재학습을 실행하는 계산부; 및Calculating data for each signal using the data of the power generation facility, setting the final learning section by performing dimension reduction and data visualization through the recommended learning section using the data for each signal, and updating the target model to update the model a calculation unit that executes model re-learning to generate ; and

상기 업데이트 모델을 이용하여 오탐지 예방을 위해 상기 발전 설비에 대한 실시간 예측하는 예측 정보를 생성하는 예측부;를 포함하는 것을 특징으로 한다.and a prediction unit that generates prediction information for real-time prediction of the power generation facility in order to prevent false detection by using the update model.

또한, 상기 신호별 데이터는 발전 설비의 상기 데이터에 대한 신호별 현재 운전 데이터 평균값 및 선택된 과거 조회기간의 신호별 과거 데이터 평균값인 것을 특징으로 한다.In addition, the data for each signal is an average value of current operation data for each signal for the data of the power generation facility and an average value of past data for each signal of the selected past inquiry period.

또한, 상기 신호별 현재 운전 데이터 평균값을 기준으로 상기 과거 조회기간내 상기 신호별 과거 데이터 평균값과의 유사도가 산출되는 것을 특징으로 한다.In addition, based on the average value of the current driving data for each signal, the similarity with the average value of the historical data for each signal within the past inquiry period is calculated.

또한, 상기 유사도는 유클리디안 거리, 마하라노비스 거리, 및 맨해튼 거리를 포함하는 거리 척도를 사용하여 산출되는 것을 특징으로 한다.In addition, the similarity is characterized in that it is calculated using a distance scale including a Euclidean distance, a Maharanovis distance, and a Manhattan distance.

또한, 상기 추천 학습 구간은 상기 유사도의 순위에 따라 가장 유사도가 높은 월이 투표를 통해 상기 거리 척도 중 2가지 이상의 거리 척도에서 상기 월이 일치하면 결정되는 최종 추천 기간을 이용하여 산출되는 것을 특징으로 한다.In addition, the recommended learning interval is calculated using a final recommendation period determined when the month coincides in two or more distance scales among the distance scales through voting for the month with the highest similarity according to the ranking of the similarity. do.

또한, 상기 차원축소는 상기 신호별 데이터를 저차원으로 차원 축소를 수행하여 3차원 공간의 다변량 차원으로 변환되는 것을 특징으로 한다.In addition, the dimensionality reduction is characterized in that the data for each signal is converted into a multivariate dimension of a three-dimensional space by performing a dimensional reduction in a low dimension.

또한, 상기 차원 축소는 상기 신호별 데이터의 분산이 최대가 되도록 고차원 데이터 셋(dataset)과 투영된 데이터 셋 간의 평균제곱거리를 최소화하는 축을 찾는 기법 또는 상기 신호별 데이터의 분포를 학습하여 고차원 데이터 셋 간의 분리를 최적화하는 결정 경계에 따라 데이터를 투영(projection)하는 기법을 이용하여 수행되는 것을 특징으로 한다.In addition, the dimensional reduction is a technique of finding an axis that minimizes the mean square distance between a high-dimensional data set and a projected data set so that the variance of the data for each signal is maximized, or a high-dimensional data set by learning the distribution of the data for each signal It is characterized in that it is performed using a technique of projecting data according to a decision boundary that optimizes the separation between the two.

또한, 상기 고차원 데이터 셋은, 사용자가 임의로 조회한 기간의 전체 데이터셋, 평균값 기반 유사도 측정으로부터 추천되는 추천 데이터셋, 및 마지막으로 변화된 상기 발전 설비의 상태에 대한 현재 운전 데이터셋을 포함하는 것을 특징으로 한다.In addition, the high-dimensional data set includes an entire data set for a period arbitrarily inquired by a user, a recommended data set recommended from average value-based similarity measurement, and a current operation data set for the last changed state of the power generation facility. do it with

또한, 상기 고차원 데이터 셋의 크기는, 상기 추천 데이터셋의 중심을 찾고 상기 중심으로부터 군집밀도가 높은 데이터의 범위를 계산된 상기 중심으로부터 확률분포의 뾰족한 정도를 나타내는 척도인 첨도(kurtosis)를 이용하여 산포가 정규분포에 가까운 형태의 데이터 범위로 추출되는 것을 특징으로 한다.In addition, the size of the high-dimensional data set is determined using kurtosis, which is a measure indicating the sharpness of the probability distribution from the center calculated by finding the center of the recommended data set and calculating the range of data with high cluster density from the center. It is characterized in that the distribution is extracted as a data range of a form close to a normal distribution.

또한, 상기 고차원 데이터 셋의 크기는, 사용자의 드래그엔 드롭 방식을 통해 선정되는 것을 특징으로 한다.In addition, the size of the high-dimensional data set is characterized in that it is selected through a user's drag-and-drop method.

또한, 상기 학습구간에서 학습되지 않을 수 있는 신호 패턴으로부터 야기될 수 있는 오탐지(false alarm)를 방지하기 위해 값이 변경되는 가변 임계값이 설정되는 것을 특징으로 한다.In addition, in order to prevent a false alarm that may be caused by a signal pattern that may not be learned in the learning section, it is characterized in that a variable threshold value whose value is changed is set.

또한, 상기 학습구간에서 학습되지 않을 수 있는 신호 패턴으로부터 야기될 수 있는 오탐지(false alarm)를 방지하기 위해 값이 고정되는 정적 임계값이 설정되는 것을 특징으로 한다.In addition, in order to prevent a false alarm that may be caused by a signal pattern that may not be learned in the learning section, a static threshold value at which a value is fixed is set.

다른 한편으로, 본 발명의 다른 일실시예는, (a) 수집부가 발전 설비의 데이터를 수집하는 단계; (b) 계산부가 발전 설비의 상기 데이터를 이용하여 신호별 데이터를 산출하고 상기 신호별 데이터를 이용하여 추천되는 추천학습구간을 통해 차원축소 및 데이터 시각화를 수행하여 최종 학습 구간을 설정하고, 대상 모델을 업데이트하여 업데이트 모델을 생성하는 모델 재학습을 실행하는 단계; 및 (c) 예측부가 상기 업데이트 모델을 이용하여 오탐지 예방을 위해 상기 발전 설비에 대해 실시간 예측하는 예측 정보를 생성하는 단계;를 포함하는 것을 특징으로 하는 오탐지 예방을 위한 조기 경보 방법을 제공한다.On the other hand, another embodiment of the present invention, comprising the steps of: (a) collecting the data of the power generation equipment; (b) the calculation unit calculates data for each signal using the data of the power generation facility, performs dimension reduction and data visualization through a recommended learning section recommended using the data for each signal, sets the final learning section, and sets the target model executing model retraining to update the update model; and (c) generating, by a prediction unit, prediction information for real-time prediction of the power generation facility to prevent false detection by using the update model. .

본 발명에 따르면, 복잡한 알고리즘이 아닌 통계값 및 데이터 군집의 정도를 유사도(유클리드, 마하라노비스, 맨해튼 거리)로 측정하고, 투표(Voting) 결과를 이용하여 최적의 데이터 셋을 확보하고, 이를 예측모델의 학습데이터로 사용함으로써 예측모델의 성능을 향상시키고 오탐지(false alarm)를 사전에 예방할 수 있다.According to the present invention, not a complicated algorithm, but statistical values and the degree of data clustering are measured by the degree of similarity (Euclidean, Maharanovis, Manhattan distance), and an optimal data set is obtained using the voting result, and it is predicted By using it as the training data of the model, the performance of the predictive model can be improved and false alarms can be prevented in advance.

또한, 본 발명의 다른 효과로서는 변수의 차원을 축소함으로써, 조기경보의 기계 학습 성능에 영향을 미치는 기계 학습 데이터 셋의 이력구간을 시각적으로 확인하고, 계절변화, 정비로 인한 설비 상태변화, 외부 환경조건 변화(대기온도 변화 등)에 따른 모델의 재학습 필요 유무를 판단하기가 용이하다는 점을 들 수 있다.In addition, as another effect of the present invention, by reducing the dimension of the variable, the history section of the machine learning data set that affects the machine learning performance of early warning is visually confirmed, and the change of season, change of equipment condition due to maintenance, and external environment For example, it is easy to determine whether the model needs re-learning according to changes in conditions (changes in ambient temperature, etc.).

또한, 본 발명의 또 다른 효과로서는 학습 데이터의 패턴을 확보하여 장기적으로 DB를 구축 시, 각 설비에 대해 정교한 최적화된 학습데이터를 확보할 수 있다는 점을 들 수 있다.In addition, as another effect of the present invention, it is possible to secure sophisticated and optimized learning data for each facility when establishing a long-term DB by securing patterns of learning data.

또한, 본 발명의 또 다른 효과로서는 현재 운전 데이터가 과거 유사한 패턴을 가지는 기존 모델의 데이터셋을 선택적으로 도입하여 재활용할 수 있게 되므로, 빈번하게 발생하는 모델수정 작업을 최소화하고 모델관리의 효율성을 높이고 불필요한 학습기간 제거를 통해 Data 사이즈를 최적화함으로써 학습성능 및 학습속도를 향상시킬 수 있다는 점을 들 수 있다.In addition, as another effect of the present invention, it is possible to selectively introduce and reuse a dataset of an existing model in which the current driving data has a similar pattern in the past, thereby minimizing the frequently occurring model modification work and increasing the efficiency of model management. The point is that learning performance and learning speed can be improved by optimizing data size by removing unnecessary learning period.

도 1은 본 발명의 일실시예에 따른 오탐지 예방을 위한 조기 경보 시스템의 블럭 구성도이다.
도 2는 본 발명의 일실시예에 따른 오탐지 예방을 위한 조기 경보 구현과정을 보여주는 흐름도이다.
도 3은 본 발명의 일실시예에 따른 이벤트 빈도를 이용한 트리거 작동을 보여주는 표이다.
도 4는 본 발명의 일실시예에 따른 2차원을 이용한 데이터 변화 시각화를 위한 차원 축소를 보여주는 개념도이다.
도 5는 본 발명의 일실시예에 따른 차원 축소를 이용한 시각화 그래프이다.
도 6은 본 발명의 일실시예에 따른 학습 구간 제안의 시각화 그래프이다.
도 7은 본 발명의 일실시예에 따른 학습 데이터 추적 관리를 보여주는 그래프이다.
도 8은 본 발명의 일실시예에 따른 설비 이력 데이터를 보여주는 도면이다.
도 9는 본 발명의 일실시예에 따른 설비에 대한 추천된 데이터 셋의 결과를 보여주는 그래프이다.
도 10은 본 발명의 일실시예에 따른 유사도 기반 학습 데이터의 추천 결과를 보여주는 표이다.
도 11 내지 13은 본 발명의 일실시예에 따른 발전 설비에 대한 시각화를 보여주는 그래프이다.
도 14는 본 발명의 일실시예에 따른 학습 데이터 셋의 분포를 이용한 크기 선택을 보여주는 도면이다.
도 15는 본 발명의 일실시예에 따른 학습 데이터 셋의 중심으로부터 거리를 이용한 크기 선택을 보여주는 도면이다.
도 16은 본 발명의 일실시예에 따른 모델 수정을 위한 정보 제공을 보여주는 도면이다.
도 17는 본 발명의 일실시예에 따른 학습 데이터 패턴의 추적을 보여주는 도면이다.
도 18은 본 발명의 일실시예에 따른 정적 임계치로 인한 오탐지 발생을 보여주는 도면이다.
도 19는 본 발명의 일실시예에 따른 가변 임계치로 인한 오탐지 발생을 보여주는 도면이다.
도 20은 본 발명의 일실시예에 따른 최적화 전후를 비교하는 그래프이다.1 is a block diagram of an early warning system for preventing false detection according to an embodiment of the present invention.
2 is a flowchart illustrating an early warning implementation process for preventing false detection according to an embodiment of the present invention.
3 is a table showing a trigger operation using an event frequency according to an embodiment of the present invention.
4 is a conceptual diagram illustrating a dimension reduction for data change visualization using two dimensions according to an embodiment of the present invention.
5 is a visualization graph using dimension reduction according to an embodiment of the present invention.
6 is a visualization graph of a learning section proposal according to an embodiment of the present invention.
7 is a graph showing tracking management of learning data according to an embodiment of the present invention.
8 is a view showing facility history data according to an embodiment of the present invention.
9 is a graph showing a result of a recommended data set for a facility according to an embodiment of the present invention.
10 is a table showing a recommendation result of similarity-based learning data according to an embodiment of the present invention.
11 to 13 are graphs showing visualization of a power generation facility according to an embodiment of the present invention.
14 is a diagram illustrating size selection using a distribution of a training data set according to an embodiment of the present invention.
15 is a diagram illustrating size selection using a distance from the center of a training data set according to an embodiment of the present invention.
16 is a diagram illustrating information provision for model correction according to an embodiment of the present invention.
17 is a diagram showing tracking of a learning data pattern according to an embodiment of the present invention.
18 is a diagram illustrating occurrence of false positives due to a static threshold according to an embodiment of the present invention.
19 is a diagram illustrating occurrence of a false detection due to a variable threshold according to an embodiment of the present invention.
20 is a graph comparing before and after optimization according to an embodiment of the present invention.

본 발명은 다양한 변경을 가할 수 있고 여러 가지 실시예를 가질 수 있는바, 특정 실시예들을 도면에 예시하고 상세한 설명에 구체적으로 설명하고자 한다. 그러나 이는 본 발명을 특정한 실시 형태에 대해 한정하려는 것이 아니며, 본 발명의 사상 및 기술 범위에 포함되는 모든 변경, 균등물 내지 대체물을 포함하는 것으로 이해되어야 한다.Since the present invention can have various changes and can have various embodiments, specific embodiments are illustrated in the drawings and will be described in detail in the detailed description. However, this is not intended to limit the present invention to specific embodiments, and it should be understood to include all modifications, equivalents, and substitutes included in the spirit and scope of the present invention.

각 도면을 설명하면서 유사한 참조부호를 유사한 구성요소에 대해 사용한다. 제 1, 제 2등의 용어는 다양한 구성요소들을 설명하는데 사용될 수 있지만, 상기 구성요소들은 상기 용어들에 의해 한정되어서는 안 된다. 상기 용어들은 하나의 구성요소를 다른 구성요소로부터 구별하는 목적으로만 사용된다. In describing each figure, like reference numerals are used for like elements. Terms such as first, second, etc. may be used to describe various elements, but the elements should not be limited by the terms. The above terms are used only for the purpose of distinguishing one component from another.

예를 들어, 본 발명의 권리 범위를 벗어나지 않으면서 제 1 구성요소는 제 2 구성요소로 명명될 수 있고, 유사하게 제 2 구성요소도 제 1 구성요소로 명명될 수 있다. "및/또는" 이라는 용어는 복수의 관련된 기재된 항목들의 조합 또는 복수의 관련된 기재된 항목들 중의 어느 항목을 포함한다.For example, without departing from the scope of the present invention, a first component may be referred to as a second component, and similarly, a second component may also be referred to as a first component. The term “and/or” includes a combination of a plurality of related listed items or any of a plurality of related listed items.

다르게 정의되지 않는 한, 기술적이거나 과학적인 용어를 포함해서 여기서 사용되는 모든 용어들은 본 발명이 속하는 기술 분야에서 통상의 지식을 가진 자에 의해 일반적으로 이해되는 것과 동일한 의미가 있다. Unless defined otherwise, all terms used herein, including technical or scientific terms, have the same meaning as commonly understood by one of ordinary skill in the art to which this invention belongs.

일반적으로 사용되는 사전에 정의되어 있는 것과 같은 용어들은 관련 기술의 문맥상 가지는 의미와 일치하는 의미가 있는 것으로 해석되어야 하며, 본 출원에서 명백하게 정의하지 않는 한, 이상적이거나 과도하게 형식적인 의미로 해석되지 않아야 한다.Terms such as those defined in commonly used dictionaries should be interpreted as having meanings consistent with the meanings in the context of the related art, and should not be interpreted in an ideal or excessively formal meaning unless explicitly defined in the present application. shouldn't

이하 첨부된 도면을 참조하여 본 발명의 일실시예에 따른 오탐지 예방을 위한 조기 경보 시스템 및 방법을 상세하게 설명하기로 한다.Hereinafter, an early warning system and method for preventing false detection according to an embodiment of the present invention will be described in detail with reference to the accompanying drawings.

도 1은 본 발명의 일실시예에 따른 오탐지 예방을 위한 조기 경보 시스템(100)의 블럭 구성도이다. 도 1을 참조하면, 조기 경보 시스템(100)은, 발전 설비에 관한 데이터를 전송받는 통신부(110), 발전 설비의 데이터를 수집하여 저장하는 수집부(120), 데이터를 이용하여 신호별 데이터를 산출하고 신호별 데이터를 이용하여 추천되는 학습구간을 통해 차원축소 및 데이터 시각화를 수행하여 학습구간을 확정하여 대상 모델을 업데이트하여 업데이트 모델을 생성하는 계산부(130), 업데이트 모델을 이용하여 오탐지를 실시간 예측하는 예측 정보를 생성하는 예측부(140), 예측 정보를 출력하는 출력부(150) 등을 포함하여 구성될 수 있다.1 is a block diagram of an early warning system 100 for preventing false detection according to an embodiment of the present invention. Referring to FIG. 1 , the early warning system 100 includes a communication unit 110 that receives data about power generation facilities, a collection unit 120 that collects and stores data of power generation facilities, and uses the data to collect data for each signal. Calculation unit 130 that calculates and updates the target model to update the target model by performing dimensionality reduction and data visualization through the recommended learning section using data for calculation and data visualization, false detection using the update model It may be configured to include a prediction unit 140 for generating prediction information for predicting in real time, an output unit 150 for outputting prediction information, and the like.

통신부(110)는 통신망(미도시)을 통해 다른 외부 통신 기기(미도시)와 연결되는 기능을 수행한다. 통신은 유무선이 가능하다. 또한, 통신부(110)는 무선 통신 등을 위한 다양한 통신 프로토콜을 이용할 수 있다. 이를 위해, 통신부(110)는 모뎀, 마이크로프로세서, 통신 회로 소자 등이 구성될 수 있다.The communication unit 110 performs a function of being connected to another external communication device (not shown) through a communication network (not shown). Communication can be wired or wireless. In addition, the communication unit 110 may use various communication protocols for wireless communication and the like. To this end, the communication unit 110 may include a modem, a microprocessor, a communication circuit element, and the like.

통신망은 복수의 단말 및 서버들과 같은 각각의 노드 상호 간에 정보 교환이 가능한 연결 구조를 의미하는 것으로, 공중교환 전화망(PSTN), 공중교환 데이터망(PSDN), 종합정보통신망(ISDN: Integrated Services Digital Networks), 광대역 종합 정보 통신망(BISDN: Broadband ISDN), 근거리 통신망(LAN: Local Area Network), 대도시 지역망(MAN: Metropolitan Area Network), 광역 통신망(WLAN: Wide LAN) 등이 될 수 있다, 그러나, 본 발명은 이에 한정되지는 않으며, 무선 통신망인 CDMA(Code Division Multiple Access), WCDMA(Wideband Code Division Multiple Access), Wibro(Wireless Broadband), WiFi(Wireless Fidelity), HSDPA(High Speed Downlink Packet Access) 망, 블루투스(bluetooth), NFC(Near Field Communication) 네트워크, 위성 방송 네트워크, 아날로그 방송 네트워크, DMB(Digital Multimedia Broadcasting) 네트워크 등이 될 수 있다. 또는, 이들 유선 통신망 및 무선 통신망의 조합일 수 있다.A communication network refers to a connection structure that enables information exchange between each node, such as a plurality of terminals and servers, and includes a public switched telephone network (PSTN), a public switched data network (PSDN), and an integrated services digital network (ISDN) Networks), Broadband ISDN (BISDN), Local Area Network (LAN), Metropolitan Area Network (MAN), Wide LAN (WLAN), etc., but , the present invention is not limited thereto, and the present invention is not limited thereto, and wireless communication networks such as CDMA (Code Division Multiple Access), WCDMA (Wideband Code Division Multiple Access), Wibro (Wireless Broadband), WiFi (Wireless Fidelity), HSDPA (High Speed Downlink Packet Access) It may be a network, Bluetooth, Near Field Communication (NFC) network, satellite broadcasting network, analog broadcasting network, DMB (Digital Multimedia Broadcasting) network, and the like. Alternatively, it may be a combination of these wired communication networks and wireless communication networks.

외부 통신 기기로는 서버, PC(Personal computer), 노트북, 노트패드 등을 들 수 있다. 물론, 이러한 통신망을 통하지 않고 외부 통신기기에 저장된 데이터를 USB에 옮겨 제공하는 것도 가능하다. Examples of the external communication device include a server, a personal computer (PC), a notebook computer, and a notepad. Of course, it is also possible to transfer data stored in an external communication device to the USB without passing through such a communication network.

수집부(120)는 통신부(110)를 통해 전송되는 발전 설비의 데이터를 수집하는 기능을 수행한다. 발전 설비의 데이터는 발전 설비로부터 전송받을 수도 있고, 발전 설비에서 1차적으로 생성되고, 2차적으로 저장되는 외부 통신 기기를 통해 전송받을 수도 있다. 수집부(120)는 전송된 데이터를 저장부(160)의 데이터베이스(DB)에 저장한다.The collection unit 120 performs a function of collecting data of the power generation facility transmitted through the communication unit 110 . The data of the power generation facility may be transmitted from the power generation facility, or may be transmitted through an external communication device that is primarily generated and stored secondarily in the power generation facility. The collection unit 120 stores the transmitted data in the database DB of the storage unit 160 .

계산부(130)는 데이터를 이용하여 신호별 데이터를 산출하고 신호별 데이터를 이용하여 추천되는 학습구간을 통해 차원축소 및 데이터 시각화를 수행하여 학습구간을 확정하여 대상 모델을 업데이트하여 업데이트 모델을 생성하는 기능을 수행한다. 또한, 계산부(130)는 유사도 측정을 통한 최적 학습구간을 추천하고, 차원축소 및 시각화를 통한 현재 운전 데이터와 추천 학습 데이터를 검증하는 기능을 수행한다. 또한, 학습 데이터의 추적관리를 통한 최적 학습 패턴의 데이터베이스를 확보하며, 중심으로부터 거리 또는 통계값을 이용한 학습 데이터의 크기를 최적화한다. 또한, 가변 임계치를 적용한 오탐지 방지를 구현한다.The calculation unit 130 calculates data for each signal by using the data, performs dimension reduction and data visualization through a recommended learning section using the data for each signal, determines the learning section, updates the target model, and generates an update model perform the function In addition, the calculator 130 recommends an optimal learning section through similarity measurement, and performs a function of verifying current driving data and recommended learning data through dimension reduction and visualization. In addition, a database of optimal learning patterns is secured through tracking and management of the learning data, and the size of the learning data is optimized using a distance from the center or a statistical value. In addition, it implements false detection prevention by applying a variable threshold.

예측부(140)는 업데이트 모델을 이용하여 오탐지를 실시간 예측하는 예측 정보를 생성하는 기능을 수행한다. 또한, 학습 데이터 중심-현재 운전 데이터 중심 거리를 모니터링하여 거리 증가가 발생하면 알람을 생성하는 기능도 수행한다.The prediction unit 140 performs a function of generating prediction information for predicting a false positive in real time by using the update model. In addition, it monitors the distance between the learning data center and the current driving data center, and generates an alarm when a distance increase occurs.

출력부(150)는 수집부(120), 계산부(130), 예측부(140) 등에서 처리하는 정보를 표시하는 기능을 수행한다. 따라서, 출력부(150)는 문자, 음성, 및 그래픽의 조합으로 정보를 출력할 수 있다. 이를 위해 출력부(150)는 디스플레이, 사운드 시스템 등을 포함하여 구성될 수 있다.The output unit 150 performs a function of displaying information processed by the collection unit 120 , the calculation unit 130 , the prediction unit 140 , and the like. Accordingly, the output unit 150 may output information in a combination of text, voice, and graphics. To this end, the output unit 150 may be configured to include a display, a sound system, and the like.

디스플레이는 LCD(Liquid Crystal Display), LED(Light Emitting Diode) 디스플레이, PDP(Plasma Display Panel), OLED(Organic LED) 디스플레이, 터치 스크린, CRT(Cathode Ray Tube), 플렉시블 디스플레이 등이 될 수 있다. 터치 스크린의 경우, 입력 수단으로 기능할 수 있다.The display may be a liquid crystal display (LCD), a light emitting diode (LED) display, a plasma display panel (PDP), an organic LED (OLED) display, a touch screen, a cathode ray tube (CRT), a flexible display, or the like. In the case of a touch screen, it may function as an input means.

저장부(160)는 시스템(100)을 동작시키기 위한 소프트웨어, 명령어 세트, 또는 데이터 등을 저장하기 위한 기능을 수행한다. 이를 위해 저장부(160)는 랜덤액세스메모리(Random Access Memory, RAM), 자기 디스크(Magnetic Disc), 플래시 메모리(Flash Memory), 정적램(Static Random Access Memory, SRAM), 롬(Read Only Memory, ROM), EEPROM(Electrically Erasable Programmable Read Only Memory), PROM(Programmable Read Only Memory) 등으로 구현될 수 있으나 이에 한정되는 것은 아니다.The storage unit 160 performs a function for storing software, an instruction set, or data for operating the system 100 . To this end, the storage unit 160 is a random access memory (Random Access Memory, RAM), a magnetic disk (Magnetic Disc), a flash memory (Flash Memory), a static RAM (Static Random Access Memory, SRAM), a ROM (Read Only Memory, ROM), electrically erasable programmable read only memory (EEPROM), programmable read only memory (PROM), etc., but is not limited thereto.

도 1에 도시된 수집부(120), 계산부(130), 예측부(140)는 적어도 하나의 기능이나 동작을 처리하는 단위를 의미하며, 이는 소프트웨어 및/또는 하드웨어로 구현될 수 있다. 하드웨어 구현에 있어, 상술한 기능을 수행하기 위해 디자인된 ASIC(application specific integrated circuit), DSP(digital signal processing), PLD(programmable logic device), FPGA(field programmable gate array), 프로세서, 마이크로프로세서, 다른 전자 유닛 또는 이들의 조합으로 구현될 수 있다. The collection unit 120 , the calculation unit 130 , and the prediction unit 140 illustrated in FIG. 1 mean a unit that processes at least one function or operation, which may be implemented as software and/or hardware. In hardware implementation, application specific integrated circuit (ASIC), digital signal processing (DSP), programmable logic device (PLD), field programmable gate array (FPGA), processor, microprocessor, other It may be implemented as an electronic unit or a combination thereof.

소프트웨어 구현에 있어, 소프트웨어 구성 컴포넌트(요소), 객체 지향 소프트웨어 구성 컴포넌트, 클래스 구성 컴포넌트 및 작업 구성 컴포넌트, 프로세스, 기능, 속성, 절차, 서브 루틴, 프로그램 코드의 세그먼트, 드라이버, 펌웨어, 마이크로 코드, 데이터, 데이터베이스, 데이터 구조, 테이블, 배열 및 변수를 포함할 수 있다. 소프트웨어, 데이터 등은 메모리에 저장될 수 있고, 프로세서에 의해 실행된다. 메모리나 프로세서는 당업자에게 잘 알려진 다양한 수단을 채용할 수 있다.In software implementation, software composition component (element), object-oriented software composition component, class composition component and task composition component, process, function, attribute, procedure, subroutine, segment of program code, driver, firmware, microcode, data , databases, data structures, tables, arrays, and variables. Software, data, etc. may be stored in a memory and executed by a processor. The memory or processor may employ various means well known to those skilled in the art.

도 2는 본 발명의 일실시예에 따른 오탐지 예방을 위한 조기 경보 구현과정을 보여주는 흐름도이다. 도 2를 참조하면, 정비 또는 계절변화로 인해 발전 설비의 상태가 변해 다수의 예측모델에서 일정 시간내 오탐지(false alarm)가 빈번하게 발생하는 경우, 트리거(Trigger) 기능을 이용하여 사용자에게 모델 재학습을 요청한다(단계 S210). 이러한 모델 재학습을 요청하기 위한 이벤트 빈도를 통한 트리거 작동을 보여주는 예시가 도 3에 도시된다. 이에 대해서는 후술하기로 한다.2 is a flowchart illustrating an early warning implementation process for preventing false detection according to an embodiment of the present invention. Referring to FIG. 2 , when a false alarm frequently occurs within a certain time in a plurality of predictive models due to a change in the state of a power generation facility due to maintenance or seasonal change, a trigger function is used to provide the model to the user. Re-learning is requested (step S210). An example showing a trigger operation through an event frequency for requesting such model retraining is shown in FIG. 3 . This will be described later.

모델 재학습 요청이 있으면, 데이터를 처리하여 모델 재학습을 실행하는 절차(S200)가 실행된다. 모델 재학습 절차(S200)는 단계 S220 내지 단계 S270로 구성될 수 있다.If there is a request for model re-learning, a procedure ( S200 ) of processing data to execute model re-learning is executed. The model re-learning procedure ( S200 ) may consist of steps S220 to S270 .

먼저 수집부(120)를 통해 획득된 발전 설비의 데이터에 대해 신호별 현재 운전 데이터 평균값을 계산한다(단계 S220,S230). 신호별 현재 운전 데이터 평균값의 산출의 경우, 사용자가 학습에 필요한 전체기간을 조회하며, 개발 시스템은 현재 운전 데이터에 대한 각 신호(Tag)별 평균값과 선택된 과거 조회 기간에 대한 월별 평균값으로 각각 산출된다. 이를 표로 나타내면 다음과 같다. First, the average value of current operation data for each signal is calculated for the data of the power generation facility acquired through the collecting unit 120 (steps S220 and S230). In the case of calculating the average value of the current driving data for each signal, the user inquires the entire period required for learning, and the development system calculates the average value for each signal (tag) for the current driving data and the monthly average value for the selected past inquiry period. . This is shown in a table as follows.

위표에서, 열 "P1LAB~" 등은 신호(Tag)를 나타내며, 행은 신호별 조회기간에 대한 월별 평균값 산출을 나타낸다. 또한, 현재값은 현재 운전 데이터 평균값을 나타낸다.In the table above, columns "P1LAB~" and the like represent signals (tags), and rows represent monthly average calculations for the inquiry period for each signal. In addition, the present value represents an average value of the current driving data.

위표는 BFP(boiler feed pump, 보일러 급수펌프)에 대한 신호(Tag)로써 BFP는 발전설비의 급수계통에 해당하며 증기를 생성하기위해 보일러로 물을 공급해주는 보조설비에 해당한다. BFP는 펌프(Pump)와 모터(motor)부로 구성되며 상기에 표기된 신호(Tag)는 BFP의 입구/출구 온도, 입/출구의 차압(discharge pressure), 펌프 및 모터부의 베어링 진동 및 온도 신호를 각각 나타내며 설비의 운전데이터를 의미한다.The above table is a tag for BFP (boiler feed pump, boiler feed water pump), and BFP corresponds to the feed water system of the power generation facility and an auxiliary facility that supplies water to the boiler to generate steam. The BFP consists of a pump and a motor, and the signals indicated above are the inlet/outlet temperature of the BFP, the discharge pressure of the inlet/outlet, and the bearing vibration and temperature signals of the pump and the motor, respectively. It indicates the operation data of the facility.

부연하면, 사용자가 학습에 필요한 과거 이력 데이터의 조회 기간을 설정(약 1~3년)하면, 시스템은 정비이후부터 현재시점까지의 해당 모델에 포함된 신호(Tag)별 평균값을 각각 산출하고, 조회 구간에 대해 월별로 신호(Tag)별 데이터 평균값을 산출한다.In other words, when the user sets the inquiry period for past history data required for learning (about 1 to 3 years), the system calculates the average value for each signal (tag) included in the model from maintenance to the present time, For the inquiry section, the average value of data for each signal (tag) is calculated for each month.

이후, 각 신호(Tag)의 현재 운전 데이터 평균값을 기준으로 조회 기간(예를 들면 약 3년의 기간인 2016-10 ~ 2019-10) 내 월별 신호(Tag)별 데이터 평균값과 유사도를 측정한다(단계 S240). 이때, 측정 방식은 유클리디안 거리, 마하라노비스 거리, 맨해튼 거리 등의 거리 척도가 될 수 있다.Thereafter, the average value and similarity of the data for each monthly signal (tag) within the inquiry period (for example, about 3 years, 2016-10 ~ 2019-10) is measured based on the average value of the current driving data of each signal (tag) ( step S240). In this case, the measurement method may be a distance scale such as a Euclidean distance, a Maharanovis distance, or a Manhattan distance.

유클리디안 거리(Euclidean distance), 마할라노비스 거리(Mahalanobis Distance), 맨하탄(Manhattan distance)에 대한 수식은 아래와 같다. The equations for Euclidean distance, Mahalanobis Distance, and Manhattan distance are as follows.

유클리디안 거리는 두 데이터(x,y)(벡터)간 직선 거리를 측정하는 방식이다.Euclidean distance is a method of measuring the linear distance between two data (x,y) (vector).

마할라노비스 거리는 평균과의 거리가 표준편차의 몇 배인지 나타낸 방식이다. 따라서, 유클리디안 거리에 공분산(σ) 계산이 더해진 것으로 데이터의 공분산이 모두 0일 경우, 마할라노비스와 유클리디안 거리는 동일하다. 여기서, T는 변환행렬을 나타낸다.The Mahalanobis distance is a method of expressing how many times the standard deviation is the distance from the mean. Therefore, when the covariance of the data is 0 as the covariance (σ) calculation is added to the Euclidean distance, the Mahalanobis and the Euclidean distance are the same. Here, T denotes a transformation matrix.

맨하탄 거리는 두 벡터(x,y)를 잇는 가장 짧은 거리의 절대치를 합한 거리를 나타낸 방식이다.The Manhattan distance is a method that represents the sum of the absolute values of the shortest distances connecting two vectors (x,y).

따라서, 조회기간에 대해 현재 운전 데이터 평균값과 신호별 과거 데이터 평균값과의 거리값을 유클리디안 거리(Euclidean distance), 마할라노비스 거리(Mahalanobis Distance), 맨하탄(Manhattan distance)으로 각각 산출한다. 이를 정리하면 다음 표와 같다.Therefore, for the inquiry period, the distance value between the average value of the current driving data and the average value of the past data for each signal is calculated as Euclidean distance, Mahalanobis distance, and Manhattan distance, respectively. The table below summarizes this.

위 표에서 맨윗행은 현재 운전 데이터의 월별을 나타내고, 최좌측열은 과거 데이터의 월별을 나타낸다.In the table above, the top row represents the month of the current driving data, and the leftmost column represents the month of the past data.

이후, 투표를 통한 최적 학습구간의 추천이 이루어진다(단계 S250). 부연하면, 3가지의 거리 척도를 통해 유사도(similarity) 순위를 산출하고, 이 유사도 순위에 따라 가장 유사도가 높은 해당 월을 Voting(투표)하게 된다. 즉, 전체 산출된 거리값을 이용하여 유사도가 가장 높은 전체 이력 데이터 구간을 추천받게 된다. 3가지의 거리 척도 중 2가지 이상의 기법에서 추천 기간(월)이 일치하는 경우, 사용자에게 학습기간을 최종 추천기간으로 정보를 전달한다. After that, the recommendation of the optimal learning section through voting is made (step S250). In other words, a similarity ranking is calculated through three distance scales, and the month with the highest similarity is voted according to the similarity ranking. That is, the entire history data section having the highest similarity is recommended using the total calculated distance value. If the recommendation period (months) matches in two or more techniques among the three distance scales, the learning period is delivered to the user as the final recommendation period.

또한, 사용자가 2개월 이상의 데이터를 추천받고자 할 때에도, 유사도를 이용하여 월별 순위를 선정하고 Voting 결과에 따라 학습 데이터를 추천받게 된다.In addition, even when the user wants to be recommended for data for two months or more, the monthly ranking is selected using the similarity and learning data is recommended according to the voting result.

이후, 사용자는 최종 추천기간을 확정하기 위해 시각적으로 확인이 필요하며, 이때 모든 신호(Tag)를 대상으로 확인하기가 어려우므로 저차원으로 차원을 축소하여 3차원의 다변량 차원으로 데이터를 변환한다(단계 S260). 3차원 공간에서 조회기간에 대한 전체 이력 데이터(즉, 사용자가 임의로 조회한 기간에 대한 과거 데이터 셋), 추천 학습 구간(즉, Voting 결과로 추천된 데이터 셋), 현재 운전 데이터의 상태를 시각화하여 표현하고, 사용자는 이를 확인한 후에 추천 학습 구간 설정을 확정한다(단계 S261).After that, the user needs to visually confirm the final recommendation period. At this time, since it is difficult to check all the tags as a target, the data is converted into a three-dimensional multivariate dimension by reducing the dimension to a low dimension ( step S260). By visualizing the entire history data for the inquiry period (i.e., the past data set for the period arbitrarily inquired by the user), the recommended learning section (i.e., the data set recommended by the voting result), the status of the current driving data in a three-dimensional space expression, and after confirming this, the user confirms the recommended learning section setting (step S261).

물론, 단계 S261에서 추천 학습 구간이 확정되지 않으면, 단계 S230 내지 단계 S260가 다시 진행된다.Of course, if the recommended learning section is not determined in step S261, steps S230 to S260 proceed again.

한편, 가변(variable) 임계값(threshold) 설정을 통해 학습구간에서 학습되지 않을 수 있는 신호 패턴으로부터 야기될 수 있는 오탐지(false alarm)를 체계적으로 보완하는 기능을 수행할 수도 있다(단계 S270).Meanwhile, by setting a variable threshold, a function of systematically supplementing false alarms that may be caused by a signal pattern that may not be learned in the learning section may be performed (step S270) .

단계 S261에서, 확인결과, 추천 학습 구간이 확정되면, 대상 모델에 대한 업데이트가 실행되어 업데이트 모델이 생성된다(단계 S280).In step S261, when the recommended learning section is determined as a result of confirmation, an update is performed on the target model to generate an update model (step S280).

이후, 업데이트 모델을 이용하여 발전 설비에 대한 실시간 예측을 수행한다(단계 S290).Thereafter, real-time prediction of the power generation facility is performed using the updated model (step S290).

한편, 업데이트 모델이 생성되면, 학습 데이터 중심 - 현재 운전 데이터 중심 거리를 모니터링한다(단계 S280). 모니터링은 모델에 참여한 복수개의 신호(Tag)들에 대한 과거 데이터 중 학습(training)을 위해 선택된 데이터 셋에 대해 차원축소 후 재구축된 데이터 셋의 중심(평균)과 동일 신호(Tag)의 현재 운전 데이터의 차원축소후 재구축된 데이터 셋의 중심(평균)과의 거리를 감시한다. On the other hand, when the update model is generated, the learning data center - the current driving data center distance is monitored (step S280). Monitoring is the current operation of the same signal (tag) with the center (average) of the reconstructed data set after dimension reduction for the data set selected for training among the past data for a plurality of signals (tags) participating in the model After dimensionality reduction of the data, the distance from the center (average) of the reconstructed data set is monitored.

이후, 학습 데이터 중심과 현재 운전 데이터 중심의 거리가 멀어질수록 현재 상태가 학습패턴과 달라지고 있음을 의미한다. 사용자가 설정한 값 이상으로 중심간 거리가 멀어지고 이와 더불어 조기경보시스템에서 알람이 빈번하게 발생하게 되면 예측모델 업데이트가 필요함을 사용자에게 알려 줄 수 있다. 즉, 중심거리의 증가와 조기경보시스템의 알람 빈도 두 가지 정보를 이용하여 사용자는 모델 업데이트를 수행함으로써 오탐지(false alarm)를 미연에 방지하게 된다(단계 S281). Thereafter, as the distance between the center of the learning data and the center of the current driving data increases, it means that the current state is different from the learning pattern. If the center-to-center distance is greater than the value set by the user and the alarm frequently occurs in the early warning system, the user can be informed that the prediction model needs to be updated. That is, the user can prevent false alarms in advance by updating the model using the increase in the center distance and the alarm frequency of the early warning system (step S281).

도 3은 본 발명의 일실시예에 따른 이벤트 빈도를 이용한 트리거 작동을 보여주는 표이다. 도 3을 참조하면, 계절변화 또는 정비이후 설비 상태변화 시, 동일 신호(Tag)의 알람(Alarm)이 지속적으로 발생하게 되며, 이는 실제 설비의 이상징후로 인한 경보(alarm)가 아니라 설비 상태의 변화로 인해 예측모델의 학습패턴과 현재 패턴이 달라짐으로 인해 발생된 오탐지이다. 그러므로, 기존 모델의 재학습이 필요함을 알 수 있다. 따라서, 대상 모델과 알람 발생 빈도수를 기준으로 트리거(Trigger)를 사용하여 모델 재학습을 사용자에게 요청한다.3 is a table showing a trigger operation using an event frequency according to an embodiment of the present invention. Referring to FIG. 3 , when a season change or a facility state change after maintenance, an alarm of the same signal (Tag) is continuously generated, which is not an alarm due to an abnormal symptom of the actual facility, but rather the condition of the facility. It is a false detection that occurs because the learning pattern of the predictive model and the current pattern are different due to the change. Therefore, it can be seen that re-learning of the existing model is necessary. Therefore, the user is requested to relearn the model using a trigger based on the target model and the frequency of alarm occurrence.

도 4는 본 발명의 일실시예에 따른 2차원을 이용한 데이터 변화 시각화를 위한 차원 축소를 보여주는 개념도이다. 일반적으로, 발전설비와 같은 대형 플랜트에서는 다양한 설비가 존재하므로, 각 설비당 기계학습모델을 20개의 이상을 가동해야 한다. 이로 인해 관리해야 하는 신호(Tag)는 20×20~30 = 400~600개가 된다. 따라서, 설비가 늘어날수록 관리 태그(Tag)의 수도 급격히 늘어나므로 종래의 방식으로는 효율적으로 모델을 관리하기가 어렵다. 4 is a conceptual diagram illustrating a dimension reduction for data change visualization using two dimensions according to an embodiment of the present invention. In general, since there are various facilities in a large plant such as a power plant, 20 or more machine learning models must be operated for each facility. Due to this, the number of signals (tags) to be managed becomes 20×20~30 = 400~600. Therefore, as the number of equipment increases, the number of management tags rapidly increases, so it is difficult to efficiently manage the model in the conventional method.

또한, 모델 내 다수의 신호(Tag)에 대한 신호의 변화를 확인하기 위해서는, 2차원 그래프를 이용해 하나씩 전체 신호를 확인해야 하는 번거로움이 있다. 이러한 번거로움을 해소하기 위해 차원축소를 실시한다. 고차원 데이터 특성 중, 일부 특성으로 고차원 데이터의 표현이 가능하며, 저차원 공간으로 투영(Projection)시켜 차원을 줄여가는 방식이다(410,420,430,440). 도 4를 참조하면, 차원축소를 위해, 데이터의 분산이 최대가 되도록 고차원 데이터 셋(dataset)과 투영된 데이터 셋 간의 평균제곱거리를 최소화하는 축을 찾는 기법 또는 데이터의 분포를 학습하여 고차원 데이터 셋 간의 분리를 최적화하는 결정 경계에 따라, 데이터를 투영(projection)하는 기법을 도입할 수 있다.In addition, in order to check the signal change with respect to a plurality of signals (tags) in the model, it is inconvenient to check all signals one by one using a two-dimensional graph. Dimensional reduction is carried out to solve this inconvenience. Among the high-dimensional data characteristics, it is possible to express high-dimensional data with some characteristics, and it is a method of reducing the dimension by projecting it into a low-dimensional space (410, 420, 430, 440). Referring to FIG. 4 , for dimensionality reduction, a technique for finding an axis that minimizes the mean square distance between a high-dimensional data set and a projected data set so that the dispersion of data is maximized, or by learning the distribution of data between high-dimensional data sets Depending on the decision boundary that optimizes the separation, a technique for projecting the data can be introduced.

시각적으로 표현되어야 할 고차원 데이터 셋은 총 3가지로 나누어진다. 첫째, 사용자가 임의로 조회한 기간의 전체 데이터셋, 둘째, 평균값 기반 유사도 측정으로부터 추천된 추천 데이터셋, 마지막으로 변화된 발전 설비의 상태에 대한 현재 운전 데이터셋을 각각 시각적으로 도식하여야 한다. 앞서 설명한 바와 같이, 한 개의 예측모델 내에도 태그의 수(차원수)가 너무 많으므로 이를 한 개의 그래프에 도식화하기가 어렵다. 이를 위해 상기와 같이 저차원으로 축소시킨 3차원 공간에서 차원이 축소된 전체 태그에 대해 군집 정도를 한 번에 확인할 수 있다.The high-dimensional data set to be expressed visually is divided into three types. First, the entire data set of the period arbitrarily inquired by the user, second, the recommended data set recommended from the average value-based similarity measurement, and finally the current operation data set for the changed state of the power generation facility should be visually plotted. As described above, there are too many tags (dimensions) in one predictive model, so it is difficult to schematize them in one graph. To this end, it is possible to check the degree of clustering for all tags with reduced dimensions in the three-dimensional space reduced to the low dimension as described above.

도 5는 본 발명의 일실시예에 따른 차원 축소를 이용한 시각화 그래프이고, 도 6은 본 발명의 일실시예에 따른 학습 구간 제안의 시각화 그래프이다. 도 5 및 도 6을 참조하면, 학습성능을 높게 유지하면서 학습 데이터를 최소화하기 위한 방법으로 추천 데이터셋의 중심을 찾고 중심으로부터 군집밀도가 높은 데이터의 범위를 계산된 중심으로부터 확률분포의 뾰족한 정도를 나타내는 척도인 첨도(kurtosis)를 이용하여 산포가 정규분포에 가까운 형태의 데이터 범위를 추출하여 데이터 셋의 크기를 최적화할 수 있다. 사용자의 판단에 따라 학습 데이터의 거리를 임의로 조정가능하며, 선택한 거리에 따라 학습 데이터의 사이즈도 변경될 수 있다.5 is a visualization graph using dimension reduction according to an embodiment of the present invention, and FIG. 6 is a visualization graph of a learning section proposal according to an embodiment of the present invention. 5 and 6 , as a method for minimizing learning data while maintaining high learning performance, the center of the recommended dataset is found, and the range of data with high cluster density from the center is calculated, and the sharpness of the probability distribution is calculated from the center. The size of the data set can be optimized by extracting a data range whose distribution is close to a normal distribution using kurtosis, which is a measure that represents it. The distance of the learning data may be arbitrarily adjusted according to the user's judgment, and the size of the learning data may be changed according to the selected distance.

도 7은 본 발명의 일실시예에 따른 학습 데이터 추적 관리를 보여주는 그래프이다. 도 7을 참조하면, 현재 운전 데이터 셋과 추천된 데이터 셋을 월별로 또는 선택적으로 확인이 가능하며, 시스템으로부터 추천받은 데이터 셋을 모델 업데이트를 위한 학습 데이터로 설정여부를 결정한다. 또한, 학습 데이터 중심의 이동 경로를 추적(Trace) 관리함으로써 시스템 장기운영 시, 최적의 학습패턴에 정보를 제공할 수 있다.7 is a graph showing tracking management of learning data according to an embodiment of the present invention. Referring to FIG. 7 , the current driving data set and the recommended data set can be checked monthly or selectively, and it is determined whether the data set recommended by the system is set as training data for model update. In addition, it is possible to provide information on the optimal learning pattern during long-term operation of the system by tracing and managing the movement path centered on the learning data.

도 8은 본 발명의 일실시예에 따른 설비 이력 데이터를 보여주는 도면이다. 특히, 도 8은 발전소 보조기기(balance of plant) 설비 중 CEP(Changjo Energy Plant) 설비에 대한 이력 데이터이다. 도 8을 참조하면, 조회기간 중에 3회의 발전 설비의 정비가 발생되었으며 정비 전후 발전 설비의 상태에 따라 신호의 패턴이 조금씩 달라지는 것을 확인할 수 있다. 8 is a view showing facility history data according to an embodiment of the present invention. In particular, FIG. 8 is historical data for a CEP (Changjo Energy Plant) facility among power plant auxiliary equipment (balance of plant) facilities. Referring to FIG. 8 , it can be seen that the maintenance of the power generation facility occurred three times during the inquiry period, and the signal pattern slightly changed according to the state of the power generation facility before and after the maintenance.

도 9는 본 발명의 일실시예에 따른 설비에 대한 추천된 추천 데이터 셋의 결과를 보여주는 그래프이다. 도 9를 참조하면, 거리를 이용한 유사도 기법의 유효성을 검토하기 위해 CEP 설비에 대해 현재 운전 데이터셋을 과거 데이터 셋(2016.7월～2018.4월)로부터 임의로 10개의 달을 선택했다. 선택된 10개 사례에 대해 3가지 유사도 기법이 추천한 데이터 셋이 투표(Voting)를 통해 데이터 셋 중 2가지 이상이 동일한 달을 추천 해주는지 검토하였다. 10개의 사례에 대해 2가지 이상의 유사도 기법이 같은 달을 추천한 것을 확인하였다.9 is a graph showing a result of a recommended data set recommended for a facility according to an embodiment of the present invention. Referring to FIG. 9 , in order to examine the effectiveness of the similarity method using distance, 10 months were randomly selected from the past data sets (July 2016 to April 2018) as the current driving data set for the CEP facility. For the 10 selected cases, it was reviewed whether the data set recommended by the three similarity techniques recommends the same month at least two of the data sets through voting. It was confirmed that two or more similarity methods recommended the same month for 10 cases.

도 10은 본 발명의 일실시예에 따른 유사도 기반 학습 데이터의 추천 결과를 보여주는 표이다. 도 10을 참조하면, CEP(Condensate Extraction Pump) 설비뿐만 아니라 BFP(Boiler Feedwater Pump), CWP(Circulating Water Pump) 펌프 2종과 FAB(Fluidizing Air Blower), IDF(Induced Draft Fan), PAF(Primary Air Fan), SAF(Secondary Air Fan) 4종에 대해 동일한 조회기간에 대해 오탐지(false alarm) 방지를 위한 학습데이터 선정을 실시하였다. 각 설비는 동일한 설비가 이중으로 구성되고 2개 호기에 각각 설치되므로 총 28개의 사례를 확인할 수 있다.10 is a table showing a recommendation result of similarity-based learning data according to an embodiment of the present invention. Referring to FIG. 10, as well as CEP (Condensate Extraction Pump) equipment, BFP (Boiler Feedwater Pump), CWP (Circulating Water Pump) pumps, FAB (Fluidizing Air Blower), IDF (Induced Draft Fan), PAF (Primary Air) Fan) and SAF (Secondary Air Fan) were selected for training data to prevent false alarms for the same inquiry period. In each facility, a total of 28 cases can be confirmed as the same facility is configured in duplicate and is installed in two units respectively.

조회기간 2016.7월부터 2018.4월까지에 대해 설비별로 현재 운전 데이터 셋을 설정하고 과거 이력데이터 셋(매월)과 평균값을 이용하여 유사도(유클리디안 거리, 마할라노비스 거리, 맨하튼 거리)를 측정한 결과, 전체 사례 중 약 92%가 동일한 기간(달)을 학습 데이터로 추천하였으며, 8%만이 3가지 유사도가 각기 다른 데이터 셋(달)을 추천한 것을 확인할 수 있다. The result of measuring the similarity (Euclidean distance, Mahalanobis distance, Manhattan distance) using the historical data set (monthly) and the average value after setting the current operation data set for each facility for the inquiry period from July 2016 to April 2018 , it can be seen that about 92% of all cases recommended the same period (month) as the training data, and only 8% recommended three different sets of data (months) with different degrees of similarity.

이 중 일부는 실제 설비의 상태가 변화되어 현재의 상태가 과거 이력데이터와는 완전히 다른 데이터 군집을 이루고 있는 경우를 포함하고 있어, 이를 제외한다면 유사도 기반의 성능은 더 높아질 수 있다. Some of these include cases in which the current state forms a completely different data cluster from the past historical data due to a change in the actual equipment state.

도 11 내지 도 13은 본 발명의 일실시예에 따른 발전 설비에 대한 시각화를 보여주는 그래프이다. 특히, 도 11은 1호기 IDF 설비에 대한 시각화이다. 전체 고차원 데이터 셋(1110)이 추천 데이터 셋, 과거 데이터 셋, 현재 운전 데이터 셋(1120)으로 시각화된다.11 to 13 are graphs showing visualization of a power generation facility according to an embodiment of the present invention. In particular, FIG. 11 is a visualization of the IDF facility of Unit 1. The entire high-dimensional data set 1110 is visualized as a recommendation data set, a past data set, and a current driving data set 1120 .

도 12는 1호기 SAF 설비에 대한 시각화이다. 전체 고차원 데이터 셋(1210)이 추천 데이터 셋, 과거 데이터 셋, 현재 운전 데이터 셋(1220)으로 시각화된다.12 is a visualization of the SAF facility at Unit 1. The entire high-dimensional data set 1210 is visualized as a recommendation data set, a past data set, and a current driving data set 1220 .

도 13은 1호기 SAF 설비에 대한 시각화이다. 전체 고차원 데이터 셋(1310)이 추천 데이터 셋, 과거 데이터 셋, 현재 운전 데이터 셋(1320)으로 시각화된다.13 is a visualization of the SAF facility at Unit 1. The entire high-dimensional data set 1310 is visualized as a recommendation data set, a past data set, and a current driving data set 1320 .

도 14는 본 발명의 일실시예에 따른 학습 데이터 셋의 분포를 이용한 크기 선택을 보여주는 도면이다. 도 14를 참조하면, 추천된 추천 데이터 셋에 대해서도 데이터가 분포된 형상에 따라 데이터의 밀집된 부분을 구별해 시스템으로부터 데이터 크기(1410)를 추천받거나 현재 운전 데이터 셋으로부터 거리를 이용하여 크기를 추천받는다. 또한, 사용자가 임의의 데이터 셋(1420)을 선택할 수 있도록 구성될 수 있다. 즉, 학습데이터 크기를 선정하지 않고, 사용자가 직접 드래그엔 드롭 방식으로 영역을 설정하여 크기를 선정할 수 있다.14 is a diagram illustrating size selection using a distribution of a training data set according to an embodiment of the present invention. Referring to FIG. 14 , even for the recommended recommended data set, a data size 1410 is recommended from the system by distinguishing a dense portion of data according to a shape in which the data is distributed, or a size is recommended using the distance from the current driving data set. . Also, it may be configured so that a user can select an arbitrary data set 1420 . That is, without selecting the size of the training data, the user can directly set the area using a drag-and-drop method to select the size.

도 15는 본 발명의 일실시예에 따른 학습 데이터 셋의 중심으로부터 거리를 이용한 크기 선택을 보여주는 도면이다. 도 15를 참조하면, 학습 데이터 셋(1510)에서 학습 데이터 셋의 중심으로부터 거리를 이용한 크기 선택(1520)가 가능하다.15 is a diagram illustrating size selection using a distance from the center of a training data set according to an embodiment of the present invention. Referring to FIG. 15 , in the training data set 1510 , size selection 1520 using a distance from the center of the training data set is possible.

도 16은 본 발명의 일실시예에 따른 모델 수정을 위한 정보 제공을 보여주는 도면이다. 도 16을 참조하면, 학습모델의 중심을 추적(trace) 함으로써 설비의 상태변화를 파악하기가 쉽고 학습데이터의 중심과 현재 상태의 중심간 거리가 일정 거리 이상이 되고 알람 발생이 많아질수록 모델 재수정이 필요한 상황임을 사용자에게 알려줄 수 있다.16 is a diagram illustrating information provision for model correction according to an embodiment of the present invention. Referring to FIG. 16 , by tracing the center of the learning model, it is easy to understand the change in the state of the equipment, and the distance between the center of the learning data and the center of the current state becomes more than a certain distance, and the more alarms occur, the more the model is re-corrected. You can inform the user that this is a necessary situation.

도 17는 본 발명의 일실시예에 따른 학습 데이터 패턴의 추적을 보여주는 도면이다. 도 17을 참조하면, 학습데이터의 패턴을 장기적으로 축적 시, 전주기에 걸쳐 정교하고 최적화된 학습데이터 셋 구축이 가능하며, 이를 통해 예측 모델 성능 향상뿐만 아니라 오탐지(false alarm)도 미연에 방지할 수 있다.17 is a diagram showing tracking of a learning data pattern according to an embodiment of the present invention. Referring to FIG. 17 , when patterns of learning data are accumulated over a long period of time, it is possible to construct a sophisticated and optimized learning data set over the entire cycle, thereby improving predictive model performance as well as preventing false alarms in advance. can do.

도 18은 본 발명의 일실시예에 따른 정적 임계치로 인한 오탐지 발생을 보여주는 도면이다. 도 18을 참조하면, 앞서 모델의 학습 데이터를 최적화함으로써 오탐지(False) 알람을 방지하였다면, 후처리에서는 잔차의 정적 임계치 조건에서 알람을 발생시키지 않는 트리거(Triggering) 기능을 조합하여 예측모델의 한계를 추가적으로 보완할 수 있다. 즉, 정적 임계치의 이상 또는 이하에는 알람이 발생되지 않는다.18 is a diagram illustrating occurrence of false positives due to a static threshold according to an embodiment of the present invention. Referring to FIG. 18 , if false alarms were prevented by optimizing the training data of the model previously, in post-processing, a triggering function that does not generate an alarm under the static threshold condition of the residual is combined to limit the predictive model. can be further supplemented. That is, no alarm is generated above or below the static threshold.

도 19는 본 발명의 일실시예에 따른 가변 임계치로 인한 오탐지 발생을 보여주는 도면이다. 도 19를 참조하면, 앞서 모델의 학습 데이터를 최적화함으로써 오탐지(False) 알람을 방지하였다면, 후처리에서는 잔차의 가변 임계치 조건에서 알람을 발생시키지 않는 트리거(Triggering) 기능을 조합하여 예측모델의 한계를 추가적으로 보완할 수 있다.19 is a diagram illustrating occurrence of a false detection due to a variable threshold according to an embodiment of the present invention. Referring to FIG. 19 , if false alarms were prevented by optimizing the training data of the model previously, in post-processing, a triggering function that does not generate an alarm under the variable threshold condition of the residual is combined to limit the predictive model. can be further supplemented.

도 20은 본 발명의 일실시예에 따른 최적화 전후를 비교하는 그래프이다. 도 20을 참조하면, 최적화전(2010)에서는 센서값이 자주 발생하나 최적화후(2020)에서는 센서값의 빈도가 줄어든다. 또한, 최적화전(2010)에는 신뢰도(reliability)가 83.5002%임에 반해, 최적화후(2020)에서는 신뢰도가 97.2979%이다.20 is a graph comparing before and after optimization according to an embodiment of the present invention. Referring to FIG. 20 , the sensor value frequently occurs before optimization (2010), but the frequency of the sensor value decreases after optimization (2020). In addition, the reliability before optimization (2010) is 83.5002%, whereas after optimization (2020), the reliability is 97.2979%.

또한, 여기에 개시된 실시형태들과 관련하여 설명된 방법 또는 알고리즘의 단계들은, 마이크로프로세서, 프로세서, CPU(Central Processing Unit) 등과 같은 다양한 컴퓨터 수단을 통하여 수행될 수 있는 프로그램 명령 형태로 구현되어 컴퓨터 판독 가능 매체에 기록될 수 있다. 상기 컴퓨터 판독 가능 매체는 프로그램 (명령) 코드, 데이터 파일, 데이터 구조 등을 단독으로 또는 조합하여 포함할 수 있다. In addition, the steps of the method or algorithm described in relation to the embodiments disclosed herein are implemented in the form of program instructions that can be executed through various computer means such as a microprocessor, a processor, a CPU (Central Processing Unit), etc. It can be recorded on any available medium. The computer-readable medium may include program (instructions) codes, data files, data structures, etc. alone or in combination.

상기 매체에 기록되는 프로그램 (명령) 코드는 본 발명을 위하여 특별히 설계되고 구성된 것들이거나 컴퓨터 소프트웨어 당업자에게 공지되어 사용 가능한 것일 수도 있다. 컴퓨터 판독 가능 기록 매체의 예에는 하드 디스크, 플로피 디스크 및 자기 테이프 등과 같은 자기 매체(magnetic media), CD-ROM, DVD, 블루레이 등과 같은 광기록 매체(optical media) 및 롬(ROM), 램(RAM), 플래시 메모리 등과 같은 프로그램 (명령) 코드를 저장하고 수행하도록 특별히 구성된 반도체 기억 소자가 포함될 수 있다. The program (instructions) code recorded on the medium may be specially designed and configured for the present invention, or may be known and available to those skilled in the art of computer software. Examples of the computer-readable recording medium include magnetic media such as hard disks, floppy disks and magnetic tapes, optical media such as CD-ROMs, DVDs, Blu-rays, and the like, and ROM, RAM ( A semiconductor memory device specially configured to store and execute program (instruction) code such as RAM), flash memory, and the like may be included.

여기서, 프로그램 (명령) 코드의 예에는 컴파일러에 의해 만들어지는 것과 같은 기계어 코드뿐만 아니라 인터프리터 등을 사용해서 컴퓨터에 의해서 실행될 수 있는 고급 언어 코드를 포함한다. 상기된 하드웨어 장치는 본 발명의 동작을 수행하기 위해 하나 이상의 소프트웨어 모듈로서 작동하도록 구성될 수 있으며, 그 역도 마찬가지이다.Here, examples of the program (instruction) code include not only machine language codes such as those generated by a compiler but also high-level language codes that can be executed by a computer using an interpreter or the like. The hardware devices described above may be configured to operate as one or more software modules to perform the operations of the present invention, and vice versa.

100: 오탐지 예방을 위한 조기 경보 시스템
110: 통신부
120: 수집부
130: 계산부
140: 예측부
150: 출력부
160: 저장부100: Early warning system to prevent false positives
110: communication department
120: collection unit
130: calculator
140: prediction unit
150: output unit
160: storage

Claims

a collection unit 120 for collecting data of power generation facilities;
Calculating data for each signal using the data of the power generation facility, setting the final learning section by performing dimension reduction and data visualization through a recommended learning section recommended using the data for each signal, and updating the target model to update the model a calculation unit 130 that executes model re-learning to generate ; and
a prediction unit 140 for generating prediction information for real-time prediction of the power generation facility in order to prevent false detection by using the update model;
Early warning system for the prevention of false positives comprising a.

The method of claim 1,
The early warning system for preventing false detection, characterized in that the data for each signal is an average value of current operation data for each signal for the data of the power generation facility and an average value of historical data for each signal of the selected past inquiry period.

3. The method of claim 2,
The early warning system for preventing false detection, characterized in that the similarity with the average value of the past data for each signal within the past inquiry period is calculated based on the average value of the current driving data for each signal.

4. The method of claim 3,
The early warning system for the prevention of false positives, characterized in that the similarity is calculated using a distance scale including the Euclidean distance, the Maharanovis distance, and the Manhattan distance.

5. The method of claim 4,
The recommended learning interval is calculated using the final recommendation period determined when the month coincides in two or more distance scales among the distance scales through voting for the month with the highest similarity according to the ranking of the similarity Early warning system for detection prevention.

6. The method of claim 5,
The dimensional reduction is an early warning system for preventing false detection, characterized in that the data for each signal is reduced to a lower dimension and converted into a multivariate dimension of a three-dimensional space.

7. The method of claim 6,
The dimensional reduction is a technique for finding an axis that minimizes the mean square distance between a high-dimensional data set and a projected data set so that the variance of the data for each signal is maximized, or by learning the distribution of the data for each signal to separate the high-dimensional data sets An early warning system for preventing false positives, characterized in that it is performed using a technique of projecting data according to a decision boundary that optimizes .

8. The method of claim 7,
The high-dimensional data set includes an entire data set for a period arbitrarily inquired by a user, a recommended data set recommended from average value-based similarity measurement, and a current operating data set for the last changed state of the power generation facility, characterized in that it includes Early warning system to prevent false positives.

9. The method of claim 8,
The size of the high-dimensional data set is determined by finding the center of the recommended data set and using kurtosis, which is a measure indicating the sharpness of the probability distribution from the center, from which the range of data with high cluster density is calculated from the center. An early warning system for the prevention of false positives, characterized in that it is extracted into a data range of a form close to a normal distribution.

9. The method of claim 8,
The early warning system for preventing false detection, characterized in that the size of the high-dimensional data set is selected through a user's drag-and-drop method.

The method of claim 1,
An early warning system for preventing false detection, characterized in that a variable threshold value with a variable value is set in order to prevent a false alarm that may be caused by a signal pattern that may not be learned in the recommended learning section.

The method of claim 1,
An early warning system for preventing false detection, characterized in that a static threshold value is set to prevent false alarms that may be caused by signal patterns that may not be learned in the recommended learning section.

(a) the collecting unit 120 collecting the data of the power generation facility;
(b) the calculation unit 130 calculates data for each signal using the data of the power generation facility, and performs dimension reduction and data visualization through a recommended learning section recommended using the data for each signal to set the final learning section and executing model retraining to update the target model to generate an updated model; and
(c) generating, by the prediction unit 140, prediction information for real-time prediction of the power generation facility to prevent false detection by using the update model;
An early warning method for preventing false positives comprising a.

14. The method of claim 13,
The early warning method for preventing false detection, characterized in that the data for each signal is an average value of current operation data for each signal for the data of the power generation facility and an average value of historical data for each signal of the selected past inquiry period.

15. The method of claim 14,
The early warning method for preventing false detection, characterized in that the similarity with the average value of the past data for each signal within the past inquiry period is calculated based on the average value of the current driving data for each signal.

16. The method of claim 15,
The early warning method for preventing false positives, characterized in that the similarity is calculated using a distance scale including a Euclidean distance, a Maharanovis distance, and a Manhattan distance.

17. The method of claim 16,
The recommended learning interval is calculated using the final recommendation period determined when the month coincides in two or more distance scales among the distance scales through voting for the month with the highest similarity according to the ranking of the similarity Early warning method to prevent detection.

18. The method of claim 17,
The dimensional reduction is an early warning method for preventing false detection, characterized in that the data for each signal is dimensionally reduced to a lower dimension and converted into a multivariate dimension of a three-dimensional space.

19. The method of claim 18,
The dimensional reduction is a technique for finding an axis that minimizes the mean square distance between a high-dimensional data set and a projected data set so that the variance of the data for each signal is maximized, or by learning the distribution of the data for each signal to separate the high-dimensional data sets An early warning method for preventing false positives, characterized in that it is performed using a technique of projecting data according to a decision boundary that optimizes .

14. The method of claim 13,
The early warning method for preventing false detection, characterized in that a variable threshold value with a variable value is set in order to prevent a false alarm that may be caused by a signal pattern that may not be learned in the recommended learning section.