KR101843879B1

KR101843879B1 - System for detecting abnormality data, method thereof and computer recordable medium storing the method

Info

Publication number: KR101843879B1
Application number: KR1020170080742A
Authority: KR
Inventors: 진병복; 김동중; 정용국; 박수미; 조세이; 김승연; 최유경; 김민정
Original assignee: 한국환경공단
Priority date: 2017-06-26
Filing date: 2017-06-26
Publication date: 2018-03-30

Abstract

The present invention relates to an abnormal data detection system, a method for the same, and a computer-readable recording medium in which the method is recorded. The present invention collects data about emission materials and states through a measurement device installed in a fluid discharge facility including pollutants, determines the operating state of the facility discharging fluid by applying hierarchical statistical technique to the collected data, and determines abnormality of the collected data about each operation state. The abnormal data detection system, which collects and analyzes measurement value signals from a plurality of measurement devices outputting the measurement value signals by measuring the state of fluid and a plurality of materials included in the fluid, includes: a communication unit which receives the measurement value signals from the plurality of measurement devices; a database unit which stores the measurement value signals received through the communication unit as measurement data; a user interface unit which provides an interface for the selection of detection criteria and abnormal data detection objects and provides an interface displaying a detection result; and a detection unit which analyzes the measurement data stored in the database unit by using the detection criteria and the abnormal data detection objects selected through the user interface unit, determines the operation state of the facility in which the plurality of measurement devices is installed according to the analyzed result, selects a singular value depending on the determined operation state, and displays the determination result and the selection result through the user interface unit as the detection result. The present invention is able to precisely detect abnormal data.

Description

TECHNICAL FIELD The present invention relates to an abnormal data detection system, a method for the same, and a computer readable recording medium on which the method is recorded.

본 발명은 이상 자료 검출 시스템, 이를 위한 방법 및 이 방법이 기록된 컴퓨터로 판독 가능한 기록 매체에 관한 것으로, 보다 자세하게는 오염물질을 포함하는 유체 배출 시설에 설치된 측정기기를 통하여 배출 물질 및 상태에 관한 자료를 수집하고, 수집된 자료에 계층적 통계 기법을 적용하여 유체를 배출하는 시설의 가동 상태를 판별하며, 각각의 가동 상태에 대한 수집 자료의 이상 여부를 판단하는 이상 자료 검출 시스템, 이를 위한 방법 및 이 방법이 기록된 컴퓨터로 판독 가능한 기록 매체에 관한 것이다.The present invention relates to an abnormal data detection system, a method for the same, and a computer-readable recording medium on which the method is recorded. More particularly, the present invention relates to a data detection system, An abnormal data detection system that collects data, identifies the operation state of a facility that discharges fluid by applying hierarchical statistical techniques to the collected data, and determines whether or not the collected data is abnormal for each operation state. And a computer-readable recording medium on which the method is recorded.

오염물질을 포함하는 액체 또는 기체를 배출하는 시설에 의한 환경오염을 예방하기 위하여 다양한 자동측정 모니터링 시스템에 의해 오염물질의 상시 측정이 수행되고 있다. 예를 들면, 굴뚝이 설치된 사업장의 배출 기체 내에 포함된 오염물질에 의한 대기 오염을 예방하기 위해 전국 대기오염물질 배출 사업장에 대하여 먼지(TSP, Total Suspended Particles), 질소산화물(NOx), 황산화물(SOx), 염화수소(HCl), 불화수소(HF), 암모니아(NH₃), 일산화탄소(CO)와 같은 7개 오염물질을 24시간 상시 감시할 수 있는 굴뚝자동측정시스템(CleanSYS)이 설치되었으며, 이를 통해 전국 1,500개 이상의 굴뚝에서 5분 간격으로 측정 자료가 수집될 수 있다.In order to prevent environmental pollution caused by a facility for discharging a liquid containing a pollutant or a gas, a continuous measurement of pollutants is carried out by various automatic measurement monitoring systems. For example, in order to prevent air pollution caused by pollutants contained in the exhaust gas of a workplace where a chimney is installed, the air pollutants emission sites (TSP, Total Suspended Particles), nitrogen oxides (NOx), sulfur oxides (CleanSYS), which can continuously monitor the seven pollutants such as SOx, HCl, HF, NH3, and CO during 24 hours, Measurements can be collected every 5 minutes from more than 1,500 chimneys nationwide.

그러나, 종래의 자동측정 모니터링 시스템은, 많은 수의 측정기기와 연계된 시스템 운영을 통하여 방대한 양의 자료를 축적함에도 불구하고 자료의 신뢰성이 보장되지 않기 때문에 축적된 자료를 단순히 실시간 배출감시 및 총량관리를 위한 행정자료로 활용하는 것에 그치는 문제점이 있다.However, since the conventional automatic measurement monitoring system accumulates a large amount of data through the operation of a system connected with a large number of measuring instruments, the reliability of the data can not be guaranteed, so that the accumulated data can be simply monitored in real- And to use it as administrative data for

즉, 종래의 자동측정 모니터링 시스템에 수집되는 자료의 품질을 관리하는 데 있어서는, 측정기기 설치 시설의 업종별 오염물질의 배출특성, 시설 환경의 영향 등을 고려하지 않고, 측정기기의 측정값과 정상, 동작불량, 전원단절, 보수중의 4가지 상태정보, 시설의 운영 현황 기록 등을 바탕으로 대상 시설에 대해 이해하고 있는 전문 인력이 축적된 자료 중 이상 자료를 선별하므로, 자료 확정 과정에서 전문 인력 간 경험, 역량의 차이로 인해 확정된 자료의 품질 관리 수준에 차이가 발생하게 되어 자료의 신뢰성이 담보될 수 없는 문제점이 있다.That is, in managing the quality of the data collected in the conventional automatic measurement monitoring system, it is necessary to consider the measurement value of the measuring apparatus and the normal, Based on the 4 status information of faulty operation, power failure, and maintenance, and records of operation status of the facilities, it is possible to select the abnormal data among the accumulated data of experts who understand the target facilities. Experience, and competence, there is a problem in that the reliability of the data can not be guaranteed.

등록특허공보 제10-1199924호Patent Registration No. 10-1199924

본 발명의 목적은 상기 종래 기술의 문제점을 해결하기 위하여, 오염물질 배출 시설로부터 수집된 자료의 품질 관리를 위하여 통계적 처리 기법을 도입하고, 이를 통하여 전문 인력의 역량 차이에 따른 자료 품질의 편차를 해소할 수 있는 이상 자료 검출 시스템, 이를 위한 방법 및 이 방법이 기록된 컴퓨터로 판독 가능한 기록 매체를 제공하는데 있다.An object of the present invention is to solve the problems of the prior art by introducing statistical processing techniques for quality control of collected data from pollutant discharge facilities and thereby eliminating variations in data quality due to differences in competence of skilled workers A method for the same, and a computer-readable recording medium on which the method is recorded.

또한, 본 발명의 목적은 오염물질 배출 시설로부터 수집된 자료의 분포 형태에 따라 배출 시설의 가동 상태를 파악하고 해당 가동 상태에 대응하는 이상 자료 선별 필터를 적용함으로써 이상 자료를 정교하게 검출할 수 있는 이상 자료 검출 시스템, 이를 위한 방법 및 이 방법이 기록된 컴퓨터로 판독 가능한 기록 매체를 제공하는데 있다.It is also an object of the present invention to provide a method and apparatus for accurately detecting abnormal data by detecting an operation state of a discharge facility according to the distribution pattern of data collected from a pollutant discharge facility and applying an abnormal data selection filter corresponding to the operation state An abnormal data detection system, a method for the same, and a computer readable recording medium on which the method is recorded.

상기 목적을 달성하기 위한 본 발명의 이상 자료 검출 시스템은, 유체에 포함된 복수개의 물질 및 유체의 상태를 측정하여 측정값 신호를 출력하는 복수개의 측정기기로부터 상기 측정값 신호를 수집하여 분석하는 이상 자료 검출 시스템에 있어서, 상기 복수개의 측정기기로부터 상기 측정값 신호를 수신하는 통신부; 상기 통신부를 통하여 수신한 상기 측정값 신호를 측정 데이터로서 저장하는 데이터 베이스부; 이상 자료 검출 대상 및 검출 기준의 선택을 위한 인터페이스를 제공하고, 검출 결과를 표시하는 인터페이스를 제공하는 사용자 인터페이스부; 및 상기 사용자 인터페이스부를 통하여 선택된 상기 이상 자료 검출 대상 및 상기 검출 기준을 이용하여 상기 데이터 베이스부 내에 저장된 측정 데이터를 분석하며, 분석한 결과에 따라 상기 복수개의 측정기기가 설치된 시설의 가동 상태를 판정하고, 판정된 가동 상태 별로 특이값을 선별하며, 판정 결과 및 선별 결과를 상기 검출 결과로서 상기 사용자 인터페이스부를 통하여 표시하는 검출부를 포함한다.In order to accomplish the above object, the abnormal data detection system of the present invention includes an abnormality detection system for collecting and analyzing the measurement value signals from a plurality of measurement devices for measuring states of a plurality of substances contained in a fluid and a fluid, A data detection system comprising: a communication unit for receiving the measurement value signal from the plurality of measurement devices; A data base for storing the measurement value signal received through the communication unit as measurement data; A user interface unit for providing an interface for selection of an abnormal data detection target and a detection criterion and providing an interface for displaying a detection result; And analyzing measurement data stored in the database unit using the abnormal data detection object and the detection criterion selected through the user interface unit and determining an operation state of the facility equipped with the plurality of measuring instruments according to a result of the analysis, And a detection unit for selecting a specific value according to the determined operation state and displaying the determination result and the selection result through the user interface unit as the detection result.

여기서, 상기 검출부는, 상기 데이터 베이스부 내에 저장된 측정 데이터에 대하여 상기 복수개의 측정기기가 설치된 시설의 가동 상태 별 측정 데이터의 분포 범위를 정의하고, 상기 통신부를 통하여 입력된 측정값 신호를 정의된 분포 범위와 비교하여 상기 복수개의 측정기기가 설치된 시설의 가동 상태를 판정하는 패턴 분석부를 포함할 수 있다.Here, the detection unit may define a distribution range of measurement data for each operation state of a facility in which the plurality of measurement devices are installed, with respect to the measurement data stored in the database unit, and transmit the measured value signal input through the communication unit to a defined distribution range And a pattern analyzer for determining the operation state of the facility in which the plurality of measuring instruments are installed.

또한, 상기 패턴 분석부는, 상기 측정 데이터에 대하여 가동 상태의 개수에 대응하는 집합으로 분할하는 K평균++ 클러스터링 분석을 실행하고, K평균++ 클러스터링 분석 실행 결과에 따라 도출된 복수개의 집합의 분포 범위를 상기 가동 상태 별 측정 데이터의 분포 범위로 정의할 수 있다.Also, the pattern analyzing unit may perform K average ++ clustering analysis for dividing the measurement data into a set corresponding to the number of operating states, and calculate a distribution of a plurality of sets derived from the K average ++ clustering analysis execution result Range can be defined as a distribution range of measurement data for each operating state.

한편, 상기 검출부는, 상기 측정기기에 의해 측정되는 복수개의 물질 및 유체의 상태에 관한 복수개의 항목 중 주성분 요소 분석(Principal Component Analysis)에 의해 복수개의 주요 항목을 선택하고, 선택된 복수개의 주요 항목에 관한 상기 데이터 베이스부 내에 저장된 측정 데이터에 대해 T²-통계량을 계산하며, 계산된 T²-통계량을 T²-통계량에 근거한 상한 관리값과 비교하여 상기 데이터 베이스부 내에 저장된 측정 데이터의 특이값을 선별하는 주성분 요소 분석부를 포함할 수 있다.Meanwhile, the detecting unit may select a plurality of main items by Principal Component Analysis among a plurality of items related to states of a plurality of substances and fluids measured by the measuring apparatus, Calculating a T²-statistic for the measurement data stored in the database unit, comparing the calculated T²-statistic with an upper-limit management value based on the T²-statistic, and selecting a specific value of the measurement data stored in the database unit And an element analysis unit.

또한, 상기 검출부는, 상기 측정기기에 의해 측정되는 복수개의 물질 및 유체의 상태에 관한 복수개의 항목 중 주성분 요소 분석(Principal Component Analysis)에 의해 복수개의 주요 항목을 선택하고, 선택된 복수개의 주요 항목에 관한 상기 데이터 베이스부 내에 저장된 측정 데이터에 대해 Q-통계량을 계산하며, 계산된 Q-통계량을 Q-통계량에 근거한 상한 관리값과 비교하여 상기 데이터 베이스부 내에 저장된 측정 데이터의 특이값을 선별하는 주성분 요소 분석부를 포함할 수 있다.The detecting unit may be configured to select a plurality of main items by Principal Component Analysis among a plurality of items related to states of a plurality of substances and fluids measured by the measuring apparatus, A Q-statistic is calculated for the measurement data stored in the database unit with respect to the Q-statistics, and the main component for selecting the specific value of the measurement data stored in the database unit by comparing the calculated Q- And an element analysis unit.

한편, 상기 검출부는, 상기 측정기기에 의해 측정되는 복수개의 물질 및 유체의 상태에 관한 복수개의 항목 중 2개의 항목을 선택하고, 선택된 2개 항목에 관한 상기 데이터 베이스부 내에 저장된 측정 데이터에 대해 T²-통계량을 계산하며, 계산된 T²-통계량을 T²-통계량에 근거한 상한 관리값과 비교하여 상기 데이터 베이스부 내에 저장된 측정 데이터의 특이값을 선별하는 상관관계 분석부를 포함할 수 있다.On the other hand, the detection unit may select two items out of a plurality of items related to the states of the plurality of substances and fluids measured by the measuring instrument, and determine T < 2 > for the measurement data stored in the database unit for the two selected items A correlation analyzer for calculating a statistic and comparing the calculated T < 2 > -statistic with an upper-limit management value based on the T < 2 > -statement to select a specific value of the measurement data stored in the database.

또한, 상기 검출부는, 상기 측정기기에 의해 측정되는 복수개의 물질 및 유체의 상태에 관한 복수개의 항목 중 한 개의 항목을 선택하고, 선택된 항목에 관한 상기 데이터 베이스부 내에 저장된 측정 데이터에 대해 상기 판정된 가동 상태 별로 평균 및 표준편차를 계산하며, 계산된 평균 및 계산된 표준편차(σ)에 따른 미리 설정된 신뢰수준의 관리한계값과 비교하여 상기 데이터 베이스부 내에 저장된 측정 데이터의 특이값을 선별하는 단일변수 분석부를 포함할 수 있다.The detecting unit may be configured to select one item among a plurality of items relating to the states of the plurality of substances and fluids measured by the measuring device and to determine the one or more items Calculating a mean and a standard deviation for each operation state, comparing the calculated average and the standard deviation with a control threshold value of a predetermined confidence level according to the calculated average and the calculated standard deviation (?), And selecting a single value of the measurement data stored in the database unit And a variable analyzing unit.

한편, 상기 검출부는, 상기 측정 데이터에 대해 상기 측정기기의 측정 시 적용되는 보정 값에 기반하여 보정 전 데이터를 산출하고, 상기 측정 데이터와 상기 보정 전 데이터의 차이를 미리 설정된 값과 비교하여 상기 측정 데이터의 특이값을 선별하는 보정 환산 분석부를 포함할 수 있다.On the other hand, the detection unit may calculate the pre-correction data on the basis of the correction value applied to the measurement data when the measurement apparatus is measured, compare the difference between the measurement data and the pre- And a correction conversion analysis unit for selecting a specific value of the data.

또한, 상기 검출부는, 상기 패턴 분석부에 의해 정의된 가동 상태 별 측정 데이터의 분포 범위를 제1 룰셋으로 설정하고, 상기 제1 룰셋과 상기 통신부를 통하여 입력된 측정값 신호를 비교하여 상기 복수개의 측정기기가 설치된 시설의 가동 상태를 판정하는 룰셋 관리부를 더 포함할 수 있다.The detection unit may set the distribution range of measurement data for each operating state defined by the pattern analysis unit as a first rule set and compare the measured value signal inputted through the communication unit with the first rule set, And a rule set management unit for determining the operation status of the facility in which the measuring instrument is installed.

한편, 상기 검출부는, 상기 주성분 요소 분석부의 상기 T²-통계량에 근거한 상한 관리값을 제2 룰셋으로 설정하고, 상기 제2 룰셋과 상기 통신부를 통하여 입력된 측정값 신호에 대해 계산된 T²-통계량을 비교하여 상기 통신부를 통하여 입력된 측정값 신호의 특이값 여부를 판정하는 룰셋 관리부를 더 포함할 수 있다.On the other hand, the detection unit may set an upper limit management value based on the T < 2 > -statistic amount of the principal component analysis unit as a second rule set, and calculate a T & And a rule set manager for comparing the measured value signal inputted through the communication unit with a specific value of the measured value signal.

또한, 상기 검출부는, 상기 주성분 요소 분석부의 상기 Q-통계량에 근거한 상한 관리값을 제3 룰셋으로 설정하고, 상기 제3 룰셋과 상기 통신부를 통하여 입력된 측정값 신호에 대해 계산된 Q-통계량을 비교하여 상기 통신부를 통하여 입력된 측정값 신호의 특이값 여부를 판정하는 룰셋 관리부를 더 포함할 수 있다.The detection unit may set the upper limit management value based on the Q-statistic amount of the principal component analysis unit as a third rule set, and calculate a Q-statistic amount calculated for the measured value signal input through the third rule set and the communication unit as And a rule set manager for comparing the measured value signal inputted through the communication unit with a specific value of the measured value signal.

한편, 상기 검출부는, 상기 상관관계 분석부의 상기 T²-통계량에 근거한 상한 관리값을 제4 룰셋으로 설정하고, 상기 제4 룰셋과 상기 통신부를 통하여 입력된 측정값 신호에 대해 계산된 T²-통계량을 비교하여 상기 통신부를 통하여 입력된 측정값 신호의 특이값 여부를 판정하는 룰셋 관리부를 더 포함할 수 있다.On the other hand, the detection unit sets the upper limit management value based on the T < 2 > -state of the correlation analyzer as a fourth rule, and calculates a T < 2 > -statement amount calculated for the measured value signal input through the fourth rule- And a rule set manager for comparing the measured value signal inputted through the communication unit with a specific value of the measured value signal.

또한, 상기 검출부는, 상기 단일변수 분석부의 상기 관리한계값을 제5 룰셋으로 설정하고, 상기 제5 룰셋과 상기 통신부를 통하여 입력된 측정값 신호의 값을 비교하여 상기 통신부를 통하여 입력된 측정값 신호의 특이값 여부를 판정하는 룰셋 관리부를 더 포함할 수 있다.The detecting unit may set the management limit value of the single variable analyzing unit as the fifth rule and compare the value of the measured value signal input through the communication unit with the fifth rule set, And a rule set manager for determining whether the signal has a singular value.

상기 목적을 달성하기 위한 본 발명의 이상 자료 검출 방법은, 유체에 포함된 복수개의 물질 및 유체의 상태를 측정하여 측정값 신호를 출력하는 복수개의 측정기기로부터 상기 측정값 신호를 수집하여 분석하는 이상 자료 검출 방법에 있어서, 상기 복수개의 측정기기로부터 상기 측정값 신호를 수신하는 제1 단계; 상기 제1 단계에서 수신한 상기 측정값 신호를 측정 데이터로서 저장하는 제2 단계; 이상 자료 검출 대상 및 검출 기준을 입력받는 제3 단계; 및 상기 제3 단계를 통하여 입력된 상기 이상 자료 검출 대상 및 상기 검출 기준을 이용하여 상기 제2 단계에서 저장된 측정 데이터를 분석하며, 분석한 결과에 따라 상기 복수개의 측정기기가 설치된 시설의 가동 상태를 판정하고, 판정된 가동 상태 별로 특이값을 선별하며, 판정 결과 및 선별 결과를 표시하는 제4 단계를 포함한다.In order to achieve the above object, the abnormal data detection method of the present invention comprises the steps of: collecting and analyzing the measurement value signals from a plurality of measurement devices for measuring a state of a plurality of substances contained in a fluid and a fluid, A data detection method, comprising: a first step of receiving a measurement value signal from a plurality of measurement devices; A second step of storing the measured value signal received in the first step as measurement data; A third step of receiving an abnormal data detection object and a detection standard; And analyzing the measurement data stored in the second step using the abnormal data detection object and the detection criterion input through the third step, and determining the operation state of the facility equipped with the plurality of measuring instruments according to the analyzed result And a fourth step of selecting a specific value according to the determined operation state, and displaying the determination result and the selection result.

한편, 상기와 같은 목적을 달성하기 위하여 본 발명의 또 다른 실시예는, 이상 자료 검출 방법을 기록한 컴퓨터 판독 가능한 기록매체를 제공할 수 있다.According to another aspect of the present invention, there is provided a computer-readable recording medium recording an abnormal data detection method.

본 발명은, 오염물질 배출 시설로부터 수집된 자료에 대하여 이상 자료 자동 선별용 통계적 필터링 방식을 적용함으로써, 담당 인력 간 확정 자료 품질의 편차 발생을 방지하고, 이를 통하여 확정 자료의 신뢰성을 향상시키는 효과가 있다.The present invention applies a statistical filtering method for automatic sorting of abnormal data on data collected from a pollutant discharge facility to prevent deviation in the quality of confirmed data among personnel in charge and to improve the reliability of confirmed data have.

또한, 본 발명은, 수집 자료의 분표 형태에 따라 오염물질 배출 시설의 가동 상태를 파악하고, 파악된 가동 상태에 맞는 전체 자료 간의 상관관계, 한 쌍의 변수 간의 상관관계, 단일 변수의 분포 형태, 온도, 습도 및 산소에 따른 자료보정 상태를 종합적으로 반영한 다단계 필터링 동작을 통하여 보다 정교한 이상 자료를 검출할 수 있는 효과가 있다.The present invention also relates to a method and apparatus for detecting pollutant discharge facility operation status according to the form of the collected data and determining a correlation between all data corresponding to the detected operation status, a correlation between a pair of variables, It is possible to detect more sophisticated abnormal data through multilevel filtering operation which comprehensively reflects data correction according to temperature, humidity and oxygen.

한편, 본 발명은, 가동 상태에 따른 다단계 필터링 동작을 반영한 룰셋(Ruleset)을 도입함으로써, 1,500개 이상의 오염물질 배출시설에서 수집되는 대용량의 자료를 실시간으로 처리할 수 있는 효과가 있다.Meanwhile, the present invention has the effect of processing a large amount of data collected in 1,500 or more pollutant discharge facilities in real time by introducing Ruleset reflecting the multistage filtering operation according to the operating state.

도 1은 본 발명의 일 실시예에 따른 이상 자료 검출 시스템을 도시한 도면이다.
도 2는 본 발명의 일 실시예에 따른 이상 자료 검출 시스템의 사용자 인터페이스부가 제공하는 사용자 인터페이스 화면의 일례를 도시한 도면이다.
도 3은 본 발명의 일 실시예에 따른 이상 자료 검출 시스템의 검출부를 구체적으로 도시한 도면이다.
도 4a 내지 도 4d는 본 발명의 일 실시예에 따른 이상 자료 검출 시스템의 패턴 분석부의 동작을 설명하는 그래프이다.
도 5a 및 도 5b는 본 발명의 일 실시예에 따른 이상 자료 검출 시스템의 상관관계 분석부의 동작을 설명하는 그래프이다.
도 6은 본 발명의 일 실시예에 따른 이상 자료 검출 시스템의 사용자 인터페이스부가 제공하는 룰셋 관리부 제어를 위한 화면의 일례를 도시한 도면이다.
도 7은 본 발명의 일 실시예에 따른 이상 자료 검출 방법을 도시한 동작 흐름도이다.
도 8a 내지 도 8c는 본 발명의 일 실시예에 따른 이상 자료 검출 시스템의 사용자 인터페이스부가 제공하는 그래프를 도시한 도면이다.
도 9a 및 도 9b는 본 발명의 일 실시예에 따른 이상 자료 검출 시스템의 사용자 인터페이스부가 제공하는 다른 그래프를 도시한 도면이다.
도 10a 내지 도 10f는 본 발명의 일 실시예에 따른 이상 자료 검출 시스템의 가동 상태 분류 동작을 나타낸 그래프이다.1 is a diagram illustrating an abnormal data detection system according to an embodiment of the present invention.
2 is a view showing an example of a user interface screen provided by a user interface unit of the abnormal data detection system according to an embodiment of the present invention.
FIG. 3 is a diagram specifically illustrating a detection unit of the abnormal data detection system according to an embodiment of the present invention.
4A to 4D are graphs illustrating the operation of the pattern analyzer of the abnormal data detection system according to an embodiment of the present invention.
5A and 5B are graphs illustrating the operation of the correlation analyzer of the abnormal data detection system according to an embodiment of the present invention.
6 is a diagram illustrating an example of a screen for controlling the ruleset management unit provided by the user interface unit of the abnormal data detection system according to an embodiment of the present invention.
7 is a flowchart illustrating an abnormal data detection method according to an embodiment of the present invention.
8A to 8C are graphs illustrating a user interface unit of the abnormal data detection system according to an embodiment of the present invention.
9A and 9B are diagrams illustrating other graphs provided by the user interface unit of the abnormal data detection system according to the embodiment of the present invention.
10A to 10F are graphs showing the operation state classification operation of the abnormal data detection system according to an embodiment of the present invention.

개시된 기술에 관한 설명은 구조적 내지 기능적 설명을 위한 실시예에 불과하므로, 개시된 기술의 권리범위는 본문에 설명된 실시예에 의하여 제한되는 것으로 해석되어서는 아니 된다. 즉, 실시예는 다양한 변경이 가능하고 여러 가지 형태를 가질 수 있으므로 개시된 기술의 권리범위는 기술적 사상을 실현할 수 있는 균등물들을 포함하는 것으로 이해되어야 한다. The description of the disclosed technique is merely an example for structural or functional explanation and the scope of the disclosed technology should not be construed as being limited by the embodiments described in the text. That is, the embodiments are to be construed as being variously embodied and having various forms, so that the scope of the disclosed technology should be understood to include equivalents capable of realizing technical ideas.

한편, 본 출원에서 서술되는 용어의 의미는 다음과 같이 이해되어야 할 것이다.Meanwhile, the meaning of the terms described in the present application should be understood as follows.

“제1”, “제2” 등의 용어는 하나의 구성요소를 다른 구성요소로부터 구별하기 위한 것으로 이들 용어들에 의해 권리범위가 한정되어서는 아니 된다. 예를 들어, 제1 구성요소는 제2 구성요소로 명명될 수 있고, 유사하게 제2 구성요소도 제1 구성요소로 명명될 수 있다.The terms " first ", " second ", and the like are used to distinguish one element from another and should not be limited by these terms. For example, the first component may be referred to as a second component, and similarly, the second component may also be referred to as a first component.

어떤 구성요소가 다른 구성요소에 "연결되어" 있다고 언급된 때에는, 그 다른 구성요소에 직접적으로 연결될 수도 있지만, 중간에 다른 구성요소가 존재할 수도 있다고 이해되어야 할 것이다. 반면에, 어떤 구성요소가 다른 구성요소에 "직접 연결되어" 있다고 언급된 때에는, 중간에 다른 구성요소가 존재하지 않는 것으로 이해되어야 할 것이다. 한편, 구성요소들 간의 관계를 설명하는 다른 표현들, 즉 "~사이에"와 "바로 ~사이에" 또는 "~에 이웃하는"과 "~에 직접 이웃하는" 등도 마찬가지로 해석되어야 한다.It is to be understood that when an element is referred to as being "connected" to another element, it may be directly connected to the other element, but there may be other elements in between. On the other hand, when an element is referred to as being "directly connected" to another element, it should be understood that there are no other elements in between. On the other hand, other expressions that describe the relationship between components, such as "between" and "between" or "neighboring to" and "directly adjacent to" should be interpreted as well.

단수의 표현은 문맥상 명백하게 다르게 뜻하지 않는 한 복수의 표현을 포함하는 것으로 이해되어야 하고, "포함하다" 또는 "가지다" 등의 용어는 실시된 특징, 숫자, 단계, 동작, 구성요소, 부분품 또는 이들을 조합한 것이 존재함을 지정하려는 것이지, 하나 또는 그 이상의 다른 특징들이나 숫자, 단계, 동작, 구성요소, 부분품 또는 이들을 조합한 것들의 존재 또는 부가 가능성을 미리 배제하지 않는 것으로 이해되어야 한다.It is to be understood that the singular " include " or "have" are to be construed as including the stated feature, number, step, operation, It is to be understood that the combination is intended to specify that it is present and not to preclude the presence or addition of one or more other features, numbers, steps, operations, components, parts or combinations thereof.

각 단계들은 문맥상 명백하게 특정 순서를 기재하지 않은 이상 명기된 순서와 다르게 일어날 수 있다. 즉, 각 단계들은 명기된 순서와 동일하게 일어날 수도 있고 실질적으로 동시에 수행될 수도 있으며 반대의 순서대로 수행될 수도 있다.Each step may take place differently from the stated order unless explicitly stated in a specific order in the context. That is, each step may occur in the same order as described, may be performed substantially concurrently, or may be performed in reverse order.

여기서 사용되는 모든 용어들은 다르게 정의되지 않는 한, 개시된 기술이 속하는 분야에서 통상의 지식을 가진 자에 의해 일반적으로 이해되는 것과 동일한 의미를 가진다. 일반적으로 사용되는 사전에 정의되어 용어들은 관련 기술의 문맥상 가지는 의미와 일치하는 것으로 해석되어야 하며, 본 출원에서 명백하게 정의하지 않는 한 이상적이거나 과도하게 형식적인 의미를 지니는 것으로 해석될 수 없다.All terms used herein have the same meaning as commonly understood by one of ordinary skill in the art to which the disclosed technology belongs, unless otherwise defined. Terms defined in commonly used dictionaries should be interpreted to be consistent with meaning in the context of the relevant art and can not be construed as having ideal or overly formal meaning unless expressly defined in the present application.

도 1 및 도 3은 본 발명의 일 실시예에 따른 이상 자료 검출 시스템을 도시한 도면이고, 도 2는 본 발명의 일 실시예에 따른 이상 자료 검출 시스템이 제공하는 사용자 인터페이스 화면의 일례를 도시한 도면으로, 본 발명의 일 실시예에 따른 이상 자료 검출 시스템은, 통신부(100), 데이터 베이스부(200), 사용자 인터페이스부(300) 및 검출부(400)를 포함한다.FIG. 1 and FIG. 3 are diagrams showing an abnormal data detection system according to an embodiment of the present invention, and FIG. 2 is an example of a user interface screen provided by the abnormal data detection system according to an embodiment of the present invention The abnormal data detection system according to an embodiment of the present invention includes a communication unit 100, a database unit 200, a user interface unit 300, and a detection unit 400.

통신부(100)는, 복수개의 측정기기(10)로부터 측정값 신호를 수신하고, 수신된 신호를 데이터 베이스부(200) 또는 검출부(400)로 출력하는 역할을 한다. 여기서, 통신부(100)는, 유선 또는 무선 방식의 통신일 수 있으나, 무선 방식의 통신을 이용하는 경우에는, 복수개의 측정기기(10) 내부에 무선 통신 모듈, 예를 들면, CDMA(Code Division Multiple Access) 모듈을 구비하도록 하고 통신부(100)로 직접 정보를 수신하거나, 무선 통신 모듈을 구비한 단말기(도시되지 않음)가 개별 계량 장비(10)의 계량값을 읽어들이도록 하여 이와 같이 수집한 정보를 수신할 수 있다.The communication unit 100 receives measurement value signals from a plurality of measurement devices 10 and outputs the received signals to the data base unit 200 or the detection unit 400. [ Here, the communication unit 100 may be a wired or wireless communication. However, when using wireless communication, a wireless communication module, for example, a Code Division Multiple Access (CDMA) ) Module and receives information directly from the communication unit 100 or allows a terminal (not shown) having a wireless communication module to read the weighing values of the individual weighing equipment 10, .

복수개의 측정기기(10)는, 유체에 포함된 복수개의 물질 및 유체의 상태를 측정하되, 오염물질 배출 시설, 예를 들면, 정유 공장의 굴뚝에 각각 설치되어 굴뚝에서 배출되는 다양한 오염물질과 산소, 유량 및 온도 등의 보정항목을 실시간으로 측정하고 측정된 측정값에 대한 신호를 통신부(100)로 전송할 수 있다. 이때, 복수개의 측정기기(10)에서 농도를 측정하는 오염물질은 먼지, 질소산화물, 황산화물, 염화수소, 불화수소, 암모니아, 일산화탄소와 같은 7개 오염물질을 포함할 수 있다.The plurality of measuring instruments 10 measures the state of a plurality of substances contained in the fluid and the state of the fluid and is installed in a pollutant discharge facility, for example, a chimney of an oil refinery, , Flow rate, temperature, etc., in real time, and transmits a signal for the measured value to the communication unit 100. [ At this time, the pollutants measuring the concentration in the plurality of measuring instruments 10 may include seven pollutants such as dust, nitrogen oxides, sulfur oxides, hydrogen chloride, hydrogen fluoride, ammonia, and carbon monoxide.

또한, 복수개의 측정기기(10)는, 상술한 오염물질의 농도를 측정한 값 외에도 보정 등을 위한 전송항목으로 배출가스 유량, 배출가스 온도, 연소실 내부 온도, 산소 농도, 습도 등을 측정값에 대한 신호에 더 포함할 수 있다.In addition to the measured values of the concentrations of the above-mentioned pollutants, the plurality of measuring instruments 10 may further include, as transmission items for the correction, the exhaust gas flow rate, the exhaust gas temperature, the combustion chamber internal temperature, the oxygen concentration, For example.

데이터 베이스부(200)는, 통신부(100)를 통하여 수신한 측정값 신호를 측정 데이터로서 저장하고, 검출부(400)의 제어에 따라 저장된 측정 데이터를 검출부(400)로 출력한다. 이때, 데이터 베이스부(200)는, 복수개의 측정기기(10)로부터 수집한 수년간의 데이터를 저장할 수 있다. 이때, 데이터 베이스부(200)는, 복수개의 측정기기(10)에서 측정하는 값의 종류에 따라 7개의 오염물질 측정항목(먼지, 질소산화물, 황산화물, 염화수소, 불화수소, 암모니아, 일산화탄소의 각 농도)과 5개의 전송항목(배출가스 유량, 배출가스 온도, 연소실 내부 온도, 산소 농도, 습도) 별로 구분하여 측정 데이터로서 저장할 수 있다.The database unit 200 stores the measurement value signal received through the communication unit 100 as measurement data and outputs the measurement data stored under the control of the detection unit 400 to the detection unit 400. At this time, the database unit 200 can store data collected from a plurality of measuring instruments 10 for several years. At this time, the database unit 200 stores seven pollutant measurement items (dust, nitrogen oxides, sulfur oxides, hydrogen chloride, hydrogen fluoride, ammonia, and carbon monoxide) according to the types of values measured by the plurality of measurement devices 10. [ Concentration) and 5 transmission items (exhaust gas flow rate, exhaust gas temperature, combustion chamber internal temperature, oxygen concentration, humidity).

사용자 인터페이스부(300)는, 도 2에 도시된 바와 같이 시스템 사용자가 이상 자료 검출 대상 및 검출 기준을 선택할 수 있도록 인터페이스를 제공하고, 검출부(400)에 의해 산출된 검출 결과를 표시하는 인터페이스를 제공한다.2, the user interface unit 300 provides an interface for the system user to select the abnormal data detection object and detection standard, and provides an interface for displaying the detection result calculated by the detection unit 400 do.

사용자 인터페이스부(300)는, 시스템 사용자가 이상 자료 검출 대상 및 검출 기준을 선택할 수 있도록 다양한 선택 인터페이스를 제공할 뿐 아니라, 시스템 사용자가 검출부(400)에 의해 산출된 이상 자료를 쉽게 인식할 수 있도록 다양한 방식으로 검출 결과를 표시할 수 있는데, 도 2에 도시된 바를 참조하여 상세히 설명하면 다음과 같다.The user interface unit 300 not only provides various selection interfaces for allowing a system user to select an abnormal data detection object and detection standard but also allows the system user to easily recognize the abnormal data calculated by the detection unit 400 The detection result can be displayed in various ways, which will be described in detail with reference to FIG. 2.

먼저, 사용자 인터페이스부(300)는, 이상 자료 검출 대상 선택에 있어서, 형태 선택 항목(311, select classification type)을 통하여 사업장 별 정렬 방식(Sorting by factory ID) 또는 배출시설 분류체계 별 정렬 방식(Sorting by classification)을 제공할 수 있다. 아울러, 사용자 인터페이스부(300)는, 이상 자료 검출 대상 선택에 있어서, 직접 사업장 명칭을 입력하여 검색하는 방식(312, search factory)도 제공할 수 있다. 사용자가 도 2에 도시된 바와 같이 사업장 별 정렬 방식을 선택한 경우, 사용자 인터페이스부(300)는, 사업장의 리스트(313)를 제공하여 사용자가 대상 사업장, 예를 들면 GS칼텍스(주)여수공장을 선택할 수 있도록 한다. 또한, 사용자 인터페이스부(300)는, 대상 배출구, 예를 들면 굴뚝의 리스트(314)를 제공하여 사용자가 대상 배출구, 예를 들면 GS칼텍스(주)여수공장의 보일러 및 가열 시설에 설치된 1번 굴뚝을 선택할 수 있도록 한다. 한편, 사용자 인터페이스부(300)는, 선택된 대상 배출구의 상세 정보 및 자료 보정을 위한 표준 산소 농도(standardOxygen), 표준 수분 함량(standardMoisture) 등의 배출구별 정보(315)를 제공할 수 있다.First, the user interface unit 300 selects sorting by factory ID or sorting method according to the emission facility classification system through the type selection item 311, by classification. In addition, the user interface unit 300 may also provide a method (312) of searching for a business entity name by directly inputting a business entity name in the selection of an ideal data detection object. 2, the user interface unit 300 provides a list 313 of businesses to allow the user to select a target business site, for example, a GS Caltex Yeosu factory To be selected. The user interface unit 300 may also provide a list 314 of target outlets, for example, a chimney so that the user can select a target outlet, for example, a boiler in the Yeosu plant of GS Caltex Co., . Meanwhile, the user interface unit 300 may provide emission discrimination information 315 such as standard oxygen concentration and standard moisture concentration for correcting the detailed information of the selected target outlet and data.

또한, 사용자 인터페이스부(300)는, 이상 자료 검출 기준 선택에 있어서, 가동 상태에 따른 다단계 필터링 동작을 반영한 룰셋을 설정(320, Outlier Detection), 관리(330, Ruleset Management) 및 평가(340, Ruleset Evaluation)하기 위한 다양한 인터페이스를 제공할 수 있으나 이에 한정되지 않는다. 예를 들면, 사용자 인터페이스부(300)는, 룰셋 설정을 위하여 데이터 베이스부(200)로부터 불러들일 측정 데이터의 수집 기간을 선택하는 항목(321)을 제공할 수 있다.In addition, the user interface unit 300 sets (320, Outlier Detect), rules (330, Ruleset Management) and evaluates (340, Ruleset) a rule set reflecting the multi- Evaluation), but is not limited thereto. For example, the user interface unit 300 may provide an item 321 for selecting a collection period of measurement data to be fetched from the database unit 200 for rule set setting.

검출부(400)는, 사용자 인터페이스부(300)를 통하여 선택된 이상 자료 검출 대상 및 검출 기준을 이용하여 데이터 베이스부(200) 내에 저장된 측정 데이터를 분석하며, 분석한 결과에 따라 복수개의 측정기기(10)가 설치된 시설의 가동 상태를 판정하고, 판정된 가동 상태 별로 특이값을 선별하며, 판정 결과 및 선별 결과를 검출 결과로서 사용자 인터페이스부(300)를 통하여 표시한다. 여기서, 이상 자료 또는 특이값은, 동일한 가동 상태에서 일반적인 배출 농도 등의 범위 특성에서 벗어난 형태의 자료를 의미할 수 있다.The detection unit 400 analyzes the measurement data stored in the database unit 200 using the selected ideal data detection object and detection reference through the user interface unit 300, And displays the determination result and the selection result through the user interface unit 300 as the detection result. Here, anomalous data or singular values may mean data in a form that deviates from the range characteristics such as general emission concentration in the same operating state.

이때, 검출부(400)는, 도 3에 도시된 바와 같이 패턴 분석부(410), 주성분 요소 분석부(420), 상관관계 분석부(430), 단일변수 분석부(440), 보정 환산 분석부(450) 및 룰셋 관리부(460)를 포함할 수 있다.3, the detecting unit 400 includes a pattern analyzing unit 410, a principal component analyzing unit 420, a correlation analyzing unit 430, a single variable analyzing unit 440, (450) and rule set management unit (460).

패턴 분석부(410)는, 데이터 베이스부(200) 내에 저장된 측정 데이터에 대하여 패턴 분석을 수행하고, 복수개의 측정기기(10)가 설치된 시설의 가동 상태 별 측정 데이터의 분포 범위를 구성하며, 통신부(100)를 통하여 실시간으로 입력된 측정값 신호를 구성된 분포 범위와 비교하여 복수개의 측정기기(10)가 설치된 시설의 가동 상태를 판별한다. 이때, 사용자 인터페이스부(300)는, 패턴 분석 항목(322)을 제공함으로써 패턴 분석부(410)의 데이터 베이스부(200) 내에 저장된 측정 데이터에 대한 패턴 분석 수행 시 시간 윈도우(Time Window) 및 패턴 개수(Max Pattern Number)를 설정할 수 있도록 한다. 또한, 패턴 분석은, 오염물질 배출 시설의 운전 형태에 따라 오염물질 배출 농도 등 특성을 정의하고 유사한 배출 농도 범위를 보이는 시간대를 군집화한 것을 의미할 수 있다.The pattern analyzing unit 410 performs pattern analysis on the measurement data stored in the database unit 200 and configures a distribution range of measurement data for each operation state of a facility in which a plurality of measurement devices 10 are installed, The measured value signal inputted in real time through the controller 100 is compared with the configured distribution range to determine the operation state of the facility in which the plurality of measuring instruments 10 is installed. At this time, the user interface unit 300 may provide a pattern analysis item 322 so that a time window and a pattern when analyzing a pattern of measurement data stored in the database unit 200 of the pattern analysis unit 410 (Max Pattern Number) can be set. In addition, the pattern analysis can be defined as defining pollutant discharge concentration characteristics according to the operation mode of the pollutant discharge facility and clustering time zones showing similar emission concentration ranges.

예를 들면, 패턴 분석부(410)는, 측정 데이터 중 산소 농도(도 4a에 도시됨), 유량(도 4b에 도시됨), 질소산화물 농도(도 4c에 도시됨), 배출 기체 온도(도 4d에 도시됨)의 4가지 변수를 이용하여 하기 표 1과 같은 4가지 가동 상태를 나타내는 패턴을 정의할 수 있으나 이에 한정되지 않으며, 가동 중단, 가동 준비, 감산 운전, 방지 시설 이상 등 다양한 패턴을 정의할 수 있다.4B), nitrogen oxide concentration (shown in FIG. 4C), exhaust gas temperature (also shown in FIG. 4B), and oxygen concentration 4d), it is possible to define four patterns of operation states as shown in Table 1 below, but the present invention is not limited to this, and various patterns such as operation interruption, preparation for operation, subtraction operation, Can be defined.

패턴 정의Pattern Definition 측정 데이터 특성Measurement data characteristics 표준 연소Standard combustion 중간 유량, 중간 산소 농도, 중간 온도(배출 기체)Intermediate flow rate, medium oxygen concentration, intermediate temperature (exhaust gas) 공기 공급 과잉Air supply excess 저 유량, 저온(배출 기체), 질소산화물 고농도Low flow rate, low temperature (exhaust gas), high concentration of nitrogen oxide 고온 연소High temperature combustion 고 유량, 고온(배출 기체), 산소 중저농도High flow rate, high temperature (exhaust gas), low oxygen concentration 가동 중지Downtime 유량 없음 등No flow

여기서, 표 1에 사용된 측정 데이터는 고형 연료를 사용하는 연소 시설로부터 1개월 간 수집된 자료이며, 사용자 인터페이스부(300)를 통하여 도 4a 내지 도 4d에 도시된 바와 같이 각 가동 상태 별로 상이한 색상, 즉, 표준 연소는 빨강색으로, 공기 공급 과잉은 파랑색으로, 고온 연소는 연두색으로, 가동 중지는 노랑색으로 표시할 수 있다.Here, the measurement data used in Table 1 are data collected from the combustion facility using solid fuel for one month, and through the user interface unit 300, as shown in FIGS. 4A to 4D, , That is, the standard combustion is red, the excess air supply is blue, the high temperature combustion is green, and the shutdown is yellow.

즉, 시설의 가동 상태에 따라 오염물질의 농도 분포, 온도 분포, 산소 농도의 분포, 유량의 분포 및 각 측정값 사이의 상관관계가 서로 상이하게 나타나므로, 이들 측정 데이터의 분포 및 상호 연관관계를 이용하여 가동 상태를 구분하는 특성의 범위를 정의할 수 있다. 도 4a 내지 도 4d에 도시된 바에 의하면, 시설의 가동 상태는 단일 측정값의 분포 특성에 따라 구분되는 것이 아니라 복수개의 측정값의 특성 및 상관관계에 따라 구분되는 것을 확인할 수 있다.That is, since the correlation between the concentration distribution of pollutants, the distribution of temperature, the distribution of oxygen concentration, the distribution of flow rate, and the measured values differs according to the operation state of the facility, the distribution and interrelation of these measurement data It is possible to define the range of the characteristic that distinguishes the operating state. 4A to 4D, it can be seen that the operation state of the facility is not classified according to the distribution characteristic of a single measurement value but is divided according to the characteristics and correlation of a plurality of measurement values.

여기서, 가동 상태를 구분하기 위하여 군집 분석, 예를 들면, K평균++(KMeanPlusPlus) 클러스터링, 퍼지-K평균(Fussy-KMean) 클러스터링, DBSCAN(Density-Based Spatial Clustering of Applications with Noise)이 사용될 수 있으며, 특히, 구분된 패턴 간의 비유사도를 최대화하기 위하여 K평균++ 클러스터링 방식을 사용하는 것이 바람직하다. 또한, 비유사도 측정을 위한 거리측정 방법론으로 변수의 분포형태를 계량화할 수 있는 어스 무버(Earth Mover) 거리 측정 방식을 사용할 수 있으나 이에 한정되지 않는다.In order to distinguish the operation state, cluster analysis, for example, K average ++ (KMeanPlusPlus) clustering, Fuzzy-K mean clustering and DBSCAN (Density-Based Spatial Clustering of Applications with Noise) In particular, it is preferable to use a K average ++ clustering method in order to maximize the degree of similarity between divided patterns. In addition, an earth mover distance measurement method capable of quantifying a distribution form of a variable by a distance measurement method for non-inference measurement may be used, but is not limited thereto.

패턴 분석부(410)가 패턴 분석을 수행하는 방법 중 일례에 대하여 상술하면 다음과 같다.An example of a method of performing pattern analysis by the pattern analysis unit 410 will be described in detail as follows.

먼저, 패턴 분석부(410)는, 패턴 분석을 위하여 산소 농도, 유량, 질소산화물 농도, 배출 기체 온도 등 각 측정 항목 별 통계값, 즉, 평균, 표준 편차, 왜도 등을 계산한다. 이후에, 패턴 분석부(410)는, 각 측정 항목 간의 단위가 상이함에 따라 큰 값을 가지는 항목의 측정 데이터에 클러스터링이 좌우되는 것을 방지하기 위하여 측정 데이터의 항목 별 정규화(Normalize)를 수행한다. 다음에, 패턴 분석부(410)는, 의도적으로 좀 더 큰 비중을 두고자 하는 항목, 예를 들면, 산소 농도와 같은 항목이 있는 경우, 군집화 전에 측정 데이터를 거리가중 함수로 처리하기 위한 가중 계수를 설정할 수 있다. 다음에, 패턴 분석부(410)는, 데이터 베이스부(200) 내 측정 데이터에 대하여 가동 상태의 개수(Max Pattern Number)(도 2에 도시된 패턴 분석 항목(322) 참조), 예를 들면, 가동 중단, 가동 준비, 감산 운전, 방지 시설 이상의 4개 집합으로 분할하는 K평균++ 클러스터링 분석을 실행하고, K평균++ 클러스터링 분석 실행 결과에 따라 도출된 복수개의 집합의 분포 범위를 가동 상태 별 측정 데이터의 분포 범위로 정의할 수 있다. 여기서, K평균 클러스터링은, 통상의 기술자에게 알려진 바와 같이, 주어진 데이터를 k개의 클러스터로 묶는 알고리즘으로, 각 클러스터와 거리 차이의 분산을 최소화하는 방식으로 동작하고, K평균++ 클러스터링은 K평균 클러스터링의 초기값(클러스터링의 중심점)을 최적의 값으로 선택하는 알고리즘으로 K평균++ 클러스터링을 적용함으로써 임의의 초기값 선택에 따른 알고리즘 성능 저하를 방지할 수 있다. 또한, 패턴 분석부(410)는, 측정 데이터로부터 패턴을 추출하는 데 있어서 측정 데이터의 시간 범위를 제어하기 위한 시간 윈도우(Time Window)(도 2에 도시된 패턴 분석 항목(322) 참조)를 사용할 수 있다.First, the pattern analyzing unit 410 calculates statistical values for each measurement item such as oxygen concentration, flow rate, nitrogen oxide concentration, and exhaust gas temperature, that is, average, standard deviation, and degree of distortion for pattern analysis. Thereafter, the pattern analyzing unit 410 performs normalization for each item of measurement data to prevent clustering from being influenced by the measurement data of items having a large value as the units between the measurement items differ. Next, the pattern analyzing unit 410 calculates a weight for processing the measurement data as a weighted distance function before clustering when there is an item such as an oxygen concentration, which is intended to have a larger weight intentionally The coefficient can be set. Next, the pattern analyzing unit 410 analyzes the number of the active states (Max Pattern Number) (see the pattern analysis item 322 shown in FIG. 2) with respect to the measurement data in the database unit 200, The K average ++ clustering analysis is performed by dividing into four sets of downtime, operation preparation, subtraction operation, and prevention facilities, and the distribution range of the plurality of sets derived from the K average ++ clustering analysis execution result It can be defined as the distribution range of measurement data. Here, K average clustering is an algorithm for grouping given data into k clusters, as known to those of ordinary skill in the art, and operates in a manner that minimizes dispersion of the distance difference with each cluster, and K average ++ clustering is a function of K average clustering The algorithm can select the initial value (the center point of the clustering) as the optimal value and apply the K average ++ clustering algorithm to prevent the degradation of the algorithm performance according to the initial value selection. In addition, the pattern analyzing unit 410 may use a time window (see the pattern analysis item 322 shown in FIG. 2) for controlling the time range of the measurement data in extracting the pattern from the measurement data .

또한, 패턴 분석부(410)는, 패턴 분석 결과 각 가동 상태를 정의하기 위한 특정 패턴의 분포가 명확하지 않은 경우, 대상 패턴을 더 세분화하여 상세 패턴을 생성하거나 대상 패턴을 다른 유사 패턴에 통합하는 기능을 제공할 수 있다. 이를 위하여, 패턴 분석부(410)는, 가동 상태의 변동에 따른 과도기 패턴을 정의하는 기능을 제공할 수 있다.If the distribution of the specific pattern for defining each operating state as a result of the pattern analysis is not clear, the pattern analyzing unit 410 may generate a detailed pattern by further subdividing the target pattern or integrate the target pattern into another similar pattern Function can be provided. For this, the pattern analyzer 410 may provide a function of defining a transitional pattern according to the variation of the operating state.

주성분 요소 분석부(420)는, 주성분 요소 분석(Principal Component Analysis, PCA) 방식을 통하여 구축된 모형을 기준으로 T²-통계량 또는 Q-통계량을 이용하여 데이터 베이스부(200) 내에 저장된 측정 데이터 내 특이값, 바람직하게는 가동 상태 별 특이값을 선별한다. 이때, 주성분 요소 분석부(420)의 동작에 대하여 구체적으로 설명하면 다음과 같다.The principal component analyzing unit 420 analyzes a characteristic of the measurement data stored in the database unit 200 using the T < 2 > statistic or the Q-statistic based on a model constructed through a principal component analysis (PCA) Value, preferably a specific value for each operating state. Hereinafter, the operation of the principal component analysis unit 420 will be described in detail.

먼저, 종속 변수가 복수개인 자료 중에서 이상 자료, 즉, 특이값을 검출하기 위해서는, 다변량 통계 분석 방식을 적용해야 하는데, 이와 같은 다변량 통계 분석은, 정상적 운영 조건에서 획득된 자료를 이용하여 모형을 구축하는 제1 단계와 모형을 이용하여 정상적인 과정을 벗어나는 특이값을 식별하는 제2 단계로 구분될 수 있다. 여기서, 제1 단계는 통계적 방법 중 주성분 분석에 의해 수행되고, 제2 단계는 호텔링의 T²-통계량 또는 Q-통계량(Squared Prediction Error, SPE)을 이용하여 수행될 수 있다. 이때, 제2 단계에서는 T²-통계량 또는 Q-통계량 중 어느 하나만을 이용할 수 있으나, 바람직하게는 T²-통계량 및 Q-통계량을 동시에 이용할 수 있다.First, multivariate statistical analysis should be applied in order to detect abnormal data, ie, singular value, among multiple data of dependent variables. Such multivariate statistical analysis can be performed by using the data obtained from normal operating conditions And a second step of identifying a specific value that deviates from a normal process using a model. Here, the first step may be performed by principal component analysis in the statistical method, and the second step may be performed using the T²-statistic or the Q-statistic (SPE) of the hotel ring. In this case, in the second step, only one of the T²-statistic or the Q-statistic can be used, but preferably the T²-statistic and the Q-statistic can be used at the same time.

실시간으로 측정값이 추가되는 데이터에 있어서는 주성분 요소 분석에 의한 모형이 시간에 따라 가변되어야 하는데, 주성분 요소 분석부(420)는, 측정 데이터의 시간 범위를 제어하기 위한 시간 윈도우를 이용하여 가장 과거의 자료를 삭제하고 새로운 시간 윈도우에 의해 생성된 새로운 공분산 함수(Covariance Function)를 통해 주성분 요소 분석을 실시할 수 있다.In the data in which the measurement values are added in real time, the principal component analysis model should be varied over time. The principal component analysis unit 420 uses the time window for controlling the time range of the measurement data, You can delete the data and perform principal component analysis through the new Covariance Function created by the new time window.

먼저, 차수가 n×j인 임의의 자료행렬 X는 차수가 n×k인 스코어 행렬(Score matrix) T 와 차수가 j×k인 로딩 행렬(Loading matrix) P의 선형결합으로 표현될 수 있다. 여기서, n은 전체 측정값의 개수, k는 주성분의 개수, j는 변수의 개수를 의미한다. 첫 번째 주성분(first component)은 자료를 가장 많이 설명해 주는 성분을 의미하며, 두 번째 주성분은 두 번째로 자료를 많이 설명하는 성분을 의미한다. 이와 같이 세 번째, 네 번째 등과 같은 주성분들을 해석할 수 있다. 주성분의 개수인 k는 n 또는 j 둘 중 하나보다 작거나 같은 개수를 가진다. 이때, 자료행렬 X는 하기 수학식과 같이 구성될 수 있다.First, an arbitrary data matrix X with a degree n x j can be expressed as a linear combination of a scoring matrix T with order n × k and a loading matrix P with order j × k. Where n is the total number of measurements, k is the number of principal components, and j is the number of variables. The first component is the component that describes the data the most and the second component is the component that explains the data the second. Thus, we can interpret the main components such as the third and fourth. The number k of principal components has a number less than or equal to one of n or j. At this time, the data matrix X can be constructed as shown in the following equation.

여기서, E는 차수가 n×j인 잔차(Residual) 행렬을 의미한다.Here, E denotes a residual matrix having a degree n x j.

데이터 베이스부(200) 내에 저장된 측정 데이터에 대하여 계산된 로딩 행렬이 P라고 하면, 통신부(100)를 통하여 수신한 새로운 측정값 신호에 대한 스코어 값(

)은 하기 수학식에 의해 계산될 수 있다.If the loading matrix calculated for the measurement data stored in the database unit 200 is P, a score value (?) For a new measurement value signal received through the communication unit 100

) Can be calculated by the following equation.

여기서, 소문자는 벡터, 대문자는 행렬을 의미하며, t는 통신부(100)를 통하여 수신한 새로운 측정값 신호가 반영된 벡터화 데이터를 의미한다.Here, the lower case means a vector, the upper case means a matrix, and t means vectorized data in which a new measured value signal received through the communication unit 100 is reflected.

호텔링의 T²-통계량은, 과거 측정 데이터와 새로운 측정 데이터 간에 얼마나 비슷한지 여부를 의미하며, 하기 수학식과 같은 T²-통계량, 즉, 마하라노비스 거리(Mahalanobis Distance)를 계산하는 방식으로 산출된다.The T < 2 > -statistics of a hotel ring means how similar the old measurement data is to the new measurement data and is calculated by calculating the T < 2 > statistic, i.e. Mahalanobis distance,

여기서, S는 과거 측정 데이터에 대한 주성분 요소 분석의 스코어 값에 대한 공분산 행렬, 즉, 측정 데이터 중 선택된 데이터의 고유값(eigenvalue)의 대각 행렬을 의미하며, x는 새로운 측정 데이터를 위한 평균으로 중심화된 예측 스코어 벡터값을 의미한다.Here, S denotes a diagonal matrix of a covariance matrix for the score value of the principal component analysis for past measurement data, i.e., a diagonal matrix of eigenvalues of selected data among measurement data, and x denotes a center of gravity Quot; means a predicted score vector value.

또한, 주성분 요소 분석부(420)는, 데이터 베이스부(200) 내에 저장된 과거 측정 데이터 대비 새로운 측정 데이터의 특이값을 선별하기 위해서, 관리 상한값(Upper Control Limit)을 하기 수학식에 의하여 산정할 수 있다.In addition, the principal component analysis unit 420 can calculate the upper management limit value according to the following equation in order to select the specific value of the new measurement data with respect to the past measured data stored in the database unit 200 have.

여기서, n은 측정 데이터 내 측정값의 개수, p는 주성분 요소 분석에서 사용된 주성분의 개수,

는 α분위수에서 자유도가 p와 n-1인 F분포의 임계값(Critical Value)이다. 만약, 주성분 요소 분석부(420)는, 수학식 3에 의해 계산된 Tㅂ값이 수학식 4에 의해 계산된 임계값을 초과하게 되면 특이값으로 판단하게 된다.Where n is the number of measured values in the measurement data, p is the number of principal components used in the principal component analysis,

Is the critical value of the F distribution whose degrees of freedom are p and n-1 in the α-quantile. If the T value calculated by Equation (3) exceeds the threshold value calculated by Equation (4), the principal component analysis unit 420 determines that the T value is an outlier.

Q-통계량은 실제 측정값과 추정값과의 차이를 모니터링하면서 이상 여부를 판단하는 데에 사용되며, 주성분 요소 분석에서 잔차 행렬에 해당하는 부분을 고려하여 하기 수학식에 의해 산출될 수 있다.The Q-statistic is used to determine the abnormality while monitoring the difference between the actual measured value and the estimated value, and can be calculated by the following equation in consideration of the portion corresponding to the residual matrix in the principal component analysis.

여기서, P는 데이터 베이스부(200) 내에 저장된 측정 데이터에 대하여 계산된 로딩 행렬(공분산 행렬)이고, x는 새로운 측정 데이터를 위한 평균으로 중심화된 예측 스코어 벡터값이며, I는 단위 행렬을 의미한다.Here, P is a loading matrix (covariance matrix) calculated for the measurement data stored in the database unit 200, x is a predictive score vector value centered as an average for new measurement data, and I is a unit matrix .

정상 조건 하에서 Q-통계량은 다변량 정규 분포를 따르므로 임계값은 가중된 카이제곱 분포를 통해 추정될 수 있으며, 이에 따른 관리 상한값(

)은 하기 수학식을 이용하여 계산된다.Under normal conditions, the Q-statistics follow a multivariate normal distribution, so the threshold can be estimated through a weighted chi-square distribution,

) Is calculated using the following equation.

여기서,

는 정규분포의 (1-α)분위수에 해당하는 값을 의미하며,

는 하기 수학식에 의해 산출될 수 있다.here,

Means a value corresponding to the (1-α) quantile of the normal distribution,

Can be calculated by the following equation.

또한,

는 하기 수학식에 의해 산출될 수 있다.Also,

Can be calculated by the following equation.

여기서,

는 주성분 요소 분석에서 얻어진 고유값(eigenvalue)을 의미하며, a는 변수의 개수를 의미하고, d는 주성분 분석에서 사용된 주성분의 개수를 의미한다.here,

Denotes the eigenvalue obtained from principal component analysis, a denotes the number of variables, and d denotes the number of principal components used in the principal component analysis.

주성분 요소 분석부(420)는, 수학식 5에 의해 계산된 Q값이 수학식 6에 의해 계산된 임계값을 초과하게 되면 이상 자료, 즉, 특이값으로 판단하게 된다.The principal component analysis unit 420 determines that the Q value calculated by Equation (5) exceeds the threshold value calculated by Equation (6) as an abnormal data, that is, a singular value.

상관관계 분석부(430)는, 측정기기(10)에 의해 측정되는 복수개의 물질 및 유체의 상태에 관한 복수개의 항목 중 2개의 항목을 선택하고, 선택된 항목에 관한 데이터 베이스부(200) 내에 저장된 측정 데이터에 대해 마하라노비스 거리(Mahalanobis Distance) 방법에 의한 상관관계 분석을 수행하며, 분석 결과로 2개의 항목에 관한 측정 데이터의 특이값을 선별한다. 즉, 상관관계 분석부(430)는, 데이터 베이스부(200) 내에 저장된 측정 데이터 중 2개의 항목에 관한 측정 데이터를 두 변수로 하는 상관관계, 예를 들면, 질소산화물의 농도와 산소 농도를 각 변수로 하는 상관관계를 이용하여 특이값을 추출하게 되는데, 이에 관하여 상술하면 다음과 같다.The correlation analyzing unit 430 selects two items out of the plurality of items related to the states of the plurality of substances and fluids measured by the measuring instrument 10 and stores the two items in the database 200 related to the selected item A correlation analysis is performed on the measurement data by the Mahalanobis distance method, and the analysis results are used to select the specific values of the measurement data on the two items. That is, the correlation analyzing unit 430 calculates a correlation using measurement data on two items of the measurement data stored in the database unit 200 as two variables, for example, the concentration of nitrogen oxide and the concentration of oxygen And extracts the singular value by using the correlation as a variable.

상관관계 분석부(430)는, 통신부(100)를 통하여 수신되는 측정기기(10)의 측정값 신호를 저장한 측정 데이터 중 2개의 항목 간 상호관계에 있어 특이값으로 검출되어야 하는 상황에 대처하기 위하여, 2개의 항목에 관한 측정 데이터를 이용하여 공분산 행렬을 구성하고, 공분산 행렬을 기준으로 수학식 3과 같이 마하라노비스 거리를 계산하며, 계산된 값이 수학식 4의 주성분 요소 개수인 p에 상관관계 분석용 항목의 개수인 2를 대입하여 계산한 임계값을 초과하게 되면 이상 자료로 추출하게 된다. 즉, 상관관계 분석부(430)는, 수학식 4를 하기 수학식과 같이 두 변수 사이에서 특이값을 추출하기 위한 임계값을 정하는 수학식으로 변형하여 사용하게 된다.The correlation analyzing unit 430 analyzes the measurement data stored in the measurement device 10 received through the communication unit 100 in response to a situation that the measurement data should be detected as a specific value in the mutual relationship between two items A covariance matrix is constructed using the measurement data on the two items, and the Mahalanobis distance is calculated based on the covariance matrix as shown in equation (3), and the calculated value is multiplied by p, which is the number of principal component elements in equation If the number of items for correlation analysis, 2, is exceeded, it is extracted as abnormal data. That is, the correlation analyzer 430 transforms the equation (4) into a mathematical expression for determining a threshold value for extracting a singular value between two variables as shown in the following equation.

여기서, m은 관측 데이터의 항목의 개수로 2개의 항목을 이용하는 경우 상술한 바와 같이 2를 대입할 수 있다. n은 측정 데이터 내 측정값의 개수로서, 굴뚝 원격 감시 시스템인 TMS(Tele-Monitoring System)의 경우에 5분 자료의 개수를 의미할 수 있다.

는 100(1-α)% 신뢰 구간에 대한 F분포의 역함수를 의미한다.Here, m is a number of items of observation data, and when two items are used, 2 can be substituted as described above. n is the number of measured values in the measurement data, which may be the number of 5-minute data in case of the Tele-Monitoring System (TMS), which is a chimney remote monitoring system.

Is the inverse of the F distribution for the 100 (1-α)% confidence interval.

도 5a 및 도 5b는 상관관계 분석부(430)의 동작을 설명하는 그래프로, 상관관계 분석부(430)는, 선택된 2개의 항목, 예를 들면, 질소산화물의 농도(

)와 산소 농도(

)의 좌표값(

,

)을 도 5a에 도시된 바와 같이 공분산 평면으로 사상(Mapping)하고 공분산 분포의 중심점으로부터 각 좌표의 거리를 원 형태의 분포로 환산한 거리, 즉, 마하라노비스 거리를 측정하게 된다.5A and 5B are graphs for explaining the operation of the correlation analyzing unit 430. The correlation analyzing unit 430 compares two selected items, for example, the concentration of nitrogen oxides (

) And oxygen concentration

) Coordinate value (

,

) Is mapped to the covariance plane as shown in FIG. 5A, and the distance obtained by converting the distance of each coordinate from the center point of the covariance distribution into the distribution of the circle shape, that is, Mahalanobis distance is measured.

먼저, 두 개의 상호 관련성이 있는 항목의 측정 데이터가 도 5a의 좌측과 같이 시간대 별로 분포하고 있는 경우를 가정하면, 시간 축을 무시하고 동일 시간대에 대한 2개 항목의 측정 데이터를 2차원의 좌표 형태로 변환할 수 있는데, 변환된 결과는 도 5a의 우측에 도시된 좌표와 같다.First, assuming that the measurement data of two correlated items are distributed in the time zone as shown in the left side of FIG. 5A, the measurement data of the two items in the same time zone are ignored and the two- The converted result is the same as the coordinates shown on the right side of FIG. 5A.

이때, 도 5b에 도시된 바와 같이 변환 결과 한 좌표값(Multivariate outlier: our of control)이 다른 좌표값에 비해 현저히 떨어진 곳에 위치한 경우, 이 값을 이상 자료, 즉, 특이값으로 판단할 수 있다.In this case, as shown in FIG. 5B, when the transformed coordinate value (Multivariate outlier: our of control) is located at a position significantly different from other coordinate values, it can be judged as an abnormal data, that is, a singular value.

상관관계 분석부(430)는, 마하라노비스 거리 방법론을 통하여 도 5a 및 도 5b에 도시된 바와 같은 타원형의 분포 관계를 원형의 분포 관계로 변환하게 되며, 이를 통하여 중심점과 각 측정값 간의 거리 측정만으로 정량적으로 특이값을 추출할 수 있게 된다. 다시 말하면, 공분산 평면으로 사상된 데이터의 형태에 따라 타원형의 상관관계 분포를 참조하여 어느 좌표에 위치한 값이 특이값인지를 정량적으로 판단하기는 어려운 경우가 있는데, 특정한 2개의 측정값의 좌표가 하나는 타원형의 꼭지점에 위치한 경우와 다른 하나는 타원형의 중심점 근처에 위치한 경우, 중심점으로부터의 거리는 동일하더라도 하나는 정상값이고 다른 하나는 특이값으로 판별되어야 하기 때문이다. 즉, 상관관계 분석부(430)는, 타원형의 분포 관계를 원형의 분포 관계로 변환함에 있어서 상술한 수학식 2 및 수학식 3에 나타난 바와 같이 공분산 행렬을 이용하게 된다.The correlation analyzer 430 transforms the elliptical distribution relationship into a circular distribution relationship as shown in Figs. 5A and 5B through the Mahalanobis distance methodology, and measures the distance between the center point and each measured value It is possible to quantitatively extract the singular value. In other words, it is difficult to quantitatively judge which coordinate value is a singular value by referring to the elliptic correlation distribution depending on the type of data mapped to the covariance plane. When the coordinates of two specific measured values are one Is located at the vertex of the ellipse and the other is located near the center point of the ellipse, the distance from the center point is the same, one is the normal value and the other is the singular value. That is, the correlation analyzer 430 uses the covariance matrix as shown in Equations (2) and (3) in converting the elliptic distribution relationship into the circular distribution.

단일변수 분석부(440)는, 통신부(100)를 통하여 수신되는 측정기기(10)의 측정값 신호를 저장한 측정 데이터 중 각 항목에 대하여 데이터의 정상적인 분포범위를 결정하고, 특정값이 그 분포범위를 벗어난 경우 특이값으로 추출한다. 특히, 단일변수 분석부(440)는, 측정 데이터의 각 항목별 정상적인 분포 범위를 복수개의 측정기기(10)가 설치된 시설의 가동 상태에 따라 다르게 결정하게 된다.The single variable analyzing unit 440 determines a normal distribution range of data for each item of the measurement data storing the measurement value signal of the measuring instrument 10 received through the communication unit 100, Extrapolate to an extrinsic value if out of range. In particular, the single variable analyzing unit 440 determines the normal distribution range of each item of the measurement data differently according to the operation state of the facility in which the plurality of measuring instruments 10 is installed.

즉, 단일변수 분석부(440)는, 측정기기(10)에 의해 측정되는 복수개의 물질 및 유체의 상태에 관한 복수개의 항목 중 한 개의 항목을 선택하고, 선택된 항목에 관한 데이터 베이스부(200) 내에 저장된 측정 데이터에 대해 패턴 분석부(410)에 의해 판정된 가동 상태 별로 평균 및 표준편차를 계산하며, 계산된 평균 및 계산된 표준편차(σ)에 따른 미리 설정된 신뢰수준, 예를 들면, 90, 95, 99, 99.99% 신뢰수준의 관리한계값과 비교하여 데이터 베이스부(200) 내에 저장된 측정 데이터의 특이값을 선별한다.That is, the single variable analyzing unit 440 selects one item among a plurality of items related to the states of a plurality of substances and fluids measured by the measuring instrument 10, Calculates a mean and a standard deviation for each of the operation states determined by the pattern analysis unit 410 with respect to the measurement data stored in the storage unit 410 and calculates a predetermined confidence level based on the calculated average and the calculated standard deviation? , 95, 99, 99.99% confidence level, and selects the specific value of the measurement data stored in the database unit (200).

단일변수 분석부(440)에 의하여 결정되는 특이값 검출을 위한 범위는 상한 및 하한을 갖는 관리한계범위로 표시될 수 있는데, 예를 들면, 표준편차의 3배수 범위, 즉, 3σ 관리한계값 밖으로 벗어나는 측정값을 특이값으로 선별할 수 있다. 3σ 관리한계값을 적용한다는 의미는, 하기 수학식과 같은 유의 수준을 사용하여 귀무 가설(특이값에 해당)의 검정, 즉, 99.5% 신뢰수준에 대한 정상 범위를 벗어나는 것을 의미한다. 이때, 단일변수 분석부(440)에 적용되는 신뢰수준은 사용자가 관리수준에 따라 임의로 설정할 수 있으며, 99.5% 신뢰수준뿐 아니라 90, 95, 99, 99.99% 등 다양하게 설정할 수 있다.The range for detecting the singular value determined by the single variable analysis unit 440 can be expressed by a management limit range having an upper limit and a lower limit. For example, a range of three times the standard deviation, that is, The measured values can be selected with a specific value. Applying a 3σ management limit value means that it is outside of the normal range for the test of the null hypothesis (corresponding to the singular value), that is, the 99.5% confidence level, using a significance level such as the following equation: In this case, the trust level applied to the single variable analyzer 440 can be arbitrarily set according to the management level of the user, and can be set to various values such as 90, 95, 99, 99.99% as well as the 99.5% confidence level.

여기서,

는 측정 데이터의 각 측정값, μ는 데이터 베이스부(200) 내에 저장된 측정 데이터에 대해 패턴 분석부(410)에 의해 판정된 가동 상태 별로 계산된 평균을 의미한다.here,

Denotes an average calculated by the pattern analyzing unit 410 for each measurement value stored in the database unit 200, and?

즉, 단일변수 분석부(440)에 의하여 결정되는 특이값 검출을 위한 범위의 관리상한값(UCL) 및 관리하한값(LCL)은 하기 수학식에 의해 결정될 수 있다.That is, the management upper limit value UCL and the management lower limit value LCL of the range for detecting the singular value determined by the single variable analyzing unit 440 can be determined by the following equation.

여기서, p는 정규분포 가정에 의한 확률값을 의미한다.Here, p denotes a probability value based on a normal distribution assumption.

보정 환산 분석부(450)는, 데이터 베이스부(200) 내에 저장된 측정 데이터에 대하여 측정기기(10)의 측정 시 적용되는 보정 값에 기반하여 보정 전 데이터를 산출하고, 측정 데이터와 보정 전 데이터의 차이를 미리 설정된 값과 비교하여 측정 데이터의 특이값을 선별한다. 즉, 보정 환산 분석부(450)는, 산소 농도, 온도, 유량의 보정 전후에 대한 데이터 분포의 특성을 분석하여 특이값, 즉, 이상 자료를 추출하게 된다.The calibration conversion analysis unit 450 calculates the pre-correction data based on the correction value applied to the measurement data stored in the database unit 200 when the measurement apparatus 10 is measured, The difference is compared with a preset value to select a specific value of the measurement data. That is, the correction conversion analysis unit 450 analyzes the characteristics of the data distribution before and after the correction of the oxygen concentration, the temperature, and the flow rate, and extracts the singular value, that is, the abnormal data.

오염물질 배출시설에 설치된 복수개의 측정기기(10)는, 산소 농도, 온도, 유량에 대한 보정을 적용한 값을 측정값으로 하여 통신부(100)에 전송하게 되는데, 보정 전 자료를 배출시설 측에서 조작 또는 왜곡하는 경우에 측정기기(10)로부터 수신한 측정값에서는 특이값이 발견되지 않더라도 실제는 이상 자료로 처리되어야 하므로, 보정 환산 분석부(450)를 이용하여 보정 전 자료의 특이값을 선별하는 것이 바람직하다. 아울러, 보정 환산 분석부(450)는, 오염물질 배출시설에서 배출하는 유체의 유량, 온도 등이 급격하게 변화함에 따라 보정을 적용한 값이 갑자기 증가 또는 감소하는 현상이 발생할 수 있는 바, 이러한 현상에 따른 오류(실제로는 이상 자료인 특이값이 아님에도 특이값으로 선별하는 오류 등)를 검출하고 통제하는 기능을 제공하게 된다.The plurality of measuring instruments 10 installed in the pollutant discharge facility transmits the correction value to the communication unit 100 as a measurement value, Even if no specific value is found in the measurement value received from the measuring instrument 10 in the case of distortion, the actual value should be treated as the ideal data. Therefore, the correction value conversion unit 450 selects the specific value of the pre-correction data . Further, as the flow rate and the temperature of the fluid discharged from the pollutant discharge facility suddenly change, the correction applied to the correction analysis section 450 may suddenly increase or decrease the value applied to the correction. (In fact, it is not an abnormal value, which is an abnormal data, but an error to be selected as an abnormal value, etc.).

한편, 보정 환산 분석부(450)는, 보정 전 데이터를 산출하는 방식 외에도 측정기기(10)로부터 보정 전 데이터를 직접 수집하여 데이터 베이스부(200) 내에 저장된 측정 데이터와 비교분석할 수 있으나 이에 한정되지 않는다.In addition to the method of calculating the pre-correction data, the correction conversion analysis unit 450 may directly compare the pre-correction data with the measurement data stored in the data base unit 200, It does not.

보정 환산 분석부(450)에서 측정 데이터 중 오염물질 측정 농도에 관한 데이터를 온도, 산소 농도 및 유량에 의하여 보정 전 데이터로 환산하는 과정에 대하여 설명하면 다음과 같다. 이때, 아래 보정식에서 온도 및 압력은 표준 상태(Standard Temperature and Pressure, STP)로 환산된 것이며, 압력은 1기압으로 가정한다.A process of converting the data on the pollutant measurement concentration in the measurement data into the pre-correction data by the temperature, the oxygen concentration, and the flow rate in the correction conversion analysis unit 450 will be described below. At this time, temperature and pressure are converted into Standard Temperature and Pressure (STP) in the following formula, and pressure is assumed to be 1 atm.

측정기기(10)에서 온도 보정에 의하여 오염물질 농도의 실측값(

)을 오염물질 농도의 전송용 측정값(

)으로 환산하는 수학식은 아래와 같다.The measured value of the concentration of the pollutant in the measuring device 10

) For the transfer of the pollutant concentration (

) Is as follows.

여기서,

는 표준상태(0℃, 1기압)로 환산한 오염물질의 농도(mg/Smㅃ, ppm)이고,

는 실측 또는 임의의 설정 온도에서 측정한 오염물질의 농도(mg/mㅃ, ppm)이며,

는 실측 또는 임의의 설정 온도(℃)이다.here,

Is the concentration of contaminant (mg / Sm ㅃ, ppm) converted to the standard state (0 ℃, 1 atm)

Is the concentration (mg / m ㅃ, ppm) of the pollutant measured at the actual or arbitrary set temperature,

Is the actual temperature or an arbitrary set temperature (占 폚).

또한, 측정기기(10)에서 산소 농도 보정에 의하여 오염물질 농도의 실측값(

)을 오염물질 농도의 전송용 측정값(

)으로 환산하는 수학식은 아래와 같다.In addition, the measured value of the contaminant concentration in the measuring instrument 10

) For the transfer of the pollutant concentration (

) Is as follows.

여기서,

는 표준산소 농도로서 배출시절의 연소 장비를 적정하게 운전하고 배출 기체 희석을 방지하기 위해 설정되는 농도(%)를 의미하며,

는 실측 산소 농도(%)를 의미한다.here,

Means the concentration (%) which is set as a standard oxygen concentration for proper operation of the combustion equipment at the time of discharge and prevention of exhaust gas dilution,

Means the actual oxygen concentration (%).

또한, 측정기기(10)에서 온도 보정에 의하여 배출 유량의 실측값(

)을 배출 유량의 전송용 측정값(

)으로 환산하는 수학식은 아래와 같다.Further, in the measuring device 10, an actual measured value of the discharge flow rate (

) For the discharge flow rate measurement (

) Is as follows.

여기서,

는 표준상태(0℃, 1기압)로 환산한 유량(㎥/30분)이고,

는 실측 또는 임의의 설정 온도에서 측정한 유량(㎥/30분)이다.here,

Is the flow rate (m3 / 30 min) converted to the standard state (0 DEG C, 1 atm)

(M3 / 30 minutes) measured at the actual or arbitrary set temperature.

한편, 측정기기(10)에서 산소 농도 보정에 의하여 배출 유량의 실측값(

)을 배출 유량의 전송용 측정값(

)으로 환산하는 수학식은 아래와 같다.On the other hand, the measured value of the discharge flow rate by the oxygen concentration correction in the measuring instrument 10

) For the discharge flow rate measurement (

) Is as follows.

여기서,

는 표준상태(0℃, 1기압)로 환산한 유량(㎥/30분)이고,

는 실측 또는 임의의 설정 온도에서 측정한 유량(㎥/30분)이며,

는 실측 또는 임의의 설정 온도(℃)이다.here,

Is the flow rate (m3 / 30 min) converted to the standard state (0 DEG C, 1 atm)

(M < 3 > / 30 min) measured at the actual or arbitrary set temperature,

Is the actual temperature or an arbitrary set temperature (占 폚).

보정 환산 분석부(450)는, 특이값을 선별하는 데 있어서 일반적인 신뢰구간 방법론을 적용할 수 있다. 즉, 보정 환산 분석부(450)는, 측정 데이터와 보정 전 데이터의 차이를 계산한 값을 축적하고, 축적된 데이터의 평균, 표준편차를 이용하여 특이값인지 여부를 판정하고자 하는 측정 데이터와 보정 전 데이터의 차이가 95% 또는 99.5%의 신뢰수준에서 상한과 하한의 범위 내에 있는지 조사하게 된다.The correction conversion analysis unit 450 can apply a general confidence interval methodology to select singular values. In other words, the correction conversion analysis unit 450 accumulates the value obtained by calculating the difference between the measurement data and the pre-correction data, and uses the average and standard deviation of the accumulated data to compare the measurement data to be determined as the singular value, It is checked whether the difference of the entire data is within the range of the upper limit and the lower limit at the confidence level of 95% or 99.5%.

여기서, 보정 환산 분석부(450)는, 측정 데이터(보정 후 데이터)와 보정 전 데이터의 차이값을 하기 수학식에 의해 백분율로 환산하여 검토용 값으로 사용할 수 있으나 이에 한정되지 않는다.Here, the correction conversion analysis unit 450 may convert the difference between the measured data (corrected data) and the unmodified data into a percentage by using the following equation, but is not limited thereto.

룰셋(Ruleset) 관리부(460)는, 데이터 베이스부(200) 내 측정 데이터에 대한 사전 분석을 통하여 도출된 가동 상태 분류 방법, 주성분 요소 분석 결과, 상관관계 분석 결과, 단일변수 분석 결과를 패턴 분석부(410), 주성분 요소 분석부(420), 상관관계 분석부(430), 단일변수 분석부(440)로부터 각각 입력받고, 입력된 방법 및 결과를 룰셋 형태로 저장하며, 통신부(100)를 통하여 입력된 측정값 신호를 실시간으로 분석하는 경우에 별도의 데이터 베이스부(200) 내 측정 데이터에 대한 연산 작업 없이 해당 룰셋 적용만으로 특이값, 즉, 이상 자료를 추출한다. 다시 말하면, 상술한 바와 같은 패턴 분석부(410), 주성분 요소 분석부(420), 상관관계 분석부(430), 단일변수 분석부(440)에 의한 특이값 선별 방법은, 패턴 분석 및 주성분 요소 분석 등에 필요한 다량의 데이터 조회, 수많은 통계분석 및 데이터 연산이 요구되므로 실시간 이상 자료 추출에는 적합하지 않은 바, 이러한 단점을 룰셋 관리부(460)로 보완하게 된다.The ruleset management unit 460 analyzes the results of the operational state classification, the principal component analysis, the correlation analysis, and the single variable analysis result, which are obtained through the preliminary analysis of the measurement data in the database unit 200, The principal component analyzing unit 420, the correlation analyzing unit 430 and the single variable analyzing unit 440 and stores the input method and results in the form of a rule set. In the case of analyzing the inputted measured value signal in real time, a specific value, that is, an ideal data is extracted only by applying the rule set without performing calculation operation on the measurement data in the separate database unit 200. In other words, the singular value selection method by the pattern analyzing unit 410, the principal component analyzing unit 420, the correlation analyzing unit 430, and the single variable analyzing unit 440, as described above, The statistical analysis and the data calculation are required. Therefore, the rule set management unit 460 compensates for these disadvantages because it is not suitable for real-time abnormal data extraction.

도 6은 사용자 인터페이스부(300)가 제공하는 룰셋 관리부(460) 제어를 위한 화면의 일례를 도시한 도면으로, 도 1 내지 도 6을 참조하여 룰셋 관리부(460)의 동작에 대하여 설명하면 다음과 같다.6 is a diagram illustrating an example of a screen for controlling the ruleset management unit 460 provided by the user interface unit 300. The operation of the ruleset management unit 460 will be described with reference to FIGS. same.

룰셋 관리부(460)는, 패턴 분석부(410)에 의해 정의된 가동 상태 별 측정 데이터의 분포 범위를 제1 룰셋으로 설정하고, 미리 설정된 제1 룰셋과 통신부(100)를 통하여 실시간으로 입력된 측정값 신호를 비교하여 복수개의 측정기기(10)가 설치된 시설의 가동 상태를 판정할 수 있다. 예를 들면, 룰셋 관리부(460)는, 가동 상태를 도 6에 도시된 바와 같이 보통, 과잉 공기, 고온, 가동 중지 상태로 구분하고, 사용자 인터페이스부(300)는, 룰셋 관리부(460)에 의해 각 가동 상태 별로 설정된 측정 데이터의 분포 범위에 대한 정보를 표시하는 선택용 인터페이스 도구(331)를 제공할 수 있다.The rule set management unit 460 sets the distribution range of the measurement data for each operating state defined by the pattern analysis unit 410 as a first rule set, Value signals are compared with each other to determine the operation state of the facility in which a plurality of measuring instruments 10 are installed. For example, the ruleset management unit 460 divides the operation state into an excess air, a high temperature, and a downtime state, as shown in FIG. 6, and the user interface unit 300 controls the ruleset management unit 460 And a selection interface tool 331 for displaying information on the distribution range of the measurement data set for each operation state.

룰셋 관리부(460)는, 주성분 요소 분석부(420)에 의해 수학식 4를 이용하여 계산된 T²-통계량에 근거한 상한 관리값을 제2 룰셋으로 설정하고, 미리 설정된 제2 룰셋과 통신부(100)를 통하여 입력된 측정값 신호에 대해 수학식 3을 이용하여 계산된 T²-통계량을 비교하여 통신부(100)를 통하여 실시간으로 입력된 측정값 신호의 특이값 여부를 판정할 수 있다. 예를 들면, 룰셋 관리부(460)는, 수학식 4를 이용하여 계산된 T²-통계량에 근거한 상한 관리값을 임계값(Threshold)으로 정하고, 사용자 인터페이스부(300)는, 룰셋 관리부(460)에 의해 정한 실시간 검출에 적용된 임계 수준, 예를 들면, 도 6에 도시된 바와 같은 0.000~24.428을 표시함과 아울러 해당 기능의 신뢰 수준(Confidence level) 및 사용 여부(Enabled)를 선택할 수 있도록 한다(332).The rule-set management unit 460 sets the upper limit management value based on the T2-statistic calculated by the principal component analysis unit 420 using Equation (4) as the second rule set, and sets the second rule set to the communication unit 100, The measured value signal inputted through the communication unit 100 can be compared with the T < 2 > -state calculated using Equation (3) to determine whether the measured value signal inputted in real time is a singular value. For example, the ruleset manager 460 sets the upper limit management value based on the T < 2 > -state calculated using Equation 4 as a threshold value, and the user interface unit 300 sets the upper limit management value to the rule set manager 460 (For example, 0.000 to 24.428 as shown in FIG. 6), as well as to select a confidence level and an enabled state of the corresponding function (refer to 332 ).

또한, 룰셋 관리부(460)는, 주성분 요소 분석부(420)에 의해 수학식 6을 이용하여 계산된 Q-통계량에 근거한 상한 관리값을 제3 룰셋으로 설정하고, 미리 설정된 제3 룰셋과 통신부(100)를 통하여 입력된 측정값 신호에 대해 수학식 5를 이용하여 계산된 Q-통계량을 비교하여 통신부(100)를 통하여 실시간으로 입력된 측정값 신호의 특이값 여부를 판정할 수 있다.In addition, the rule-set management unit 460 sets the upper limit management value based on the Q-statistic calculated using the equation (6) by the principal component analysis unit 420 to the third rule set, and sets the third rule set to the communication unit 100, the Q-statistic calculated using Equation (5) may be compared with each other to determine whether the measured value signal inputted in real time through the communication unit 100 has a specific value.

한편, 룰셋 관리부(460)는, 상관관계 분석부(430)에 의해 수학식 9를 이용하여 계산된 T²-통계량에 근거한 상한 관리값을 제4 룰셋으로 설정하고, 미리 설정된 제4 룰셋과 통신부(100)를 통하여 입력된 측정값 신호에 대해 수학식 3을 이용하여 계산된 T²-통계량을 비교하여 통신부(100)를 통하여 실시간으로 입력된 측정값 신호의 특이값 여부를 판정할 수 있다.On the other hand, the rule-set management unit 460 sets the upper limit management value based on the T²-statistic calculated by the equation (9) by the correlation analysis unit 430 to the fourth rule set, 100 to compare the T < 2 > -statistics calculated using Equation (3) to determine whether the measured value signal inputted in real time through the communication unit 100 has a specific value.

또한, 룰셋 관리부(460)는, 단일변수 분석부(440)에 의해 수학식 10 또는 임의로 결정된 신뢰수준 및 수학식 11을 이용하여 계산된 관리한계값을 제5 룰셋으로 설정하고, 미리 설정된 제5 룰셋과 통신부(100)를 통하여 입력된 측정값 신호의 값을 비교하여 통신부(100)를 통하여 실시간으로 입력된 측정값 신호의 특이값 여부를 판정할 수 있다.Further, the ruleset management unit 460 sets the management threshold value calculated using Equation 10 or the arbitrarily determined confidence level and Equation 11 by the single variable analyzing unit 440 to the fifth rule set, It is possible to compare the value of the measured value signal input through the communication unit 100 with the rule set and determine whether the measured value signal inputted in real time through the communication unit 100 has a specific value.

이때, 검출부(400)는, 룰셋 관리부(460)에 의해 선별된 특이값에 대하여, 주성분 요수 분석부(420), 상관관계 분석부(430), 단일변수 분석부(440) 등 각 분석 결과에 대하여 가중계수를 적용하고, 가중계수가 적용된 선별 결과를 취합하여 제1 룰셋 내지 제5 룰셋이 모두 반영된 이상 자료 추출 결과를 도출할 수 있고, 도출된 이상 자료 추출 결과에 대하여 등급을 설정할 수 있으나 이에 한정되지 않는다. 즉, 검출부(400)는, 이상 자료인지 여부 뿐 아니라 이상 자료, 즉, 특이값의 비정상 등급, 예를 들면, 5개 등급(NORMAL/WARNING/MINOR/MAJOR/CRITICAL)을 구분하여 관리자에게 편의성을 제공할 수 있다.At this time, the detection unit 400 determines whether the singular values selected by the ruleset management unit 460 are included in the respective analysis results, such as the principal component number analysis unit 420, the correlation analysis unit 430, and the single variable analysis unit 440 The result of the abnormal data extraction in which the first rule set to the fifth rule set are all reflected can be derived and the grade can be set for the result of the extracted abnormal data, It is not limited. That is, the detecting unit 400 distinguishes abnormality of abnormal data, for example, 5 classes (NORMAL / WARNING / MINOR / MAJOR / CRITICAL) .

도 7은 본 발명의 일 실시예에 따른 이상 자료 검출 방법을 도시한 동작 흐름도로서, 도 1 내지 도 7을 참조하여 본 발명의 이상 자료 검출 방법을 설명하면 다음과 같다.FIG. 7 is a flowchart illustrating an abnormal data detecting method according to an embodiment of the present invention. Referring to FIGS. 1 to 7, the abnormal data detecting method of the present invention will be described as follows.

먼저, 통신부(100)를 통하여 복수개의 측정기기(10)로부터 측정값 신호를 소정의 간격, 예를 들면 5분 간격으로 수신하고(S100), 수신한 측정값 신호를 측정 데이터로서 데이터 베이스부(200)에 저장한다(S200). 이때, 데이터 베이스부(200)에 저장된 측정 데이터는 각 측정기기(10) 별, 즉, 배출구별로 수개월 내지 수년간 축적된 데이터일 수 있다.First, a measurement value signal is received from a plurality of measuring instruments 10 through the communication unit 100 at predetermined intervals, for example, every five minutes (S100), and the received measurement value signal is transmitted to the data base 200) (S200). At this time, the measurement data stored in the database unit 200 may be data accumulated for several months to several years according to each measuring instrument 10, that is, the outlet.

다음에, 도 2에 도시된 바와 같은 사용자 인터페이스부(300)를 통하여 이상 자료 검출 대상 및 검출 기준을 입력받는다(S300). 이때, 사용자 인터페이스부(300)를 통한 검출 대상 선택은 통신부(100)를 통하여 복수개의 측정기기(10)로부터 측정값 신호를 수신하기 전에 수행될 수 있고, 이에 따라, 선택된 대상 배출구, 예를 들면, GS칼텍스(주)여수공장의 보일러 및 가열 시설에 설치된 1번 굴뚝으로부터 측정값 신호를 수신하도록 설정할 수 있다. 또한, 사용자 인터페이스부(300)는, 이상 자료 검출 기준 선택에 있어서, 분석 기간 설정 등의 선택 인터페이스를 제공함과 아울러 본 발명의 의한 데이터 분석이 이루어진 후에 룰셋을 설정하기 위한 가동 상태에 따른 다단계 필터링 동작을 반영한 룰셋 설정(320), 관리(330) 및 평가(340)를 위한 다양한 인터페이스를 제공할 수 있다.Next, the abnormal data detection object and detection standard are inputted through the user interface unit 300 as shown in FIG. 2 (S300). At this time, the detection object selection through the user interface unit 300 can be performed before receiving the measurement value signal from the plurality of measuring instruments 10 through the communication unit 100, and accordingly, , GS Caltex Co., Ltd. You can set up to receive the measured value signal from the No. 1 chimney installed at the boiler and heating facility of the Yeosu plant. In addition, the user interface unit 300 provides a selection interface such as an analysis period setting in the selection of an abnormal data detection reference, a multi-step filtering operation according to the operation state for setting a ruleset after data analysis according to the present invention is performed, (330), and the evaluation (340).

다음에, 검출부(400)는, 사용자 인터페이스부(300)를 통하여 입력된 이상 자료 검출 대상 및 검출 기준을 이용하여 데이터 베이스부(200)에 저장된 측정 데이터를 분석하며, 분석한 결과에 따라 복수개의 측정기기(10)가 설치된 시설의 가동 상태를 판정하고, 판정된 가동 상태 별로 특이값을 선별하며, 판정 결과 및 선별 결과를 사용자 인터페이스부(300)를 통하여 표시한다(S400). 이때, 검출부(400)의 동작에 대하여 상술하면 다음과 같다.Next, the detection unit 400 analyzes the measurement data stored in the database unit 200 using the abnormal data detection object and detection standard input through the user interface unit 300, and according to the result of analysis, The operation state of the facility in which the measuring instrument 10 is installed is judged, a specific value is selected for each determined operation state, and the determination result and the selection result are displayed through the user interface unit 300 (S400). The operation of the detector 400 will now be described in detail.

검출부(400)는, 복수개의 측정기기(10)에 의해 측정되는 복수개의 물질 및 유체의 상태에 관한 복수개의 항목 사이의 상관성을 바탕으로 5분 간격으로 수신하여 저장된 측정 데이터(이하, 5분 데이터라고 함.)의 패턴을 분류한다. 예를 들면, 검출부(400)는, 패턴 분석을 위하여 산소 농도, 유량, 질소산화물 농도, 배출 기체 온도 등 각 측정 항목 별 통계값, 즉, 평균, 표준 편차, 왜도 등을 계산하고, 각 측정 항목 간의 단위가 상이함에 따라 큰 값을 가지는 항목의 측정 데이터에 클러스터링이 좌우되는 것을 방지하기 위하여 측정 데이터의 항목 별 정규화를 수행하며, 의도적으로 좀 더 큰 비중을 두고자 하는 항목, 예를 들면, 산소 농도와 같은 항목이 있는 경우, 군집화 전에 측정 데이터를 거리가중 함수로 처리하기 위한 가중 계수를 설정하고, 데이터 베이스부(200) 내 측정 데이터에 대하여 가동 상태의 개수(도 2에 도시된 패턴 분석 항목(322) 참조), 예를 들면, 가동 중단, 가동 준비, 감산 운전, 방지 시설 이상의 4개 집합으로 분할하는 K평균++ 클러스터링 분석을 실행하고, K평균++ 클러스터링 분석 실행 결과에 따라 도출된 복수개의 집합의 분포 범위를 가동 상태 별 측정 데이터의 분포 범위로 정의할 수 있다. 이때, 검출부(400)는, 1년 기준 100,000개 이상의 측정값을 포함하는 측정 데이터를 10분 이내로 분석하는 성능을 제공하는 것이 바람직하다.The detection unit 400 detects the measurement data (hereinafter, referred to as 5-minute data) received and stored at intervals of 5 minutes based on the correlation between a plurality of items related to the states of the plurality of substances and fluids measured by the plurality of measurement devices 10, Quot;). &Lt; / RTI > For example, the detection unit 400 calculates statistical values, i.e., average, standard deviation, and distortion, for each measurement item such as oxygen concentration, flow rate, nitrogen oxide concentration, and exhaust gas temperature for pattern analysis, In order to prevent the clustering from being influenced by the measurement data of the item having a large value as the unit between the items is different, the measurement data is normalized according to each item, and an item intentionally puts a larger weight, for example, The weighting coefficient for processing the measurement data as a function of the distance weight before the clustering is set, and the number of the operating states (the pattern shown in FIG. 2 (See analysis item (322)), for example, K average ++ clustering analysis is performed by dividing into four sets of downtime, operation preparation, subtraction operation, The distribution range of the plurality of sets derived according to the clustering analysis execution result can be defined as the distribution range of the measurement data for each operating state. At this time, the detector 400 preferably provides the capability of analyzing measurement data including a measurement value of 100,000 or more per year within 10 minutes.

이때, 검출부(400)는, 사용자 인터페이스부(300)를 통하여 항목별 시계열 그래프 및 항목 간 데이터 분포에서 동일 상태를 색상으로 구분하여 시각화한 화면을 표시할 수 있다. 사용자는 이와 같은 시각화 데이터를 배출구별 가동 상태의 분류, 장기간 패턴 변화 및 발생 기간의 추적, 룰셋 설정을 위한 대상 기간 선정에 활용할 수 있다.At this time, the detection unit 400 can display a screen visualized by color-categorizing the same state in the time series graph and the data distribution between items through the user interface unit 300. The user can utilize such visualization data to classify the operation state of the emission sorting, to track the long-term pattern change and generation period, and to select the target period for the rule set setting.

예를 들면, 도 8a 내지 도 8c는 검출부(400)가 사용자 인터페이스부(200)를 통하여 제공하는 그래프를 도시한 도면으로, 이에 대하여 설명하면 다음과 같다.For example, FIGS. 8A to 8C show graphs provided by the detecting unit 400 through the user interface unit 200, which will be described below.

도 8a는 선택된 측정기기(10)로부터 수신된 2015년 유량(FL1)에 관한 시계열 그래프로, 유속계 정도 검사 이전에 간헐적인 피토관(Pitot tube) 막힘 현상을 보이다가 유속계 정도 검사 이후에 피토관 막힘에 따른 지속적 이상 고유량이 발생하는 것을 나타내고 있다. 또한, 도 8b는 선택된 측정기기(10)로부터 수신된 2016년 유량(FL1)에 관한 시계열 그래프로, 샘플 라인 누출(leak) 및 막힘으로 정상 측정이 불가능한 기간이 도과한 후에 피토관 막힘에 따른 지속적 이상 고유량이 발생하는 것을 나타내고 있다.8A is a time-series graph of the flow rate (FL1) of 2015 received from the selected measuring instrument 10, showing an intermittent Pitot tube clogging phenomenon before the measurement of the flow velocity, Indicating that a persistent ideal eigenvalue occurs. 8B is a time series graph relating to the 2016 flow rate FL1 received from the selected measuring instrument 10 and shows a continuous error due to clogging of the pitot tube after a period in which normal measurement is impossible due to sample line leakage and clogging And a specific amount is generated.

도 8c는 항목간 데이터 분포에 대한 그래프로, 미세먼지 농도(TSP)-산소 농도(O2), 황산화물 농도(SOX)-산소 농도, 질소 산화물 농도(NOX)-산소 농도, 염화수소 농도(HCL)-산소 농도, 일산화탄소 농도(CO)-산소 농도, 배출기체 유량(FL1)-산소 농도, 배출기체 온도(TMP)-산소 농도, 연소실 내부 온도(TM1)-산소 농도, 미세먼지 농도-배출기체 온도 등 2개의 항목 간에 상관관계를 나타내고 있다. 이때, 배출기체 유량-산소 농도의 상관 관계에서 도 8b에 도시된 유량에 대응하는 부분(323) 포함되어 있는 것을 확인할 수 있으며, 이에 따라 피토관 막힘 현상이 발생한 것을 파악할 수 있다.FIG. 8C is a graph showing the distribution of data between items. FIG. 8C is a graph showing the distribution of data between items. FIG. 8C is a graph showing the distribution of data between items. FIG. - Oxygen concentration, carbon monoxide concentration (CO) - Oxygen concentration, exhaust gas flow rate (FL1) - oxygen concentration, exhaust gas temperature (TMP) - oxygen concentration, combustion chamber internal temperature (TM1) - oxygen concentration, fine dust concentration - And the like. At this time, it can be confirmed that the portion 323 corresponding to the flow rate shown in FIG. 8B is included in the correlation between the exhaust gas flow rate and the oxygen concentration, and it can be understood that the pitot tube plugging phenomenon has occurred.

도 9a는 배출기체 유량 데이터에 관하여 6개월 간 데이터 분석을 통해 이상 고부하 구간으로 판정된, 즉, 관리상한값 이상으로 판정된 특이값(324)을 표시한 시계열 그래프로, 검출부(400)는, 상술한 바와 같이 주성분 요소 분석, T²-통계량, Q-통계량, 미리 설정된 신뢰수준의 관리한계값을 이용하여 가동 상태별 특이값을 검출하고, 검출 결과를 도 9a에 도시된 바와 같이 표시할 수 있다.9A is a time-series graph showing a specific value 324 determined as an abnormally high load section through data analysis for six months with respect to the discharged gas flow rate data, that is, determined as the upper management upper limit value. As described above, by using the principal component analysis, the T²-statistic, the Q-statistic, and the management limit value of the predetermined confidence level, it is possible to detect the singular value for each operation state and display the detection result as shown in FIG. 9A.

또한, 도 9b는 두 항목, 즉, 유량 및 산소 농도의 상관관계에 관한 특이값(적색)을 표시한 상관관계 플롯(Plot)으로, 검출부(400)는, 도 9b에 도시된 바와 같이 상관관계에 따른 이상 자료, 즉, 특이값을 시계열 그래프로 표시할 수 있다.9B is a correlation plot showing two items, that is, a specific value (red) relating to the correlation between the flow rate and the oxygen concentration. The detection unit 400 detects a correlation The abnormal data according to the time series graph can be displayed.

검출부(400)는, 다양한 패턴에 대한 분석 결과를 적용하여 T²-통계량에 의한 상한 관리값, Q-통계량에 의한 상한 관리값, 미리 설정된 신뢰수준의 관리한계값 등을 임계값으로 설정하고, 설정된 임계값을 실시간 검출 수준인 룰셋으로 적용할 수 있다. 예를 들면, 검출부(400)는, 도 6에 도시된 바와 같이 사용자 인터페이스부(300)로부터 룰셋을 적용(apply)하는 설정 명령(333)이 입력되면, 수학식 4를 이용하여 계산된 T²-통계량에 근거한 상한 관리값을 임계값으로 정하고, 임계값을 참고하여 정한 실시간 검출에 적용된 임계 수준, 예를 들면, 0.000~24.428을 표시함과 아울러 해당 기능의 신뢰 수준을 표시하고 사용 여부를 선택하는 명령을 입력받을 수 있다.The detection unit 400 sets the upper limit management value by the T²-statistic, the upper limit management value by the Q-statistic, the management limit of the preset trust level, etc. as a threshold value by applying the analysis results of various patterns, The threshold value can be applied as a ruleset that is a real-time detection level. For example, when a setting command 333 for applying a ruleset is input from the user interface unit 300 as shown in FIG. 6, the detecting unit 400 detects a T²- The upper limit management value based on the statistic is set as the threshold value, the threshold level applied to the real-time detection determined by referring to the threshold value, for example, 0.000 ~ 24.428 is displayed, the trust level of the function is displayed, Command can be input.

이후에, 검출부(400)는, 통신부(100)를 통하여 입력되는 실시간 측정값에 대하여 미리 설정된 룰셋을 기준으로 가동 상태를 판정하고 특이값을 선별할 수 있다. 예를 들면, 검출부(400)는, 데이터 분석에 의해 산소 농도 21%를 가동 중지 상태를 나타내는 하한값으로 설정한 경우에, 시계열 그래프에서 산소 농도 21%를 초과하는 부분을 별도의 색상으로 표시함으로써 가동 중지 상태임을 사용자에게 알려줄 수 있고, 데이터 분석에 의해 질소 산화물 농도 40%를 이상 자료인 하한값으로 설정한 경우에, 시계열 그래프에서 질소 산화물 농도 40%를 초과하는 부분을 별도의 색상으로 표시함으로써 동절기 요소수 동결로 방지시설 약품 투입량이 감소한 상황임을 사용자에게 알려줄 수 있다.Thereafter, the detection unit 400 can determine the operation state based on a preset rule set for the real-time measurement value input through the communication unit 100, and select a specific value. For example, in the case where the oxygen concentration of 21% is set as the lower limit value indicating the shutdown state by data analysis, the detection unit 400 displays the portion exceeding the oxygen concentration of 21% in the time series graph in a separate color, It is possible to inform the user that the nitrogen oxide concentration is in a stopped state and when a nitrogen oxide concentration of 40% is set as a lower limit value by the data analysis, the portion exceeding 40% of the nitrogen oxide concentration in the time series graph is displayed in a separate color, It is possible to inform the user that the number of freezing prevention facilities is decreased.

도 10a 내지 도 10f는 검출부(400)의 가동 상태 분류 결과를 나타낸 그래프로, 검출부(400)는, 실시간 데이터(2017년 5월 17일)에 대해 패턴 분석에 의하여 룰셋으로 설정된 임계값을 적용하여 보통, 과잉공기, 고온, 가동 중지의 4개의 가동 상태로 구분할 수 있다. 즉, 검출부(400)는, 실시간 전송 데이터에서 질소 산화물 농도 및 산소 농도가 높은 자료는 과잉 공기 상태로 분류하고, 유량 및 일산화탄소 농도가 높은 자료는 고온 상태로 분류하며, 기타 전항목 평균 범위의 자료를 보통 상태로 분류할 수 있다. 도 10a 내지 도 10f에 도시된 결과에 의하면, 과잉 공기 투입 시 완전 연소에 의해 일산화탄소 발생은 감소하지만 산소의 증가에 따라 질소 산화물의 농도는 증가하며, 고온 상태에서는 불완전 연소로 일산화탄소가 급증하는 등 일반적인 연소공학적 해석 결과와 일치된 결과가 도출됨을 확인할 수 있다.10A to 10F are graphs showing the results of classification of the operating state of the detecting unit 400. The detecting unit 400 applies a threshold value set as a rule set by pattern analysis to real-time data (May 17, 2017) Usually, it can be divided into four operating states: excess air, high temperature, and shutdown. That is, the detection unit 400 classifies data in which the nitrogen oxide concentration and oxygen concentration are high in the real-time transmission data into the excess air condition, the data in which the flow rate and the carbon monoxide concentration are in the high concentration condition, Can be classified into a normal state. According to the results shown in Figs. 10A to 10F, although the generation of carbon monoxide is reduced by the complete combustion when the excess air is introduced, the concentration of nitrogen oxide increases with the increase of oxygen, and the carbon monoxide is increased by incomplete combustion at a high temperature It can be confirmed that the results consistent with the results of combustion engineering analysis are derived.

이러한 본 발명에 의한 이상 자료 검출 방법은 프로그램으로 구현되어 컴퓨터로 읽을 수 있는 기록매체(씨디롬, 램, 롬, 플로피 디스크, 하드 디스크, 광자기 디스크 등)에 저장될 수 있다.The abnormal data detection method according to the present invention can be implemented by a program and stored in a computer-readable recording medium (CD-ROM, RAM, ROM, floppy disk, hard disk, magneto-optical disk, etc.).

이러한 개시된 기술인 방법 및 장치는 이해를 돕기 위하여 도면에 도시된 실시예를 참고로 설명되었으나, 이는 예시적인 것에 불과하며, 당해 분야에서 통상적 지식을 가진 자라면 이로부터 다양한 변형 및 균등한 타 실시예가 가능하다는 점을 이해할 것이다. 따라서 개시된 기술의 진정한 기술적 보호 범위는 첨부된 특허청구범위에 의해 정해져야 할 것이다.Although the disclosed method and apparatus have been described with reference to the embodiments shown in the drawings for illustrative purposes, those skilled in the art will appreciate that various modifications and equivalent embodiments are possible without departing from the scope of the present invention. I will understand that. Accordingly, the true scope of protection of the disclosed technology should be determined by the appended claims.

10: 측정기기
100: 통신부
200: 데이터 베이스부
300: 사용자 인터페이스부
400: 검출부
410: 패턴 분석부
420: 주성분 요소 분석부
430: 상관관계 분석부
440: 단일변수 분석부
450: 보정 환산 분석부
460: 룰셋 관리부10: Measuring equipment
100:
200:
300: user interface section
400:
410: pattern analysis unit
420: Principal component analysis unit
430: correlation analysis section
440: Single variable analysis unit
450: Calibration conversion analysis section
460: Ruleset manager

Claims

1. An abnormal data detection system for collecting and analyzing a measurement value signal from a plurality of measuring instruments for measuring a state of a plurality of materials and fluids contained in a fluid and outputting a measurement value signal,
A communication unit for receiving the measurement value signal from the plurality of measurement devices;
A data base for storing the measurement value signal received through the communication unit as measurement data;
A user interface unit for providing an interface for selection of an abnormal data detection target and a detection criterion and providing an interface for displaying a detection result; And
Wherein the control unit analyzes the measurement data stored in the database unit using the abnormal data detection object and the detection criterion selected through the user interface unit and determines the operation state of the facility equipped with the plurality of measurement apparatuses according to the result of the analysis, And displaying the determination result and the selection result as the detection result through the user interface unit,
Wherein:
A distribution range of measurement data for each operation state of the facility in which the plurality of measurement devices are installed with respect to the measurement data stored in the database unit and comparing the measured value signal inputted through the communication unit with a defined distribution range, And a pattern analyzer for determining the operation state of the facility in which the measuring instrument is installed,
The pattern analyzing unit,
A K average ++ clustering analysis is performed for dividing the measurement data into a set corresponding to the number of operating states, and the distribution range of a plurality of sets derived according to the K average ++ clustering analysis execution result is measured by the above- An abnormal data detection system that defines the range of data distribution.

delete

The method according to claim 1,
Wherein:
A plurality of main items are selected by Principal Component Analysis among a plurality of items related to states of a plurality of substances and fluids measured by the measuring device, And a principal component analysis unit for calculating a T < 2 > -state for the stored measurement data and comparing the calculated T < 2 > -state with an upper limit management value based on the T & Data detection system.

The method according to claim 1,
Wherein:
A plurality of main items are selected by Principal Component Analysis among a plurality of items related to states of a plurality of substances and fluids measured by the measuring device, And a principal component analysis unit for calculating a Q-statistic for the stored measurement data and comparing the calculated Q-statistic with an upper limit management value based on the Q-statistic to select a specific value of the measurement data stored in the database unit Data detection system.

The method according to claim 1,
Wherein:
Selecting two items out of a plurality of items relating to the states of the plurality of substances and fluids measured by the measuring instrument, calculating a T < 2 > -state for the measurement data stored in the database portion concerning the selected two items, And a correlation analyzer for comparing the calculated T < 2 > - statistic with an upper limit management value based on the T < 2 > - statistic to select a specific value of the measurement data stored in the database.

The method according to claim 1,
Wherein:
And a control unit for selecting one item from among a plurality of items relating to the states of the plurality of substances and fluids measured by the measuring device and for comparing the measured data stored in the database unit with the average and standard And a single variable analyzing unit for calculating the deviation and selecting a specific value of the measurement data stored in the database unit by comparing the calculated average value with a management threshold value of a predetermined confidence level according to the calculated standard deviation Data detection system.

The method according to claim 1,
Wherein:
Calculating a pre-correction data based on a correction value applied to the measurement data in the measurement of the measurement device, comparing the difference between the measurement data and the pre-correction data with a preset value, And a correction-conversion-analysis unit for correcting the abnormal data.

The method according to claim 1,
Wherein:
And a control unit for setting a distribution range of measurement data for each operating state defined by the pattern analyzer as a first rule set and comparing the first rule set and a measured value signal inputted through the communication unit, And a rule set manager for determining a state of the abnormal data.

The method of claim 4,
Wherein:
A second rule set based on the T²-statistic of the principal component analysis unit as a second rule set, and a T²-statistic value calculated for the measured value signal inputted through the communication unit is compared with the second rule set, And a rule set management unit for determining whether the inputted measured value signal is a singular value or not.

The method of claim 5,
Wherein:
A third rule set is set as an upper limit management value based on the Q-statistic of the principal component analysis unit, and the Q-statistic calculated for the measured value signal inputted through the communication unit is compared with the third rule set, And a rule set management unit for determining whether the inputted measured value signal is a singular value or not.

The method of claim 6,
Wherein:
The upper limit management value based on the T < 2 > -state of the correlation analyzer is set to a fourth rule, and the T < 2 > -state calculated for the measured value signal inputted through the communication unit is compared with the fourth rule, And a rule set management unit for determining whether the inputted measured value signal is a singular value or not.

The method of claim 7,
Wherein:
Wherein the control unit sets the management limit value of the single variable analyzer as a fifth rule and compares the value of the measured value signal inputted through the communication unit with the fifth rule to determine whether the measured value signal inputted through the communication unit is a specific value And a rule set management unit that determines whether or not the abnormal data is detected.

An abnormal data detection method for collecting and analyzing a measurement value signal from a plurality of measuring instruments for measuring a state of a plurality of materials and fluids contained in a fluid and outputting a measurement value signal,
A first step of receiving the measurement value signal from the plurality of measurement devices;
A second step of storing the measured value signal received in the first step as measurement data;
A third step of receiving an abnormal data detection object and a detection standard; And
Analyzing the measurement data stored in the second step using the abnormal data detection object and the detection criterion input through the third step, and determining an operation state of the facility equipped with the plurality of measuring instruments according to the analyzed result A fourth step of selecting a specific value according to the determined operation state, and displaying a determination result and a selection result,
In the fourth step,
A K average ++ clustering analysis is performed for dividing the measurement data into a set corresponding to the number of operation states with respect to the measurement data while defining a distribution range of measurement data for each operation state of the facility in which the plurality of measurement devices are installed , K average ++ defines a distribution range of a plurality of sets derived according to a result of execution of the clustering analysis as a distribution range of the measurement data for each of the operating states and compares the received measurement value signal with the defined distribution range, An abnormal data detection method for determining the operation state of a facility in which two measuring devices are installed.

A computer-readable recording medium on which a program for executing the abnormal data detection method according to claim 14 is recorded.