KR102276600B1

KR102276600B1 - Method and System for Data Distribution Processing Management for Power-related Big Data Analysis in Heat Treatment Process

Info

Publication number: KR102276600B1
Application number: KR1020190035690A
Authority: KR
Inventors: 김태형; 함경선; 박용국
Original assignee: 한국전자기술연구원
Priority date: 2019-03-28
Filing date: 2019-03-28
Publication date: 2021-07-13
Also published as: KR20200119404A

Abstract

열처리 공정의 전력 빅데이터 분석을 위한 데이터 분산처리 관리방법 및 시스템이 제공된다. 본 발명의 실시예에 따른 빅데이터 관리 방법은, 공장의 설비들을 계측기들로부터 데이터들을 수집하고, 수집한 데이터에 대한 샤드 키를 데이터를 생성한 계측기에 부여된 ID를 기초로 설정하며, 설정한 샤드 키를 기초로 데이터들을 분산 저장한다. 이에 의해, 각 데이터가 서버에 고르게 분산되는 것과 동시에 분석하고자 하는 데이터가 지역성을 가져서 반복 호출을 줄임으로써 데이터 인덱싱 속도를 높일 수 있어, 빅데이터의 효율적인 처리가 가능해진다.A data distributed processing management method and system for power big data analysis of a heat treatment process are provided. The big data management method according to the embodiment of the present invention collects data from the instruments of the factory facilities, sets the shard key for the collected data based on the ID assigned to the instrument that generated the data, and sets the Data is distributed and stored based on the shard key. As a result, each data is evenly distributed on the server and, at the same time, the data to be analyzed has locality and thus the data indexing speed can be increased by reducing repeated calls, thereby enabling efficient processing of big data.

Description

Method and System for Data Distribution Processing Management for Power-related Big Data Analysis in Heat Treatment Process}

본 발명은 빅데이터 관리 기술에 관한 것으로, 더욱 상세하게는 열처리 공정에서 전력 빅데이터 분석을 위한 데이터 관리 방법 및 시스템에 관한 것이다.The present invention relates to big data management technology, and more particularly, to a data management method and system for power big data analysis in a heat treatment process.

기존 스마트 팩토리 관련 기술은 대부분 공정 자동화에 관심을 두고 있으며, FEMS(Factory Energy Management System)의 경우 고가의 하이엔드 모델이 주를 이루고 있다.Most of the existing smart factory-related technologies are interested in process automation, and in the case of FEMS (Factory Energy Management System), expensive high-end models are mainly.

따라서 EMS(Energy Management System) 및 설비장비 구축 시 큰 비용을 야기하기 때문에 뿌리산업과 같이 중소규모 수용가가 주를 이루는 산업에서 EMS 적용은 미흡한 상황이다.Therefore, EMS (Energy Management System) and equipment construction cause a large cost, so EMS application is insufficient in industries where small and medium-sized customers like root industries are the main ones.

현재, 뿌리산업 수용가 현장 실정에 맞는 맞춤형의 저가, 경량형 에너지 절감 서비스 시스템의 필요성이 증가하고 있다.Currently, there is an increasing need for a low-cost, lightweight energy-saving service system tailored to the situation of the root industry consumers.

특히, 뿌리산업을 위한 에너지 절감 서비스는 설비 구축 등 비용이 큰 방식 대신 현장에서 수집되는 다양한 데이터를 기반으로 운영되는 특징을 보이고 있어, 수집되는 빅데이터를 효과적으로 관리하기 위한 방안의 모색이 요청된다.In particular, energy saving services for the root industry are operated based on various data collected from the field instead of expensive methods such as facility construction, so it is required to find a way to effectively manage the big data collected.

본 발명은 상기와 같은 문제점을 해결하기 위하여 안출된 것으로서, 본 발명의 목적은, 빅데이터의 효율적인 처리를 위한 방안으로, 각 데이터가 서버에 고르게 분산되는 것과 동시에 분석하고자 하는 데이터가 지역성을 가져서 반복 호출을 줄임으로써 데이터 인덱싱 속도를 높일 수 있는 방법을 제공함에 있다.The present invention has been devised to solve the above problems, and an object of the present invention is to provide a method for efficient processing of big data, in which each data is evenly distributed on the server and at the same time the data to be analyzed is repeated with locality The goal is to provide a way to speed up data indexing by reducing calls.

상기 목적을 달성하기 위한 본 발명의 일 실시예에 따른, 빅데이터 관리 방법은, 공장의 설비들을 계측기들로부터 데이터들을 수집하는 단계; 수집한 데이터에 대한 샤드 키를, 데이터를 생성한 계측기에 부여된 ID를 기초로 설정하는 단계; 설정한 샤드 키를 기초로 데이터들을 분산 저장하는 단계;를 포함한다.According to an embodiment of the present invention for achieving the above object, a big data management method includes the steps of: collecting data from measuring instruments of facilities of a factory; setting a shard key for the collected data based on the ID assigned to the instrument that generated the data; and distributing and storing data based on the set shard key.

그리고, 저장 단계는, 동일 계측기에서 생성된 데이터들을 동일 서버에 저장할 수 있다.And, in the storing step, data generated by the same instrument may be stored in the same server.

또한, 저장 단계는, 특정 관계에 있는 계측기들에서 생성된 데이터들을 동일 서버에 저장할 수 있다.In addition, in the storing step, data generated by the instruments in a specific relationship may be stored in the same server.

그리고, 연속 공정을 수행하는 설비들에 대한 계측기의 ID는, 공통 인자를 포함할 수 있다.In addition, IDs of measuring instruments for facilities performing a continuous process may include a common factor.

또한, 저장 단계는, Hashed Sharding 방식으로 데이터들을 분산 저장할 수 있다.Also, in the storage step, data may be distributed and stored in a hashed sharding method.

그리고, 계측기의 ID는, 계측하는 설비의 종류를 나타내는 인자를 포함할 수 있다.In addition, the ID of the measuring device may include a factor indicating the type of equipment to be measured.

또한, 설비는, 열처리 공정을 위한 설비일 수 있다.Also, the facility may be a facility for a heat treatment process.

한편, 본 발명의 다른 실시예에 따른, 빅데이터 관리 시스템은, 공장의 설비들을 계측기들로부터 데이터들을 수집하는 수집부; 및 수집한 데이터에 대한 샤드 키를 데이터를 생성한 계측기에 부여된 ID를 기초로 설정하고, 설정한 샤드 키를 기초로 데이터들을 분산 저장하는 DB;를 포함한다.On the other hand, according to another embodiment of the present invention, a big data management system, a collection unit for collecting data from the equipment of the factory; and a DB that sets the shard key for the collected data based on the ID given to the instrument that generated the data, and distributes and stores the data based on the set shard key.

이상 설명한 바와 같이, 본 발명의 실시예들에 따르면, 각 데이터가 서버에 고르게 분산되는 것과 동시에 분석하고자 하는 데이터가 지역성을 가져서 반복 호출을 줄임으로써 데이터 인덱싱 속도를 높일 수 있어, 빅데이터의 효율적인 처리가 가능해진다.As described above, according to the embodiments of the present invention, data indexing speed can be increased by reducing repetitive calls because each data is evenly distributed on the server and the data to be analyzed has locality, so that efficient processing of big data becomes possible

도 1은 본 발명의 일 실시예에 따른 빅데이터 처리 시스템을 도시한 도면,
도 2는 샤딩을 통한 데이터 분산 저장 방식을 나타낸 도면,
도 3은 열처리 공정설비를 예시한 도면,
도 4는 설비들의 계측 포인트들을 예시한 도면,
도 5는 계측기에서 수집되는 데이터들을 예시한 도면,
도 6은 Hashed Sharding 방식을 통한 데이터 분산 저장 방식을 나타낸 도면이다.1 is a view showing a big data processing system according to an embodiment of the present invention;
2 is a diagram showing a data distributed storage method through sharding;
3 is a view illustrating a heat treatment process equipment;
4 illustrates the metrology points of installations;
5 is a diagram illustrating data collected by an instrument;
6 is a diagram illustrating a data distribution storage method through a hashed sharding method.

이하에서는 도면을 참조하여 본 발명을 보다 상세하게 설명한다.Hereinafter, the present invention will be described in more detail with reference to the drawings.

에너지 효율화를 이루는 공장에너지관리시스템(FEMS, Factory Energy Management)은 주로 고가의 장비를 기반으로 공정 자동화에 관한 솔루션들이 대부분이다.Factory Energy Management (FEMS) that achieves energy efficiency is mostly solutions related to process automation based on expensive equipment.

이러한 시스템들은 초기 도입 비용이 커 비용 부담을 초례하기 때문에 뿌리산업을 기반으로 하는 중소규모 사업장에서의 적용은 미흡한 상황이다.Since these systems have a high initial cost of introduction and cost burden, their application in small and medium-sized businesses based on the root industry is insufficient.

따라서 에너지 절감을 위해 비용이 큰 기존 방식을 대체하여 사업 현장에서 수집 할 수 있는 다양한 서브미터링 데이터를 기반으로 설비별 부하패턴 및 에너지 수요 예측, 에너지 지표를 통한 Energy-aware 생산 스케쥴링 등의 정보를 제공할 수 있는 빅데이터 시스템이 새로운 해결책으로 제시되고 있다.Therefore, it provides information such as load pattern and energy demand forecast for each facility based on various sub-metering data that can be collected at the business site, replacing the existing expensive method for energy saving, and energy-aware production scheduling through energy indicators. A big data system that can do this is being proposed as a new solution.

도 1은 본 발명의 일 실시예에 따른 빅데이터 처리 시스템인 빅데이터 플랫폼을 도시한 도면이다.1 is a diagram illustrating a big data platform that is a big data processing system according to an embodiment of the present invention.

도 1에 도시된 바와 같이, 본 발명의 실시예에 따른 빅데이터 처리 시스템(100)의 데이터 수집부(110)는 공장 설비들(10)로부터 IoT 플랫폼(20)이 수집한 데이터들, 서비스 플랫폼(30)이 분석한 데이터들, 기상청 서버(40)가 제공하는 기상 데이터들을 수집한다.As shown in FIG. 1 , the data collection unit 110 of the big data processing system 100 according to the embodiment of the present invention collects data from the IoT platform 20 from the factory facilities 10 , the service platform The data analyzed by the 30 and meteorological data provided by the meteorological agency server 40 are collected.

빅데이터 분석 엔진(120)은 수집된 데이터들을 주기적으로 분석하고, MongoDB(130)는 수집된 데이터와 분석 데이터를 저장하며, HDFS(Hadoop Distributed File System)(140)는 데이터들을 백업한다.The big data analysis engine 120 periodically analyzes the collected data, the MongoDB 130 stores the collected data and the analysis data, and the Hadoop Distributed File System (HDFS) 140 backs up the data.

도 1에 도시된 빅데이터 처리 시스템에서 특히 중요한 부분은 데이터 적재 단에서의 빅데이터 분산처리 관리 방법이다. 데이터의 분산처리를 얼마나 효율적으로 수행했는가에 따라서 데이터 관리의 효율성, 데이터 분석 속도 등에서 차이가 발생한다.A particularly important part of the big data processing system shown in FIG. 1 is the big data distributed processing management method at the data loading stage. There are differences in data management efficiency and data analysis speed depending on how efficiently distributed data processing is performed.

다수의 분산 서버 기반의 NoSQL DB인 MongoDB(130)를 통해 빅데이터의 분산처리를 하기 위해서는, 도 2에 도시된 바와 같이 샤딩(Sharding)을 통해 데이터를 특정한 기준에 의해서 분리할 필요가 있다.In order to perform distributed processing of big data through MongoDB 130, which is a NoSQL DB based on a number of distributed servers, it is necessary to separate data according to specific criteria through sharding as shown in FIG. 2 .

샤딩(Sharding)의 종류는 크게 두가지로 구분할 수 있다.There are two main types of sharding.

1) Hashed Sharding1) Hashed Sharding

- Hash Function 기반으로 Shard Key를 분리하여 저장하는 방식 - A method to separate and store Shard Key based on Hash Function

2) Ranged Sharding2) Ranged Sharding

- Shard Key 값에 따라 범위로 데이터를 나누어 저장하는 방식 - A method of storing data by dividing it into ranges according to the Shard Key value

샤딩의 경우 Shard Key를 잘못 고려하면 데이터가 각 서버에 고르게 분산되지 않고 한 쪽에 쏠리는 현상이 발생할 수 있다.In the case of sharding, if the shard key is taken into account incorrectly, data may not be evenly distributed on each server and may be skewed to one side.

또한 빅데이터의 효율적인 처리를 위해서는 각 데이터가 서버에 고르게 분산되는 것과 동시에 분석하고자 하는 데이터가 지역성(locality)을 가져서 반복 호출을 줄임으로써 데이터 인덱싱 속도를 높이는 것이 필요하다.In addition, for efficient processing of big data, it is necessary to increase the data indexing speed by reducing repeated calls because each data is evenly distributed on the server and the data to be analyzed has locality.

구체적으로, 데이터를 생성한 계측기의 ID를 기초로 샤드 키를 설정하여, Hashed Sharding 방식으로 데이터들을 분산 저장한다. 동일 계측기에서 생성된 데이터들이 동일 서버에 저장되도록 하는 것은 물론, 특정 관계, 이를 테면 연속 공정을 수행하는 설비들에서 계측된 데이터들도 동일 서버에 저장되도록 하기 위함이다.Specifically, the shard key is set based on the ID of the instrument that generated the data, and the data is distributed and stored in the hashed sharding method. This is not only so that data generated by the same measuring instrument is stored in the same server, but also in a specific relationship, for example, data measured in facilities that perform continuous processes are also stored in the same server.

본 발명의 실시예에 따른 빅데이터 처리 시스템이 대상으로 하고 있는 사이트는 금속 열처리 공장이며, 도 3에 도시된 바와 같이 열처리를 위한 다양한 공정설비를 갖추고 있으며, 도 4에 도시된 바와 같이 각 설비마다 아래의 계측 포인트를 가지고 있다. 나아가, 각 계측기는 도 5에 나타낸 데이터를 수집하고 있다.The site targeted by the big data processing system according to the embodiment of the present invention is a metal heat treatment plant, equipped with various process facilities for heat treatment as shown in FIG. 3, and for each facility as shown in FIG. It has the following measurement points. Further, each instrument is collecting the data shown in FIG. 5 .

열처리 공장에서 에너지 절감을 위해서는 각 설비별 전력사용 패턴을 분석하고 이를 통해 공정 스케줄링을 통해 최적 에너지 효율 운전을 수행함으로써 전력 사용량을 줄일 수 있다.In order to save energy in a heat treatment plant, power consumption can be reduced by analyzing power usage patterns for each facility and performing optimal energy efficiency operation through process scheduling.

이를 위해서는 각 설비별 전력사용량을 예측할 수 있는 전력수요예측모델이 필요하며 이는 각 계측 포인트마다 시계열 기반의 데이터를 활용하여 모델링할 수 있다.To this end, a power demand forecasting model that can predict the power consumption of each facility is required, which can be modeled using time series-based data for each measurement point.

또한 연속로의 특성상 공정 내 일부 설비들은 운전 시점에 대해 서로 상관관계를 가진다(Q/T를 통해 금속이 투입되었다면 이후 TEMP로 전달될 것이므로 시간 도메인에서의 설비별 전력사용량 패턴 분석을 위해 필요한 정보이다).In addition, due to the nature of the continuous furnace, some facilities in the process have a correlation with each other with respect to the operation time (if metal is input through Q/T, it will be transmitted to TEMP later, so it is necessary information for analyzing the power consumption pattern for each facility in the time domain. ).

따라서 데이터 모델링 및 전력 사용량 예측을 위해서 아래와 같은 기준을 통해 데이터를 분산 적재 및 관리할 필요가 있다.Therefore, for data modeling and power usage prediction, it is necessary to distribute and manage data based on the following criteria.

- 설비별 계측 포인트(MeterID를 통해 판단) - Measurement points for each facility (determined through MeterID)

- 설비의 연속 공정 여부(MeterID를 통해 판단) - Whether the equipment is continuous process (determined through MeterID)

각 설비별 MeterID는 연속 공정 여부를 고려하여 아래의 형태로 번호를 부여한다.The MeterID for each facility is assigned a number in the form below considering whether the process is continuous or not.

- 부여 규칙 - Grant rules

1) 각 설비의 meterID는 1부터 시작함 1) The meterID of each facility starts from 1.

2) meterID 10자리는 설비의 종류를 표현함 2) 10-digit meterID represents the type of equipment

- NOR&CA(1), SRA(2), Q/T(3), TEM(4), NOR(5) - NOR&CA(1), SRA(2), Q/T(3), TEM(4), NOR(5)

3) meterID 100자리는 연속 공정끼리 같은 숫자로 표현함 3) 100 digits of meterID are expressed as the same number for consecutive processes.

- NOR&CA 1호 -> SRA1호기(1) - NOR&CA Unit 1 -> SRA Unit 1 (1)

- Q/T1호기 -> TEM1호기(2) - Q/T1 unit -> TEM1 unit (2)

- Q/T2호기 -> TEM2호기(3) - Q/T2 Unit -> TEM2 Unit (3)

- Q/T3호기 -> TEM3호기(4) - Q/T3 Unit -> TEM3 Unit (4)

- 연속공정 없음(0) - No continuous process (0)

- NOR&CA 1호기 : 111~5 - NOR&CA Unit 1 : 111~5

- SRA 1호기(NOR&CA 연속) : 121~3 - SRA Unit 1 (NOR&CA continuous): 121~3

- QT 1호기 : 231~4 - QT Unit 1 : 231~4

- TEMP 1호기(QT 연속) : 241~4 - TEMP Unit 1 (QT continuous): 241~4

- QT 2호기 : 331~4 - QT Unit 2 : 331~4

- TEMP 2호기(QT 연속) : 341~4 - TEMP Unit 2 (QT continuous): 341~4

- QT 3호기 : 431~4 - QT Unit 3 : 431~4

- TEMP 3호기(QT 연속) : 441~4 - TEMP Unit 3 (QT continuous): 441~4

- NOR 2호기(독립) : 951~7 - NOR Unit 2 (independent): 951~7

이에 따라 샤드 키는 MeterID로 설정하고, 도 6에 도시된 바와 같은 Hashed Sharding 방식을 통해 데이터를 분산한다.Accordingly, the shard key is set as MeterID, and data is distributed through the hashed sharding method as shown in FIG. 6 .

각 데이터는 서버에 골고루 분산되면서 동시에 지역성을 가져야 하므로 Hashed Function은 아래의 수식으로 구성된다.Since each data must be distributed evenly across the server and have locality at the same time, the Hashed Function is composed of the following formula.

- 특정 설비의 전체 계량 정보와 연속공정의 계량 정보는 shard에 배치될 수 있도록 함 - The total weighing information of a specific facility and the weighing information of a continuous process can be placed on the shard

- 동시에 전체 데이터가 n개의 서버에 분산될 수 있도록 처리함 - At the same time, the entire data is processed so that it can be distributed across n servers.

f(x) = [ (KEY(meterID) - (KEY(meterID) % 100) } / 100 % n ] + 1 (n은 분산 서버의 개수) f(x) = [ (KEY(meterID) - (KEY(meterID) % 100) } / 100 % n ] + 1 (n is the number of distributed servers)

지금까지, 빅데이터의 효율적인 처리를 위해서는 각 데이터가 서버에 고르게 분산되는 것과 동시에 분석하고자 하는 데이터가 지역성을 가져서 반복 호출을 줄임으로써 데이터 인덱싱 속도를 높이는 데이터 분산처리 관리방법 및 시스템에 대해 바람직한 실시예를 들어 상세히 설명하였다.Up to now, for efficient processing of big data, each data is evenly distributed on the server, and at the same time, the data to be analyzed has locality and thus a preferred embodiment of a data distribution processing management method and system that increases the data indexing speed by reducing repetitive calls has been described in detail.

위 실시예에서 공장의 설비는 열처리 공정을 위한 설비인 것을 상정하였는데 설명의 편의를 위한 일 예에 불과하다. 다른 공정을 위한 설비들에 대해서도 본 발명의 기술적 사상이 적용될 수 있음은 물론이다.In the above embodiment, it is assumed that the facility of the factory is a facility for a heat treatment process, but it is only an example for convenience of description. Of course, the technical idea of the present invention can also be applied to facilities for other processes.

한편, 본 실시예에 따른 장치와 방법의 기능을 수행하게 하는 컴퓨터 프로그램을 수록한 컴퓨터로 읽을 수 있는 기록매체에도 본 발명의 기술적 사상이 적용될 수 있음은 물론이다. 또한, 본 발명의 다양한 실시예에 따른 기술적 사상은 컴퓨터로 읽을 수 있는 기록매체에 기록된 컴퓨터로 읽을 수 있는 코드 형태로 구현될 수도 있다. 컴퓨터로 읽을 수 있는 기록매체는 컴퓨터에 의해 읽을 수 있고 데이터를 저장할 수 있는 어떤 데이터 저장 장치이더라도 가능하다. 예를 들어, 컴퓨터로 읽을 수 있는 기록매체는 ROM, RAM, CD-ROM, 자기 테이프, 플로피 디스크, 광디스크, 하드 디스크 드라이브, 등이 될 수 있음은 물론이다. 또한, 컴퓨터로 읽을 수 있는 기록매체에 저장된 컴퓨터로 읽을 수 있는 코드 또는 프로그램은 컴퓨터간에 연결된 네트워크를 통해 전송될 수도 있다.On the other hand, it goes without saying that the technical idea of the present invention can also be applied to a computer-readable recording medium containing a computer program for performing the functions of the apparatus and method according to the present embodiment. In addition, the technical ideas according to various embodiments of the present invention may be implemented in the form of computer-readable codes recorded on a computer-readable recording medium. The computer-readable recording medium may be any data storage device readable by the computer and capable of storing data. For example, the computer-readable recording medium may be a ROM, RAM, CD-ROM, magnetic tape, floppy disk, optical disk, hard disk drive, or the like. In addition, the computer-readable code or program stored in the computer-readable recording medium may be transmitted through a network connected between computers.

또한, 이상에서는 본 발명의 바람직한 실시예에 대하여 도시하고 설명하였지만, 본 발명은 상술한 특정의 실시예에 한정되지 아니하며, 청구범위에서 청구하는 본 발명의 요지를 벗어남이 없이 당해 발명이 속하는 기술분야에서 통상의 지식을 가진자에 의해 다양한 변형실시가 가능한 것은 물론이고, 이러한 변형실시들은 본 발명의 기술적 사상이나 전망으로부터 개별적으로 이해되어져서는 안될 것이다.In addition, although preferred embodiments of the present invention have been illustrated and described above, the present invention is not limited to the specific embodiments described above, and the technical field to which the present invention belongs without departing from the gist of the present invention as claimed in the claims Various modifications are possible by those of ordinary skill in the art, and these modifications should not be individually understood from the technical spirit or prospect of the present invention.

10 : 공장 설비
20 : IoT 플랫폼
30 : 서비스 플랫폼
40 : 기상청 서버
100 : 빅데이터 처리 시스템
110 : 데이터 수집부
120 : 빅데이터 분석 엔진
130 : MongoDB
140 : HDFS(Hadoop Distributed File System)10: factory equipment
20: IoT Platform
30: service platform
40: Meteorological Agency Server
100: big data processing system
110: data collection unit
120: Big data analysis engine
130 : MongoDB
140: Hadoop Distributed File System (HDFS)

Claims

collecting data from instrumentation equipment of the factory;
setting a shard key for the collected data based on the ID assigned to the instrument that generated the data;
A method for managing big data comprising: distributing and storing data based on the set shard key.

The method according to claim 1,
The storage step is
Big data management method, characterized in that the data generated by the same instrument is stored in the same server.

The method according to claim 1,
The storage step is
Big data management method, characterized in that the data generated by the instruments in a specific relationship are stored in the same server.

4. The method according to claim 3,
The ID of the instrument for the equipment performing continuous process is,
Big data management method comprising common factors.

5. The method according to claim 4,
The storage step is
Big data management method characterized in that data is distributed and stored in a hashed sharding method.

6. The method of claim 5,
The ID of the instrument is,
Big data management method comprising a factor indicating the type of equipment to be measured.

The method according to claim 1,
equipment,
Big data management method, characterized in that the equipment for the heat treatment process.

a collection unit that collects data from the instruments of the factory; and
A DB that sets the shard key for the collected data based on the ID given to the instrument that generated the data, and distributes and stores the data based on the set shard key; Big data management system comprising a.