KR20220055737A

KR20220055737A - Method of clustering and analyzing data of customer using electricity

Info

Publication number: KR20220055737A
Application number: KR1020200140150A
Authority: KR
Inventors: 주성관; 정현철; 장민석; 김태곤
Original assignee: 고려대학교 산학협력단
Priority date: 2020-10-27
Filing date: 2020-10-27
Publication date: 2022-05-04
Also published as: KR102472573B1

Abstract

The present invention relates to a method for clustering and analyzing data of a customer using electricity. The method for clustering and analyzing data of a customer using electricity comprises: a step of extracting continuous data by extracting a power consumption characteristic of the customer using electricity through a time-of-use (TOU) index; a step of clustering continuous data of the customer; a step of clustering categorical data of the customer; a step of determining final clustering by matching a result of clustering the continuous data and a result of clustering the categorical data; and a step of analyzing a characteristic of the customer through a result of the final clustering.

Description

METHODS OF CLUSTERING AND ANALYZING DATA OF CUSTOMER USING ELECTRICITY

본 발명은 전기 이용 고객 데이터 군집 분석 방법에 관한 것으로, 보다 상세하게는 연속형 데이터 및 범주형 데이터를 포함하는 전기 이용 고객 데이터를 군집하고 분석하는 방법에 관한 것이다.The present invention relates to a method for clustering electricity use customer data, and more particularly, to a method for clustering and analyzing electricity customer data including continuous data and categorical data.

지능형 검침 인프라(advanced metering infrastructure, AMI)는 스마트 미터를 통해 전력 사용량 및 품질에 관한 데이터를 실시간으로 측정, 수집, 및 분석하여 고객에게 정보를 제공하는 양방향 원격검침 인프라이다. AMI를 통해 다양한 데이터가 수집됨에 따라 이를 활용한 다양한 에너지 관리 서비스가 운영되고 있고, 에너지 절약과 효율 향상을 위해서는 전기 이용 고객의 다양한 특징에 따라 에너지 관리 서비스가 운영되어야 한다. 따라서, 효과적인 에너지 관리 서비스 제공을 위해서는 고객이 가지고 있는 다양한 데이터를 이용하여 특성에 따라 고객을 합리적으로 그룹화하는 기술이 필요하다.Advanced metering infrastructure (AMI) is an interactive telemetry infrastructure that provides information to customers by measuring, collecting, and analyzing data on power usage and quality in real time through smart meters. As various data are collected through AMI, various energy management services are being operated using it. In order to save energy and improve efficiency, energy management services should be operated according to the various characteristics of electricity users. Therefore, in order to provide an effective energy management service, a technology for rationally grouping customers according to characteristics using various data of customers is required.

국내등록특허 10-1070368(공개일: 2011.03.07)Domestic registered patent 10-1070368 (published on: 2011.03.07)

본원 발명이 해결하고자 하는 과제는 고객이 가지고 있는 다양한 데이터를 이용하여 특성에 따라 고객을 합리적으로 그룹화하고 분석하는 전기 이용 고객 데이터 군집 분석 방법을 제공하는 것이다.The problem to be solved by the present invention is to provide a method for analyzing customer data clusters using electricity that rationally groups and analyzes customers according to characteristics using various data that customers have.

해결하고자 하는 과제를 달성하기 위하여 본 발명의 실시예들에 따른 전기 이용 고객 데이터 군집 분석 방법은 전기 이용 고객의 전력 소비 특성을 계시별(time-of-use, TOU) 지표를 통해 추출하여 연속형 데이터를 추출하는 단계, 상기 고객의 연속형 데이터를 군집화하는 단계, 상기 고객의 범주형 데이터를 군집화하는 단계, 상기 연속형 데이터의 군집화 결과 및 상기 범주형 데이터의 군집화 결과를 매칭하여, 최종 군집화를 결정하는 단계, 및 상기 최종 군집화 결과를 통해 상기 고객의 특성을 분석하는 단계를 포함한다.In order to achieve the problem to be solved, the method for analyzing a cluster of customer data using electricity according to embodiments of the present invention is a continuous type by extracting power consumption characteristics of customers using electricity through time-of-use (TOU) indicators. Extracting data, clustering the customer's continuous data, clustering the customer's categorical data, matching the clustering result of the continuous data and the clustering result of the categorical data to achieve final clustering determining, and analyzing characteristics of the customer through the final clustering result.

본 발명의 실시예들에 따르면, TOU 지표를 이용하여 고객의 전력 소비 특성을 직관적으로 분석할 수 있다. 그리고 MCA를 통해 합리적으로 군집 수를 산정할 수 있기 때문에 혼합형 데이터인 전기 이용 고객 데이터를 효과적으로 군집하여 분석할 수 있다. 또한, 군집 결과에 따른 데이터를 분석하여 군집별 전기 이용 고객의 특성에 따라 효율적인 에너지 관리 서비스 운영 방안을 제시할 수 있다.According to embodiments of the present invention, it is possible to intuitively analyze the customer's power consumption characteristics using the TOU index. And because the number of clusters can be reasonably calculated through MCA, it is possible to effectively cluster and analyze the mixed-type data of electricity users. In addition, by analyzing the data according to the cluster results, it is possible to suggest an efficient energy management service operation plan according to the characteristics of customers using electricity for each cluster.

본 발명의 상세한 설명에서 인용되는 도면을 보다 충분히 이해하기 위하여 각 도면의 상세한 설명이 제공된다.
도 1은 본 발명의 일 실시예에 따른 전기 이용 고객 데이터 군집 분석 방법을 설명하기 위한 순서도이다.
도 2는 본 발명의 일 실시예에 따른 전기 이용 고객 데이터 군집 분석 방법을 설명하기 위한 블록도이다.
도 3은 일 실시예에 따른 전기 이용 고객 데이터 군집 분석 방법에서, 군집 결과에 따른 군집별 TOU 지표 및 사회적 요인 데이터를 나타내는 표이다.
도 4는 일 실시예에 따른 전기 이용 고객 데이터 군집 분석 방법에서, 군집 결과 TOU 지표를 통해 3차원 산포도를 나타낸 그래프이다.In order to more fully understand the drawings recited in the Detailed Description, a detailed description of each drawing is provided.
1 is a flowchart for explaining a method for analyzing a cluster of customer data using electricity according to an embodiment of the present invention.
2 is a block diagram for explaining a method for analyzing a cluster of customer data using electricity according to an embodiment of the present invention.
3 is a table showing TOU index and social factor data for each cluster according to a cluster result in a method for analyzing a cluster of customer data using electricity according to an embodiment.
4 is a graph illustrating a three-dimensional scatter diagram through a cluster result TOU index in a method for cluster analysis of customer data using electricity according to an embodiment.

본 명세서에 개시되어 있는 본 발명의 개념에 따른 실시예들에 대해서 특정한 구조적 또는 기능적 설명들은 단지 본 발명의 개념에 따른 실시예들을 설명하기 위한 목적으로 예시된 것으로서, 본 발명의 개념에 따른 실시예들은 다양한 형태들로 실시될 수 있으며 본 명세서에 설명된 실시예들에 한정되지 않는다.Specific structural or functional descriptions of the embodiments according to the concept of the present invention disclosed herein are only exemplified for the purpose of explaining the embodiments according to the concept of the present invention, and the embodiment according to the concept of the present invention These may be embodied in various forms and are not limited to the embodiments described herein.

본 발명의 개념에 따른 실시예들은 다양한 변경들을 가할 수 있고 여러 가지 형태들을 가질 수 있으므로 실시예들을 도면에 예시하고 본 명세서에서 상세하게 설명하고자 한다. 그러나, 이는 본 발명의 개념에 따른 실시예들을 특정한 개시 형태들에 대해 한정하려는 것이 아니며, 본 발명의 사상 및 기술 범위에 포함되는 모든 변경, 균등물, 또는 대체물을 포함한다.Since the embodiments according to the concept of the present invention may have various changes and may have various forms, the embodiments will be illustrated in the drawings and described in detail herein. However, this is not intended to limit the embodiments according to the concept of the present invention to specific disclosed forms, and includes all modifications, equivalents, or substitutes included in the spirit and scope of the present invention.

제1 또는 제2 등의 용어는 다양한 구성 요소들을 설명하는데 사용될 수 있지만, 상기 구성 요소들은 상기 용어들에 의해 한정되어서는 안 된다. 상기 용어들은 하나의 구성 요소를 다른 구성 요소로부터 구별하는 목적으로만, 예컨대 본 발명의 개념에 따른 권리 범위로부터 벗어나지 않은 채, 제1 구성 요소는 제2 구성 요소로 명명될 수 있고 유사하게 제2 구성 요소는 제1 구성 요소로도 명명될 수 있다.Terms such as first or second may be used to describe various elements, but the elements should not be limited by the terms. The above terms are used only for the purpose of distinguishing one component from another, for example without departing from the scope of the inventive concept, a first component may be termed a second component and similarly a second component A component may also be referred to as a first component.

어떤 구성 요소가 다른 구성 요소에 "연결되어" 있다거나 "접속되어" 있다고 언급된 때에는, 그 다른 구성 요소에 직접적으로 연결되어 있거나 또는 접속되어 있을 수도 있지만, 중간에 다른 구성 요소가 존재할 수도 있다고 이해되어야 할 것이다. 반면에, 어떤 구성 요소가 다른 구성 요소에 "직접 연결되어" 있다거나 "직접 접속되어" 있다고 언급된 때에는 중간에 다른 구성 요소가 존재하지 않는 것으로 이해되어야 할 것이다. 구성 요소들 간의 관계를 설명하는 다른 표현들, 즉 "~사이에"와 "바로 ~사이에" 또는 "~에 이웃하는"과 "~에 직접 이웃하는" 등도 마찬가지로 해석되어야 한다.When a component is referred to as being “connected” or “connected” to another component, it is understood that it may be directly connected or connected to the other component, but other components may exist in between. it should be On the other hand, when it is said that a certain element is "directly connected" or "directly connected" to another element, it should be understood that the other element does not exist in the middle. Other expressions describing the relationship between components, such as "between" and "immediately between" or "neighboring to" and "directly adjacent to", etc., should be interpreted similarly.

본 명세서에서 사용한 용어는 단지 특정한 실시예를 설명하기 위해 사용된 것으로서, 본 발명을 한정하려는 의도가 아니다. 단수의 표현은 문맥상 명백하게 다르게 뜻하지 않는 한, 복수의 표현을 포함한다. 본 명세서에서, "포함하다" 또는 "가지다" 등의 용어는 본 명세서에 기재된 특징, 숫자, 단계, 동작, 구성 요소, 부분품 또는 이들을 조합한 것이 존재함을 지정하려는 것이지, 하나 또는 그 이상의 다른 특징들이나 숫자, 단계, 동작, 구성 요소, 부분품 또는 이들을 조합한 것들의 존재 또는 부가 가능성을 미리 배제하지 않는 것으로 이해되어야 한다.The terms used herein are used only to describe specific embodiments, and are not intended to limit the present invention. The singular expression includes the plural expression unless the context clearly dictates otherwise. In this specification, terms such as "comprise" or "have" are intended to designate that a feature, number, step, operation, component, part, or combination thereof described herein exists, but one or more other features It should be understood that it does not preclude the possibility of the presence or addition of numbers, steps, operations, components, parts, or combinations thereof.

다르게 정의되지 않는 한, 기술적이거나 과학적인 용어를 포함해서 여기서 사용되는 모든 용어들은 본 발명이 속하는 기술 분야에서 통상의 지식을 가진 자에 의해 일반적으로 이해되는 것과 동일한 의미를 가진다. 일반적으로 사용되는 사전에 정의되어 있는 것과 같은 용어들은 관련 기술의 문맥상 가지는 의미와 일치하는 의미를 갖는 것으로 해석되어야 하며, 본 명세서에서 명백하게 정의하지 않는 한, 이상적이거나 과도하게 형식적인 의미로 해석되지 않는다.Unless defined otherwise, all terms used herein, including technical or scientific terms, have the same meaning as commonly understood by one of ordinary skill in the art to which this invention belongs. Terms such as those defined in a commonly used dictionary should be interpreted as having a meaning consistent with the meaning in the context of the related art, and should not be interpreted in an ideal or excessively formal meaning unless explicitly defined in the present specification. does not

이하, 본 발명의 일 실시예에 따른 전기 이용 고객 데이터 군집 분석 방법에 대하여 설명하기로 한다.Hereinafter, a method for analyzing a cluster of customer data using electricity according to an embodiment of the present invention will be described.

도 1은 본 발명의 일 실시예에 따른 전기 이용 고객 데이터 군집 분석 방법을 설명하기 위한 순서도이다. 전기 이용 고객 데이터 군집 분석 방법은 적어도 프로세서 및/또는 메모리를 포함하는 컴퓨팅 장치에 의해 수행되는 방법을 의미할 수 있다.1 is a flowchart for explaining a method for analyzing a cluster of customer data using electricity according to an embodiment of the present invention. The electricity usage customer data cluster analysis method may refer to a method performed by a computing device including at least a processor and/or a memory.

도 1을 참조하면, 전기 이용 고객의 전력 소비 특성을 TOU(Time-of-Use, 계시별) 지표를 통해 추출할 수 있다(S110). 이때, 전기 이용 고객의 데이터는 유무선 통신을 통해 수신되거나 소정의 입력 장치를 통해 수신된 후 컴퓨팅 장치 내에 저장된 데이터를 의미할 수 있다.Referring to FIG. 1 , the power consumption characteristics of electricity users may be extracted through a Time-of-Use (TOU) index (S110). In this case, the data of the electricity user may refer to data stored in the computing device after being received through wired/wireless communication or received through a predetermined input device.

전기 이용 고객의 데이터는 연속형, 범주형, 및 혼합형 데이터 중 적어도 하나를 포함할 수 있다. 연속형 데이터의 예는 전력 데이터이며, 범주형 데이터는 기기 보유 현황, 가구원 수, 및 거주 지역 등과 같은 고객의 사회적 요인 데이터이다. 혼합형 데이터는 연속형 데이터 및 범주형 데이터를 모두 포함하는 데이터이다.The data of the electricity user may include at least one of continuous data, categorical data, and mixed data. An example of continuous data is power data, and categorical data is data on social factors of customers such as device ownership status, number of household members, and residential area. Mixed data is data that includes both continuous data and categorical data.

한편, 전력 데이터의 경우 미리 정해진 시간 단위(예컨대, 15분 단위)로 측정되는 계시별(TOU) 데이터이기 때문에 데이터량이 많다. 이와 같은 전력 데이터의 특성으로 인해 데이터 분석 시 직관적인 분석이 어렵고 군집 분석 시에도 낮은 군집 성능을 보여준다. 따라서, 효과적인 군집 분석을 위해서는 전력 데이터로부터 전력 소비 특성을 추출할 필요성이 있다.On the other hand, in the case of power data, since it is time-of-time (TOU) data measured in a predetermined time unit (eg, 15-minute units), the amount of data is large. Due to these characteristics of power data, intuitive analysis is difficult during data analysis, and it shows low cluster performance even during cluster analysis. Therefore, it is necessary to extract power consumption characteristics from power data for effective cluster analysis.

전기 이용 고객은 생활 패턴에 따라 시간대별로 전력 소비량이 다르기 때문에 시간대별 전력 소비 특성을 추출해야 한다. 전력 소비는 전기 요금과도 밀접한 관련이 있기 때문에 전기 요금제 구조를 이용하여 고객의 전력 소비 특성을 추출할 때 특성을 직관적으로 나타낼 수 있다. 일 실시예에서 따르면, 전기 요금제 중에서 계절 및 시간대에 따라 전기 요금을 차등하게 부과하는 계시별(time-of-use, TOU) 요금제가 존재한다. TOU 요금제는 24시간을 경부하, 중간 부한, 및 최대부하의 3가지로 구분하고, 각 시간대는 계절에 따라 다르기 때문에 TOU 요금제 구조를 이용한 지표인 TOU 지표를 통해 계절 및 시간대에 따른 고객의 전력 소비 특성을 추출할 수 있다.Electricity users need to extract power consumption characteristics for each time period because electricity consumption varies by time period according to their lifestyle. Since electricity consumption is closely related to electricity rates, the characteristics can be intuitively expressed when extracting the customer's electricity consumption characteristics using the electricity rate system structure. According to an embodiment, there is a time-of-use (TOU) rate system in which electricity rates are differentially charged according to seasons and time zones among electricity rate plans. The TOU rate system divides 24 hours into three categories: light load, medium load, and maximum load, and since each time zone varies according to the season, customers' electricity consumption according to season and time zone through the TOU indicator, an indicator using the TOU rate system structure characteristics can be extracted.

TOU 지표는 TOU 요금제 구조를 이용하여 계절 및 시간대에 따른 고객의 전력 소비 특성을 나타낸 지표로써, 다음의 식 1에 따라 나타낸다.The TOU indicator is an indicator showing the customer's power consumption characteristics according to seasons and time zones using the TOU rate system structure, and is expressed according to Equation 1 below.

[식 1][Equation 1]

여기에서,

는 고객 'i'의 시간대별 전력 소비 특성(period = 경부하, 중간부하, 및 최대부하)이고,

는 시간대별 시간 가중치로 24시간 중 해당 시간대가 차지하는 비율이며,

는 고객 'i'의 해당 시간대 평균 전력이다.From here,

is the time period power consumption characteristics of customer 'i' (period = light load, medium load, and maximum load),

is the time weight for each time zone, and is the ratio of the time zone among the 24 hours,

is the average power of customer 'i' during that time period.

전기 이용 고객 데이터를 이용해 고객의 특성에 따라 군집할 경우, 데이터 간의 유사도를 산정할 수 있다(S120). 데이터의 유사도는 데이터 사이의 거리를 측정하여 분석하며 데이터의 형태에 따라 다르게 산정될 수 있다. 일 실시예에 따르면, 연속형 데이터는 유클리드(Euclidean) 거리로 유사도를 산정하고, 범주형 데이터는 자카드(Jaccard) 거리로 유사도를 산정하며, 혼합형 데이터는 가워(Gower) 거리로 유사도를 산정할 수 있다. When clustering according to the characteristics of customers using electricity customer data, the similarity between the data can be calculated (S120). The similarity of data is analyzed by measuring the distance between the data, and can be calculated differently depending on the type of data. According to an embodiment, for continuous data, similarity is calculated by Euclidean distance, for categorical data, similarity is calculated by Jaccard distance, and for mixed data, similarity can be calculated by Gower distance. there is.

TOU 지표는 연속형 데이터이기 때문에, 유클리드 거리로 유사도를 산정할 수 있다. 유클리드 거리는 두 점 사이의 직선 거리로써, TOU 지표의 유클리드 거리는 다음의 식 2에 따라 산정될 수 있다.Since the TOU index is continuous data, the similarity can be calculated using the Euclidean distance. The Euclidean distance is a straight line distance between two points, and the Euclidean distance of the TOU index can be calculated according to Equation 2 below.

[식 2][Equation 2]

전기 이용 고객은 에어컨 및 전열기 등과 같은 가전기기 보유여부, 총 가구원 수, 및 거주 지역 등 사회적 요인을 나타낸 범주형 데이터가 존재한다. 범주형 데이터의 유사도는 자카드 거리로 산정한다. 자카드 거리는 두 집합 S_i 및 S_j 사이의 합집합과 교집합을 사용하여 계산되며, 식 3과 같다.There is categorical data indicating whether electricity users have home appliances such as air conditioners and heaters, the total number of household members, and social factors such as residential area. The similarity of categorical data is calculated by the jacquard distance. The jacquard distance is calculated using the union and intersection between the two sets S _i and S _j , as in Equation 3.

[식 3][Equation 3]

혼합형 데이터의 유사도는 가워 거리를 사용하여 고객과 고객 사이의 유사도를 산정할 수 있다. 고객 간의 가워 거리는 다음의 식 4와 같이 산정될 수 있다.The similarity of the mixed data can be calculated using the Gouer distance to calculate the similarity between the customer and the customer. The gouer distance between customers can be calculated as in Equation 4 below.

[식 4][Equation 4]

여기에서,

는 특성 'k'데이터 개수의 역수,

는 특성 'k'에서의 유사도로 데이터의 형태에 따라 다르게 계산될 수 있다. 두 고객 사이의 가워 거리가 가깝다는 것은 두 고객의 전체적인 특성이 유사하다는 것을 의미한다.From here,

is the reciprocal of the number of characteristic 'k' data,

is a similarity in the characteristic 'k' and may be calculated differently depending on the data type. The close gouer distance between the two customers means that the overall characteristics of the two customers are similar.

전기 이용 고객은 전력 소비 특성과 사회적 요인 데이터를 가진 혼합형 데이터가 필요하기 때문에, 두 데이터 형태를 고려하여 최적 군집 수를 산정할 수 있다. 매칭 군집 알고리즘(Matched Clustering Algorithm, MCA)은 전력 소비 특성 및 사회적 요인 데이터의 형태를 고려한 최적 군집 수 산정 알고리즘이다.Since electricity users need mixed data with power consumption characteristics and social factor data, the optimal number of clusters can be calculated by considering both data types. The Matched Clustering Algorithm (MCA) is an algorithm for calculating the optimal number of clusters considering the shape of power consumption characteristics and social factor data.

여기에서, 데이터 군집 분석 알고리즘에서 군집 수를 너무 크게 설정하면, 오버피팅(overfitting)되는 문제점이 존재하기 때문에 군집 수를 적절하게 산정하는 것이 중요하다. 일반적인 최적 군집 수 산정 방법에는 각 군집에 속한 개체간 거리의 제곱합인 Within Cluster Sum of Squares(WCSS)를 군집 수를 늘려가며 그래프를 작성한다. 그리고 WCSS 그래프의 기울기가 완만해지는 점을 최적 군집 수로 산정하며 이 방법을 엘보우 기법(Elbow method)이라고 한다. 하지만 엘보우 기법은 데이터의 형태나 군집 알고리즘에 따라 다르게 나타날 수도 있으며 기울기가 완만해지는 명확한 기준이 존재하지 않기 때문에 사용자에 따라 다르게 나타난다. 따라서 본 발명의 일 실시예에 따르면, 다양한 데이터 형태와 군집 알고리즘을 동시에 반영하여 합리적인 군집 수를 산정하기 위해 매칭 군집 알고리즘(MCA)을 사용한다.Here, if the number of clusters is set too large in the data cluster analysis algorithm, it is important to properly calculate the number of clusters because there is a problem of overfitting. In a general method of calculating the optimal number of clusters, a graph is created by increasing the number of clusters, the Within Cluster Sum of Squares (WCSS), which is the sum of squares of distances between objects in each cluster. And the point where the slope of the WCSS graph becomes gentle is calculated as the optimal number of clusters, and this method is called the Elbow method. However, the elbow technique may appear differently depending on the data type or clustering algorithm, and since there is no clear criterion for the gradient to be gentle, it appears differently depending on the user. Therefore, according to an embodiment of the present invention, a matching clustering algorithm (MCA) is used to calculate a reasonable number of clusters by simultaneously reflecting various data types and clustering algorithms.

도 2는 본 발명의 일 실시예에 따른 전기 이용 고객 데이터 군집 분석 방법을 설명하기 위한 블록도이다.2 is a block diagram illustrating a method for analyzing a cluster of customer data using electricity according to an embodiment of the present invention.

도 1 및 도 2를 참조하면, MCA는 전력 소비 특성 및 사회적 요인 데이터를 각각 K-means 및 응집형　계층적 군집화(Hierachical Agglomerative Clustering, HAC)를 통해 군집할 수 있다(S130). 이때, 군집 수는 a(a는 2 이상의 자연수로써 예시적인 값은 2일 수 있음) 내지 b(b는 a 보다 큰 자연수로써 예시적인 값은 10일 수 있음)까지 변화시킨다. 그리고 각 군집 수에 따른 K-means와 HAC 군집 결과를 매칭하여 매칭율이 가장 높은 군집 수를 최적 군집 수로 산정할 수 있다(S140).1 and 2 , the MCA may cluster power consumption characteristics and social factor data through K-means and hierarchical agglomerative clustering (HAC), respectively (S130). In this case, the number of clusters is changed from a (a is a natural number greater than or equal to 2, and an exemplary value may be 2) to b (b is a natural number greater than a, and an exemplary value may be 10). And by matching the K-means and HAC cluster results according to the number of clusters, the number of clusters with the highest matching rate can be calculated as the optimal number of clusters (S140).

매칭율 산정은 두 군집화 알고리즘의 결과의 일치 비율에 기초하여 산정될 수 있다. 예컨대, K-means 알고리즘에 따른 군집화의 결과(즉, 각 고객이 속하는 군집 번호)와 HAC 알고리즘에 따른 군집화의 결과(즉, 각 고객이 속하는 군집 번호)의 일치 비율이 매칭율로 산정될 수 있다. 도 2에서 군집 수가 2인 경우, 고객 1은 K-means 알고리즘에 따른 군집화 결과는 제2 군집(군집 1)이고 HAC 알고리즘에 따른 군집화 결과는 제1 군집(군집 0)이기 때문에 고객 1은 두 알고리즘에 따른 군집화 결과가 매칭되지 않는다. 반대로, 고객 2는 K-means 알고리즘에 따른 군집화 결과는 제2 군집(군집 1)이고 HAC 알고리즘에 따른 군집화 결과 역시 제2 군집(군집 1)이기 때문에 두 알고리즘에 따른 군집화 결과가 매칭된다. 이와 같이, 전체 고객의 수에 대한 매칭되는 고객의 수의 비율이 매칭율이 될 수 있다.The matching rate may be calculated based on the matching ratio of the results of the two clustering algorithms. For example, the matching ratio between the result of clustering according to the K-means algorithm (ie, the cluster number to which each customer belongs) and the result of clustering according to the HAC algorithm (ie, the cluster number to which each customer belongs) can be calculated as the matching rate. . In FIG. 2 , when the number of clusters is 2, customer 1 is the second cluster (cluster 1) according to the K-means algorithm, and the clustering result according to the HAC algorithm is the first cluster (cluster 0). The results of clustering according to Conversely, for Customer 2, the clustering result according to the K-means algorithm is the second cluster (cluster 1) and the clustering result according to the HAC algorithm is also the second cluster (cluster 1), so the clustering results according to the two algorithms are matched. As such, the ratio of the number of matched customers to the total number of customers may be the matching rate.

추가적으로, 산정된 군집 수를 사용하여 혼합형 데이터를 K-medoid을 통해 군집하여 전기 이용 고객의 특성을 분석할 수 있다(S150). 이때, 최적 군집 수 산정 시 매칭율이 동일한 군집 수가 존재할 경우, 군집 내부의 전체 가워 거리를 계산하여 총 거리가 작은 군집 수를 최적 군집 수로 선택할 수 있다.Additionally, by using the calculated number of clusters, mixed-type data can be clustered through K-medoid to analyze the characteristics of electricity users (S150). In this case, when the number of clusters having the same matching rate exists when calculating the optimal number of clusters, the number of clusters having a small total distance may be selected as the optimal number of clusters by calculating the entire Gouer distance inside the cluster.

본 발명의 실시예들에 따르면, TOU 지표는 전력 데이터를 계절 및 시간대에 따라 구분하여 표현하기 때문에 고객의 전력 소비 특성을 직관적으로 분석할 수 있다. 그리고 MCA를 통해 합리적으로 군집 수를 산정할 수 있기 때문에 혼합형 데이터인 전기 이용 고객 데이터를 효과적으로 군집하여 분석할 수 있다. 또한, 군집 결과에 따른 데이터를 분석하여 군집별 전기 이용 고객의 특성에 따라 효율적인 에너지 관리 서비스 운영 방안을 제시할 수 있다.According to embodiments of the present invention, since the TOU indicator divides and expresses power data according to seasons and time zones, it is possible to intuitively analyze the customer's power consumption characteristics. And because the number of clusters can be reasonably calculated through MCA, it is possible to effectively cluster and analyze the mixed-type data of electricity users. In addition, by analyzing the data according to the cluster results, it is possible to suggest an efficient energy management service operation plan according to the characteristics of customers using electricity for each cluster.

도 3은 일 실시예에 따른 전기 이용 고객 데이터 군집 분석 방법에서, 군집 결과에 따른 군집별 TOU 지표 및 사회적 요인 데이터를 나타내는 표이고, 도 4는 일 실시예에 따른 전기 이용 고객 데이터 군집 분석 방법에서, 군집 결과 TOU 지표를 통해 3차원 산포도를 나타낸 그래프이다.3 is a table showing the TOU index and social factor data for each cluster according to the cluster results in the electricity use customer data cluster analysis method according to an embodiment, and FIG. 4 is an electricity use customer data cluster analysis method according to an embodiment. , a graph showing the three-dimensional scatter plot through the TOU index of the cluster results.

도 3에서, 군집 결과를 바탕으로 사회적 요인 데이터를 분석하면, 군집 0은 가구원 수 1 내지 2명인 특성을 가지고 있는 군집이며, 군집 1 및 2는 가구원수가 3 내지 4명인 특성을 보여준다. 또한, 군집 2는 군집 1보다 주택 평수가 더 크고 에어컨 보유대수가 더 많은 특성을 보여줄 수 있다.In FIG. 3 , when the social factor data is analyzed based on the cluster results, cluster 0 is a cluster having a characteristic of 1 to 2 household members, and clusters 1 and 2 show a characteristic in which the number of household members is 3 to 4 people. In addition, cluster 2 may show a characteristic that the number of houses is larger and the number of air conditioners is larger than that of cluster 1.

도 3 및 도 4를 참조하면, 군집 0은 다른 군집에 비해 전체적인 전력 사용량이 작고, 군집 2는 군집 1에 비해 최대부하 시간대 전력 사용량이 많은 특성을 보여준다.Referring to FIGS. 3 and 4 , cluster 0 shows a small overall power consumption compared to other clusters, and cluster 2 shows a characteristic that the power consumption during the maximum load time is higher than that of cluster 1.

결과적으로 각 군집별 사회적 요인 특성과 시간대별 전력 사용량 쉽게 파악할 수 있기 때문에, 각 군집별 특성에 맞추어 전력 수요 관리 업체는 고객에게 전기를 아낄 수 있는 방안을 제하여 전기 요금을 절감하는 서비스를 제공할 수 있다.As a result, the characteristics of social factors for each cluster and power consumption by time period can be easily identified, so in accordance with the characteristics of each cluster, power demand management companies provide customers with a service that reduces electricity bills by eliminating ways to save electricity. can

이상에서 설명된 장치는 하드웨어 구성 요소, 소프트웨어 구성 요소, 및/또는 하드웨어 구성 요소 및 소프트웨어 구성 요소의 집합으로 구현될 수 있다. 예를 들어, 실시예들에서 설명된 장치 및 구성 요소는, 예를 들어, 프로세서, 콘트롤러, ALU(Arithmetic Logic Unit), 디지털 신호 프로세서(Digital Signal Processor), 마이크로컴퓨터, FPA(Field Programmable array), PLU(Programmable Logic Unit), 마이크로프로세서, 또는 명령(instruction)을 실행하고 응답할 수 있는 다른 어떠한 장치와 같이, 하나 이상의 범용 컴퓨터 또는 특수 목적 컴퓨터를 이용하여 구현될 수 있다. 처리 장치는 운영 체제(Operation System, OS) 및 상기 운영 체제 상에서 수행되는 하나 이상의 소프트웨어 애플리케이션을 수행할 수 있다. 또한, 처리 장치는 소프트웨어의 실행에 응답하여, 데이터를 접근, 저장, 조작, 처리 및 생성할 수도 있다. 이해의 편의를 위하여, 처리 장치는 하나가 사용되는 것으로 설명된 경우도 있지만, 해당 기술 분야에서 통상의 지식을 가진 자는, 처리 장치가 복수 개의 처리 요소(Processing Element) 및/또는 복수 유형의 처리 요소를 포함할 수 있음을 알 수 있다. 예를 들어, 처리 장치는 복수 개의 프로세서 또는 하나의 프로세서 및 하나의 콘트롤러를 포함할 수 있다. 또한, 병렬 프로세서(Parallel Processor)와 같은, 다른 처리 구성(Processing Configuration)도 가능하다.The device described above may be implemented as a hardware component, a software component, and/or a set of hardware components and software components. For example, the devices and components described in the embodiments may include, for example, a processor, a controller, an arithmetic logic unit (ALU), a digital signal processor, a microcomputer, a field programmable array (FPA), It may be implemented using one or more general purpose or special purpose computers, such as a Programmable Logic Unit (PLU), a microprocessor, or any other device capable of executing and responding to instructions. The processing device may execute an operating system (OS) and one or more software applications executed on the operating system. A processing device may also access, store, manipulate, process, and generate data in response to execution of the software. For convenience of understanding, although one processing device is sometimes described as being used, one of ordinary skill in the art will recognize that the processing device includes a plurality of processing elements and/or a plurality of types of processing elements. It can be seen that can include For example, the processing device may include a plurality of processors or one processor and one controller. Other Processing Configurations are also possible, such as a Parallel Processor.

소프트웨어는 컴퓨터 프로그램(Computer Program), 코드(Code), 명령(Instruction), 또는 이들 중 하나 이상의 조합을 포함할 수 있으며, 원하는 대로 동작하도록 처리 장치를 구성하거나 독립적으로 또는 결합적으로(Collectively) 처리 장치를 명령할 수 있다. 소프트웨어 및/또는 데이터는, 처리 장치에 의하여 해석되거나 처리 장치에 명령 또는 데이터를 제공하기 위하여, 어떤 유형의 기계, 구성 요소(Component), 물리적 장치, 가상 장치(Virtual Equipment), 컴퓨터 저장 매체 또는 장치, 또는 전송되는 신호 파(Signal Wave)에 영구적으로, 또는 일시적으로 구체화(Embody)될 수 있다. 소프트웨어는 네트워크로 연결된 컴퓨터 시스템 상에 분산되어서, 분산된 방법으로 저장되거나 실행될 수도 있다. 소프트웨어 및 데이터는 하나 이상의 컴퓨터 판독 가능 기록 매체에 저장될 수 있다.The software may include a computer program, code, instructions, or a combination of one or more thereof, and configure the processing device to operate as desired or independently or collectively processed You can command the device. The software and/or data may be any type of machine, component, physical device, virtual equipment, computer storage medium or device, to be interpreted by or provide instructions or data to the processing device. , or may be permanently or temporarily embodied in a transmitted signal wave (Signal Wave). The software may be distributed over networked computer systems and stored or executed in a distributed manner. Software and data may be stored in one or more computer-readable recording media.

실시예에 따른 방법은 다양한 컴퓨터 수단을 통하여 수행될 수 있는 프로그램 명령 형태로 구현되어 컴퓨터 판독 가능 매체에 기록될 수 있다. 상기 컴퓨터 판독 가능 매체는 프로그램 명령, 데이터 파일, 데이터 구조 등을 단독으로 또는 좋ㅂ하여 포함할 수 있다. 상기 매체에 기록되는 프로그램 명령은 실시예를 위하여 특별히 설계되고 구성된 것들이거나 컴퓨터 소프트웨어 당업자에게 공지되어 사용 가능한 것일 수도 있다. 컴퓨터 판독 가능 기록 매체의 예에는 하드 디스크, 플로피 디스크 및 자기 테이프와 같은 자기 매체(Magnetic Media), CD-ROM, DVD와 같은 광기록 매체(Optical Media), 플롭티컬 디스크(Floptical Disk)와 같은 자기-광 매체(Magneto-optical Media), 롬(ROM), 램(RAM), 플래시 메모리 등과 같은 프로그램 명령을 저장하고 수행하도록 특별히 구성된 하드웨어 장치가 포함된다. 프로그램 명령의 예에는 컴파일러에 의해 만들어지는 것과 같은 기계어 코드뿐만 아니라 인터프리터 등을 사용해서 컴퓨터에 의해서 실행될 수 있는 고급 언어 코드를 포함한다. 상기된 하드웨어 장치는 실시예의 동작을 수행하기 위해 하나 이상의 소프트웨어 모듈로서 작동하도록 구성될 수 있으며, 그 역도 마찬가지이다.The method according to the embodiment may be implemented in the form of program instructions that can be executed through various computer means and recorded in a computer-readable medium. The computer-readable medium may include program instructions, data files, data structures, etc. alone or preferably. The program instructions recorded on the medium may be specially designed and configured for the embodiment, or may be known and available to those skilled in the art of computer software. Examples of computer-readable recording media include magnetic media such as hard disks, floppy disks and magnetic tapes, optical recording media such as CD-ROMs and DVDs, and magnetic media such as floppy disks. - Includes hardware devices specially configured to store and execute program instructions, such as Magneto-optical Media, ROM, RAM, Flash memory, and the like. Examples of program instructions include not only machine language codes such as those generated by a compiler, but also high-level language codes that can be executed by a computer using an interpreter or the like. The hardware devices described above may be configured to operate as one or more software modules to perform the operations of the embodiments, and vice versa.

본 발명은 도면에 도시된 실시예를 참고로 설명되었으나 이는 예시적인 것에 불과하며, 본 기술 분야의 통상의 지식을 가진 자라면 이로부터 다양한 변형 및 균등한 타 실시예가 가능하다는 점을 이해할 것이다. 예를 들어, 설명된 기술들이 설명된 방법과 다른 순서로 수행되거나, 및/또는 설명된 시스템, 구조, 장치, 회로 등의 구성 요소들이 설명된 방법과 다른 형태로 결합 또는 조합되거나, 다른 구성 요소 또는 균등물에 의하여 대치되거나 치환되더라도 적절한 결과가 달성될 수 있다. 따라서, 본 발명의 진정한 기술적 보호 범위는 첨부된 등록청구범위의 기술적 사상에 의해 정해져야 할 것이다.Although the present invention has been described with reference to the embodiment shown in the drawings, which is only exemplary, those skilled in the art will understand that various modifications and equivalent other embodiments are possible therefrom. For example, the described techniques are performed in an order different from the described method, and/or the described components of the system, structure, apparatus, circuit, etc. are combined or combined in a different form than the described method, or other components Or substituted or substituted by equivalents may achieve an appropriate result. Accordingly, the true technical protection scope of the present invention should be determined by the technical spirit of the appended claims.

Claims

In the electricity usage customer data cluster analysis method performed on a computing device including at least a processor,
extracting continuous data by extracting power consumption characteristics of electricity users through time-of-use (TOU) indicators;
clustering the customer continuous data;
clustering the customer categorical data; and
and determining final clustering by matching the clustering result of the continuous data and the clustering result of the categorical data.

According to claim 1,
The time-based indicator is an indicator showing the customer's power consumption characteristics according to the season and time zone based on the time-based rate system structure that differentially charges electricity according to the season and time zone.

(From here,

is the average power of customer 'i' in the corresponding time period.)

According to claim 1,
The continuous data is clustered through the K-means algorithm by calculating the similarity with the Euclidean distance,
The categorical data is a clustering analysis method of customer data using electricity that calculates the similarity by the Jaccard distance and clusters it through a Hierachical Agglomerative Clustering (HAC) algorithm.

4. The method of claim 3,
The step of clustering the continuous data of the customer and the step of clustering the categorical data of the customer are clustered by changing the number of clusters from 2 to 10,
The step of determining the final clustering is an electricity use customer data cluster analysis method of matching the K-means algorithm and the HAC algorithm cluster results according to the number of clusters to calculate the number of clusters with the highest matching rate as the optimal number of clusters.

5. The method of claim 4,
When calculating the optimal number of clusters, if the number of clusters with the same matching rate exists, the total Gower distance inside the cluster is calculated and the number of clusters with a small total distance is selected as the optimal number of clusters using customer data cluster analysis method.

According to claim 1,
Electricity usage customer data cluster analysis method further comprising the step of analyzing the characteristics of the customer through the final clustering result.

7. The method of claim 6,
Hybrid data including the continuous and categorical data is clustered through K-medoid by calculating the similarity with a Gower distance.