KR102472573B1

KR102472573B1 - Method of clustering and analyzing data of customer using electricity

Info

Publication number: KR102472573B1
Application number: KR1020200140150A
Authority: KR
Inventors: 주성관; 정현철; 장민석; 김태곤
Original assignee: 고려대학교 산학협력단
Priority date: 2020-10-27
Filing date: 2020-10-27
Publication date: 2022-11-29
Also published as: KR20220055737A

Abstract

전기 이용 고객 데이터 군집 분석 방법에 관한 것이다. 전기 이용 고객 데이터 군집 분석 방법은 전기 이용 고객의 전력 소비 특성을 계시별(time-of-use, TOU) 지표를 통해 추출하여 연속형 데이터를 추출하는 단계, 상기 고객의 연속형 데이터를 군집화하는 단계, 상기 고객의 범주형 데이터를 군집화하는 단계, 상기 연속형 데이터의 군집화 결과 및 상기 범주형 데이터의 군집화 결과를 매칭하여, 최종 군집화를 결정하는 단계, 및 상기 최종 군집화 결과를 통해 상기 고객의 특성을 분석하는 단계를 포함한다.It relates to a cluster analysis method for electricity user data. The cluster analysis method for electricity user data includes the steps of extracting continuous data by extracting power consumption characteristics of electricity customers through time-of-use (TOU) indicators, and clustering the continuous data of the customers. , Clustering the categorical data of the customer, matching the clustering result of the continuous data and the clustering result of the categorical data to determine the final clustering, and determining the characteristics of the customer through the final clustering result. It includes an analysis step.

Description

Method for cluster analysis of electricity user data {METHOD OF CLUSTERING AND ANALYZING DATA OF CUSTOMER USING ELECTRICITY}

본 발명은 전기 이용 고객 데이터 군집 분석 방법에 관한 것으로, 보다 상세하게는 연속형 데이터 및 범주형 데이터를 포함하는 전기 이용 고객 데이터를 군집하고 분석하는 방법에 관한 것이다.The present invention relates to a cluster analysis method for electricity user data, and more particularly, to a method for clustering and analyzing electricity user data including continuous data and categorical data.

지능형 검침 인프라(advanced metering infrastructure, AMI)는 스마트 미터를 통해 전력 사용량 및 품질에 관한 데이터를 실시간으로 측정, 수집, 및 분석하여 고객에게 정보를 제공하는 양방향 원격검침 인프라이다. AMI를 통해 다양한 데이터가 수집됨에 따라 이를 활용한 다양한 에너지 관리 서비스가 운영되고 있고, 에너지 절약과 효율 향상을 위해서는 전기 이용 고객의 다양한 특징에 따라 에너지 관리 서비스가 운영되어야 한다. 따라서, 효과적인 에너지 관리 서비스 제공을 위해서는 고객이 가지고 있는 다양한 데이터를 이용하여 특성에 따라 고객을 합리적으로 그룹화하는 기술이 필요하다.An advanced metering infrastructure (AMI) is an interactive remote metering infrastructure that provides information to customers by measuring, collecting, and analyzing data on power consumption and quality in real time through smart meters. As various data are collected through AMI, various energy management services are being operated using it. In order to save energy and improve efficiency, energy management services must be operated according to the various characteristics of electricity users. Therefore, in order to provide an effective energy management service, a technique for rationally grouping customers according to characteristics using various data possessed by customers is required.

국내등록특허 10-1070368(공개일: 2011.03.07)Domestic Registered Patent No. 10-1070368 (published date: 2011.03.07)

본원 발명이 해결하고자 하는 과제는 고객이 가지고 있는 다양한 데이터를 이용하여 특성에 따라 고객을 합리적으로 그룹화하고 분석하는 전기 이용 고객 데이터 군집 분석 방법을 제공하는 것이다.The problem to be solved by the present invention is to provide a method for analyzing electricity user data clusters that reasonably groups and analyzes customers according to characteristics using various data possessed by customers.

해결하고자 하는 과제를 달성하기 위하여 본 발명의 실시예들에 따른 전기 이용 고객 데이터 군집 분석 방법은 전기 이용 고객의 전력 소비 특성을 계시별(time-of-use, TOU) 지표를 통해 추출하여 연속형 데이터를 추출하는 단계, 상기 고객의 연속형 데이터를 군집화하는 단계, 상기 고객의 범주형 데이터를 군집화하는 단계, 상기 연속형 데이터의 군집화 결과 및 상기 범주형 데이터의 군집화 결과를 매칭하여, 최종 군집화를 결정하는 단계, 및 상기 최종 군집화 결과를 통해 상기 고객의 특성을 분석하는 단계를 포함한다.In order to achieve the problem to be solved, the method for analyzing the electricity user data cluster according to the embodiments of the present invention extracts the power consumption characteristics of the electricity user through a time-of-use (TOU) indicator and uses a continuous type Extracting data, clustering the continuous data of the customer, clustering the categorical data of the customer, matching the clustering result of the continuous data and the clustering result of the categorical data, and final clustering determining, and analyzing characteristics of the customer through the final clustering result.

본 발명의 실시예들에 따르면, TOU 지표를 이용하여 고객의 전력 소비 특성을 직관적으로 분석할 수 있다. 그리고 MCA를 통해 합리적으로 군집 수를 산정할 수 있기 때문에 혼합형 데이터인 전기 이용 고객 데이터를 효과적으로 군집하여 분석할 수 있다. 또한, 군집 결과에 따른 데이터를 분석하여 군집별 전기 이용 고객의 특성에 따라 효율적인 에너지 관리 서비스 운영 방안을 제시할 수 있다.According to embodiments of the present invention, it is possible to intuitively analyze a customer's power consumption characteristics using the TOU index. And since the number of clusters can be reasonably calculated through MCA, it is possible to effectively cluster and analyze mixed-type data of customers using electricity. In addition, by analyzing the data according to the cluster results, it is possible to propose an efficient energy management service operation plan according to the characteristics of electricity users in each cluster.

본 발명의 상세한 설명에서 인용되는 도면을 보다 충분히 이해하기 위하여 각 도면의 상세한 설명이 제공된다.
도 1은 본 발명의 일 실시예에 따른 전기 이용 고객 데이터 군집 분석 방법을 설명하기 위한 순서도이다.
도 2는 본 발명의 일 실시예에 따른 전기 이용 고객 데이터 군집 분석 방법을 설명하기 위한 블록도이다.
도 3은 일 실시예에 따른 전기 이용 고객 데이터 군집 분석 방법에서, 군집 결과에 따른 군집별 TOU 지표 및 사회적 요인 데이터를 나타내는 표이다.
도 4는 일 실시예에 따른 전기 이용 고객 데이터 군집 분석 방법에서, 군집 결과 TOU 지표를 통해 3차원 산포도를 나타낸 그래프이다.A detailed description of each drawing is provided in order to more fully understand the drawings cited in the detailed description of the present invention.
1 is a flowchart illustrating a method for analyzing a cluster of electricity user data according to an embodiment of the present invention.
2 is a block diagram illustrating a method for analyzing a cluster of electricity user data according to an embodiment of the present invention.
3 is a table showing TOU index and social factor data for each cluster according to a cluster result in the method for cluster analysis of electricity customer data according to an embodiment.
4 is a graph showing a three-dimensional scatter diagram through a TOU index as a result of clustering in a method for cluster analysis of electricity customer data according to an embodiment.

본 명세서에 개시되어 있는 본 발명의 개념에 따른 실시예들에 대해서 특정한 구조적 또는 기능적 설명들은 단지 본 발명의 개념에 따른 실시예들을 설명하기 위한 목적으로 예시된 것으로서, 본 발명의 개념에 따른 실시예들은 다양한 형태들로 실시될 수 있으며 본 명세서에 설명된 실시예들에 한정되지 않는다.Specific structural or functional descriptions of the embodiments according to the concept of the present invention disclosed in this specification are only illustrated for the purpose of explaining the embodiments according to the concept of the present invention, and the embodiments according to the concept of the present invention may be embodied in many forms and are not limited to the embodiments described herein.

본 발명의 개념에 따른 실시예들은 다양한 변경들을 가할 수 있고 여러 가지 형태들을 가질 수 있으므로 실시예들을 도면에 예시하고 본 명세서에서 상세하게 설명하고자 한다. 그러나, 이는 본 발명의 개념에 따른 실시예들을 특정한 개시 형태들에 대해 한정하려는 것이 아니며, 본 발명의 사상 및 기술 범위에 포함되는 모든 변경, 균등물, 또는 대체물을 포함한다.Embodiments according to the concept of the present invention can apply various changes and can have various forms, so the embodiments are illustrated in the drawings and described in detail in this specification. However, this is not intended to limit the embodiments according to the concept of the present invention to specific disclosed forms, and includes all modifications, equivalents, or substitutes included in the spirit and scope of the present invention.

제1 또는 제2 등의 용어는 다양한 구성 요소들을 설명하는데 사용될 수 있지만, 상기 구성 요소들은 상기 용어들에 의해 한정되어서는 안 된다. 상기 용어들은 하나의 구성 요소를 다른 구성 요소로부터 구별하는 목적으로만, 예컨대 본 발명의 개념에 따른 권리 범위로부터 벗어나지 않은 채, 제1 구성 요소는 제2 구성 요소로 명명될 수 있고 유사하게 제2 구성 요소는 제1 구성 요소로도 명명될 수 있다.Terms such as first or second may be used to describe various components, but the components should not be limited by the terms. The above terms are used only for the purpose of distinguishing one component from another, e.g. without departing from the scope of rights according to the concept of the present invention, a first component may be termed a second component and similarly a second component may be termed a second component. A component may also be referred to as a first component.

어떤 구성 요소가 다른 구성 요소에 "연결되어" 있다거나 "접속되어" 있다고 언급된 때에는, 그 다른 구성 요소에 직접적으로 연결되어 있거나 또는 접속되어 있을 수도 있지만, 중간에 다른 구성 요소가 존재할 수도 있다고 이해되어야 할 것이다. 반면에, 어떤 구성 요소가 다른 구성 요소에 "직접 연결되어" 있다거나 "직접 접속되어" 있다고 언급된 때에는 중간에 다른 구성 요소가 존재하지 않는 것으로 이해되어야 할 것이다. 구성 요소들 간의 관계를 설명하는 다른 표현들, 즉 "~사이에"와 "바로 ~사이에" 또는 "~에 이웃하는"과 "~에 직접 이웃하는" 등도 마찬가지로 해석되어야 한다.It is understood that when an element is referred to as being "connected" or "connected" to another element, it may be directly connected or connected to the other element, but other elements may exist in the middle. It should be. On the other hand, when a component is referred to as “directly connected” or “directly connected” to another component, it should be understood that no other component exists in the middle. Other expressions describing the relationship between components, such as "between" and "directly between" or "adjacent to" and "directly adjacent to", etc., should be interpreted similarly.

본 명세서에서 사용한 용어는 단지 특정한 실시예를 설명하기 위해 사용된 것으로서, 본 발명을 한정하려는 의도가 아니다. 단수의 표현은 문맥상 명백하게 다르게 뜻하지 않는 한, 복수의 표현을 포함한다. 본 명세서에서, "포함하다" 또는 "가지다" 등의 용어는 본 명세서에 기재된 특징, 숫자, 단계, 동작, 구성 요소, 부분품 또는 이들을 조합한 것이 존재함을 지정하려는 것이지, 하나 또는 그 이상의 다른 특징들이나 숫자, 단계, 동작, 구성 요소, 부분품 또는 이들을 조합한 것들의 존재 또는 부가 가능성을 미리 배제하지 않는 것으로 이해되어야 한다.Terms used in this specification are only used to describe specific embodiments, and are not intended to limit the present invention. Singular expressions include plural expressions unless the context clearly dictates otherwise. In this specification, terms such as "comprise" or "having" are intended to indicate that there is a feature, number, step, operation, component, part, or combination thereof described in this specification, but one or more other features It should be understood that it does not preclude the possibility of the presence or addition of numbers, steps, operations, components, parts, or combinations thereof.

다르게 정의되지 않는 한, 기술적이거나 과학적인 용어를 포함해서 여기서 사용되는 모든 용어들은 본 발명이 속하는 기술 분야에서 통상의 지식을 가진 자에 의해 일반적으로 이해되는 것과 동일한 의미를 가진다. 일반적으로 사용되는 사전에 정의되어 있는 것과 같은 용어들은 관련 기술의 문맥상 가지는 의미와 일치하는 의미를 갖는 것으로 해석되어야 하며, 본 명세서에서 명백하게 정의하지 않는 한, 이상적이거나 과도하게 형식적인 의미로 해석되지 않는다.Unless defined otherwise, all terms used herein, including technical or scientific terms, have the same meaning as commonly understood by one of ordinary skill in the art to which the present invention belongs. Terms such as those defined in commonly used dictionaries should be interpreted as having a meaning consistent with the meaning in the context of the related art, and unless explicitly defined in this specification, it should not be interpreted in an ideal or excessively formal meaning. don't

이하, 본 발명의 일 실시예에 따른 전기 이용 고객 데이터 군집 분석 방법에 대하여 설명하기로 한다.Hereinafter, a method for analyzing a cluster of electricity user data according to an embodiment of the present invention will be described.

도 1은 본 발명의 일 실시예에 따른 전기 이용 고객 데이터 군집 분석 방법을 설명하기 위한 순서도이다. 전기 이용 고객 데이터 군집 분석 방법은 적어도 프로세서 및/또는 메모리를 포함하는 컴퓨팅 장치에 의해 수행되는 방법을 의미할 수 있다.1 is a flowchart illustrating a method for analyzing a cluster of electricity user data according to an embodiment of the present invention. The cluster analysis method for electricity user data may refer to a method performed by a computing device including at least a processor and/or a memory.

도 1을 참조하면, 전기 이용 고객의 전력 소비 특성을 TOU(Time-of-Use, 계시별) 지표를 통해 추출할 수 있다(S110). 이때, 전기 이용 고객의 데이터는 유무선 통신을 통해 수신되거나 소정의 입력 장치를 통해 수신된 후 컴퓨팅 장치 내에 저장된 데이터를 의미할 수 있다.Referring to FIG. 1 , power consumption characteristics of customers using electricity may be extracted through TOU (Time-of-Use) indicators (S110). In this case, the data of customers using electricity may refer to data received through wired/wireless communication or received through a predetermined input device and then stored in the computing device.

전기 이용 고객의 데이터는 연속형, 범주형, 및 혼합형 데이터 중 적어도 하나를 포함할 수 있다. 연속형 데이터의 예는 전력 데이터이며, 범주형 데이터는 기기 보유 현황, 가구원 수, 및 거주 지역 등과 같은 고객의 사회적 요인 데이터이다. 혼합형 데이터는 연속형 데이터 및 범주형 데이터를 모두 포함하는 데이터이다.The electricity customer data may include at least one of continuous data, categorical data, and mixed data. An example of continuous data is electric power data, and categorical data is customer's social factor data, such as current state of device ownership, number of household members, and region of residence. Mixed data is data that includes both continuous data and categorical data.

한편, 전력 데이터의 경우 미리 정해진 시간 단위(예컨대, 15분 단위)로 측정되는 계시별(TOU) 데이터이기 때문에 데이터량이 많다. 이와 같은 전력 데이터의 특성으로 인해 데이터 분석 시 직관적인 분석이 어렵고 군집 분석 시에도 낮은 군집 성능을 보여준다. 따라서, 효과적인 군집 분석을 위해서는 전력 데이터로부터 전력 소비 특성을 추출할 필요성이 있다.On the other hand, in the case of power data, since it is time-by-time (TOU) data measured in predetermined time units (eg, 15-minute units), the amount of data is large. Due to the characteristics of such power data, intuitive analysis is difficult during data analysis, and low clustering performance is shown even during cluster analysis. Therefore, there is a need to extract power consumption characteristics from power data for effective cluster analysis.

전기 이용 고객은 생활 패턴에 따라 시간대별로 전력 소비량이 다르기 때문에 시간대별 전력 소비 특성을 추출해야 한다. 전력 소비는 전기 요금과도 밀접한 관련이 있기 때문에 전기 요금제 구조를 이용하여 고객의 전력 소비 특성을 추출할 때 특성을 직관적으로 나타낼 수 있다. 일 실시예에서 따르면, 전기 요금제 중에서 계절 및 시간대에 따라 전기 요금을 차등하게 부과하는 계시별(time-of-use, TOU) 요금제가 존재한다. TOU 요금제는 24시간을 경부하, 중간 부한, 및 최대부하의 3가지로 구분하고, 각 시간대는 계절에 따라 다르기 때문에 TOU 요금제 구조를 이용한 지표인 TOU 지표를 통해 계절 및 시간대에 따른 고객의 전력 소비 특성을 추출할 수 있다.Electricity users need to extract the power consumption characteristics for each time zone because the power consumption varies by time zone according to their life pattern. Since power consumption is closely related to electricity rates, the characteristics can be intuitively displayed when extracting the customer's power consumption characteristics using the electricity rate system. According to an embodiment, among the electricity rates, a time-of-use (TOU) rate system that differentially charges electricity rates according to seasons and time zones exists. The TOU rate system divides 24 hours into three types: light load, medium load, and maximum load, and since each time zone is different depending on the season, the TOU indicator, which is an indicator using the TOU rate plan structure, determines the customer's electricity consumption according to the season and time zone. characteristics can be extracted.

TOU 지표는 TOU 요금제 구조를 이용하여 계절 및 시간대에 따른 고객의 전력 소비 특성을 나타낸 지표로써, 다음의 식 1에 따라 나타낸다.The TOU index is an index representing the customer's power consumption characteristics according to seasons and time zones using the TOU rate structure, and is expressed according to Equation 1 below.

[식 1][Equation 1]

여기에서,

는 고객 'i'의 시간대별 전력 소비 특성(period = 경부하, 중간부하, 및 최대부하)이고,

는 시간대별 시간 가중치로 24시간 중 해당 시간대가 차지하는 비율이며,

는 고객 'i'의 해당 시간대 평균 전력이다.From here,

Is the power consumption characteristic of customer 'i' by time period (period = light load, medium load, and maximum load),

is the time weight of each time zone, which is the percentage occupied by that time zone in 24 hours,

is the average power of customer 'i' during that time period.

전기 이용 고객 데이터를 이용해 고객의 특성에 따라 군집할 경우, 데이터 간의 유사도를 산정할 수 있다(S120). 데이터의 유사도는 데이터 사이의 거리를 측정하여 분석하며 데이터의 형태에 따라 다르게 산정될 수 있다. 일 실시예에 따르면, 연속형 데이터는 유클리드(Euclidean) 거리로 유사도를 산정하고, 범주형 데이터는 자카드(Jaccard) 거리로 유사도를 산정하며, 혼합형 데이터는 가워(Gower) 거리로 유사도를 산정할 수 있다. When clustering according to customer characteristics using electricity customer data, similarity between data can be calculated (S120). The similarity of data is analyzed by measuring the distance between data and can be calculated differently depending on the type of data. According to an embodiment, the similarity can be calculated with Euclidean distance for continuous data, the similarity can be calculated with Jaccard distance for categorical data, and the similarity can be calculated with Gower distance for mixed data. have.

TOU 지표는 연속형 데이터이기 때문에, 유클리드 거리로 유사도를 산정할 수 있다. 유클리드 거리는 두 점 사이의 직선 거리로써, TOU 지표의 유클리드 거리는 다음의 식 2에 따라 산정될 수 있다.Since the TOU index is continuous data, similarity can be calculated using the Euclidean distance. The Euclidean distance is a straight line distance between two points, and the Euclidean distance of the TOU index can be calculated according to Equation 2 below.

[식 2][Equation 2]

전기 이용 고객은 에어컨 및 전열기 등과 같은 가전기기 보유여부, 총 가구원 수, 및 거주 지역 등 사회적 요인을 나타낸 범주형 데이터가 존재한다. 범주형 데이터의 유사도는 자카드 거리로 산정한다. 자카드 거리는 두 집합 S_i 및 S_j 사이의 합집합과 교집합을 사용하여 계산되며, 식 3과 같다.For customers using electricity, there is categorical data that shows social factors such as whether or not they have home appliances such as air conditioners and electric heaters, the total number of household members, and the region of residence. The similarity of categorical data is calculated by the Jacquard distance. The Jacquard distance is calculated using the union and intersection between the two sets S _i and S _j , as shown in Equation 3.

[식 3][Equation 3]

혼합형 데이터의 유사도는 가워 거리를 사용하여 고객과 고객 사이의 유사도를 산정할 수 있다. 고객 간의 가워 거리는 다음의 식 4와 같이 산정될 수 있다.The similarity of mixed data can be calculated from customer to customer using distance. The distance between customers can be calculated as in Equation 4 below.

[식 4][Equation 4]

여기에서,

는 특성 'k'데이터 개수의 역수,

는 특성 'k'에서의 유사도로 데이터의 형태에 따라 다르게 계산될 수 있다. 두 고객 사이의 가워 거리가 가깝다는 것은 두 고객의 전체적인 특성이 유사하다는 것을 의미한다.From here,

is the reciprocal of the number of features 'k' data,

may be calculated differently according to the type of data with similarity in feature 'k'. A close distance between two customers means that the overall characteristics of the two customers are similar.

전기 이용 고객은 전력 소비 특성과 사회적 요인 데이터를 가진 혼합형 데이터가 필요하기 때문에, 두 데이터 형태를 고려하여 최적 군집 수를 산정할 수 있다. 매칭 군집 알고리즘(Matched Clustering Algorithm, MCA)은 전력 소비 특성 및 사회적 요인 데이터의 형태를 고려한 최적 군집 수 산정 알고리즘이다.Since electricity customers need mixed data with power consumption characteristics and social factor data, the optimal number of clusters can be calculated by considering both types of data. The Matched Clustering Algorithm (MCA) is an algorithm for calculating the optimal number of clusters considering the shape of power consumption characteristics and social factor data.

여기에서, 데이터 군집 분석 알고리즘에서 군집 수를 너무 크게 설정하면, 오버피팅(overfitting)되는 문제점이 존재하기 때문에 군집 수를 적절하게 산정하는 것이 중요하다. 일반적인 최적 군집 수 산정 방법에는 각 군집에 속한 개체간 거리의 제곱합인 Within Cluster Sum of Squares(WCSS)를 군집 수를 늘려가며 그래프를 작성한다. 그리고 WCSS 그래프의 기울기가 완만해지는 점을 최적 군집 수로 산정하며 이 방법을 엘보우 기법(Elbow method)이라고 한다. 하지만 엘보우 기법은 데이터의 형태나 군집 알고리즘에 따라 다르게 나타날 수도 있으며 기울기가 완만해지는 명확한 기준이 존재하지 않기 때문에 사용자에 따라 다르게 나타난다. 따라서 본 발명의 일 실시예에 따르면, 다양한 데이터 형태와 군집 알고리즘을 동시에 반영하여 합리적인 군집 수를 산정하기 위해 매칭 군집 알고리즘(MCA)을 사용한다.Here, if the number of clusters is set too high in the data cluster analysis algorithm, there is an overfitting problem, so it is important to properly calculate the number of clusters. In general, the optimal number of clusters is calculated by graphing the Within Cluster Sum of Squares (WCSS), which is the sum of squares of distances between individuals belonging to each cluster, while increasing the number of clusters. And the point where the slope of the WCSS graph becomes gentle is calculated as the optimal number of clusters, and this method is called the elbow method. However, the elbow technique may appear differently depending on the type of data or the clustering algorithm, and since there is no clear criterion for smoothing the slope, it appears differently depending on the user. Therefore, according to an embodiment of the present invention, a matching clustering algorithm (MCA) is used to calculate a reasonable number of clusters by simultaneously reflecting various data types and clustering algorithms.

도 2는 본 발명의 일 실시예에 따른 전기 이용 고객 데이터 군집 분석 방법을 설명하기 위한 블록도이다.2 is a block diagram illustrating a method for analyzing a cluster of electricity user data according to an embodiment of the present invention.

도 1 및 도 2를 참조하면, MCA는 전력 소비 특성 및 사회적 요인 데이터를 각각 K-means 및 응집형　계층적 군집화(Hierachical Agglomerative Clustering, HAC)를 통해 군집할 수 있다(S130). 이때, 군집 수는 a(a는 2 이상의 자연수로써 예시적인 값은 2일 수 있음) 내지 b(b는 a 보다 큰 자연수로써 예시적인 값은 10일 수 있음)까지 변화시킨다. 그리고 각 군집 수에 따른 K-means와 HAC 군집 결과를 매칭하여 매칭율이 가장 높은 군집 수를 최적 군집 수로 산정할 수 있다(S140).Referring to FIGS. 1 and 2 , MCA may cluster power consumption characteristics and social factor data through K-means and Hierarchical Agglomerative Clustering (HAC), respectively (S130). At this time, the number of clusters is varied from a (where a is a natural number greater than 2 and an exemplary value may be 2) to b (where b is a natural number greater than a and an exemplary value may be 10). In addition, the number of clusters with the highest matching rate can be calculated as the optimal number of clusters by matching the K-means and HAC cluster results according to the number of clusters (S140).

매칭율 산정은 두 군집화 알고리즘의 결과의 일치 비율에 기초하여 산정될 수 있다. 예컨대, K-means 알고리즘에 따른 군집화의 결과(즉, 각 고객이 속하는 군집 번호)와 HAC 알고리즘에 따른 군집화의 결과(즉, 각 고객이 속하는 군집 번호)의 일치 비율이 매칭율로 산정될 수 있다. 도 2에서 군집 수가 2인 경우, 고객 1은 K-means 알고리즘에 따른 군집화 결과는 제2 군집(군집 1)이고 HAC 알고리즘에 따른 군집화 결과는 제1 군집(군집 0)이기 때문에 고객 1은 두 알고리즘에 따른 군집화 결과가 매칭되지 않는다. 반대로, 고객 2는 K-means 알고리즘에 따른 군집화 결과는 제2 군집(군집 1)이고 HAC 알고리즘에 따른 군집화 결과 역시 제2 군집(군집 1)이기 때문에 두 알고리즘에 따른 군집화 결과가 매칭된다. 이와 같이, 전체 고객의 수에 대한 매칭되는 고객의 수의 비율이 매칭율이 될 수 있다.The matching rate may be calculated based on the matching rate of the results of the two clustering algorithms. For example, the matching ratio between the clustering result according to the K-means algorithm (ie, the cluster number to which each customer belongs) and the clustering result according to the HAC algorithm (ie, the cluster number to which each customer belongs) can be calculated as the matching rate. . In FIG. 2, when the number of clusters is 2, customer 1 has two algorithms because the clustering result according to the K-means algorithm is the second cluster (cluster 1) and the clustering result according to the HAC algorithm is the first cluster (cluster 0). The clustering results according to do not match. Conversely, for customer 2, the clustering result according to the K-means algorithm is the second cluster (cluster 1) and the clustering result according to the HAC algorithm is also the second cluster (cluster 1), so the clustering results according to the two algorithms match. In this way, the ratio of the number of matched customers to the total number of customers may be the matching rate.

추가적으로, 산정된 군집 수를 사용하여 혼합형 데이터를 K-medoid을 통해 군집하여 전기 이용 고객의 특성을 분석할 수 있다(S150). 이때, 최적 군집 수 산정 시 매칭율이 동일한 군집 수가 존재할 경우, 군집 내부의 전체 가워 거리를 계산하여 총 거리가 작은 군집 수를 최적 군집 수로 선택할 수 있다.Additionally, it is possible to analyze the characteristics of customers using electricity by clustering mixed-type data through K-medoid using the calculated number of clusters (S150). In this case, when calculating the optimal number of clusters, if there are a number of clusters with the same matching rate, the total number of clusters with a smaller total distance can be selected as the optimal number of clusters by calculating the total Gower distance inside the cluster.

본 발명의 실시예들에 따르면, TOU 지표는 전력 데이터를 계절 및 시간대에 따라 구분하여 표현하기 때문에 고객의 전력 소비 특성을 직관적으로 분석할 수 있다. 그리고 MCA를 통해 합리적으로 군집 수를 산정할 수 있기 때문에 혼합형 데이터인 전기 이용 고객 데이터를 효과적으로 군집하여 분석할 수 있다. 또한, 군집 결과에 따른 데이터를 분석하여 군집별 전기 이용 고객의 특성에 따라 효율적인 에너지 관리 서비스 운영 방안을 제시할 수 있다.According to embodiments of the present invention, since the TOU indicator divides and expresses power data according to seasons and time zones, customers' power consumption characteristics can be intuitively analyzed. And since the number of clusters can be reasonably calculated through MCA, it is possible to effectively cluster and analyze mixed-type data of customers using electricity. In addition, by analyzing the data according to the cluster results, it is possible to propose an efficient energy management service operation plan according to the characteristics of electricity users in each cluster.

도 3은 일 실시예에 따른 전기 이용 고객 데이터 군집 분석 방법에서, 군집 결과에 따른 군집별 TOU 지표 및 사회적 요인 데이터를 나타내는 표이고, 도 4는 일 실시예에 따른 전기 이용 고객 데이터 군집 분석 방법에서, 군집 결과 TOU 지표를 통해 3차원 산포도를 나타낸 그래프이다.3 is a table showing TOU index and social factor data for each cluster according to clustering results in a method for cluster analysis of electricity user data according to an embodiment, and FIG. 4 is a table showing a cluster analysis method for electricity user data according to an embodiment. , It is a graph showing a 3-dimensional scatter plot through the TOU index of clustering results.

도 3에서, 군집 결과를 바탕으로 사회적 요인 데이터를 분석하면, 군집 0은 가구원 수 1 내지 2명인 특성을 가지고 있는 군집이며, 군집 1 및 2는 가구원수가 3 내지 4명인 특성을 보여준다. 또한, 군집 2는 군집 1보다 주택 평수가 더 크고 에어컨 보유대수가 더 많은 특성을 보여줄 수 있다.In FIG. 3 , when social factor data is analyzed based on the clustering results, cluster 0 is a cluster that has the characteristics of having 1 to 2 household members, and clusters 1 and 2 show the characteristics of having 3 to 4 household members. In addition, cluster 2 may show characteristics with a larger house size and more air conditioners than cluster 1.

도 3 및 도 4를 참조하면, 군집 0은 다른 군집에 비해 전체적인 전력 사용량이 작고, 군집 2는 군집 1에 비해 최대부하 시간대 전력 사용량이 많은 특성을 보여준다.Referring to FIGS. 3 and 4 , cluster 0 has a smaller overall power consumption than other clusters, and cluster 2 shows a characteristic of having a higher power consumption during peak load hours than cluster 1.

결과적으로 각 군집별 사회적 요인 특성과 시간대별 전력 사용량 쉽게 파악할 수 있기 때문에, 각 군집별 특성에 맞추어 전력 수요 관리 업체는 고객에게 전기를 아낄 수 있는 방안을 제하여 전기 요금을 절감하는 서비스를 제공할 수 있다.As a result, since the social factor characteristics of each cluster and power consumption by time can be easily identified, the electricity demand management company can provide customers with a service to reduce electricity bills by devising ways to save electricity according to the characteristics of each cluster. can

이상에서 설명된 장치는 하드웨어 구성 요소, 소프트웨어 구성 요소, 및/또는 하드웨어 구성 요소 및 소프트웨어 구성 요소의 집합으로 구현될 수 있다. 예를 들어, 실시예들에서 설명된 장치 및 구성 요소는, 예를 들어, 프로세서, 콘트롤러, ALU(Arithmetic Logic Unit), 디지털 신호 프로세서(Digital Signal Processor), 마이크로컴퓨터, FPA(Field Programmable array), PLU(Programmable Logic Unit), 마이크로프로세서, 또는 명령(instruction)을 실행하고 응답할 수 있는 다른 어떠한 장치와 같이, 하나 이상의 범용 컴퓨터 또는 특수 목적 컴퓨터를 이용하여 구현될 수 있다. 처리 장치는 운영 체제(Operation System, OS) 및 상기 운영 체제 상에서 수행되는 하나 이상의 소프트웨어 애플리케이션을 수행할 수 있다. 또한, 처리 장치는 소프트웨어의 실행에 응답하여, 데이터를 접근, 저장, 조작, 처리 및 생성할 수도 있다. 이해의 편의를 위하여, 처리 장치는 하나가 사용되는 것으로 설명된 경우도 있지만, 해당 기술 분야에서 통상의 지식을 가진 자는, 처리 장치가 복수 개의 처리 요소(Processing Element) 및/또는 복수 유형의 처리 요소를 포함할 수 있음을 알 수 있다. 예를 들어, 처리 장치는 복수 개의 프로세서 또는 하나의 프로세서 및 하나의 콘트롤러를 포함할 수 있다. 또한, 병렬 프로세서(Parallel Processor)와 같은, 다른 처리 구성(Processing Configuration)도 가능하다.The device described above may be implemented as a hardware component, a software component, and/or a set of hardware components and software components. For example, devices and components described in the embodiments may include, for example, a processor, a controller, an arithmetic logic unit (ALU), a digital signal processor, a microcomputer, a field programmable array (FPA), It may be implemented using one or more general purpose or special purpose computers, such as a Programmable Logic Unit (PLU), microprocessor, or any other device capable of executing and responding to instructions. The processing device may execute an operating system (OS) and one or more software applications running on the operating system. A processing device may also access, store, manipulate, process, and generate data in response to execution of software. For convenience of understanding, there are cases in which one processing device is used, but those skilled in the art will understand that the processing device includes a plurality of processing elements and/or a plurality of types of processing elements. It can be seen that it can include. For example, a processing device may include a plurality of processors or a processor and a controller. Also, other processing configurations are possible, such as a parallel processor.

소프트웨어는 컴퓨터 프로그램(Computer Program), 코드(Code), 명령(Instruction), 또는 이들 중 하나 이상의 조합을 포함할 수 있으며, 원하는 대로 동작하도록 처리 장치를 구성하거나 독립적으로 또는 결합적으로(Collectively) 처리 장치를 명령할 수 있다. 소프트웨어 및/또는 데이터는, 처리 장치에 의하여 해석되거나 처리 장치에 명령 또는 데이터를 제공하기 위하여, 어떤 유형의 기계, 구성 요소(Component), 물리적 장치, 가상 장치(Virtual Equipment), 컴퓨터 저장 매체 또는 장치, 또는 전송되는 신호 파(Signal Wave)에 영구적으로, 또는 일시적으로 구체화(Embody)될 수 있다. 소프트웨어는 네트워크로 연결된 컴퓨터 시스템 상에 분산되어서, 분산된 방법으로 저장되거나 실행될 수도 있다. 소프트웨어 및 데이터는 하나 이상의 컴퓨터 판독 가능 기록 매체에 저장될 수 있다.Software may include a computer program, code, instructions, or a combination of one or more of these, and may configure a processing device to operate as desired or process independently or collectively. The device can be commanded. Software and/or data may be any tangible machine, component, physical device, virtual equipment, computer storage medium or device, intended to be interpreted by or to provide instructions or data to a processing device. , or may be permanently or temporarily embodied in the transmitted signal wave. Software may be distributed on networked computer systems and stored or executed in a distributed manner. Software and data may be stored on one or more computer readable media.

실시예에 따른 방법은 다양한 컴퓨터 수단을 통하여 수행될 수 있는 프로그램 명령 형태로 구현되어 컴퓨터 판독 가능 매체에 기록될 수 있다. 상기 컴퓨터 판독 가능 매체는 프로그램 명령, 데이터 파일, 데이터 구조 등을 단독으로 또는 좋ㅂ하여 포함할 수 있다. 상기 매체에 기록되는 프로그램 명령은 실시예를 위하여 특별히 설계되고 구성된 것들이거나 컴퓨터 소프트웨어 당업자에게 공지되어 사용 가능한 것일 수도 있다. 컴퓨터 판독 가능 기록 매체의 예에는 하드 디스크, 플로피 디스크 및 자기 테이프와 같은 자기 매체(Magnetic Media), CD-ROM, DVD와 같은 광기록 매체(Optical Media), 플롭티컬 디스크(Floptical Disk)와 같은 자기-광 매체(Magneto-optical Media), 롬(ROM), 램(RAM), 플래시 메모리 등과 같은 프로그램 명령을 저장하고 수행하도록 특별히 구성된 하드웨어 장치가 포함된다. 프로그램 명령의 예에는 컴파일러에 의해 만들어지는 것과 같은 기계어 코드뿐만 아니라 인터프리터 등을 사용해서 컴퓨터에 의해서 실행될 수 있는 고급 언어 코드를 포함한다. 상기된 하드웨어 장치는 실시예의 동작을 수행하기 위해 하나 이상의 소프트웨어 모듈로서 작동하도록 구성될 수 있으며, 그 역도 마찬가지이다.The method according to the embodiment may be implemented in the form of program instructions that can be executed through various computer means and recorded on a computer readable medium. The computer readable medium may include program instructions, data files, data structures, etc. alone or in combination. Program commands recorded on the medium may be specially designed and configured for the embodiment or may be known and usable to those skilled in computer software. Examples of computer-readable recording media include magnetic media such as hard disks, floppy disks and magnetic tapes, optical media such as CD-ROMs and DVDs, and magnetic media such as floptical disks. - Includes hardware devices specially configured to store and execute program instructions, such as magneto-optical media, ROM, RAM, flash memory, etc. Examples of program instructions include high-level language codes that can be executed by a computer using an interpreter, as well as machine language codes such as those produced by a compiler. The hardware devices described above may be configured to operate as one or more software modules to perform the operations of the embodiments, and vice versa.

본 발명은 도면에 도시된 실시예를 참고로 설명되었으나 이는 예시적인 것에 불과하며, 본 기술 분야의 통상의 지식을 가진 자라면 이로부터 다양한 변형 및 균등한 타 실시예가 가능하다는 점을 이해할 것이다. 예를 들어, 설명된 기술들이 설명된 방법과 다른 순서로 수행되거나, 및/또는 설명된 시스템, 구조, 장치, 회로 등의 구성 요소들이 설명된 방법과 다른 형태로 결합 또는 조합되거나, 다른 구성 요소 또는 균등물에 의하여 대치되거나 치환되더라도 적절한 결과가 달성될 수 있다. 따라서, 본 발명의 진정한 기술적 보호 범위는 첨부된 등록청구범위의 기술적 사상에 의해 정해져야 할 것이다.Although the present invention has been described with reference to the embodiments shown in the drawings, this is only exemplary, and those skilled in the art will understand that various modifications and equivalent other embodiments are possible therefrom. For example, the described techniques may be performed in an order different from the method described, and/or components of the described system, structure, device, circuit, etc. may be combined or combined in a different form than the method described, or other components may be used. Or even if it is replaced or substituted by equivalents, appropriate results can be achieved. Therefore, the true technical protection scope of the present invention should be determined by the technical spirit of the attached claims.

Claims

A method of cluster analysis of electricity usage customer data performed in a computing device including at least one processor,
extracting continuous data by extracting power consumption characteristics of electricity customers through time-of-use (TOU) indicators;
clustering the customer's continuous data;
clustering the categorical data of the customer; and
Determining final clustering by matching a clustering result of the continuous data and a clustering result of the categorical data;
The continuous data is clustered through a K-means algorithm by calculating similarity with Euclidean distance,
The categorical data is clustered through a Hierarchical Agglomerative Clustering (HAC) algorithm by calculating similarity with Jaccard distance,
The step of clustering the continuous data of the customer and the step of clustering the categorical data of the customer are clustered while changing the number of clusters,
In the step of determining the final clustering, the number of clusters with the highest matching rate is calculated as the optimal number of clusters by matching the clustering results of the K-means algorithm and the HAC algorithm for each number of clusters;
When calculating the optimal number of clusters, if there is a number of clusters with the same matching rate, the total Gower distance inside the cluster is calculated and the number of clusters with the smallest total distance is selected as the optimal number of clusters.
A method for cluster analysis of electricity customer data.

According to claim 1,
The time-of-day indicator is an indicator of the customer's power consumption characteristics according to the season and time zone based on the time-of-day rate plan structure that differentially charges electricity rates according to the season and time zone, and the following equation Method for analyzing customer data clustering.

(From here,

is the average power of customer 'i' during that time period.)

delete

A method for analyzing a cluster of electricity usage customer data performed in a computing device including at least one processor,
extracting continuous data by extracting power consumption characteristics of electricity customers through time-of-use (TOU) indicators;
clustering the customer's continuous data;
clustering the categorical data of the customer;
determining final clustering by matching a clustering result of the continuous data and a clustering result of the categorical data; and
Analyzing the characteristics of the customer through the final clustering result,
The mixed data including the continuous and categorical data is clustered through K-medoid by calculating the similarity with the Gower distance,
A method for cluster analysis of electricity customer data.

delete