KR20210130395A

KR20210130395A - METHOD FOR ANALYZING RESIDENTIAL LOAD CHARACTERISTICS USING k-MEANS CLUSTERING AND PRINCIPAL COMPONENT ANALYSIS

Info

Publication number: KR20210130395A
Application number: KR1020200048532A
Authority: KR
Inventors: 윤성국; 손예지
Original assignee: 한국전력공사; 숭실대학교산학협력단
Priority date: 2020-04-22
Filing date: 2020-04-22
Publication date: 2021-11-01

Abstract

The present invention relates to a residential load characteristic analyzing method using k-means clustering and principal component analysis. The method comprises the following steps of: classifying, by a control unit of a residential load characteristic analysis apparatus, residential users through k-means clustering; extracting, by the control unit, a principal consumer representing each cluster by applying principal component analysis to a classified cluster; and analyzing, by the control unit, characteristics of each cluster by using questionnaire responses before collection of data of the extracted principal consumer.

Description

Residential load characteristic analysis method using k-means clustering and principal component analysis

본 발명은 k-평균 군집화와 주성분 분석을 이용한 주거 부하 특성 분석 방법에 관한 것으로, 보다 상세하게는 k-평균 군집화를 통해 주거 사용자를 분류한 이후, 상기 분류된 군집에 주성분 분석을 적용하여 각 군집을 대표할 수 있는 주요 소비자를 추출하고, 주거 부하 특성 분석 방법을 통해 군집별 주거 부하 특성 분석을 수행할 수 있도록 하는, k-평균 군집화와 주성분 분석을 이용한 주거 부하 특성 분석 방법에 관한 것이다.The present invention relates to a residential load characteristic analysis method using k-means clustering and principal component analysis. More specifically, after classifying residential users through k-means clustering, principal component analysis is applied to the classified clusters to apply the principal component analysis to each cluster. It relates to a residential load characteristic analysis method using k-means clustering and principal component analysis, which extracts major consumers who can represent

최근 전력 계통은 수요 측면을 중심으로 발전해왔으며, 이에 수요 자원을 활용하기 위한 다양한 수요 반응 프로그램이 적용되고 있다. Recently, the power system has been developed based on the demand side, and various demand response programs are applied to utilize demand resources.

이때 주거 부하는 전체 부하 대비 2015년 미국 22%, 2016년 캐나다 온타리오주 20%로 적지 않은 비중을 보이고 있다.At this time, the residential load is 22% of the total load in the United States in 2015 and 20% in Ontario, Canada in 2016, showing a significant proportion.

따라서 주거 부하 수요 자원을 활용하기 위해 주거 소비자에게 적절한 주거 부하 수요 반응 프로그램 설계가 필수적으로 필요한 상황이다.Therefore, in order to utilize the residential load demand resource, it is essential to design an appropriate residential load demand response program for residential consumers.

이러한 주거 부하 수요 반응 프로그램 설계에는 해당 주거 소비자와 주거 환경 내 포함되어 있는 부하(예 ; 가전제품 등)의 특성 반영이 필요하여, 이에 대한 이해가 선행되어야 한다.In designing such a residential load demand response program, it is necessary to reflect the characteristics of the residential consumer and the load included in the residential environment (eg, home appliances, etc.), so an understanding of this must be preceded.

이때 주거 소비자 부하 특성을 분석하기 위해서 군집화와 주성분 분석을 이용한 많은 연구가 수행되었으며, 대부분의 연구는 k-평균 군집화 이전에 주성분 분석을 이용해 사전에 전체 데이터의 차원 감소를 수행한다. 즉, 주성분 분석 후 k-평균 군집화를 진행하는 방식이다.At this time, many studies using clustering and principal component analysis have been conducted to analyze the characteristics of residential consumer load, and most studies perform dimensionality reduction of the entire data in advance using principal component analysis before k-means clustering. That is, k-means clustering is performed after principal component analysis.

그러나 상기와 같이 주성분 분석 후 k-평균 군집화를 진행할 경우, 개별 군집에 대한 주성분 분석의 영향을 확인할 수 없기 때문에 군집별 부하 특성 분석이 다소 부정확해지는 문제점이 있다. 또한 군집별 부하 특성 분석에 설문 응답 데이터를 이용하는 경우, 새로운 사용자 등장 시 추가적인 설문이 어렵다는 문제점이 있다.However, when k-means clustering is performed after principal component analysis as described above, since the effect of principal component analysis on individual clusters cannot be confirmed, there is a problem in that the load characteristic analysis for each cluster is somewhat inaccurate. In addition, when the questionnaire response data is used to analyze the load characteristics for each group, there is a problem in that it is difficult to conduct an additional questionnaire when a new user appears.

따라서 k-평균 군집화를 통해 주거 사용자를 분류한 이후, 상기 분류된 군집에 주성분 분석을 적용하여 각 군집을 대표할 수 있는 주요 소비자를 추출하고, 군집별 주거 부하 특성 분석을 수행할 수 있도록 하는 방법이 필요한 상황이다.Therefore, after classifying residential users through k-means clustering, principal component analysis is applied to the classified clusters to extract major consumers who can represent each cluster, and a method for performing residential load characteristic analysis for each cluster. This is a necessary situation.

본 발명의 배경기술은 대한민국 공개특허 10-2017-0053193호(2017.05.16. 공개, 무선 이미지 센서 네트워크를 위한 k-평균 클러스터링 기반의 데이터 압축 시스템 및 방법)에 개시되어 있다. The background technology of the present invention is disclosed in Korean Patent Publication No. 10-2017-0053193 (published on May 16, 2017, a data compression system and method based on k-means clustering for a wireless image sensor network).

본 발명의 일 측면에 따르면, 본 발명은 상기와 같은 문제점을 해결하기 위해 창작된 것으로서, k-평균 군집화를 통해 주거 사용자를 분류한 이후, 상기 분류된 군집에 주성분 분석을 적용하여 각 군집을 대표할 수 있는 주요 소비자를 추출하고, 주거 부하 특성 분석 방법을 통해 군집별 주거 부하 특성 분석을 수행할 수 있도록 하는, k-평균 군집화와 주성분 분석을 이용한 주거 부하 특성 분석 방법을 제공하는데 그 목적이 있다. According to one aspect of the present invention, the present invention was created to solve the above problems, and after classifying residential users through k-means clustering, principal component analysis is applied to the classified clusters to represent each cluster. The purpose of this is to provide a residential load characteristic analysis method using k-means clustering and principal component analysis, which extracts the main consumers who can .

본 발명의 일 측면에 따른 k-평균 군집화와 주성분 분석을 이용한 주거 부하 특성 분석 방법은, 주거 부하 특성 분석 장치의 제어부가 k-평균 군집화를 통해 주거 사용자를 분류하는 단계; 상기 제어부가 상기 분류된 군집에 주성분 분석을 적용하여, 각 군집을 대표할 수 있는 주요 소비자를 추출하는 단계; 및 상기 제어부가 상기 추출된 주요 소비자의 데이터 수집 이전 시점의 설문 응답을 이용해 각 군집의 특징을 분석하는 단계;를 포함하는 것을 특징으로 한다.A residential load characteristic analysis method using k-means clustering and principal component analysis according to an aspect of the present invention comprises: classifying, by a control unit of a residential load characteristic analysis apparatus, residential users through k-means clustering; extracting, by the control unit, principal component analysis to the classified clusters, which can represent each cluster; and analyzing, by the control unit, the characteristics of each group by using the questionnaire responses of the extracted main consumers prior to data collection.

본 발명에 있어서, 상기 k-평균 군집화는, 다차원 N개 데이터 집합(

)을 K 개의 군집으로 분류하는 비지도 기계학습 기법이며, 상기 데이터 집합은 복수의 군집 중 가장 거리가 짧고 SSE(sum of squared error) 값이 작은 군집에 배정되는 것을 특징으로 한다.In the present invention, the k-means clustering is a multidimensional N data set (

) is an unsupervised machine learning technique for classifying K clusters, and the data set is characterized in that the data set is assigned to a cluster having the shortest distance and a small sum of squared error (SSE) value among a plurality of clusters.

본 발명에 있어서, 상기 군집의 중심은 배정된 데이터를 기준으로 재계산되며, 상기 제어부가, 상기 재계산된 군집의 중심을 기준으로 데이터 집합을 다시 분류하고, 상기 분류한 군집에 변화가 없을 때까지 군집화 과정을 반복하며, 상기 군집화 과정은 비용 함수인 SSE 값을 최소화하는 방향으로 진행하는 것을 특징으로 한다.In the present invention, when the center of the cluster is recalculated based on the assigned data, the controller reclassifies the data set based on the recalculated center of the cluster, and there is no change in the classified cluster The clustering process is repeated until , and the clustering process is characterized in that the SSE value, which is a cost function, is minimized.

본 발명에 있어서, 상기 k-평균 군집화는, 최적 군집 수와 상관없이 주어진 군집 수로 분류가 진행되며, 사전에 적정한 최적 군집 개수를 결정하기 위하여‘Elbow method’를 이용하여 최적 군집 개수를 선정하되, 전체 군집 수(K)에 따라 계산한 비용 함수 값을 도시한 그래프에서 지정된 일정 개수 이상 전체 군집 수(K)를 증가시키면 SSE 감소량의 한계효용이 발생하여 그래프가 꺾이는 부분이 발생하며, 이 꺾이는 부분의 군집 수를 최적 군집 개수로 선정하는 것을 특징으로 한다.In the present invention, in the k-means clustering, classification is performed with a given number of clusters regardless of the optimal number of clusters, and the optimal number of clusters is selected using the 'Elbow method' to determine the appropriate optimal number of clusters in advance, In the graph showing the value of the cost function calculated according to the total number of clusters (K), if the total number of clusters (K) is increased by more than a specified number of clusters (K), the marginal utility of the decrease in SSE occurs, causing the graph to bend. It is characterized in that the number of clusters is selected as the optimal number of clusters.

본 발명에 있어서, 상기 주성분 분석은, 상기 제어부가, 아래의 수학식 2를 이용하여 변수

사이의 상호 관계를 반영한 주성분(

)으로 하여금 기존 데이터의 차원을 감소시켜 간결하게 나타내는 것임을 특징으로 한다.In the present invention, in the principal component analysis, the control unit is a variable using Equation 2 below.

The principal component that reflects the interrelationship between

) to reduce the dimension of the existing data and represent it concisely.

(수학식 2)(Equation 2)

여기서

은 loading vector로서 주성분(

)을 이루는 기존 변수의 계수이며, 이 값이 클수록 해당 변수

가 주성분(

)에 더 중요한 변수임을 나타낸다.here

is the main component (

) is the coefficient of the existing variable that makes up the

is the main component (

) is a more important variable.

본 발명에 있어서, 상기 각 군집을 대표할 수 있는 주요 소비자를 추출하는 단계에서, 상기 제어부가, 각 군집의 특성을 나타내는 주요 소비자만을 추출하기 위하여, 주성분(

)의

값을 내림차 정렬하고, 기준 등수(

) 이내

를 하나라도 가지는 소비자인 변수(

)를 전부 채택하되, 상기 기준 등수(

)는 아래의 수학식 3을 이용하여 산출하는 것을 특징으로 한다.In the present invention, in the step of extracting the main consumers who can represent each cluster, the control unit, in order to extract only the main consumers representing the characteristics of each cluster, the main component (

)of

Sort values in descending order,

) Within

A variable that is a consumer with at least one (

), but the above standard rank (

) is calculated using Equation 3 below.

(수학식 3)(Equation 3)

여기서 기준 등수(

)는

군집 내 주성분의 기준 등수이며,

는

군집의 총 주성분 개수이고,

는 해당 주성분으로 설명할 수 있는 전체 분산의 비율로서, 이를 반영하여 기준 등수를 산출하고, 누적합(

)은

크기를 고려하지 않고 단순 차등을 두어 기준 등수를 계산했을 때 추출되는 최대 소비자 수이다. Here, the standard rank (

)Is

It is the standard rank of the principal component in the cluster,

Is

is the total number of principal components in the cluster,

is the ratio of the total variance that can be explained by the corresponding principal component, and reflects this to calculate the standard rank, and

)silver

This is the maximum number of consumers extracted when the standard rank is calculated based on a simple difference without considering the size.

본 발명에 있어서, 상기 추출된 주요 소비자의 데이터 수집 이전 시점의 설문 응답을 이용해 각 군집의 특징을 분석하는 단계에서, 상기 설문 응답을 위한 설문 문항과 응답 선택지는, 테이블 형태로 미리 지정되는 것을 특징으로 한다.In the present invention, in the step of analyzing the characteristics of each cluster using the questionnaire responses at the time before the data collection of the extracted main consumers, the questionnaire questions and the response options for the questionnaire response are pre-specified in the form of a table do it with

본 발명의 일 측면에 따르면, 본 발명은 k-평균 군집화를 통해 주거 사용자를 분류한 이후, 상기 분류된 군집에 주성분 분석을 적용하여 각 군집을 대표할 수 있는 주요 소비자를 추출하고, 주거 부하 특성 분석 방법을 통해 군집별 주거 부하 특성 분석을 수행할 수 있도록 한다.According to one aspect of the present invention, after classifying residential users through k-means clustering, the present invention applies principal component analysis to the classified clusters to extract main consumers who can represent each cluster, and residential load characteristics Through the analysis method, it is possible to perform an analysis of the residential load characteristics by cluster.

도 1은 본 발명의 일 실시예에 따른 k-평균 군집화와 주성분 분석을 이용한 주거 부하 특성 분석 방법을 설명하기 위한 흐름도.
도 2는 상기 도 1에 있어서, 전체 군집 수(K)에 따라 계산한 비용 함수 값을 도시한 그래프.
도 3은 상기 도 1에 있어서, 최적 군집 개수로 분류된 소비자의 전력 사용량 데이터를 보인 그래프.
도 4는 도 1에 있어서, 주성분 분석으로 추출된 소비자로 이루어진 군집별 평균 패턴 그래프를 보인 예시도.
도 5는 도 1에 있어서, 문항 응답 중 군집별 차이가 두드러진 설문 응답 비율만을 히트맵으로 보인 예시도.1 is a flowchart for explaining a residential load characteristic analysis method using k-means clustering and principal component analysis according to an embodiment of the present invention.
2 is a graph illustrating a cost function value calculated according to the total number of clusters (K) in FIG. 1 .
3 is a graph showing power usage data of consumers classified by the optimal number of clusters in FIG. 1 .
4 is an exemplary diagram showing an average pattern graph for each cluster consisting of consumers extracted by principal component analysis in FIG. 1 .
FIG. 5 is an exemplary view in which only the survey response ratio in which the difference by group is prominent among the question responses in FIG. 1 is shown as a heat map.

이하, 첨부된 도면을 참조하여 본 발명에 따른 k-평균 군집화와 주성분 분석을 이용한 주거 부하 특성 분석 방법의 일 실시예를 설명한다. Hereinafter, an embodiment of a residential load characteristic analysis method using k-means clustering and principal component analysis according to the present invention will be described with reference to the accompanying drawings.

이 과정에서 도면에 도시된 선들의 두께나 구성요소의 크기 등은 설명의 명료성과 편의상 과장되게 도시되어 있을 수 있다. 또한, 후술되는 용어들은 본 발명에서의 기능을 고려하여 정의된 용어들로서 이는 사용자, 운용자의 의도 또는 관례에 따라 달라질 수 있다. 그러므로 이러한 용어들에 대한 정의는 본 명세서 전반에 걸친 내용을 토대로 내려져야 할 것이다.In this process, the thickness of the lines or the size of the components shown in the drawings may be exaggerated for clarity and convenience of explanation. In addition, the terms to be described later are terms defined in consideration of functions in the present invention, which may vary according to the intention or custom of the user or operator. Therefore, definitions of these terms should be made based on the content throughout this specification.

이하, 본 발명의 일 실시예에 따른 k-평균 군집화와 주성분 분석을 이용한 주거 부하 특성 분석 방법은 주거 부하 특성 분석 장치(미도시)를 통해 수행되며, 이때 도면으로 도시되어 있지 않지만, 상기 주거 부하 특성 분석 장치(미도시)는 제어부(예 : CPU, MPU 등)를 포함하는 컴퓨터나 서버로 구현될 수 있다.Hereinafter, the residential load characteristic analysis method using k-means clustering and principal component analysis according to an embodiment of the present invention is performed through a residential load characteristic analysis device (not shown), and although not shown in the drawings, the residential load The characteristic analysis apparatus (not shown) may be implemented as a computer or a server including a control unit (eg, CPU, MPU, etc.).

도 1은 본 발명의 일 실시예에 따른 k-평균 군집화와 주성분 분석을 이용한 주거 부하 특성 분석 방법을 설명하기 위한 흐름도로서, 이에 도시된 바와 같이, 본 실시예는 k-평균 군집화를 통해 주거 사용자를 분류하고(S101), 이후 상기 분류된 군집에 주성분 분석을 적용하여(S102), 각 군집을 대표할 수 있는 주요 소비자를 추출하고(S103), 상기 추출된 소비자의 데이터 수집 이전 시점의 설문 응답을 이용해 각 군집의 특징을 분석할 수 있다.1 is a flowchart for explaining a residential load characteristic analysis method using k-means clustering and principal component analysis according to an embodiment of the present invention. classify (S101), and then apply principal component analysis to the classified cluster (S102), extract major consumers that can represent each cluster (S103), and respond to a questionnaire at the time before data collection of the extracted consumers can be used to analyze the characteristics of each cluster.

이때 상기 k-평균 군집화는 다차원 N개 데이터 집합

를 K 개의 군집으로 분류하는 비지도 기계학습 기법이며, 아래 수학식 1은 비용 함수인 SSE(sum of squared error)로서 분류할 데이터와 각 군집의 중심 사이의 유클리드 거리를 의미한다. In this case, the k-means clustering is a multidimensional N data set.

is an unsupervised machine learning technique that classifies , into K clusters. Equation 1 below is a cost function, sum of squared error (SSE), which means the Euclidean distance between the data to be classified and the center of each cluster.

좀 더 구체적으로 설명하면, 먼저 데이터 집합은 여러 군집 중 가장 거리가 짧고 SSE 값이 작은 군집에 배정된다. 이때 군집 중심은 배정된 데이터를 기준으로 다시 계산되며, 재계산된 군집 중심을 기준으로 데이터 집합을 다시 분류하고, 분류된 군집에 변화가 없을 때까지 과정을 반복한다. 최종적으로 군집화 과정은 비용 함수인 SSE 값을 최소화하는 방향으로 진행된다.More specifically, the data set is assigned to the cluster with the shortest distance and the smallest SSE value among clusters. At this time, the cluster centroid is recalculated based on the assigned data, the data set is reclassified based on the recalculated cluster centroid, and the process is repeated until there is no change in the classified cluster. Finally, the clustering process proceeds in the direction of minimizing the SSE value, which is a cost function.

여기서

는 군집(

)의 중심이며,

는 데이터와 군집 사이의 유클리드 거리를 의미하며, SSE 값이 작을수록 군집이 잘 분류되었다고 판단된다.here

is the cluster (

) is the center of

is the Euclidean distance between the data and the cluster, and the smaller the SSE value, the better the cluster is classified.

상기 k-평균 군집화는 최적 군집 수와 상관없이 주어진 군집 수로 분류를 진행하므로, 사전에 적정한 최적 군집 개수를 정하는 것이 중요한데, 최적 군집 수를 구하는 방법 중 가장 오래 사용된 방법인 ‘Elbow method’를 이용하여 최적 군집 개수를 선정한다.Since the k-means clustering proceeds with classification with the given number of clusters regardless of the number of optimal clusters, it is important to determine the optimal number of clusters in advance. to select the optimal number of clusters.

도 2는 상기 도 1에 있어서, 전체 군집 수(K)에 따라 계산한 비용 함수 값을 도시한 그래프로서, 일정 개수 이상 전체 군집 수(K)를 증가시키면 SSE(sum of squared error) 감소량의 한계효용이 발생하여 그래프가 크게 꺾이며, 이때 꺾이는 부분의 군집 수를 최적 군집 개수로 선정한다.2 is a graph showing a cost function value calculated according to the total number of clusters (K) in FIG. 1, and when the total number of clusters (K) is increased by a certain number or more, the limit of the reduction amount of the sum of squared error (SSE) When utility occurs, the graph is greatly bent, and the number of clusters in the bent portion is selected as the optimal number of clusters.

도 3은 상기 도 1에 있어서, 최적 군집 개수로 분류된 소비자의 전력 사용량 데이터를 보인 그래프이다.3 is a graph showing power usage data of consumers classified by the optimal number of clusters in FIG. 1 .

도 3의 (a)는 전체 소비자의 월별 평균 전력 사용량 패턴을 보인 것이며, 도 3의 (b)는 전체 전력 사용량의 평균 패턴을 보인 것으로서, x축과 y축은 각각 30분 단위 시간과 전력 사용량(kWh)을 의미한다. 3 (a) shows the monthly average power usage pattern of all consumers, and FIG. 3 (b) shows the average pattern of the overall power usage, and the x-axis and y-axis are 30-minute unit time and power usage ( kWh).

이때 도 3의 (b)에서 평균 패턴이 서로 비슷한 이유는 상대적으로 군집별 특성을 보이지 않는 소비자가 같이 분류되어 있기 때문이며, 이때 주요 소비자 추출을 위해 주성분 분석을 이용한다.At this time, the reason why the average patterns are similar to each other in (b) of FIG. 3 is that consumers who do not show comparatively group-specific characteristics are grouped together, and in this case, principal component analysis is used to extract the main consumers.

상기 주성분 분석은 아래의 수학식 2를 이용하여 변수

사이의 상호 관계를 반영한 주성분(

)으로 하여금 기존 데이터의 차원을 감소시켜 보다 간결하게 나타내는 기법이다.The principal component analysis is a variable using Equation 2 below.

The principal component that reflects the interrelationship between

) to reduce the dimension of the existing data and represent it more concisely.

여기서

은 loading vector로서 주성분(

가 주성분(

)에 더 중요한 변수임을 나타낸다.here

is the main component (

) is the coefficient of the existing variable that makes up the

is the main component (

) is a more important variable.

한편 부하 특성 분석을 위해서는 각 군집의 특성을 나타내는 주요 소비자만을 추출한다.On the other hand, for load characteristic analysis, only the main consumers representing the characteristics of each cluster are extracted.

예컨대 각 군집의 특성을 나타내는 주요 소비자만을 추출하기 위하여, 주성분(

)의

값을 내림차 정렬하고, 기준 등수(

) 이내

를 하나라도 가지는 소비자인 변수(

)를 전부 채택한다.For example, in order to extract only the main consumers representing the characteristics of each cluster, the main component (

)of

Sort values in descending order,

) Within

A variable that is a consumer with at least one (

) are all accepted.

상기 기준 등수(

)는 아래의 수학식 3을 이용하여 산출할 수 있다.The above standard rank (

) can be calculated using Equation 3 below.

여기서 기준 등수(

)는

군집 내 주성분의 기준 등수이며,

는

군집의 총 주성분 개수이고,

)은

)Is

It is the standard rank of the principal component in the cluster,

Is

is the total number of principal components in the cluster,

)silver

아래의 표 1은 기존(주성분 분석 전) 소비자 수 및 본 실시예에 따른 주성분 분석으로 추출된 소비자 수를 비교한 테이블이다.Table 1 below is a table comparing the number of consumers (before principal component analysis) and the number of consumers extracted by principal component analysis according to this embodiment.

표 1에 기재된 바와 같이, 주성분 분석으로 추출된 소비자 수는 타 군집 대비 적은 수의 소비자만 존재하므로 군집(Cluster) 6에서 별도의 소비자 추출 과정은 거치지 않는다.As shown in Table 1, since the number of consumers extracted by the principal component analysis has only a small number of consumers compared to other clusters, a separate consumer extraction process in Cluster 6 is not performed.

군집association 1One 22 33 44 55 66 총gun 기존existing 10211021 535535 624624 13131313 277277 1414 37843784 주성분 분석 이후 After principal component analysis 7575 2525 5151 5757 3232 254254

도 4는 도 1에 있어서, 주성분 분석으로 추출된 소비자로 이루어진 군집별 평균 패턴 그래프를 보인 예시도이다.4 is an exemplary diagram showing an average pattern graph for each cluster consisting of consumers extracted by principal component analysis in FIG. 1 .

도 4를 참조하면, k-평균 군집화만 진행한 도 3의 군집별 전력 사용량 평균 패턴이 서로 유사하게 보이는 것에 비해, 주성분 분석을 거친 군집별 평균 패턴 그래프는 타 군집 패턴과 서로 비교적 다른 패턴을 가지는 것을 육안으로 확인할 수 있다. 이는 해당 군집의 특징을 나타내기에 주요 소비자만이 주성분 분석을 통해 추출되어, 비교적 중요하지 않은 소비자가 해당 군집에서 제거되었기 때문이다.Referring to FIG. 4 , compared to the average pattern of power usage by cluster of FIG. 3 in which only k-mean clustering is performed, the average pattern graph for each cluster that has undergone principal component analysis has a relatively different pattern from other cluster patterns. can be visually confirmed. This is because only major consumers are extracted through principal component analysis to indicate the characteristics of the cluster, and relatively insignificant consumers are removed from the cluster.

참고로 주성분 분석 이후, 소비자 수가 이전 보다 특징화되어 타 군집 패턴과 구분된다는 것을 확인하기 위한 유사성 지표로는, 유클리드 거리를 이용한다. For reference, after principal component analysis, the Euclidean distance is used as a similarity index to confirm that the number of consumers is more characterized than before and is distinguished from other cluster patterns.

아래 수학식 4의 min-max 정규화(normalization)를 이용해 각 군집별 평균 전력 사용량에 정규화를 거친 후, 유클리드 거리를 계산하여 유사성을 비교한다. After normalizing the average power usage for each cluster using the min-max normalization of Equation 4 below, the Euclidean distance is calculated and similarity is compared.

여기서

는 정규화 이후 군집별 평균 전력 사용량,

는 정규화 이전 군집별 평균 전력 사용량이며,

,

는 각각 평균 전력 사용량의 최소값과 최대값을 의미한다.here

is the average power usage per cluster after normalization,

is the average power usage per cluster before normalization,

,

denotes the minimum and maximum values of the average power consumption, respectively.

아래의 표 2는 주성분 분석 전후의 각 군집 사이의 전력량 평균 패턴에 대한 유클리드 거리를 나타내는 테이블로서, 해당 수치가 높을수록 평균 전력 패턴 사이의 차이가 커져 유사성이 떨어짐을 나타낸다. 주성분 분석 이전 유클리드 평균 거리 0.778에서 주성분 분석 진행 후 평균 거리 1.2058로 증가하여, 군집별 패턴 특징화로 인한 군집 간 유사성 감소를 확인할 수 있다.Table 2 below is a table showing the Euclidean distance for the average wattage pattern between each cluster before and after the principal component analysis. The higher the value, the greater the difference between the average wattage patterns, indicating that the similarity decreases. The average distance increased from 0.778 before principal component analysis to 1.2058 after principal component analysis, confirming a decrease in the similarity between clusters due to pattern characterization for each cluster.

군집association 주성분 분석 이전 (도 3의 (b))Before principal component analysis (Fig. 3 (b)) 주성분 분석 이후(도 4) After principal component analysis (Fig. 4) 1One 22 33 44 55 66 1One 22 33 44 55 66 1One -- 0.6080.608 1.0071.007 0.3470.347 0.4080.408 0.7810.781 -- 0.7100.710 1.7191.719 0.6650.665 1.1341.134 1.4321.432 22 0.6080.608 -- 1.4741.474 0.6760.676 0.8840.884 1.3351.335 0.7100.710 -- 2.0592.059 0.6990.699 1.3881.388 1.7461.746 33 1.0071.007 1.4741.474 -- 0.8210.821 0.6440.644 0.8260.826 1.7191.719 2.0592.059 -- 1.4911.491 0.8790.879 1.0801.080 44 0.3470.347 0.6760.676 0.8210.821 -- 0.3040.304 0.8630.863 0.6650.665 0.6990.699 1.4911.491 -- 1.0051.005 1.4291.429 55 0.4080.408 0.8840.884 0.6440.644 0.3040.304 -- 0.6880.688 1.1341.134 1.3881.388 0.8790.879 1.0051.005 -- 0.6510.651 66 0.7810.781 1.3351.335 0.8260.826 0.8630.863 0.6880.688 -- 1.4321.432 1.7461.746 1.0801.080 1.4291.429 0.6510.651 --

또한 상기 주성분 분석으로 추출된 소비자의 데이터 수집 이전 시점의 설문 응답을 이용해 각 군집의 특징을 분석할 수 있다.In addition, the characteristics of each cluster can be analyzed using the questionnaire responses at the time before the data collection of the consumers extracted by the principal component analysis.

이때 상기 각 군집의 특징을 분석하기 위한 설문 문항과 응답 선택지는 아래의 표 3과 같다. In this case, the questionnaire questions and response options for analyzing the characteristics of each cluster are shown in Table 3 below.

아래의 표 3은 주거 소비자의 특성 또는 가정 내 부하의 특성을 나타내는 문항으로 구분된다.Table 3 below is divided into questions indicating the characteristics of residential consumers or load characteristics in the home.

주거 소비자 특성Residential Consumer Characteristics 가정 내 부하의 특성Characteristics of loads in the home 1) Family composition
I live alone
All people in my home are over 15
Both adults and children under 15
2) Chief income earner
An employee
self-employed (w/ or w/o employees)
Retired
Unemployed (seeking work or not)
carer : looking after relative family
3) Number of bedrooms : 1/2/3/4/5++
4) Internet access : Y/N1) Family composition
I live alone
All people in my home are over 15
Both adults and children under 15
2) Chief income earner
an employee
self-employed (w/ or w/o employees)
Retired
Unemployed (seeking work or not)
carer : looking after relative family
3) Number of bedrooms: 1/2/3/4/5++
4) Internet access: Y/N 1) Heating resource
Electricity / gas / oil
solid fuel / renewable
2) Heat water resource
Electricity/gas/central heating
oil / solid fuel
3) Number of Appliances : 1/2/3++
tumble dryer / dish washer
electric heater/stand alone freezer
TV(less than or greater than 21 inch)
Desktop/laptop/Games consoles etc.1) Heating resource
Electricity / gas / oil
solid fuel / renewable
2) Heat water resource
Electricity/gas/central heating
oil / solid fuel
3) Number of Appliances: 1/2/3++
tumble dryer / dish washer
electric heater/stand alone freezer
TV (less than or greater than 21 inch)
Desktop/laptop/Games consoles etc.

이때 주거 소비자의 특성 분석 시, 해당 군집에 포함된 소비자 수에 대해 각 문항 선택지에 응답한 해당 군집 소비자의 비율을 계산하여 이용한다. At this time, when analyzing the characteristics of residential consumers, the proportion of consumers in the corresponding cluster who responded to each item option with respect to the number of consumers included in the corresponding cluster is calculated and used.

예컨대 도 5에서 군집 1의 소비자 중 ‘I live alone’선택지에 응답한 소비자는 군집 1 내 전체 소비자의 44%에 해당한다. 또한 가정 내 부하 특성에서 난방 및 온수 연료 설문의 경우, 여러 자원을 사용하여 난방 또는 온수를 이용할 수 있으므로 자원을 중복 사용하는 소비자를 고려하며, 또한 표 3에서 “3) Number of Appliances”는 먼저 군집별로 각 기기에 대한 평균 사용 대수를 계산하고, 총 평균 사용 대수 중에서 해당 군집이 차지하는 비율을 산출한다.For example, among consumers of cluster 1 in FIG. 5 , a consumer who responded to the 'I live alone' option corresponds to 44% of all consumers in cluster 1. In addition, in the case of the heating and hot water fuel survey in the load characteristics within the home, since heating or hot water can be used using multiple resources, consumers who use resources are considered. Also, in Table 3, “3) Number of Appliances” is first The average number of units used for each device is calculated for each device, and the ratio of the cluster to the total average number of units used is calculated.

도 5는 도 1에 있어서, 문항 응답 중 군집별 차이가 두드러진 설문 응답 비율만을 히트맵으로 보인 예시도이다.FIG. 5 is an exemplary diagram in which only the survey response ratio in which the group-specific difference among the question responses is prominent is shown as a heat map in FIG. 1 .

도 5를 참조하면, 행은 각 설문의 응답이고 열은 군집(Cluster)으로서, 응답 비율이 높은 군집이 어두운 색으로 표시된다.Referring to FIG. 5 , a row is a response of each questionnaire and a column is a cluster, and a cluster with a high response rate is displayed in a dark color.

도 5에서 군집 1은 15세 미만 아이가 있는 가정 비율이 낮으며, “Number of appliance”의 개수 역시 0개 또는 1개의 비중이 높았던 독신 층의 비율이 높으며, 독신 층이기 때문에 평균 가전기기 보유량이 매우 적으며, 난방(Heating resource) 및 온수(Heat water resource)의 전력 사용량 비율도 낮다. In FIG. 5, in Cluster 1, the proportion of households with children under 15 years old is low, the number of “Number of appliances” also has a high proportion of single people with a high proportion of 0 or 1, and since it is a single household, the average household appliance holding amount is It is very small, and the ratio of electricity consumption of heating resource and hot water resource is also low.

도 5에서 군집 2는 가전기기 보유 대수가 비교적 많고, 난방 및 온수의 전력 사용량 비율이 높다.In FIG. 5, cluster 2 has a relatively large number of household appliances, and a high ratio of power consumption for heating and hot water.

도 5에서 군집 3은 TV, Desktop, Game consoles 등, 가전의 사용대수가 많고 Internet access 비중이 높으며, 군집 2와 15세 미만의 아이(children under 15)가 있다고 답한 응답의 비율은 비슷하나, 군집 3의 낮 전력 사용량이 보다 높아 등교하지 않는 아이가 있는 가정임을 짐작할 수 있다.In FIG. 5, cluster 3 has a large number of household appliances such as TVs, desktops, game consoles, etc., and a high proportion of Internet access, and the proportion of responses that answered that there are children under 15 with cluster 2 is similar, but the cluster 3, the daytime electricity consumption is higher, suggesting that the household has a child who does not attend school.

도 5에서 군집 4 및 군집 5는 은퇴층(Retired) 비율이 높은 군집이나, 군집 5에서 사용하는 전기 연료 비중이 보다 크다.In FIG. 5 , clusters 4 and 5 are clusters having a high retirement ratio, but the proportion of electric fuel used in cluster 5 is larger.

도 5에서 군집 6은 단순 고용인 및 단독 자영업(Self employed(w/o employees))의 비중이 낮은 반면 직원이 있는 자영업(Self employed(w/ emplyees)의 비중이 매우 크며, 낮의 전력 사용량이 특히 많은데, 이는 집에서 자영업을 하고 있기 때문이다. 즉, 자영업으로 인해 가전기기의 보유 대수도 많으며, 가장 높은 평균 사용량을 기록하고 있다.In FIG. 5, in Cluster 6, the proportion of simple employees and single self-employed (Self employed(w/o employees)) is low, while the proportion of self employed(w/emplyees) with employees is very large, and the power consumption during the day is particularly high. This is because they are self-employed at home, which means they own a lot of home appliances due to self-employment and record the highest average usage.

상기와 같이 본 실시예는 k-평균 군집화 이후 주성분 분석을 통해 중요 사용자만을 추출하고 군집별 특성을 분석할 수 있도록 함으로써, 상대적으로 가전기기의 보유 대수가 많고, 난방 및 온수의 비중이 높을수록 부하 이동이 용이해 수요 응답 프로그램에 참여할 가능성이 높은 군집의 특성을 알 수 있다.As described above, in this embodiment, after k-means clustering, only important users are extracted through principal component analysis and characteristics of each cluster can be analyzed. It shows the characteristics of clusters that are more likely to participate in the demand response program because of their ease of movement.

예컨대 군집 2,3,5의 수요 응답 적용이 비교적 효과가 클 것으로 보이나, 군집 3의 전기 연료 비중이 낮으므로 군집 2와 5의 참여율이 더 높을 것으로 예상되며, 군집 6의 경우 자영업으로 인해 전력 사용 시간이 고정되어 있으므로, 부하 이동이 가능한 전력 사용량은 많으나 상대적으로 부하 이동이 어려움을 알 수 있다. 또한 새로운 소비자 유입 시 어느 군집으로 분류되느냐에 따라 추가적인 설문 응답 데이터 없이도 소비자 특성을 유추할 수 있고, 각 군집의 소비자 특성을 이용하면 수요 응답 참여 가능성을 분석할 뿐 아니라 해당 소비자에 특화된 수요 응답 설계를 할 수 있게 되는 효과가 있다.For example, the application of demand response in clusters 2, 3, and 5 seems to be relatively effective, but since the proportion of electric fuel in cluster 3 is low, the participation rate in clusters 2 and 5 is expected to be higher, and in cluster 6, electricity consumption due to self-employment Since the time is fixed, it can be seen that although the amount of power used to move the load is large, it is relatively difficult to move the load. In addition, depending on which cluster a new consumer is classified into, consumer characteristics can be inferred without additional survey response data. It has the effect of being able to do it.

이상으로 본 발명은 도면에 도시된 실시예를 참고로 하여 설명되었으나, 이는 예시적인 것에 불과하며, 당해 기술이 속하는 분야에서 통상의 지식을 가진 자라면 이로부터 다양한 변형 및 균등한 타 실시예가 가능하다는 점을 이해할 것이다. 따라서 본 발명의 기술적 보호범위는 아래의 특허청구범위에 의해서 정하여져야 할 것이다. 또한 본 명세서에서 설명된 구현은, 예컨대, 방법 또는 프로세스, 장치, 소프트웨어 프로그램, 데이터 스트림 또는 신호로 구현될 수 있다. 단일 형태의 구현의 맥락에서만 논의(예컨대, 방법으로서만 논의)되었더라도, 논의된 특징의 구현은 또한 다른 형태(예컨대, 장치 또는 프로그램)로도 구현될 수 있다. 장치는 적절한 하드웨어, 소프트웨어 및 펌웨어 등으로 구현될 수 있다. 방법은, 예컨대, 컴퓨터, 마이크로프로세서, 집적 회로 또는 프로그래밍 가능한 로직 디바이스 등을 포함하는 프로세싱 디바이스를 일반적으로 지칭하는 프로세서 등과 같은 장치에서 구현될 수 있다. 프로세서는 또한 최종-사용자 사이에 정보의 통신을 용이하게 하는 컴퓨터, 셀 폰, 휴대용/개인용 정보 단말기(personal digital assistant: "PDA") 및 다른 디바이스 등과 같은 통신 디바이스를 포함한다.As described above, the present invention has been described with reference to the embodiment shown in the drawings, but this is merely exemplary, and various modifications and equivalent other embodiments are possible therefrom by those of ordinary skill in the art. will understand the point. Therefore, the technical protection scope of the present invention should be defined by the following claims. Also, the implementations described herein may be implemented as, for example, a method or process, an apparatus, a software program, a data stream, or a signal. Although discussed only in the context of a single form of implementation (eg, only as a method), implementations of the discussed features may also be implemented in other forms (eg, in an apparatus or a program). The apparatus may be implemented in suitable hardware, software and firmware, and the like. A method may be implemented in an apparatus such as, for example, a processor, which generally refers to a computer, a microprocessor, a processing device, including an integrated circuit or programmable logic device, or the like. Processors also include communication devices such as computers, cell phones, portable/personal digital assistants ("PDAs") and other devices that facilitate communication of information between end-users.

Claims

classifying, by the control unit of the residential load characteristic analysis apparatus, residential users through k-means clustering;
extracting, by the control unit, principal component analysis to the classified clusters, which can represent each cluster; and
and analyzing, by the control unit, the characteristics of each cluster using the questionnaire responses at the time before the data collection of the extracted main consumers.

The method of claim 1, wherein the k-means clustering is
Multidimensional N data sets (

) is an unsupervised machine learning technique that classifies K clusters,
The data set is a residential load characteristic analysis method using k-means clustering and principal component analysis, characterized in that the data set is assigned to a cluster having the shortest distance and a small sum of squared error (SSE) value among a plurality of clusters.

3. The method of claim 2,
The center of the cluster is recalculated based on the assigned data,
the control unit,
Classifying the data set again based on the recalculated center of the cluster,
Repeat the clustering process until there is no change in the classified cluster,
The clustering process is a residential load characteristic analysis method using k-means clustering and principal component analysis, characterized in that it proceeds in the direction of minimizing the SSE value, which is a cost function.

The method of claim 1, wherein the k-means clustering is
Classification proceeds with the given number of clusters regardless of the optimal number of clusters,
In order to determine the optimal number of clusters in advance, the optimal number of clusters is selected using the 'Elbow method',
In the graph showing the value of the cost function calculated according to the total number of clusters (K), if the total number of clusters (K) is increased by more than a specified number of clusters (K), the marginal utility of the decrease in SSE occurs, causing the graph to bend. A residential load characteristic analysis method using k-means clustering and principal component analysis, characterized in that the number of clusters is selected as the optimal number of clusters.

According to claim 1, wherein the principal component analysis,
the control unit,
Variables using Equation 2 below

The principal component that reflects the interrelationship between

) by reducing the dimension of the existing data and representing it concisely. A residential load characteristic analysis method using k-means clustering and principal component analysis, characterized in that.
(Equation 2)

here

is the main component (

) is the coefficient of the existing variable that makes up the

is the main component (

) is a more important variable.

According to claim 1, wherein in the step of extracting the main consumers who can represent each cluster,
the control unit,
In order to extract only the main consumers representing the characteristics of each cluster,
chief ingredient(

)of

Sort values in descending order,

) Within

A variable that is a consumer with at least one (

) are accepted, but
The above standard rank (

) is a residential load characteristic analysis method using k-means clustering and principal component analysis, characterized in that it is calculated using Equation 3 below.
(Equation 3)

Here, the standard rank (

)Is

It is the standard rank of the principal component in the cluster,

Is

is the total number of principal components in the cluster,

is the ratio of the total variance that can be explained by the corresponding principal component, reflecting this to calculate the standard rank, and

)silver

The method of claim 1,
In the step of analyzing the characteristics of each cluster using the questionnaire responses at the time before the data collection of the extracted main consumers,
The questionnaire questions and response options for the questionnaire response are,
A residential load characteristic analysis method using k-means clustering and principal component analysis, characterized in that it is specified in advance in a table form.