KR20230136946A

KR20230136946A - Machine learning based real estate valuation method and system

Info

Publication number: KR20230136946A
Application number: KR1020220034451A
Authority: KR
Inventors: 송인근; 이호재; 정우일
Original assignee: 주식회사 에이드파트너스
Priority date: 2022-03-21
Filing date: 2022-03-21
Publication date: 2023-10-04

Abstract

본 발명은, 기계학습기반 부동산 가치 평가 방법 및 시스템으로, (a) 복수개의 부동산 각각의 특징을 나타내는 적어도 하나 이상의 입지 특성 데이터를 수집하는 단계; (b) 수집한 상기 입지 특성 데이터를 전처리하여 적어도 3차원 이상의 고차원 벡터인 입지 특성 벡터를 생성하는 단계; (c) 상기 입지 특성 벡터를 이용하여 군집화 모델을 학습하는 단계; 및 (d) 상기 군집화 모델을 이용하여 유사 군집 내 거래 금액을 비교하고 평가 지표를 생성하는 단계;를 포함할 수 있다.The present invention is a machine learning-based real estate valuation method and system, comprising: (a) collecting at least one location characteristic data representing the characteristics of each of a plurality of real estate properties; (b) preprocessing the collected location characteristic data to generate a location characteristic vector that is a high-dimensional vector of at least three dimensions; (c) learning a clustering model using the location characteristic vector; and (d) comparing transaction amounts within similar clusters using the clustering model and generating an evaluation index.

Description

Machine learning based real estate valuation method and system}

본 발명은 기계학습기반 부동산 가치 평가 방법 및 시스템에 관한 것으로서, 부동산의 가격에 영향을 끼치는 입지 특성을 분석하여 부동산의 가치를 평가할 수 있는 방법 및 시스템에 관한 것이다.The present invention relates to a machine learning-based real estate valuation method and system, and to a method and system that can evaluate the value of real estate by analyzing location characteristics that affect the price of real estate.

부동산의 가치는 다양한 방법들을 통해서 책정되는데, 감정평가 대상 토지와 그 쓰임과 환경이 비슷한 표준지 공시지가와 비교하여 평가하는 공시지가 기준법, 건축물 인근의 유사한 토지 또는 주택의 거래액을 기준으로 가격이 결정되는 거래사례 비교법, 및 해당 부동산에 대한 수익부분을 통해 가격을 평가하는 수익 환원법 등을 사용하고 있다.The value of real estate is determined through various methods, including the publicly announced land price standard method, which evaluates the land subject to appraisal by comparing it with the publicly announced land price of standard land with similar uses and environments, and transaction cases where the price is determined based on the transaction amount of similar land or houses near the building. The comparative method and the profit reduction method, which evaluates the price based on the profit portion of the property, are used.

최근에는, 상술한 부동산의 가치에 영향을 끼치는 다양한 요소들을 인공신경망을 활용하여 인공지능 모델에 학습시켜 해당 부동산의 가치를 평가하거나, 미래에 어떻게 변화할 것인가에 대한 예측을 수행하는 방법에 대한 연구가 이루어지고 있다.Recently, research has been conducted on methods to evaluate the value of real estate or predict how it will change in the future by learning various factors that affect the value of real estate mentioned above into an artificial intelligence model using artificial neural networks. is being done.

이러한 방법들을 활용한 종래의 부동산 가격 비교 방법들은 시도명, 시군구명, 읍면동명 등 행정구역에 따라 분류되어 비교하는 것으로, 행정 편의상의 구역을 기준으로 분류되어 비교하는 것이 일반적이다.Conventional real estate price comparison methods using these methods are classified and compared according to administrative districts such as city/city/county/district names, town/village/dong names, etc., and it is common to classify and compare based on districts for administrative convenience.

그러나, 부동산을 행정구역에 따라 분류하여 비교할 경우, 단순히 같은 지역에 위치하면 같은 특성을 갖는 것으로 간주되어 분류되므로, 해당 부동산의 실제 입지 특성을 명확하게 반영할 수 없다는 문제점이 있었으며, 또한, 인공지능의 특성 상 학습 데이터를 입력하면 그에 따른 결과물만이 출력될 뿐, 그 내부에서 어떠한 기준으로 평가하였는지에 대한 정보를 알 수 없으며 그 예측 정확도를 평가하는 데에도 어려움이 있었다.However, when real estate is classified and compared according to administrative district, there is a problem that the actual location characteristics of the real estate cannot be clearly reflected because it is classified as having the same characteristics simply if it is located in the same area. In addition, artificial intelligence Due to the nature of, when learning data is input, only the results according to the results are output, and information on the internal evaluation criteria is unknown, and it was difficult to evaluate the prediction accuracy.

본 발명은 상기와 같은 문제점을 포함하여 여러 문제점들을 해결하기 위한 것으로서, 단순히 편의상 분류된 행정구역에 따라 부동산을 분류하는 것이 아닌, 입지적 특성이 유사한 부동산들을 군집화하여 유사 군집 별 부동산 각각의 상대 가치를 비교하여 평가하고 적정 가격을 제시할 수 있는 기계학습기반 부동산 가치 평가 방법 및 시스템을 제공하는 것을 목적으로 한다. 그러나 이러한 과제는 예시적인 것으로, 이에 의해 본 발명의 범위가 한정되는 것은 아니다.The present invention is intended to solve various problems including the problems described above. Rather than simply classifying real estate according to administrative districts for convenience, the present invention clusters real estate with similar locational characteristics and determines the relative value of each real estate in each similar cluster. The purpose is to provide a machine learning-based real estate valuation method and system that can compare and evaluate and suggest an appropriate price. However, these tasks are illustrative and do not limit the scope of the present invention.

상기한 바와 같이 이루어진 본 발명의 일 실시예에 따르면, 기계학습기반 부동산 가치 평가 방법이 제공된다. 상기 기계학습기반 부동산 가치 평가 방법은, (a) 복수개의 부동산 각각의 특징을 나타내는 적어도 하나 이상의 입지 특성 데이터를 수집하는 단계; (b) 수집한 상기 입지 특성 데이터를 전처리하여 적어도 3차원 이상의 고차원 벡터인 입지 특성 벡터를 생성하는 단계; (c) 상기 입지 특성 벡터를 이용하여 군집화 모델을 학습하는 단계; 및 (d) 상기 군집화 모델을 이용하여 유사 군집 내 거래 금액을 비교하고 평가 지표를 생성하는 단계; 를 포함할 수 있다.According to one embodiment of the present invention as described above, a machine learning-based real estate value evaluation method is provided. The machine learning-based real estate value assessment method includes: (a) collecting at least one location characteristic data representing the characteristics of each of a plurality of real estate properties; (b) preprocessing the collected location characteristic data to generate a location characteristic vector that is a high-dimensional vector of at least three dimensions; (c) learning a clustering model using the location characteristic vector; and (d) comparing transaction amounts within similar clusters using the clustering model and generating an evaluation index; may include.

본 발명의 일 실시예에 따르면, 제 (a) 단계에 있어서, 상기 입지 특성 데이터는, 상기 부동산의 위치 데이터, 거래 데이터, 시설 데이터 중 적어도 하나 이상을 포함할 수 있다.According to an embodiment of the present invention, in step (a), the location characteristic data may include at least one of location data, transaction data, and facility data of the real estate.

본 발명의 일 실시예에 따르면, 제 (b) 단계에 있어서, 상기 입지 특성 벡터를 생성하는 단계는, (b1) 상기 위치 데이터를 기준으로 소정 반경 내에 위치하는 상기 시설 데이터의 수를 산출하는 단계; 및 (b2) 산출된 시설 데이터의 수, 상기 위치 데이터 및 상기 거래 데이터를 고려하여 입지 특성 벡터를 도출하는 단계; 를 포함할 수 있다.According to an embodiment of the present invention, in step (b), generating the location characteristic vector includes (b1) calculating the number of the facility data located within a predetermined radius based on the location data. ; and (b2) deriving a location characteristic vector by considering the calculated number of facility data, the location data, and the transaction data; may include.

본 발명의 일 실시예에 따르면, 상기 위치 데이터는, 적어도 위도 및 경도 중 어느 하나 이상을 포함하고, 상기 거래 데이터는, 적어도 평균전용평, 사용기간 및, 실제 거래가 중 어느 하나 이상을 포함하고, 상기 시설 데이터는, 적어도 지하철 수, 초등학교 수, 중학교 수, 고등학교 수, 소매업종 수, 생활서비스업종 수, 부동산업종 수, 관광여가오락업종 수, 숙박업종 수, 스포츠업종 수, 음식업종 수, 및 학문교육업종 수 중 어느 하나 이상을 포함할 수 있다.According to one embodiment of the present invention, the location data includes at least one of latitude and longitude, and the transaction data includes at least one of average exclusive review, period of use, and actual transaction price, The facility data includes at least the number of subways, the number of elementary schools, the number of middle schools, the number of high schools, the number of retail businesses, the number of living service industries, the number of real estate industries, the number of tourism leisure and entertainment industries, the number of lodging industries, the number of sports industries, the number of food industry industries, and It may include one or more of the number of academic and educational industries.

본 발명의 일 실시예에 따르면, 상기 입지 특성 벡터는, 상기 위도, 상기 경도, 상기 평균전용평, 상기 사용기간, 상기 지하철 수, 상기 초등학교 수, 상기 중학교 수, 상기 고등학교 수, 상기 소매업종 수, 상기 생활서비스업종 수, 상기 부동산업종 수, 상기 관광여가오락업종 수, 상기 숙박업종 수, 상기 스포츠업종 수, 상기 음식업종 수, 및 상기 학문교육업종 수를 모두 고려한 16차원 벡터일 수 있다.According to an embodiment of the present invention, the location characteristic vector includes the latitude, the longitude, the average exclusive use area, the period of use, the number of subways, the number of elementary schools, the number of middle schools, the number of high schools, and the number of retail stores. , It may be a 16-dimensional vector that takes into account the number of living service industries, the number of real estate industries, the number of tourism leisure and entertainment industries, the number of lodging industries, the number of sports industries, the number of food industries, and the number of academic education industries.

본 발명의 일 실시예에 따르면, 상기 (c) 단계에 있어서, 상기 군집화 모델은, K-Means 군집화 방법을 이용하여 생성되는 모델일 수 있다.According to one embodiment of the present invention, in step (c), the clustering model may be a model generated using the K-Means clustering method.

본 발명의 일 실시예에 따르면, 상기 (d) 단계에 있어서, 상기 평가 지표를 생성하는 단계는, (d1) 상기 유사 군집 내에 속해 있는 부동산의 평균평당가의 분포를 5분할하는 분위수를 산출하는 단계; 및 (d2) 상기 분위수의 범위와 상기 평균평당가를 비교하여 평가 지표를 생성하는 단계; 를 포함할 수 있다.According to an embodiment of the present invention, in step (d), the step of generating the evaluation index includes (d1) calculating a quantile that divides the distribution of the average price per pyeong of real estate belonging to the similar cluster into five. ; and (d2) generating an evaluation index by comparing the range of the quantiles and the average price per pyeong; may include.

본 발명의 일 실시예에 따르면, 상기 (d1) 단계에 있어서,According to one embodiment of the present invention, in step (d1),

상기 분위수는, 상기 평균평당가 분포의 25% 위치의 제 2 분위수, 상기 평균평당가 분포의 75% 위치의 제 3 분위수, 상기 제 2 분위수와 상기 제 3 분위수의 차이값에 1.5를 곱한 값을 상기 제 2 분위수에서 뺀 제 1 분위수 및, 상기 제 2 분위수와 상기 제 3 분위수의 차이값에 1.5를 곱한 값을 상기 제 3 분위수에 더한 제 4 분위수를 포함하고,The quantile is the second quantile at 25% of the average price per pyeong distribution, the third quantile at 75% of the average price per pyeong distribution, and the difference between the second quantile and the third quantile multiplied by 1.5. A first quantile subtracted from the second quantile, and a fourth quantile obtained by multiplying the difference between the second quantile and the third quantile by 1.5 and adding it to the third quantile,

상기 (d2) 단계에 있어서, 상기 평균평당가가 상기 제 2 분위수 이상 상기 제 3 분위수 미만의 범위에 속할 경우 상기 평가 지표를 “적정”으로 평가하고, 상기 제 1 분위수 이상 상기 제 2 분위수 미만의 범위에 속할 경우 상기 평가 지표를 “저평가”로 평가하고, 상기 제 3 분위수 이상 상기 제 4 분위수 미만의 범위에 속할 경우 상기 평가 지표를 “고평가”로 평가하고, 상기 제 1 분위수 이하의 범위에 속할 경우 상기 평가 지표를 “매우 저평가”로 평가하고, 상기 제 4 분위수 이상의 범위에 속할 경우 상기 평가 지표를 “매우 고평가”로 평가할 수 있다.In step (d2), if the average price per pyeong falls within the range of more than the second quantile and less than the third quantile, the evaluation index is evaluated as “appropriate”, and the evaluation index is evaluated as “appropriate”, and the average price per unit is in the range of more than the first quantile but less than the second quantile. If it falls within the range above the 3rd quartile but below the 4th quartile, the above evaluation indicator is evaluated as “highly evaluated”, and if it falls within the range below the 1st quartile The above evaluation index can be evaluated as “very undervalued,” and if it falls within the range above the 4th quartile, the above evaluation index can be evaluated as “very high.”

본 발명의 일 실시예에 따르면, 상기 (d) 단계 이후에, (e) 상기 평가 지표를 통해서 상기 유사 군집 내에 속해 있는 부동산의 적정 가격 수준을 산출하는 단계; 를 더 포함할 수 있다.According to an embodiment of the present invention, after step (d), (e) calculating an appropriate price level of real estate belonging to the similar group through the evaluation index; It may further include.

본 발명의 일 실시예에 따르면, 기계학습기반 부동산 가치 평가 시스템이 제공된다. 상기 기계학습기반 부동산 가치 평가 시스템은, 복수개의 부동산 각각의 특징을 나타내는 적어도 하나 이상의 입지 특성 데이터를 수집하는 데이터 수집부; 수집한 상기 입지 특성 데이터를 전처리하여 적어도 3차원 이상의 고차원 벡터인 입지 특성 벡터를 생성하는 데이터 전처리부; 상기 입지 특성 벡터를 이용하여 군집화 모델을 학습하는 군집화 모델 학습부; 및 상기 군집화 모델을 이용하여 유사 군집 내 거래 금액을 비교하고 평가 지표를 생성하는 가치평가부; 를 포함할 수 있다.According to an embodiment of the present invention, a machine learning-based real estate value evaluation system is provided. The machine learning-based real estate value evaluation system includes a data collection unit that collects at least one location characteristic data representing the characteristics of each of a plurality of real estate properties; a data preprocessing unit that preprocesses the collected location characteristic data to generate a location characteristic vector that is a high-dimensional vector of at least three dimensions; a clustering model learning unit that learns a clustering model using the location characteristic vector; and a valuation unit that compares transaction amounts within similar clusters using the clustering model and generates an evaluation index; may include.

상기한 바와 같이 이루어진 본 발명의 일 실시예에 따르면, 입지적 특성이 유사한 부동산들을 군집화하여 유사 군집 별 부동산 각각의 상대 가치를 비교하여 평가하고 적정 가격을 제시할 수 있으며, 종래의 미래 부동산 가격 예측 방법이 시장에서 신뢰하기 어렵고, 그 자체로 부동산의 적정 가격 수준을 판단하기 어려운 반면, 본 발명의 경우 현재 시점에서 유사 부동산 상품군 내에서 부동산 각각의 적정가격 수준을 신뢰할 수 있게 제시할 수 있다.According to an embodiment of the present invention made as described above, real estate properties with similar locational characteristics are clustered, the relative value of each real estate in each similar cluster can be compared and evaluated, and an appropriate price can be presented, and the conventional future real estate price prediction can be performed. While the method is difficult to trust in the market and it is difficult to determine the appropriate price level of real estate by itself, the present invention can reliably present the appropriate price level for each real estate within a group of similar real estate products at the present time.

또한, 행정 편의상의 구역을 기준으로 하는 것이 아니라 부동산의 실제 입지 특성 상의 유사성을 기준으로 부동산을 분류하므로 부동산의 가격을 평가하는 명확한 지표를 확립할 수 있다. 물론 이러한 효과에 의해 본 발명의 범위가 한정되는 것은 아니다.In addition, since real estate is classified based on similarities in the actual location characteristics of the real estate rather than based on administrative convenience zones, a clear index for evaluating the price of real estate can be established. Of course, the scope of the present invention is not limited by this effect.

도 1은 본 발명의 일 실시예에 따른 기계학습기반 부동산 가치 평가 방법의 각 단계들을 순서대로 나타낸 순서도이다.
도 2는 본 발명의 일 실시예에 따른 기계학습기반 부동산 가치 평가 방법으로 취득한 군집화 모델을 시각적으로 나타낸 이미지이다.
도 3은 본 발명의 일 실시예에 따른 기계학습기반 부동산 가치 평가 방법으로 평가 지표를 생성하는 단계를 나타낸 이미지이다.
도 4는 본 발명의 일 실시예에 따른 기계학습기반 부동산 가치 평가 시스템을 개략적으로 나타내는 개념도이다.Figure 1 is a flowchart sequentially showing each step of the machine learning-based real estate valuation method according to an embodiment of the present invention.
Figure 2 is an image visually showing a clustering model obtained by a machine learning-based real estate valuation method according to an embodiment of the present invention.
Figure 3 is an image showing the steps of generating an evaluation index using a machine learning-based real estate value evaluation method according to an embodiment of the present invention.
Figure 4 is a conceptual diagram schematically showing a machine learning-based real estate value evaluation system according to an embodiment of the present invention.

이하, 첨부된 도면을 참조하여 본 발명의 바람직한 여러 실시예들을 상세히 설명하기로 한다.Hereinafter, various preferred embodiments of the present invention will be described in detail with reference to the attached drawings.

본 발명의 실시예들은 당해 기술 분야에서 통상의 지식을 가진 자에게 본 발명을 더욱 완전하게 설명하기 위하여 제공되는 것이며, 하기 실시예는 여러 가지 다른 형태로 변형될 수 있으며, 본 발명의 범위가 하기 실시예에 한정되는 것은 아니다. 오히려 이들 실시예들은 본 개시를 더욱 충실하고 완전하게 하고, 당업자에게 본 발명의 사상을 완전하게 전달하기 위하여 제공되는 것이다. 또한, 도면에서 각 층의 두께나 크기는 설명의 편의 및 명확성을 위하여 과장된 것이다.The embodiments of the present invention are provided to more completely explain the present invention to those skilled in the art, and the following examples may be modified into various other forms, and the scope of the present invention is as follows. It is not limited to the examples. Rather, these embodiments are provided to make the present disclosure more faithful and complete and to fully convey the spirit of the present invention to those skilled in the art. Additionally, the thickness and size of each layer in the drawings are exaggerated for convenience and clarity of explanation.

이하, 본 발명의 실시예들은 본 발명의 이상적인 실시예들을 개략적으로 도시하는 도면들을 참조하여 설명한다. 도면들에 있어서, 예를 들면, 제조 기술 및/또는 공차(tolerance)에 따라, 도시된 형상의 변형들이 예상될 수 있다. 따라서, 본 발명 사상의 실시예는 본 명세서에 도시된 영역의 특정 형상에 제한된 것으로 해석되어서는 아니 되며, 예를 들면 제조상 초래되는 형상의 변화를 포함하여야 한다.DETAILED DESCRIPTION OF THE PREFERRED EMBODIMENTS Embodiments of the present invention will now be described with reference to drawings that schematically show ideal embodiments of the present invention. In the drawings, variations of the depicted shape may be expected, for example, depending on manufacturing technology and/or tolerances. Accordingly, embodiments of the present invention should not be construed as being limited to the specific shape of the area shown in this specification, but should include, for example, changes in shape resulting from manufacturing.

도 1은 본 발명의 일 실시예에 따른 기계학습기반 부동산 가치 평가 방법의 각 단계들을 순서대로 나타낸 순서도이다. 도 1에 도시된 바와 같이, 본 발명의 일 실시예에 따른 기계학습기반 부동산 가치 평가 방법은, (a) 복수개의 부동산 각각의 특징을 나타내는 적어도 하나 이상의 입지 특성 데이터를 수집하는 단계를 포함할 수 있다.Figure 1 is a flowchart sequentially showing each step of the machine learning-based real estate valuation method according to an embodiment of the present invention. As shown in Figure 1, the machine learning-based real estate value assessment method according to an embodiment of the present invention may include the step of (a) collecting at least one location characteristic data representing the characteristics of each of a plurality of real estate properties. there is.

입지 특성 데이터는 부동산의 위치 데이터, 거래 데이터, 시설 데이터 중 적어도 하나 이상을 포함할 수 있다. 이때, 부동산의 위치 데이터는, 적어도 위도 및 경도 중 어느 하나 이상을 포함하고, 부동산의 거래 데이터는, 적어도 평균전용평, 사용기간 및, 실제 거래가 중 어느 하나 이상을 포함하고, 부동산의 시설 데이터는, 적어도 지하철 수, 초등학교 수, 중학교 수, 고등학교 수, 소매업종 수, 생활서비스업종 수, 부동산업종 수, 관광여가오락업종 수, 숙박업종 수, 스포츠업종 수, 음식업종 수, 및 학문교육업종 수 중 어느 하나 이상을 포함할 수 있다.Location characteristic data may include at least one of real estate location data, transaction data, and facility data. At this time, the location data of the real estate includes at least one or more of latitude and longitude, the transaction data of the real estate includes at least one or more of the average exclusive use square footage, period of use, and actual transaction price, and the facility data of the real estate includes , at least the number of subways, the number of elementary schools, the number of middle schools, the number of high schools, the number of retail industries, the number of living service industries, the number of real estate industries, the number of tourism leisure entertainment industries, the number of lodging industries, the number of sports industries, the number of food industry industries, and the number of academic education industries. It may include one or more of the following.

또한, 본 발명의 일 실시예에 따른 기계학습기반 부동산 가치 평가 방법은, (b) 수집한 입지 특성 데이터를 전처리하여 적어도 3차원 이상의 고차원 벡터인 입지 특성 벡터를 생성하는 단계를 포함할 수 있다. 이때, 상술한 입지 특성 벡터를 생성하는 제 (b) 단계에 있어서, 입지 특성 벡터를 생성하는 단계는, (b1) 위치 데이터를 기준으로 소정 반경 내에 위치하는 시설 데이터의 수를 산출하는 단계; 및 (b2) 산출된 시설 데이터의 수, 위치 데이터 및 거래 데이터를 고려하여 입지 특성 벡터를 도출하는 단계를 포함할 수 있다.In addition, the machine learning-based real estate value assessment method according to an embodiment of the present invention may include the step of (b) preprocessing the collected location characteristic data to generate a location characteristic vector, which is a high-dimensional vector of at least three dimensions. At this time, in the step (b) of generating the location characteristic vector described above, the step of generating the location characteristic vector includes (b1) calculating the number of facility data located within a predetermined radius based on the location data; and (b2) deriving a location characteristic vector by considering the calculated number of facility data, location data, and transaction data.

이때, 입지 특성 벡터는 상기 위도, 상기 경도, 상기 평균전용평, 상기 사용기간, 상기 지하철 수, 상기 초등학교 수, 상기 중학교 수, 상기 고등학교 수, 상기 소매업종 수, 상기 생활서비스업종 수, 상기 부동산업종 수, 상기 관광여가오락업종 수, 상기 숙박업종 수, 상기 스포츠업종 수, 상기 음식업종 수, 및 상기 학문교육업종 수를 모두 고려한 16차원 벡터일 수 있다.At this time, the location characteristic vectors include the latitude, the longitude, the average exclusive use square footage, the period of use, the number of subways, the number of elementary schools, the number of middle schools, the number of high schools, the number of retail stores, the number of living service industries, and the real estate. It may be a 16-dimensional vector that takes into account the number of industries, the number of tourism and leisure entertainment industries, the number of lodging industries, the number of sports industries, the number of food industries, and the number of academic education industries.

따라서, 부동산의 위도와 경도로 나타나는 위치를 기준으로 소정 반경 내에 위치한 다양한 시설들의 수를 산출함으로써 단순히 행정구역을 기준으로 해당 부동산을 분류하는 것이 아닌 입지적 특성을 반영하여 분류할 수 있다.Therefore, by calculating the number of various facilities located within a predetermined radius based on the location indicated by the latitude and longitude of the real estate, it is possible to classify the real estate by reflecting the locational characteristics rather than simply classifying the real estate based on administrative district.

또한, 본 발명의 일 실시예에 따른 기계학습기반 부동산 가치 평가 방법은, (c) 상기 입지 특성 벡터를 이용하여 군집화 모델을 학습하는 단계를 포함할 수 있다. 이때, 상술한 군집화 모델을 학습하는 (c) 단계에 있어서, 군집화 모델은, K-Means 군집화 방법을 이용하여 생성되는 모델일 수 있다.Additionally, the machine learning-based real estate value assessment method according to an embodiment of the present invention may include the step of (c) learning a clustering model using the location characteristic vector. At this time, in step (c) of learning the clustering model described above, the clustering model may be a model generated using the K-Means clustering method.

K-Means 군집화 방법은, 주어진 데이터들을 미리 설정된 K 개의 클러스터로 묶어 분류하는 방법으로서, 대상 데이터들을 몇 개의 군집으로 나눌 것인지에 대한 K 값과 분류 대상 데이터들을 입력값으로 하여 K 개의 클러스터가 출력된다. 이하에서는 K-Means 군집화 방법의 군집화 알고리즘을 상세하게 설명하도록 한다.The K-Means clustering method is a method of classifying given data by grouping them into K preset clusters. K clusters are output using the K value for dividing the target data into how many clusters and the classification target data as input values. Below, the clustering algorithm of the K-Means clustering method will be described in detail.

먼저, 데이터 집합에서 k 개의 데이터 오브젝트를 임의로 추출하고, 추출한 데이터 오브젝트들을 각 클러스터의 중심(centroid)으로 초기값을 설정한다. 이어서, 데이터 집합의 각 데이터 오브젝트들에 대해 k 개의 클러스터 중심 오브젝트와의 거리를 각각 구하고, 각 데이터 오브젝트가 어느 중심점 (centroid) 와 가장 유사도가 높은 중심점으로 각 데이터 오브젝트들을 할당한다. 이후, 재할당된 클러스터들을 기준으로 중심점을 다시 계산하고, 각 데이터 오브젝트의 소속 클러스터가 바뀌지 않을 때까지 위 과정을 반복한다.First, k data objects are randomly extracted from the data set, and initial values are set for the extracted data objects as the centroid of each cluster. Next, the distance to the k cluster center objects is calculated for each data object in the data set, and each data object is assigned to the centroid that has the highest similarity to a centroid. Afterwards, the center point is recalculated based on the reallocated clusters, and the above process is repeated until the cluster belonging to each data object does not change.

본 발명의 일 실시예에 따른 기계학습기반 부동산 가치 평가 방법의 (a) 단계 내지 (c) 단계에 따라 군집화 모델이 취득될 수 있다. 도 2에 도시된 바와 같이, 상기 군집화 모델은, 예컨대, K 값을 5로 설정하여 제 1 군집(G1), 제 2 군집(G2), 제 3 군집(G3), 제 4 군집(G4), 및, 제 5 군집(G5)으로 이루어진 5개의 군집으로 분류된 군집화 모델로써, 경도값을 가로축으로 하고 위도값을 세로축으로 한 좌표평면 상에 이미지 형식으로 나타날 수 있다.A clustering model can be obtained according to steps (a) to (c) of the machine learning-based real estate value assessment method according to an embodiment of the present invention. As shown in Figure 2, the clustering model, for example, sets the K value to 5 to form the first cluster (G1), the second cluster (G2), the third cluster (G3), the fourth cluster (G4), and a fifth cluster (G5), which is a clustering model classified into five clusters and can be displayed in image format on a coordinate plane with longitude values as the horizontal axis and latitude values as the vertical axis.

또한, 본 발명의 일 실시예에 따른 기계학습기반 부동산 가치 평가 방법은, (d) 군집화 모델을 이용하여 유사 군집 내 거래 금액을 비교하고 평가 지표를 생성하는 단계를 포함할 수 있다. 상술한 평가 지표를 생성하는 단계는, (d1) 상기 유사 군집 내에 속해 있는 부동산의 평균평당가의 분포를 5분할하는 분위수를 산출하는 단계; 및 (d2) 상기 분위수의 범위와 상기 평균평당가를 비교하여 평가 지표를 생성하는 단계를 포함할 수 있다.Additionally, the machine learning-based real estate value evaluation method according to an embodiment of the present invention may include the step of (d) comparing transaction amounts within similar clusters using a clustering model and generating an evaluation index. The step of generating the above-mentioned evaluation index includes (d1) calculating a quantile that divides the distribution of the average price per pyeong of real estate belonging to the similar cluster into five; and (d2) generating an evaluation index by comparing the range of the quantiles and the average price per pyeong.

이때, 분위수를 산출하는 (d1) 단계에 있어서, 분위수는, 도 3에 도시된 바와 같이, 평균평당가 분포의 25% 위치의 제 2 분위수(Q2), 평균평당가 분포의 75% 위치의 제 3 분위수(Q3), 제 2 분위수(Q2)와 제 3 분위수(Q3)의 차이값에 1.5를 곱한 값을 제 2 분위수(Q2)에서 뺀 제 1 분위수(Q1) 및, 제 2 분위수(Q2)와 제 3 분위수(Q3)의 차이값에 1.5를 곱한 값을 제 3 분위수(Q3)에 더한 제 4 분위수(Q4)를 포함할 수 있다.At this time, in the step (d1) of calculating the quantile, the quantile is the second quantile (Q2) at 25% of the average price per pyeong distribution and the third quantile at 75% of the average price per pyeong distribution, as shown in FIG. (Q3), the first quantile (Q1) obtained by subtracting the difference between the second quantile (Q2) and the third quantile (Q3) by 1.5 from the second quantile (Q2), the second quantile (Q2) and the third quantile (Q2). It may include the fourth quantile (Q4) obtained by multiplying the difference between the third quartiles (Q3) by 1.5 and adding it to the third quantile (Q3).

또한, 평가 지표를 생성하는 (d2) 단계에 있어서, 평균평당가가 제 2 분위수(Q2) 이상 제 3 분위수(Q3) 미만의 범위에 속할 경우 평가 지표를 “적정”으로 평가하고, 제 1 분위수(Q1) 이상 제 2 분위수(Q2) 미만의 범위에 속할 경우 평가 지표를 “저평가”로 평가하고, 제 3 분위수(Q3) 이상 제 4 분위수(Q4) 미만의 범위에 속할 경우 평가 지표를 “고평가”로 평가하고, 제 1 분위수(Q1) 이하의 범위에 속할 경우 평가 지표를 “매우 저평가”로 평가하고, 제 4 분위수(Q4) 이상의 범위에 속할 경우 평가 지표를 “매우 고평가”로 평가할 수 있다.In addition, in the step (d2) of generating the evaluation index, if the average price per pyeong falls within the range of more than the 2nd quantile (Q2) and less than the 3rd quantile (Q3), the evaluation index is evaluated as “appropriate” and the 1st quantile ( If it falls within the range of Q1) or more than the 2nd quantile (Q2), the evaluation indicator is evaluated as “undervalued”, and if it falls within the range of 3rd quartile (Q3) or more but less than the 4th quartile (Q4), the evaluation indicator is evaluated as “overvalued”. If it falls within the range below the 1st quartile (Q1), the evaluation indicator can be evaluated as “very undervalued,” and if it falls within the range above the 4th quartile (Q4), the evaluation indicator can be evaluated as “very high.”

또한, 본 발명의 일 실시예에 따른 기계학습기반 부동산 가치 평가 방법은, 평가 지표를 생성하는 (d) 단계 이후에, (e) 평가 지표를 통해서 유사 군집 내에 속해 있는 부동산의 적정 가격 수준을 산출하는 단계를 더 포함할 수 있다. 예컨대, 해당 부동산의 평가 지표가 “저평가”로 평가되었을 때, 제 2 분위수(Q2) 이상 제 3 분위수(Q3) 미만의 범위의 평균평당가를 적정 가격으로 산출할 수 있다.In addition, the machine learning-based real estate value evaluation method according to an embodiment of the present invention calculates the appropriate price level of real estate belonging to a similar cluster through the evaluation index (e) after step (d) of generating the evaluation index. Additional steps may be included. For example, when the evaluation index of the relevant real estate is evaluated as “undervalued,” the average price per pyeong in the range from the 2nd quantile (Q2) to the 3rd quantile (Q3) can be calculated as the appropriate price.

이하에서는 상술한 기계학습기반 부동산 가치 평가 방법을 이용한 기계학습기반 부동산 가치 평가 시스템(100)에 대하여 상세하게 설명하도록 한다.Below, the machine learning-based real estate value evaluation system 100 using the machine learning-based real estate value evaluation method described above will be described in detail.

도 4에 도시된 바와 같이, 본 발명의 일 실시예에 따른 기계학습기반 부동산 가치 평가 시스템(100)은, 복수개의 부동산 각각의 특징을 나타내는 적어도 하나 이상의 입지 특성 데이터를 수집하는 데이터 수집부(110), 수집한 입지 특성 데이터를 전처리하여 적어도 3차원 이상의 고차원 벡터인 입지 특성 벡터를 생성하는 데이터 전처리부(120), 입지 특성 벡터를 이용하여 군집화 모델을 학습하는 군집화 모델 학습부(130); 및 군집화 모델을 이용하여 유사 군집 내 거래 금액을 비교하고 평가 지표를 생성하는 가치평가부(140)를 포함할 수 있다.As shown in FIG. 4, the machine learning-based real estate value evaluation system 100 according to an embodiment of the present invention includes a data collection unit 110 that collects at least one location characteristic data representing the characteristics of each of a plurality of real estate properties. ), a data pre-processing unit 120 that pre-processes the collected location characteristic data to generate a location characteristic vector, which is a high-dimensional vector of at least three dimensions, a clustering model learning unit 130 that learns a clustering model using the location characteristic vector; and a valuation unit 140 that compares transaction amounts within similar clusters using a clustering model and generates an evaluation index.

결론적으로, 상술한 바와 같은 본 발명의 기계학습기반 부동산 가치 평가 방법 및 시스템을 이용하여 입지적 특성이 유사한 부동산들을 군집화하여 유사 군집 별 부동산 각각의 상대 가치를 비교하여 평가하고 적정 가격을 제시할 수 있으며, 종래의 미래 부동산 가격 예측 방법이 시장에서 신뢰하기 어렵고, 그 자체로 부동산의 적정 가격 수준을 판단하기 어려운 반면, 본 발명의 경우 현재 시점에서 유사 부동산 상품군 내에서 부동산 각각의 적정가격 수준을 신뢰할 수 있게 제시할 수 있으며, 부동산의 실제 입지 특성 상의 유사성을 기준으로 부동산을 분류하여 부동산의 가격을 평가하는 명확한 지표를 확립할 수 있다In conclusion, using the machine learning-based real estate valuation method and system of the present invention as described above, real estate properties with similar location characteristics can be clustered, the relative value of each real estate in each similar cluster can be compared and evaluated, and an appropriate price can be presented. In addition, while the conventional method of predicting future real estate prices is difficult to trust in the market and it is difficult to determine the appropriate price level of real estate in itself, in the case of the present invention, the appropriate price level of each real estate within the similar real estate product group at the present time can be trusted. It is possible to establish a clear index for evaluating the price of real estate by classifying real estate based on similarities in the actual location characteristics of the real estate.

본 발명은 도면에 도시된 실시예를 참고로 설명되었으나 이는 예시적인 것에 불과하며, 당해 기술분야에서 통상의 지식을 가진 자라면 이로부터 다양한 변형 및 균등한 다른 실시예가 가능하다는 점을 이해할 것이다. 따라서 본 발명의 진정한 기술적 보호 범위는 첨부된 특허청구범위의 기술적 사상에 의하여 정해져야 할 것이다.The present invention has been described with reference to the embodiments shown in the drawings, but these are merely exemplary, and those skilled in the art will understand that various modifications and equivalent other embodiments are possible therefrom. Therefore, the true scope of technical protection of the present invention should be determined by the technical spirit of the attached patent claims.

G1: 제 1 군집
G2: 제 2 군집
G3: 제 3 군집
G4: 제 4 군집
G5: 제 5 군집
Q1: 제 1 분위수
Q2: 제 2 분위수
Q3: 제 3 분위수
Q4: 제 4 분위수
100: 기계학습기반 부동산 가치평가 시스템
110: 데이터 수집부
120: 데이터 전처리부
130: 군집화 모델 학습부
140: 가치평가부G1: Cluster 1
G2: Cluster 2
G3: Cluster 3
G4: Cluster 4
G5: 5th Cluster
Q1: 1st quantile
Q2: 2nd quantile
Q3: Third quartile
Q4: 4th quartile
100: Machine learning-based real estate valuation system
110: Data collection unit
120: Data preprocessing unit
130: Clustering model learning unit
140: Valuation Department

Claims

(a) collecting at least one location characteristic data representing the characteristics of each of a plurality of real estate properties;
(b) preprocessing the collected location characteristic data to generate a location characteristic vector that is a high-dimensional vector of at least three dimensions;
(c) learning a clustering model using the location characteristic vector; and
(d) comparing transaction amounts within similar clusters using the clustering model and generating an evaluation index;
Including, machine learning-based real estate valuation method.

According to claim 1,
In step (a),
The location characteristic data is,
A machine learning-based real estate valuation method including at least one of location data, transaction data, and facility data of the real estate.

According to claim 2,
In step (b),
The step of generating the location characteristic vector is,
(b1) calculating the number of the facility data located within a predetermined radius based on the location data; and
(b2) deriving a location characteristic vector by considering the calculated number of facility data, the location data, and the transaction data;
Including, machine learning-based real estate valuation method.

According to claim 2,
The location data includes at least one of latitude and longitude,
The transaction data includes at least one of average exclusive review, period of use, and actual transaction price,
The facility data includes at least the number of subways, the number of elementary schools, the number of middle schools, the number of high schools, the number of retail businesses, the number of living service industries, the number of real estate industries, the number of tourism leisure and entertainment industries, the number of lodging industries, the number of sports industries, the number of food industry industries, and A machine learning-based real estate valuation method that includes one or more of the academic and education industries.

According to claim 4,
The location characteristic vector is,
The latitude, the longitude, the average floor space, the period of use, the number of subways, the number of elementary schools, the number of middle schools, the number of high schools, the number of retail stores, the number of living service industries, the number of real estate industries, the tourism leisure A machine learning-based real estate valuation method that is a 16-dimensional vector that considers the number of entertainment industries, the number of lodging industries, the number of sports industries, the number of food industries, and the number of academic education industries.

According to claim 1,
In step (c) above,
The clustering model is a machine learning-based real estate valuation method that is created using the K-Means clustering method.

According to claim 6,
In step (d) above,
The step of generating the evaluation index is,
(d1) calculating quantiles that divide the distribution of the average price per pyeong of real estate belonging to the similar cluster into five; and
(d2) generating an evaluation index by comparing the range of the quantiles and the average price per pyeong;
Including, machine learning-based real estate valuation method.

According to claim 7,
In step (d1),
The quantile is the second quantile at 25% of the average price per pyeong distribution, the third quantile at 75% of the average price per pyeong distribution, and the difference between the second quantile and the third quantile multiplied by 1.5. A first quantile subtracted from the second quantile, and a fourth quantile obtained by multiplying the difference between the second quantile and the third quantile by 1.5 and adding it to the third quantile,
In step (d2),
If the average pyeong price falls within the range of more than the 2nd quantile and less than the 3rd quantile, the evaluation index is evaluated as “adequate”, and if it falls within the range of more than the 1st quantile but less than the 2nd quantile, the evaluation index is evaluated as “adequate”. If it falls within the range above the 3rd quartile but below the 4th quartile, the above evaluation indicator is evaluated as “highly evaluated”, and if it falls within the range below the 1st quartile, the above evaluation indicator is evaluated as “very undervalued”. A machine learning-based real estate valuation method that evaluates the evaluation index as “very overvalued” if it falls within the 4th quartile or higher.

According to claim 1,
After step (d) above,
(e) calculating an appropriate price level for real estate belonging to the similar group through the evaluation index;
A machine learning-based real estate valuation method that further includes.

a data collection unit that collects at least one location characteristic data representing the characteristics of each of a plurality of real estate properties;
a data preprocessing unit that preprocesses the collected location characteristic data to generate a location characteristic vector that is a high-dimensional vector of at least three dimensions;
a clustering model learning unit that learns a clustering model using the location characteristic vector; and
a valuation unit that compares transaction amounts within similar clusters using the clustering model and generates an evaluation index;
Including, machine learning-based real estate valuation system.