KR20210082105A

KR20210082105A - An apparatus for generating a learning model for predicting real estate transaction price

Info

Publication number: KR20210082105A
Application number: KR1020200182102A
Authority: KR
Inventors: 임현서
Original assignee: 탱커주식회사
Priority date: 2019-12-24
Filing date: 2020-12-23
Publication date: 2021-07-02
Also published as: KR20210082104A; KR20210082106A; KR20210082113A; KR20210082110A; KR20210082108A; KR20210082103A; KR20210082112A; KR20210082107A; KR20210082111A; KR20210082114A; KR20210082109A

Abstract

The present invention relates to an apparatus for generating a learning model for predicting a real estate transaction price, which provides asset estimation and related services based on a predicted actual real estate transaction price to user terminals. According to the present invention, the apparatus generates a learning model for predicting a real estate transaction price by performing the following steps: separating a loaded data set into a training set and a test set on the basis of on a specific time point; forming an instance by extracting or adding a feature for each data of the training set; generating a decision tree-based time series trend prediction learning model by performing gradient boosting on the basis of instances of the training set; forming an instance for the test set data by extracting or adding a feature in the same manner as the training set; generating virtual data in which an actual transaction price is predicted by inputting an instance feature of the test set data to the learning model; and measuring an error between the actual transaction price of the virtual data and the test set data, changing a hyperparameter of the learning model generate step to expand the learning tree when target performance is not met, and fixing the learning model when the target performance is satisfied.

Description

An apparatus for generating a learning model for predicting real estate transaction price

본 발명은 부동산 실거래가 예측을 위한 학습 모델 생성 장치에 관한 것이다. 보다 구체적으로, 본 발명은 부동산 실거래가 데이터 베이스의 트레이닝 세트 데이터를 이용해 결정 트리(decision tree) 기반의 시계열 추세 예측 학습 모델을 구축하여 부동산 실거래가 예측 기반 부동산 자산 추정 서비스를 제공할 수 있도록 하는 부동산 실거래가 예측을 위한 학습 모델 생성 장치에 관한 것이다.The present invention relates to an apparatus for generating a learning model for real estate transaction price prediction. More specifically, the present invention builds a decision tree-based time series trend prediction learning model using the training set data of the real estate transaction database, so that the real estate transaction can provide a prediction-based real estate asset estimation service. It relates to an apparatus for generating a learning model for real transaction price prediction.

국토교통부 실거래가 공개시스템과 같이, 최근에는 부동산 실거래가에 대한 다양한 데이터들이 투명하게 공개되어 공유되고 있으며, 이를 통해 현재 거래할 부동산의 부동산 예상 실거래가를 산출하는 방식들은 금융 기관과 개인을 가리지 않고 널리 사용되고 있다.Like the Ministry of Land, Infrastructure and Transport's real transaction price disclosure system, various data on real estate transaction prices have recently been transparently disclosed and shared. It is widely used.

그러나, 현재까지 제안된 부동산 예상 실거래가 산출 기술들은 아직까지 완성적이지 못하며, 많은 문제점들을 내포하고 있는 실정이다.However, the techniques for calculating the expected real transaction price of real estate proposed so far are not yet complete, and contain many problems.

예를 들어, 금융 기관에서 제시한 시세 시표나 공시가격 지표등을 통해 부동산 실거래가를 예측하는 방식이 있으나, 이러한 지표들은 실거래가와 현장 거래 의견 등을 반영하여 작성되고, 지표 생성자의 직관에 의존되는 경우가 많아 정확성이 떨어지는 문제점이 있다. 또한, 금융 기관의 경우 세금 부과나 대출 실행의 주체이므로, 이해관계에 좌우되는 부분이 있어 공정성이 결여되기 쉽다.For example, there is a method of predicting the actual transaction price of real estate through market price tables or published price indicators presented by financial institutions, but these indicators are prepared by reflecting the actual transaction price and opinions of on-site transactions, and rely on the intuition of the index creator. In many cases, there is a problem that the accuracy is lowered. In addition, in the case of financial institutions, since they are the subject of taxation or loan execution, there are parts that depend on their interests, so it is easy to lack fairness.

한편, 감정 평가 이론을 이용한 부동산 예상 실거래가 산출 기술의 경우, 체계화된 평가 기준을 제시하고, 그에 기초한 부동산의 예상 실거래가를 산출하고 있지만, 그 평가 기준 및 평가 결과도 자의적일 수 있으며, 실거래가와 다소 괴리가 있는 가격을 예상 실거래가로 산출하는 일도 잦은 문제점이 있다.On the other hand, in the case of real estate expected real transaction price calculation technology using appraisal theory, a systematic evaluation standard is presented and the expected real transaction price of real estate is calculated based on it, but the evaluation standard and evaluation result may also be arbitrary, and the actual transaction price There is also a frequent problem in calculating a price that is somewhat different from the expected actual transaction price.

이에 최근 시장에서는 기계학습을 활용하여 부동산 예상 실거래가를 산출하고자 하는 흐름이 전개되고 있다. 그러나, 예측하고자 하는 데이터의 특성에 적합한 시도가 이루어지지는 못하고 있는 실정이다. 예를 들어 빅밸류의 관련 선행특허(10-2016-0123722 등)에서는 다중회귀분석과 유사도 점수 등의 방법을 활용하여 부동산의 예상 실거래가를 산출할 수 있는 것을 설명하고 있다.In recent years, there is a trend in the market to calculate the expected real transaction price of real estate by using machine learning. However, no attempt has been made to suit the characteristics of the data to be predicted. For example, in related prior patents of Big Value (10-2016-0123722, etc.), it is explained that the expected actual transaction price of real estate can be calculated using methods such as multiple regression analysis and similarity score.

그러나, 다중회귀분석의 경우 정교한 모델링이 어려울 뿐만 아니라 투입하는 특징(feature)의 설명력에 의해 그 정확도가 크게 좌우되므로 부동산 예상 실거래가 산출에 활용하기에는 부적합하다.However, in the case of multiple regression analysis, sophisticated modeling is difficult, and the accuracy is greatly affected by the explanatory power of the input features, so it is not suitable for use in calculating the expected real estate transaction price.

또한, 유사도 점수 분석의 경우에도 각 부동산 물건마다 유사도를 계산할 부동산 물건을 복수 개 매칭하여야 하는데, 사용자의 예측 요청 이전 모든 부동산에 대해 유사도 점수를 계산해놓지 않는 이상 계산에 많은 시간이 소모되므로 실시간 서비스 제공시 애로사항이 다수 발생할 것으로 예상된다. Also, in the case of similarity score analysis, it is necessary to match a plurality of real estate objects for which similarity is to be calculated for each real estate object. Real-time service is provided because it takes a lot of time to calculate unless similarity scores are calculated for all real estate before the user's prediction request It is expected that a number of difficulties will arise in the city.

나아가, 부동산 실거래가 데이터는 특징(feature) 간 구분을 명확히 할 수 있어 표로 정리 가능한 데이터, 즉 타뷸러 데이터(tabular data)일 수 있으며, 이와 같은 유형의 데이터는 특정 특징정보(feature)의 값을 기준으로 학습 세트(training set)를 분류하고, 분류된 학습 세트별로 학습하는 결정 트리(decision tree) 계열 학습 방식이 효과적인 것으로 알려져 있다. 그러나, 학습과 동시에 현재 학습 세트(training set)보다 미래 시점의 시계열 데이터를 예측하게 하는 경우 결정 트리 방식만으로는 성능이 급락한다는 단점이 존재하는데, 이는 현재 시점의 예상 실거래가를 산출하는 것은 단기 미래 예측에 해당되므로 정확도를 보장하기 어렵기 때문이다.Furthermore, real estate transaction price data can clearly distinguish between features, so it can be tabular data, that is, tabular data, and this type of data contains the value of specific feature information. It is known that a decision tree-series learning method in which a training set is classified as a criterion and learned by the classified training set is effective. However, when learning and predicting time series data at a future point in time rather than the current training set at the same time, there is a disadvantage that the performance drops sharply only with the decision tree method. This is because it is difficult to guarantee accuracy.

또한, 부동산 실거래가 데이터는 시계열 데이터로서 정밀한 추정을 가능하게 위한 전처리 및 데이터베이스 구축이 필요하나, 모든 데이터를 데이터베이스에 적재하기에는 저장 공간 및 복잡도상 어려움이 있으며, 기계 학습 모델에 적합하면서도 활용성 높은 데이터베이스의 구축방법이 요구되고 있는 실정이다.In addition, real estate transaction price data is time series data and requires preprocessing and database construction to enable precise estimation, but it is difficult in terms of storage space and complexity to load all data into the database, and it is a highly usable database suitable for machine learning models. There is a need for a construction method of

본 발명은 공공 데이터베이스로부터 사전 처리된 실거래가 데이터베이스를 이용하여, 기계 학습을 이용한 부동산 실거래가 예측 모델을 구축하는 부동산 실거래가 예측을 위한 학습 모델 생성방법 및 그 장치를 제공하기 위한 것이다. An object of the present invention is to provide a method and an apparatus for generating a learning model for real estate real transaction price prediction for constructing a real real estate real transaction price prediction model using machine learning using a real transaction price database pre-processed from a public database.

또한 본 발명은 상기 부동산 실거래가 예측을 위한 학습 모델에 기초한 현재 및 미래의 부동산 시세에 대한 예상 실거래가 산출 데이터를 산출함으로써, 부동산 예측 실거래가 기반의 자산 추정 및 관련 서비스를 사용자 단말로 제공할 수 있는 부동산 예상 실거래가 산출 장치 및 그 동작 방법을 제공하는 데 그 목적이 있다.In addition, the present invention can provide a user terminal with asset estimation and related services based on the predicted real transaction price by calculating the expected actual transaction price calculation data for the current and future real estate market prices based on the learning model for the real estate actual transaction price prediction. An object of the present invention is to provide a real estate expected real transaction price calculation device and an operating method thereof.

상기한 바와 같은 과제를 해결하기 위한 본 발명의 실시 예에 따른 부동산 실거래가 예측을 위한 학습 모델 생성 장치는, 로드된 데이터 세트를 특정 시점을 기준으로 트레이닝 세트와, 테스트 세트으로 분리하는 단계; 상기 트레이닝 세트의 각 데이터마다 특징(feature)을 추출하거나 추가하여 인스턴스(instance)를 형성하는 단계; 상기 트레이닝 세트의 인스턴스에 의거하여 그래디언트 부스팅을 진행하여 결정 트리(decision tree) 기반의 시계열 추세 예측 학습 모델을 생성하는 단계; 상기 테스트 세트 데이터에 대해 상기 트레이닝 세트과 같은 방식으로 특징 추출 또는 추가에 의해 인스턴스를 형성하는 단계; 상기 테스트 세트 데이터의 인스턴스 특징을 상기 학습 모델에 투입하여 실거래가를 예측한 가상 데이터를 생성하는 단계; 상기 가상 데이터와 상기 테스트 세트 데이터의 실거래가 사이의 오차를 측정하여, 목표 성능 미달시 상기 학습 모델 생성단계의 하이퍼파라메터를 가변시켜 학습 트리를 더 확장하고, 목표 성능 만족시 학습 모델을 고정하는 단계를 포함하는 학습 모델 생성 방법을 수행하여 학습모델을 생성한다.The apparatus for generating a learning model for predicting real estate transaction price according to an embodiment of the present invention for solving the above-described problems includes the steps of dividing a loaded data set into a training set and a test set based on a specific time point; forming an instance by extracting or adding a feature for each data of the training set; generating a decision tree-based time series trend prediction learning model by performing gradient boosting based on the instances of the training set; forming instances on the test set data by feature extraction or addition in the same manner as the training set; generating virtual data in which the actual transaction price is predicted by inputting the instance characteristics of the test set data to the learning model; Measuring the error between the actual transaction price of the virtual data and the test set data, when the target performance is not met, by varying the hyperparameter of the learning model creation step to further expand the learning tree, and fixing the learning model when the target performance is satisfied A learning model is created by performing a learning model creation method including

상기한 바와 같은 과제를 해결하기 위한 본 발명의 실시 예에 따른 부동산 예상 실거래가 산출 장치의 동작 방법은, 외부 부동산 정보 데이터베이스로부터 수집된 부동산 관련 데이터를 부동산 물건 모델별로 병합 가공하여, 실거래가 데이터베이스를 구축하는 단계; 상기 실거래가 데이터베이스로부터 사전 설정된 시간 또는 공간 범위에 대응하는 트레이닝 세트를 구성하여, 실거래가 예측을 위한 시계열 추세 예측 학습 모델을 생성하는 단계; 상기 실거래가 데이터베이스로부터 상기 트레이닝 세트와는 분리된 실거래 데이터 기반의 테스트 세트를 구성하여, 상기 시계열 추세 예측 학습 모델의 정확도를 검증하는 단계; 상기 시계열 추세 예측 학습 모델의 정확도 검증 결과에 따라 상기 시계열 추세 예측 학습 모델의 재구성 또는 고정 처리를 수행하는 단계; 및 상기 고정 처리된 시계열 추세 예측 학습 모델을 고정 예측 모델로 설정하고, 사용자 단말의 요청에 대응하는 특징 인스턴스 데이터를 상기 고정 예측 모델에 적용하여 획득되는 부동산 실거래가 예측 정보를 상기 사용자 단말로 제공하는 단계를 포함한다.In the method of operating a real estate expected real transaction price calculation apparatus according to an embodiment of the present invention for solving the above-described problems, real estate-related data collected from an external real estate information database is merged and processed for each real estate product model, thereby generating a real transaction price database. building; constructing a training set corresponding to a preset time or spatial range from the real trade price database, and generating a time series trend prediction learning model for real trade price prediction; verifying the accuracy of the time series trend prediction learning model by configuring a test set based on actual transaction data separated from the training set from the actual transaction price database; performing reconstruction or fixing processing of the time series trend prediction learning model according to the accuracy verification result of the time series trend prediction learning model; and setting the fixed-processed time-series trend prediction learning model as a fixed prediction model, and providing real estate actual transaction price prediction information obtained by applying feature instance data corresponding to the request of the user terminal to the fixed prediction model to the user terminal includes steps.

상기한 바와 같은 과제를 해결하기 위한 본 발명의 실시 예에 따른 부동산 예상 실거래가 산출 장치는, 외부 부동산 정보 데이터베이스로부터 수집된 부동산 관련 데이터를 부동산 물건 모델별로 병합 가공하여, 데이터베이스를 구축하는 실거래가 데이터베이스; 상기 실거래가 데이터베이스로부터 사전 설정된 시간 또는 공간 범위에 대응하는 트레이닝 세트를 구성하여, 실거래가 예측을 위한 시계열 추세 예측 학습 모델을 생성하는 시계열 추세 예측 학습 모델 생성부; 상기 실거래가 데이터베이스로부터 상기 트레이닝 세트와는 분리된 실거래 데이터 기반의 테스트 세트를 구성하여, 상기 시계열 추세 예측 학습 모델의 정확도를 검증하는 검증부; 상기 시계열 추세 예측 학습 모델의 정확도 검증 결과에 따라 상기 시계열 추세 예측 학습 모델의 재구성 또는 고정 처리를 수행하는 모델 생성부; 상기 고정 처리된 시계열 추세 예측 학습 모델을 고정 예측 모델로 설정하고, 사용자 단말의 요청에 대응하는 특징 인스턴스 데이터를 상기 고정 예측 모델에 적용하여 부동산 실거래가 예측 정보를 획득하는 예측부; 및 상기 획득된 부동산 실거래가 예측 정보를 상기 사용자 단말로 제공하는 출력부를 포함한다.Real estate estimated real transaction price calculation apparatus according to an embodiment of the present invention for solving the above problems, real transaction price database for constructing a database by merging real estate-related data collected from an external real estate information database for each real estate object model ; a time-series trend prediction learning model generator configured to construct a training set corresponding to a preset time or spatial range from the real transaction price database, and generate a time-series trend prediction learning model for predicting the actual transaction price; a verification unit configured to configure a test set based on actual transaction data separated from the training set from the real transaction price database to verify the accuracy of the time series trend prediction learning model; a model generator for reconstructing or fixing the time series trend prediction learning model according to the accuracy verification result of the time series trend prediction learning model; a prediction unit that sets the fixed-processed time series trend prediction learning model as a fixed prediction model, and obtains real estate actual transaction price prediction information by applying feature instance data corresponding to a request of a user terminal to the fixed prediction model; and an output unit providing the obtained real estate transaction price prediction information to the user terminal.

한편, 상기한 바와 같은 과제를 해결하기 위한 본 발명의 실시 예에 따른 방법은 상기 방법을 컴퓨터에서 실행시키기 위한 프로그램 및 상기 프로그램이 기록된 기록 매체로 구현될 수 있다.On the other hand, the method according to an embodiment of the present invention for solving the above problems may be implemented as a program for executing the method in a computer and a recording medium in which the program is recorded.

본 발명의 실시 예에 따르면, 학습 효율 및 활용성을 고려하여 공공 데이터베이스로부터 사전 처리된 실거래가 데이터베이스를 이용하여, 기계 학습을 이용한 부동산 실거래가 예측 모델을 구축하고, 예측 모델에 기초한 현재 및 미래의 부동산 시세에 대한 예상 실거래가 산출 데이터를 산출함으로써, 부동산 예측 실거래가 기반의 자산 추정 및 관련 서비스를 사용자 단말로 제공할 수 있는 부동산 예상 실거래가 산출 장치 및 그 동작 방법을 제공할 수 있다.According to an embodiment of the present invention, by using the real transaction price database pre-processed from the public database in consideration of learning efficiency and usability, a real estate transaction price prediction model using machine learning is built, and current and future prediction models based on the prediction model are used. It is possible to provide an apparatus for calculating an expected real transaction price for a real estate market price, and an operating method thereof, which can provide an asset estimation and related services based on the predicted real transaction price to a user terminal by calculating the estimated actual transaction price calculation data for the real estate market price.

도 1은 본 발명의 실시 예에 따른 전체 시스템을 개략적으로 도시한 도면이다.
도 2는 본 발명의 실시 예에 따른 정보 저장부에 저장되는 데이터의 예시도이다.
도 3은 본 발명의 실시 예에 따른 학습부를 보다 구체적으로 설명하기 위한 도면이다.
도 4는 본 발명의 실시 예에 따른 부동산 실거래 데이터 처리부에서 학습을 위해 분리된 예측 모델용 데이터의 예시도이다.
도 5는 본 발명의 실시 예에 따른 가상 데이터의 예시도이다.
도 6은 본 발명의 실시 예에 따른 입력부를 보다 구체적으로 도시한 도면이다.
도 7은 본 발명의 실시 예에 따른 사용자 입력 데이터 수용부에서 처리되는 사용자 입력 데이터 및 데이터 보정 실시예이다.
도 8 및 도 9는 본 발명의 실시 예에 따른 특성 데이터 획득 및 보정예시도이다.
도 10은 본 발명의 실시 예에 따른 예측부를 보다 구체적으로 도시한 블록도이다.
도 11 내지 도 13은 본 발명의 실시 예에 따른 서버의 동작 방법을 순차적으로 도시한 흐름도이다.
도 14 내지 도 19는 본 발명의 실시 예에 따른 실거래가 데이터베이스 구축 단계와, 이에 기초한 학습 모델을 구축하기 위한 특성 데이터 구성, 특징 정보 인스턴스 구성 및 이에 따른 프로세스를 설명하기 위한 도면들이다.1 is a diagram schematically illustrating an entire system according to an embodiment of the present invention.
2 is an exemplary diagram of data stored in an information storage unit according to an embodiment of the present invention.
3 is a diagram for describing a learning unit according to an embodiment of the present invention in more detail.
4 is an exemplary diagram of data for a predictive model separated for learning by a real estate transaction data processing unit according to an embodiment of the present invention.
5 is an exemplary diagram of virtual data according to an embodiment of the present invention.
6 is a diagram illustrating an input unit according to an embodiment of the present invention in more detail.
7 is an embodiment of user input data and data correction processed by the user input data receiving unit according to an embodiment of the present invention.
8 and 9 are diagrams illustrating acquisition and correction of characteristic data according to an embodiment of the present invention.
10 is a block diagram illustrating a prediction unit according to an embodiment of the present invention in more detail.
11 to 13 are flowcharts sequentially illustrating a method of operating a server according to an embodiment of the present invention.
14 to 19 are diagrams for explaining the step of constructing the actual transaction price database, the configuration of characteristic data for building a learning model based thereon, configuration of the characteristic information instance, and the process according thereto according to an embodiment of the present invention.

이하의 내용은 단지 본 발명의 원리를 예시한다. 그러므로 당업자는 비록 본 명세서에 명확히 설명되거나 도시되지 않았지만 본 발명의 원리를 구현하고 본 발명의 개념과 범위에 포함된 다양한 장치를 발명할 수 있는 것이다. 또한, 본 명세서에 열거된 모든 조건부 용어 및 실시예들은 원칙적으로, 본 발명의 개념이 이해되도록 하기 위한 목적으로만 명백히 의도되고, 이와 같이 특별히 열거된 실시예들 및 상태들에 제한적이지 않는 것으로 이해되어야 한다.The following is merely illustrative of the principles of the invention. Therefore, those skilled in the art will be able to devise various devices which, although not explicitly described or shown herein, embody the principles of the present invention and are included within the spirit and scope of the present invention. Moreover, it is to be understood that all conditional terms and examples listed herein are, in principle, expressly intended solely for the purpose of enabling the concept of the present invention to be understood, and not limited to the specifically enumerated embodiments and states as such. should be

또한, 본 발명의 원리, 관점 및 실시예들 뿐만 아니라 특정 실시예를 열거하는 모든 상세한 설명은 이러한 사항의 구조적 및 기능적 균등물을 포함하도록 의도되는 것으로 이해되어야 한다. 또한 이러한 균등물들은 현재 공지된 균등물뿐만 아니라 장래에 개발될 균등물 즉 구조와 무관하게 동일한 기능을 수행하도록 발명된 모든 소자를 포함하는 것으로 이해되어야 한다.Moreover, it is to be understood that all detailed description reciting the principles, aspects, and embodiments of the invention, as well as specific embodiments, are intended to cover structural and functional equivalents of such matters. It is also to be understood that such equivalents include not only currently known equivalents, but also equivalents developed in the future, i.e., all devices invented to perform the same function, regardless of structure.

따라서, 예를 들어, 본 명세서의 블럭도는 본 발명의 원리를 구체화하는 예시적인 회로의 개념적인 관점을 나타내는 것으로 이해되어야 한다. 이와 유사하게, 모든 흐름도, 상태 변환도, 의사 코드 등은 컴퓨터가 판독 가능한 매체에 실질적으로 나타낼 수 있고 컴퓨터 또는 프로세서가 명백히 도시되었는지 여부를 불문하고 컴퓨터 또는 프로세서에 의해 수행되는 다양한 프로세스를 나타내는 것으로 이해되어야 한다.Thus, for example, the block diagrams herein are to be understood as representing conceptual views of illustrative circuitry embodying the principles of the present invention. Similarly, all flowcharts, state transition diagrams, pseudo code, etc. may be tangibly embodied on computer-readable media and be understood to represent various processes performed by a computer or processor, whether or not a computer or processor is explicitly shown. should be

또한 프로세서, 제어 또는 이와 유사한 개념으로 제시되는 용어의 명확한 사용은 소프트웨어를 실행할 능력을 가진 하드웨어를 배타적으로 인용하여 해석되어서는 아니되고, 제한 없이 디지털 신호 프로세서(DSP) 하드웨어, 소프트웨어를 저장하기 위한 롬(ROM), 램(RAM) 및 비 휘발성 메모리를 암시적으로 포함하는 것으로 이해되어야 한다. 주지관용의 다른 하드웨어도 포함될 수 있다.In addition, clear use of terms presented as processor, control, or similar concepts should not be construed as exclusively referring to hardware having the ability to execute software, and without limitation, digital signal processor (DSP) hardware, ROM for storing software. It should be understood to implicitly include (ROM), RAM (RAM) and non-volatile memory. Other common hardware may also be included.

상술한 목적, 특징 및 장점은 첨부된 도면과 관련한 다음의 상세한 설명을 통하여 보다 분명해질 것이며, 그에 따라 본 발명이 속하는 기술분야에서 통상의 지식을 가진 자가 본 발명의 기술적 사상을 용이하게 실시할 수 있을 것이다. 또한, 본 발명을 설명함에 있어서 본 발명과 관련된 공지 기술에 대한 구체적인 설명이 본 발명의 요지를 불필요하게 흐릴 수 있다고 판단되는 경우에 그 상세한 설명을 생략하기로 한다. The above-described objects, features and advantages will become more apparent through the following detailed description in relation to the accompanying drawings, and accordingly, those of ordinary skill in the art to which the present invention pertains can easily implement the technical idea of the present invention. There will be. In addition, in describing the present invention, if it is determined that a detailed description of a known technology related to the present invention may unnecessarily obscure the gist of the present invention, the detailed description thereof will be omitted.

이하, 첨부된 도면을 참조하여 본 발명에 따른 바람직한 일실시예를 상세히 설명하기로 한다.Hereinafter, a preferred embodiment according to the present invention will be described in detail with reference to the accompanying drawings.

도 1은 본 발명의 실시 예에 따른 전체 시스템을 개략적으로 도시한 도면이다.1 is a diagram schematically illustrating an entire system according to an embodiment of the present invention.

도 1을 참조하면 본 발명의 일 실시 예에 따른 전체 시스템은, 외부 부동산 정보 데이터베이스(300), 서버(100) 및 사용자 단말(200)을 포함한다.Referring to FIG. 1 , the entire system according to an embodiment of the present invention includes an external real estate information database 300 , a server 100 , and a user terminal 200 .

보다 구체적으로, 외부 부동산 정보 데이터베이스(300), 서버(100) 및 사용자 단말(200)은, 네트워크를 통해 유선 또는 무선으로 연결될 수 있다.More specifically, the external real estate information database 300 , the server 100 , and the user terminal 200 may be connected by wire or wirelessly through a network.

네트워크간 상호간 통신을 위해 각 외부 부동산 정보 데이터베이스(300), 서버(100) 및 사용자 단말(200)은, 인터넷 네트워크, LAN, WAN, PSTN(Public Switched Telephone Network), PSDN(Public Switched Data Network), 케이블 TV 망, WIFI, 이동 통신망 및 기타 무선 통신망 등을 통하여 데이터를 송수신할 수 있다. 또한, 각 외부 부동산 정보 데이터베이스(300), 서버(100) 및 사용자 단말(200)은 각 통신망에 상응하는 프로토콜로 통신하기 위한 각각의 통신 모듈을 포함할 수 있다.For mutual communication between networks, each external real estate information database 300, server 100, and user terminal 200 is an Internet network, LAN, WAN, PSTN (Public Switched Telephone Network), PSDN (Public Switched Data Network), Data can be transmitted and received through cable TV networks, WIFI, mobile communication networks, and other wireless communication networks. In addition, each of the external real estate information database 300 , the server 100 , and the user terminal 200 may include a respective communication module for communicating with a protocol corresponding to each communication network.

그리고, 본 명세서에서 설명되는 사용자 단말(200)에는 휴대폰, 스마트 폰(smart phone), 노트북 컴퓨터(laptop computer), 디지털방송용 단말, PDA(Personal Digital Assistants), PMP(Portable Multimedia Player), 네비게이션 등이 포함될 수 있으나, 본 발명은 이에 한정되지 아니하며 그 이외에 사용자 입력 및 정보 표시 등이 가능한 다양한 장치일 수 있다.In addition, the user terminal 200 described in this specification includes a mobile phone, a smart phone, a laptop computer, a digital broadcasting terminal, a PDA (Personal Digital Assistants), a PMP (Portable Multimedia Player), a navigation, etc. may be included, but the present invention is not limited thereto, and may be various devices capable of user input and information display other than that.

먼저, 외부 부동산 정보 데이터베이스(300)는, 부동산 정보 조회 시스템과 같은 공공 데이터베이스로서, 부동산 관련 데이터를 저장 및 관리하며, 서버(100)의 요청에 따라 부동산 관련 데이터를 제공하는 국토교통부 실거래가 공개시스템 등이 예시될 수 있다.First, the external real estate information database 300, as a public database such as a real estate information inquiry system, stores and manages real estate-related data, and provides real estate-related data according to the request of the server 100. The real transaction price disclosure system of the Ministry of Land, Infrastructure and Transport and the like can be exemplified.

본 명세서에서 부동산 관련 데이터란, 부동산 가격 및 거래량에 영향을 줄 수 있는 데이터를 포함할 수 있다. 현대의 부동산 시장은 미시 및 거시경제의 다양한 변수들과 정부의 부동산 관련 정책 등에 의해 영향을 받는다. 즉, 부동산 실거래량, 매물량(매매/전세/월세 등), 실거래가, 호가, 주택 시세, 공시지가 및 주택 공급/멸실 등 부동산 가격과 거래량과 직접 연관된 데이터들뿐만 아니라, 기준 금리, 물가지수, 취득세, 인지세, 종합부동산세와 같은 부동산 관련 세금 또는 LTV, DTI와 같은 부동산 대출 규제 및 부동산 정책과 같은 정부의 부동산에 대한 행동들 역시 부동산 가격과 거래량에 영향을 준다.In the present specification, real estate-related data may include data that may affect real estate prices and transaction volumes. The modern real estate market is affected by various micro- and macro-economic variables and the government's real estate-related policies. That is, not only data directly related to real estate prices and transaction volumes, such as real estate transaction volume, sales volume (sales/jeonse/monthly rent, etc.), actual transaction price, asking price, housing price, official land price, and housing supply/destruction, but also reference interest rate, price index, acquisition tax Government actions on real estate, such as real estate-related taxes such as , stamp duty and comprehensive real estate tax, or real estate loan regulations such as LTV and DTI, and real estate policies also affect real estate prices and transaction volume.

따라서 외부 부동산 정보 데이터베이스(300)로부터, 서버(100)는 부동산 가격과 거래량에 영향을 줄 수 있는 부동산 가격과 거래량에 직접 연관된 내용, 부동산 관련 세금, 부동산 대출 규제 또는 부동산 정책과 같은 내용의 부동산 관련 데이터를 획득할 수 있다.Therefore, from the external real estate information database 300, the server 100 is directly related to the real estate price and transaction volume that can affect the real estate price and transaction volume, real estate related tax, real estate loan regulation or real estate policy of the content. data can be obtained.

그러므로, 서버(100)가 획득할 수 있는 부동산 관련 데이터는 실거래량, 매물량, 실거래가, 호가, 공시지가, 공급량, 멸실량, 기준금리, 물가지수, DTI, LTV 및 부동산 관련 세금 등 다양한 부동산 시장에 영향을 주는 데이터 중 적어도 하나를 포함할 수 있다. 또한, 부동산 시장은 지역단위로 움직이는 것이 일반적이므로, 부동산 관련 데이터는 부동산의 위치 정보를 더 포함할 수 있으나, 위치 정보를 포함하지 않는 경우에도 서버(100)는 지오코딩과 같은 방식으로 전처리를 수행하여, 학습 및 예측에 용이한 부동산 특성 인스턴스 데이터를 산출할 수 있다.Therefore, real estate-related data that can be obtained by the server 100 is in various real estate markets such as actual transaction volume, sale volume, actual transaction price, quote price, official price, supply amount, loss amount, base rate, price index, DTI, LTV, and real estate-related taxes. It may include at least one of the affecting data. In addition, since the real estate market generally moves on a regional basis, real estate-related data may further include location information of real estate, but the server 100 performs preprocessing in the same way as geocoding even if location information is not included. Thus, it is possible to calculate real estate property instance data that is easy to learn and predict.

이와 같이 서버(100)는, 공공 데이터베이스와 같은 부동산 정보 데이터베이스(300)로부터 획득된 부동산 관련 데이터의 전처리를 수행하여 실거래가 데이터베이스(110)를 구축하고, 구축된 데이터베이스를 이용한 부동산 예상 실거래가 산출 엔진(120)을 구동시켜, 기계 학습을 이용한 부동산 실거래가 예측 모델을 구축하며, 예측 모델에 기초한 현재 및 미래의 부동산 시세에 대한 예상 실거래가 산출 데이터를 산출할 수 있다.As such, the server 100 builds the real transaction price database 110 by performing pre-processing of real estate-related data obtained from the real estate information database 300 such as a public database, and a real estate expected real transaction price calculation engine using the constructed database. By driving 120, a real estate price prediction model using machine learning may be built, and estimated actual transaction price calculation data for current and future real estate prices based on the prediction model may be calculated.

그리고, 서버(100)는, 서비스 제공 엔진(130)을 구동시켜, 부동산 예측 실거래가 기반의 자산 추정 및 관련 서비스를 접속된 사용자 단말(200)로 제공할 수 있다.In addition, the server 100 may drive the service providing engine 130 to provide asset estimation and related services based on the real estate prediction actual transaction price to the connected user terminal 200 .

이러한 처리를 위해, 먼저 서버(100)는 실거래가 데이터베이스(110)를 포함할 수 있으며, 실거래가 데이터베이스(110)는 정보 수집부(111) 및 정보 저장부(112)를 포함하여, 외부 부동산 정보 데이터베이스(300)로부터 수집된 부동산 관련 데이터를 전처리하고, 데이터베이스로서 저장 및 관리한다.For this processing, first, the server 100 may include a real transaction price database 110 , and the real transaction price database 110 includes an information collection unit 111 and an information storage unit 112 , and external real estate information Real estate-related data collected from the database 300 is pre-processed, and stored and managed as a database.

부동산 관련 데이터는 제공 주체와 제공 목적에 따라 다양한 형태로 유통되는 한편, 그 특성상 단일 종류의 데이터만으로는 데이터의 원 관념인 부동산을 총체적으로 파악하기 어렵다. 즉, 부동산 데이터를 온전히 활용하기 위해서는 여러 데이터를 종합하여 완결성을 갖는 부동산 데이터로 가공할 필요가 있다. On the other hand, real estate-related data is distributed in various forms depending on the provider and purpose of provision, but due to the nature of the data, it is difficult to grasp the original concept of real estate as a whole with only one type of data. In other words, in order to fully utilize real estate data, it is necessary to synthesize various data and process it into real estate data with completeness.

따라서, 본 발명의 실시 예에 따른 정보 수집부(111)는, 외부 부동산 정보 데이터베이스(300)로부터 수집된 부동산 관련 데이터를 전처리 가공하여, 정보 저장부(112)에 저장 및 관리할 수 있다. 수집하고자 하는 부동산 관련 데이터 혹은 그 데이터가 포함된 외부 부동산 정보 데이터베이스(300)의 특성에 따라 데이터 수집 주기는 조정될 수 있으며, 여러 종류의 데이터를 수집하는 경우 각 데이터 종류마다 수집 주기가 다르게 설정될 수 있다.Accordingly, the information collection unit 111 according to an embodiment of the present invention may pre-process real estate-related data collected from the external real estate information database 300 , and store and manage it in the information storage unit 112 . The data collection period may be adjusted according to the real estate-related data to be collected or the characteristics of the external real estate information database 300 including the data, and in the case of collecting several types of data, the collection period may be set differently for each data type. have.

정보 저장부(112)는 정보 수집부(111)에서 수집한 데이터로부터 취합하고자 하는 정보를 정리하고, 이를 데이터베이스의 형태로 저장한다.The information storage unit 112 organizes information to be collected from the data collected by the information collection unit 111 and stores it in the form of a database.

예를 들어, 정보 저장부(112)는 저장 시, 동일한 부동산 혹은 부동산 거래 사례와 관련하여 여러 데이터에서 중복되거나 상충하는 항목이 존재하는 경우, 검증 프로세스를 통해 하나의 데이터로 통합 처리하는 전처리를 수행할 수 있다. 정보 저장부(112)에 구축된 데이터베이스는 이후 학습 데이터로 활용될 수 있으며, 부동산 예상 실거래가 산출를 위한 핵심적인 데이터를 포함할 수 있다.For example, when storing, the information storage unit 112 performs pre-processing of integrated processing into one data through a verification process when duplicate or conflicting items exist in multiple data related to the same real estate or real estate transaction case can do. The database built in the information storage unit 112 may be used as learning data thereafter, and may include core data for calculating the expected real estate transaction price.

도 2는 본 발명의 실시 예에 따른 정보 저장부(112)에 저장되는 데이터의 예시도이다. 도 2에서는, 인천광역시 부평구 내 3개의 법정동에 위치한 아파트 단지들의 2019년 1월 1일부터 2019년 11월 30일까지의 실거래 데이터가 정보 수집부(111)를 통해 외부 부동산 정보 데이터베이스(300)로부터 수집되고, 정보 저장부(112)에서 전처리 가공하여 저장한 데이터베이스가 도시되어 있다.2 is an exemplary diagram of data stored in the information storage unit 112 according to an embodiment of the present invention. In FIG. 2, the actual transaction data from January 1, 2019 to November 30, 2019 of apartment complexes located in three legal dongs in Bupyeong-gu, Incheon is from the external real estate information database 300 through the information collection unit 111. The collected, pre-processed and stored database in the information storage unit 112 is shown.

본 발명의 실시 예에 따르면, 전처리 가공단계에서는 부동산 예상 실거래가 산출 엔진(120)의 학습 및 예측을 용이하게 하기 위해, {시군구, 번지, 본번, 부번, 단지명, 건축년도, 면적, 층, 거래일자, 거래가격}으로 지정되어 있는 외부 부동산 정보 데이터베이스(300)의 부동산 관련 데이터 구성 항목을 변경 가공할 수 있다.According to an embodiment of the present invention, in the pre-processing step, in order to facilitate the learning and prediction of the real estate expected real transaction price calculation engine 120, {si, gun-gu, street number, main number, part number, complex name, construction year, area, floor, transaction It is possible to change and process the real estate-related data configuration item of the external real estate information database 300 designated by the date, transaction price}.

변경 가공을 위해, 본 발명의 실시 예에 따른 정보 저장부(112)는 실제 데이터베이스를 구성함에 있어서, 상기 항목들 중 일부를 제외하거나 {도로명주소, 동, 호, 향}과 같은 세부 주소 항목을 추가하는 등의 가공 처리를 수행할 수 있다. 다만, 특정 시점의 부동산 실거래가를 예측하기 위해서는 특성 값인 {거래일자}와 결과 값인 {거래가격} 항목은 필수적으로 포함시키는 것이 바람직하다.For change processing, the information storage unit 112 according to an embodiment of the present invention excludes some of the above items or selects detailed address items such as {road name address, dong, house number, direction} in configuring the actual database. Processing such as addition may be performed. However, in order to predict the actual transaction price of real estate at a specific point in time, it is desirable to necessarily include {transaction date}, which is a characteristic value, and {transaction price}, which is a result value.

도 3은 본 발명의 실시 예에 따른 학습부(121)를 보다 구체적으로 설명하기 위한 도면이다.3 is a diagram for describing the learning unit 121 according to an embodiment of the present invention in more detail.

도 3을 참조하면 본 발명의 실시 예에 따른 학습부(121)는, 부동산 실거래 데이터 처리부(1211), 시계열 추세 예측 학습 모델 생성부(1212), 부동산 예측 데이터 생성부(1213), 부동산 시세 분류 모델 생성부(1214) 및 부동산 시세 검증부(1215) 중 적어도 하나를 포함할 수 있다.Referring to FIG. 3 , the learning unit 121 according to an embodiment of the present invention includes a real estate transaction data processing unit 1211 , a time series trend prediction learning model generation unit 1212 , a real estate prediction data generation unit 1213 , and a real estate market price classification At least one of a model generation unit 1214 and a real estate price verification unit 1215 may be included.

부동산 실거래 데이터 처리부(1211)는, 실거래가 데이터베이스(110)로부터 학습에 필요한 학습 데이터 및 예측 모델 데이터 구성을 위한 분리 데이터를 추출 처리한다. 예를 들어, 부동산 실거래 데이터 처리부(1211)는, 실거래가 데이터베이스(110)를 로드하고, 사전 설정된 지역 또는 기간 범위 조건에 따라 일부 데이터를 분리하여 예측 모델 데이터로서 1차적으로 추출하고 나머지 데이터를 원본 학습 데이터로서 추출할 수 있다. 이 때 분리할 데이터는 전체 데이터 중 무작위로 선정할 수 있으나, 목표 성능에 따라 일정 기준을 정하여 선별할 수 있다.The real estate transaction data processing unit 1211 extracts, from the real transaction price database 110 , the learning data required for learning and the separated data for constructing the prediction model data. For example, the real estate transaction data processing unit 1211 loads the real transaction price database 110, separates some data according to a preset region or period range condition, and extracts it primarily as predictive model data, and extracts the remaining data as the original It can be extracted as learning data. In this case, the data to be separated may be randomly selected from among all data, but may be selected by setting a certain criterion according to the target performance.

도 4는 부동산 실거래 데이터 처리부(1211)에서 학습을 위해 분리된 예측 모델용 데이터의 예시도이다. 도 4를 참조하면, 2019년 11월 30일과 그 이후의 실거래가를 예측하고자 한다는 전제 하에, 도 2에 도시된 실거래가 데이터베이스(110)에 포함된 데이터 중 거래일자 항목의 특성값이 2019년 9월 1일부터 2019년 11월 30일 사이인 데이터가 학습을 위해 분리된 것을 확인할 수 있다.4 is an exemplary diagram of data for a predictive model separated for learning by the real estate transaction data processing unit 1211 . Referring to FIG. 4, on the premise that the actual transaction price is predicted on and after November 30, 2019, the characteristic value of the transaction date item among the data included in the actual transaction price database 110 shown in FIG. It can be seen that the data between January 1 and November 30, 2019 is separated for training.

이와 같이, 예측하고자 하는 시점으로부터 일정 기간 이전의 데이터를 분리하면, 시계열 추세 예측 학습 모델 생성부(1212)에서는 분리 데이터를 이용하여 시계열 추세 예측 학습 모델을 생성할 수 있다. 분리 데이터로부터 예측된 모델은 동일 기간의 실거래 데이터와는 과적합되지 않아 예측이 편향되지 않을 가능성이 높아질 수 있다. 본 실시예에 사용된 분리 기준은 일례에 불과하며, 특정 특성값 항목 또는 데이터 통계량 등 그 기준이 될 수 있는 요소는 목표 성능에 따라 달라질 수 있다.In this way, when data before a certain period of time is separated from the time to be predicted, the time series trend prediction learning model generating unit 1212 may generate a time series trend prediction learning model using the separated data. A model predicted from segregated data does not overfit with actual transaction data of the same period, so the probability that the prediction will not be biased may increase. The separation criterion used in this embodiment is only an example, and factors that may be the criterion, such as a specific characteristic value item or data statistics, may vary depending on the target performance.

시계열 추세 예측 학습 모델 생성부(1212)는 부동산 실거래 데이터 처리부(1211)에서 분리된 데이터로부터, 시계열 추세를 예측하기 위한 예측 학습 모델을 생성한다. 시계열 추세 예측 학습 모델 생성부(1212)가 생성하는 모델은 최종 예측 실거래가 산출에 직접적으로 이용될 수도 있으나, 분리된 데이터 영역을 나머지 학습 데이터와 함께 학습 처리할 수 있는 가상 데이터 구성에 이용되는 것이 바람직하다.The time series trend prediction learning model generating unit 1212 generates a predictive learning model for predicting a time series trend from the data separated by the real estate transaction data processing unit 1211 . The model generated by the time series trend prediction learning model generation unit 1212 may be directly used to calculate the final predicted actual transaction price, but it is used for configuring virtual data that can learn and process the separated data area together with the remaining training data. desirable.

따라서, 시계열 추세 예측 학습 모델 생성부(1212)는, 시계열 추세 예측 학습 모델로부터 부동산 실거래 데이터 처리부(1211)에서 분리된 데이터와 유사한 경향성을 갖되 그 데이터와 일치하지는 않는 가상 데이터를 생성함으로써, 그 가상 데이터가 전체 데이터의 일관성을 해치지 않는 선에서 학습에 활용되도록 처리할 수 있다.Accordingly, the time series trend prediction learning model generation unit 1212 generates virtual data that has a similar tendency to the data separated from the real estate real transaction data processing unit 1211 from the time series trend prediction learning model, but does not match the data, so that the virtual Data can be processed so that it can be used for learning without compromising the consistency of the entire data.

이를 위해, 시계열 추세 예측 학습 모델 생성부(1212)는 정확도가 비교적 낮지만 대규모 데이터 처리에 강점이 있는 기계 학습/인공신경망 학습 알고리즘과, 정확도가 비교적 높지만 국지적으로 활용하기 적합한 기계 학습/인공신경망 학습 알고리즘을 상보적으로 이용하여 시계열 추세 예측 학습 모델을 생성할 수 있다.To this end, the time series trend prediction learning model generator 1212 includes a machine learning/artificial neural network learning algorithm with relatively low accuracy but strong in large-scale data processing, and a machine learning/artificial neural network learning algorithm with relatively high accuracy but suitable for local use. A time series trend prediction learning model can be generated by using the algorithm complementarily.

이러한 예측 학습 모델을 위해, 부동산 예측 데이터 생성부(1213)는, 시계열 추세 예측 학습 모델 생성부(1212)에서 생성된 모델을 기반으로 사전 분리된 데이터 구간에 대응하는 예측을 실행하며 그 결과를 가상 데이터로 생성할 수 있다.For this predictive learning model, the real estate prediction data generation unit 1213 executes a prediction corresponding to the pre-separated data section based on the model generated by the time series trend prediction learning model generation unit 1212 and virtualizes the result. data can be created.

도 5는 본 발명의 실시 예에 따른 가상 데이터의 예시도이다. 부동산 실거래 데이터 처리부(1211)에서 분리된 데이터는 거래일자 항목의 특성값이 2019년 9월 1일부터 2019년 11월 30일 사이인 데이터이기 때문에, 2019년 9월 1일 이후 시점에 대한 실거래 데이터는 원본 데이터에는 포함되지 않은 상태이다.5 is an exemplary diagram of virtual data according to an embodiment of the present invention. Since the data separated by the real estate real transaction data processing unit 1211 is data whose characteristic value of the transaction date is between September 1, 2019 and November 30, 2019, actual transaction data for a time point after September 1, 2019 is not included in the original data.

따라서, 본 발명의 실시 예에 따른 부동산 예측 데이터 생성부(1213)는, 시계열 추세 예측 학습 모델 생성부(1212)에서 생성한 예측 학습 모델을 이용하여 2019년 9월 1일부터 2019년 11월 30일 기준 근미래 시점까지의 가상 실거래 데이터를 예측 생성할 수 있다. 도 5에서의 실시예에 따르면, 학습 데이터 원본의 거래일자를 제외한 모든 특성값은 고정하고 2019년 9월 1일 이후 임의의 시점이 거래일자가 되도록 부여하여 구성된 데이터이며, 기존 거래가격에 임의의 증감폭을 부여한 수치를 거래가격으로 예측하도록 생성된 시계열 추세 예측 학습 모델을 이용한 가상 데이터가 산출될 수 있다. 여기서, 거래일자를 포함한 가상 데이터 특성값의 지정/부여 방식은 다양하게 변경될 수 있다.Accordingly, the real estate prediction data generation unit 1213 according to an embodiment of the present invention uses the prediction learning model generated by the time series trend prediction learning model generation unit 1212 from September 1, 2019 to November 30, 2019 It is possible to predict and generate virtual real transaction data up to a point in the near future. According to the embodiment in FIG. 5, all characteristic values except for the transaction date of the training data source are fixed and any time point after September 1, 2019 is data configured to be the transaction date, and the existing transaction price is Virtual data using a time-series trend prediction learning model generated to predict the value to which the increase/decrease is given as the transaction price may be calculated. Here, the designation/granting method of the virtual data characteristic value including the transaction date may be variously changed.

한편, 부동산 시세 분류 모델 생성부(1214)는, 부동산 실거래 데이터 처리부(1211)에서 분리된 학습용 데이터와, 부동산 예측 데이터 생성부(1213)에서 생성된 가상 데이터를 이용하여, 최종 실거래가 예측 모델을 생성한다.On the other hand, the real estate market price classification model generation unit 1214 uses the learning data separated by the real estate transaction data processing unit 1211 and the virtual data generated by the real estate prediction data generation unit 1213 to generate the final actual transaction price prediction model. create

그리고, 부동산 시세 검증부(1215)는 부동산 시세 분류 모델 생성부(1214)에서 생성한 모델의 정확도를 테스트를 통해 검증하고, 그 결과에 따라 부동산 시세 분류 모델 생성부(1214)의 재 기동 여부를 결정한다. Then, the real estate market price verification unit 1215 verifies the accuracy of the model generated by the real estate price classification model generation unit 1214 through a test, and determines whether the real estate price classification model generation unit 1214 is restarted according to the result. decide

부동산 시세 검증부(1215)는, 부동산 실거래 데이터 처리부(1211)에서 분리하여, 시계열 추세 예측 학습 모델 생성부(1212)로 전달되었던 분리 데이터를 이용하여, 예측 정확도 테스트를 실시할 수 있다. 정확도 테스트에는 MAPE 등 다양한 통계적 측정 방식이 활용될 수 있다. The real estate market price verification unit 1215 may perform a prediction accuracy test by using the separated data separated from the real estate transaction data processing unit 1211 and transmitted to the time series trend prediction learning model generation unit 1212 . Various statistical measurement methods such as MAPE can be used for accuracy testing.

부동산 시세 검증부(1215)는 테스트 결과 목표 정확도를 만족한 경우 지금까지 생성된 학습 모델 기반의 트리를 시세 분류 모델로 고정 처리하며, 만족하지 못한 경우 부동산 시세 분류 모델 생성부(1214)로 보정 정보를 전달할 수 있다. 부동산 시세 분류 모델 생성부(1214)는, 보정 정보에 따라, 추가 학습 모델 트리를 생성하여 기존 생성된 학습 모델 트리에 결합할 수 있다.The real estate price verification unit 1215 fixes the tree based on the learning model generated so far as a price classification model when the test result satisfies the target accuracy, and when not satisfied, correction information to the real estate price classification model generation unit 1214 can pass The real estate price classification model generation unit 1214 may generate an additional learning model tree according to the correction information and combine it with the previously generated learning model tree.

도 6은 본 발명의 실시 예에 따른 입력부(131)를 보다 구체적으로 도시한 도면이다. 도 6을 참조하면, 입력부(131)는, 사용자 입력 데이터 수용부(1311) 및 특성 데이터 전송부(1312)를 포함할 수 있으며, 사용자 입력 데이터 수용부(1311)는, 사용자 입력 데이터 획득부(1311A) 및 사용자 입력 데이터 보정부(1311B)를 포함할 수 있고, 특성 데이터 전송부(1312)는, 특성 데이터 획득부(1312A) 및 특성 데이터 보정부(1312B)를 포함할 수 있다.6 is a diagram illustrating the input unit 131 according to an embodiment of the present invention in more detail. Referring to FIG. 6 , the input unit 131 may include a user input data receiving unit 1311 and a characteristic data transmitting unit 1312 , and the user input data receiving unit 1311 includes a user input data obtaining unit ( 1311A) and a user input data correcting unit 1311B, and the characteristic data transmitting unit 1312 may include a characteristic data obtaining unit 1312A and a characteristic data correcting unit 1312B.

사용자 입력 데이터 획득부(1311A)는 사용자 단말(200)로부터 특정 부동산의 가격을 예측하고자 하는 사용자 요청 데이터를 입력 데이터로서 수신할 수 있다. 사용자 단말(200)에서는 부동산의 주소를 텍스트로 입력하거나, 지도에서 부동산을 선택 입력하거나, 사진으로 부동산을 촬영 입력하여, 생성된 사용자 요청 데이터를 서버(100)의 사용자 입력 데이터 획득부(1311A)로 전송할 수 있다. 여기서, 사용자가 사용자 단말(200)에서 입력할 데이터의 종류와 입력 형식은 자유로울 수 있으나, 부동산을 특정하기에 충분한 정도의 필요 데이터가 입력되어야 한다. The user input data acquisition unit 1311A may receive, as input data, user request data for predicting the price of a specific real estate from the user terminal 200 . In the user terminal 200, the user input data acquisition unit 1311A of the server 100 receives the generated user request data by inputting the address of the real estate as text, selecting the real estate from the map, or capturing the real estate with a photo. can be sent to Here, the type and input format of data to be input by the user in the user terminal 200 may be free, but necessary data sufficient to specify real estate must be input.

사용자 입력 데이터 보정부(1311B)는 사용자 입력 데이터가 수신되면, 특성 데이터 획득부(1312A)에서의 처리를 용이하게 하기 위한 데이터 보정 처리를 수행할 수 있다.When the user input data is received, the user input data correction unit 1311B may perform data correction processing for facilitating the processing in the characteristic data acquisition unit 1312A.

여기서, 보정 처리는, 사용자의 데이터 입력 이후, 즉 사후적으로 진행될 수도 있으나, 사용자가 데이터를 입력하는 동시에 진행되도록 처리하는 방식도 가능하다. 입력 데이터 보정을 마친 이후 사용자 입력 데이터 보정부(1311B)는, 특성 데이터 전송부(1312)의 특성 데이터 획득부(1312A)로 입력 데이터를 전송한다.Here, the correction process may be performed after the user's data input, that is, post-mortem, but a method of processing the data so that the user inputs data and proceeds simultaneously is also possible. After the input data correction is completed, the user input data correction unit 1311B transmits the input data to the characteristic data acquisition unit 1312A of the characteristic data transmission unit 1312 .

도 7은 사용자 입력 데이터 수용부(1311)에서 처리되는 사용자 입력 데이터 및 데이터 보정 실시예이다. 통상적으로 사용자는 기억의 불완전성, 편의 추구, 일시적인 나태 등으로 인해 도 7에 도시된 '산곡동 510 103동 1102호'와 같이 온전한 주소는 아니지만 부동산을 특정하는 데에 핵심적인 데이터만을 입력하게 할 수 있다. 7 is an example of user input data processed by the user input data receiving unit 1311 and data correction. In general, users can input only key data for specifying real estate, although it is not a complete address as shown in 'Sangok-dong 510 103-dong 1102' shown in FIG. 7 due to incomplete memory, convenience pursuit, temporary indolence, etc. have.

이에 따라, 도 7에 도시된 바와 같이, 사용자 입력 데이터 획득부(1311A)에서 '산곡동 510 103동 1102호'가 획득되면, 사용자 입력 데이터 보정부(1311B)는, 등기부 등본의 주소 형식에 맞게 사용자 입력 데이터를 보정 처리할 수 있다.Accordingly, as shown in FIG. 7 , when 'Sangok-dong 510 103-dong 1102' is obtained from the user input data acquisition unit 1311A, the user input data correction unit 1311B adjusts the user input data to the address format of the registered copy. Input data can be corrected.

이는 특성 데이터 전송부(1312)가 등기부등본 데이터베이스를 검색하여 특성 데이터를 획득하는 경우에 한정된 것이며, 보정 처리는 특성 데이터 전송부(1312)에서 처리되는 특성 데이터의 종류에 따라 결정될 수 있다. This is limited to a case in which the characteristic data transmission unit 1312 acquires characteristic data by searching the register database, and the correction process may be determined according to the type of characteristic data processed by the characteristic data transmission unit 1312 .

또한 본 발명의 실시 예에 따르면, 사용자 입력 데이터 보정부(1311B)는, 사용자 입력 데이터 획득부(1311A)에 입력된 정보를 실시간으로 반영하여 보정할 수 있으며, 보정 결과를 즉각적으로 사용자 단말(200)로 출력하여 사용자가 보정된 데이터를 입력 데이터로 전달할지 선택하도록 할 수도 있다. 다만, 입력 데이터 보정의 원리와 그 내용은 특성 데이터 획득부(1312A)에서 요구하는 데이터 형식 혹은 내용에 따라 달라질 수 있다.In addition, according to an embodiment of the present invention, the user input data correcting unit 1311B may reflect and correct information input to the user input data obtaining unit 1311A in real time, and immediately display the correction result to the user terminal 200 . ) to allow the user to select whether to transmit the corrected data as input data. However, the principle and content of the input data correction may vary depending on the data format or content required by the characteristic data acquisition unit 1312A.

한편, 다시 도 6을 참조하면, 특성 데이터 획득부(1312A)는, 사용자 입력 데이터 수용부(1311)로부터 선택적으로 보정된 사용자 입력 데이터를 전달받아, 사용자 입력 데이터로부터 특정되는 부동산의 특성 값을 예측부(122)로 전송할 수 있는 형태로 추출할 수 있다. 부동산 특성값 추출에는 주로 등기부등본/건축물대장 등의 공부 데이터베이스(미도시)를 활용할 수 있다.On the other hand, referring back to FIG. 6 , the characteristic data acquisition unit 1312A receives the selectively corrected user input data from the user input data receiving unit 1311 , and predicts the characteristic value of the real estate specified from the user input data. It can be extracted in a form that can be transmitted to the unit 122 . A study database (not shown) such as a register copy/building ledger can be mainly used for real estate property value extraction.

공부(公簿) 데이터베이스는 부동산 등기사항전부증명서(이하 등기부등본), 건축물대장, 토지대장 등 부동산과 관련하여 행정부 혹은 사법부에서 관리하는 문서를 저장하는 데이터베이스를 포함하며, 각 문서 유형별 복수의 데이터베이스로 구성될 수 있다. 공부 작성 과정상 문제로 인해 공부상 기재된 정보는 실체적인 정보와 일치하지 않을 수 있지만, 실제 관측을 통해 정보를 수집하는 경우를 제외하면 가장 높은 신뢰성을 갖는다. 따라서 데이터베이스에 포함된 데이터 특성값 보정, 학습시 특성값 추가, 예측시 입력값 보완 등에 적극적으로 활용할 수 있어 구축 완성도가 높아질수록 예측의 정확도 또한 높아진다. 그러나 국내의 모든 부동산에 대해 실시간으로 공부 데이터베이스를 구축하는 것은 어려울 수 있으므로, 기간이나 지역을 한정하여 데이터를 수집하거나 요청이 들어올 때마다 데이터를 수집하는 방법 등을 채택할 수 있다.The study database includes a database that stores documents managed by the executive or judicial branch in relation to real estate, such as a certificate of all real estate registrations (hereinafter, a certified copy of the register), a building ledger, and a land ledger, and is divided into multiple databases for each document type. can be configured. Due to problems in the study writing process, the information recorded in the study may not match the actual information, but it has the highest reliability except when information is collected through actual observation. Therefore, it can be actively used for correction of data characteristic values included in the database, addition of characteristic values during learning, and supplementation of input values during prediction, and thus the accuracy of prediction increases as the construction completeness increases. However, since it can be difficult to build a real-time study database for all real estate in Korea, a method of collecting data by limiting a period or region or collecting data whenever a request is made can be adopted.

그리고, 특성 데이터 보정부(1312B)는 특성 데이터를 데이터베이스 검색에 용이한 형태로 보정하는 처리를 수행한다. 특성 데이터 보정부(1312B)에서 보정된 특성 데이터는 예측부(122)의 입력부 전달 데이터 획득부(1221)(1221)로 전달될 수 있다.Then, the characteristic data correcting unit 1312B performs a process of correcting the characteristic data in a form that is easy for database search. The characteristic data corrected by the characteristic data correction unit 1312B may be transmitted to the input unit transfer data acquisition units 1221 and 1221 of the prediction unit 122 .

도 8 및 도 9는 특성 데이터 획득 및 보정예시도이다. 도 8을 참조하면, 특성 데이터 획득부(1312A)는 사용자가 입력한 데이터를 보정한 값인 '인천광역시 부평구 산곡동 510외 1필지 푸**오 제103동 제11층 제1102호'를 전달받아, 공부 데이터베이스로부터 등기부등본을 발급 후 이용하여 특성 값을 추출할 수 있다.8 and 9 are exemplary diagrams of acquisition and correction of characteristic data. Referring to FIG. 8, the characteristic data acquisition unit 1312A receives 'No. 1102, 11th floor, No. 1102, No. 1102, 103, 103, Pu**o, 1 lot, including 510, Sangok-dong, Bupyeong-gu, Incheon', which is a value corrected by the user input, After issuing a certified copy of the register from the study database, it is possible to extract the characteristic value by using it.

예를 들어, 특성 데이터 획득부(1312A)는 등기부등본의 제목으로부터 주소 정보를 추출하고, 표제부 전유부분 표시 부분으로부터 전용면적과 층수를 획득하여, 특성 값을 보정할 수 있다. 다만 본 실시예는 특성 데이터 획득 과정의 일례에 불과하며, 전달받은 사용자 입력 데이터로 발급받는 공부의 종류, 부동산 시세 분류 모델 생성부(1214)에서 생성한 모델이 요구하는 특성값의 종류 등에 따라 추출 항목은 달라질 수 있다.For example, the characteristic data acquisition unit 1312A may correct the characteristic value by extracting address information from the title of the registered copy and acquiring the exclusive area and the number of floors from the exclusive display portion of the title part. However, this embodiment is only an example of a characteristic data acquisition process, and extraction is performed according to the type of study issued with the received user input data, the type of characteristic value required by the model generated by the real estate price classification model generation unit 1214, etc. Items may vary.

그리고, 도 9를 참조하면, 도 9에서 추출된 주소는 등기부등본에서 획득한 값이기 때문에 등기부등본을 검색할 때에는 유용하게 사용될 수 있으나, 도 9 하단 도면과 같이 실거래가 데이터베이스가 구성되어 있을 경우 내용의 형식이 일치하지 않아 데이터베이스 검색, 모델 적용 등의 과정에서 본 발명이 정상적으로 기능하는 데에 어려움이 있을 수 있다.And, referring to FIG. 9, since the address extracted in FIG. 9 is a value obtained from a certified copy of the register, it can be usefully used when searching for a certified copy of the register. Since the format of ' does not match, it may be difficult for the present invention to function normally in the process of database search, model application, and the like.

따라서, 본 발명의 실시 예에 따르면 특성 데이터 보정부(1312B)는 도 9에 도시된 바와 같이, 획득한 특성 데이터의 {주소} 항목을 {시군구, 번지, 본번, 부번, 단지명}으로 세분화하고, {주소} 항목의 내용을 각 항목에 맞게 보정 처리할 수 있다. 단 본 발명의 실시 과정에서 실제 보정 원리와 그 내용은 실거래가 데이터베이스를 구성하는 특성값 항목과 그 내용에 따라 달라질 수 있으며, 경우에 따라 지도 등의 외부 상용 서비스를 활용하여 특성 데이터를 보정할 수 있다.Therefore, according to an embodiment of the present invention, the characteristic data correction unit 1312B subdivides the {address} item of the acquired characteristic data into {si, county, street, main number, sub-number, complex name} as shown in FIG. 9, The contents of the {address} item can be corrected for each item. However, in the practice of the present invention, the actual correction principle and its contents may vary depending on the characteristic value items and contents constituting the real transaction price database, and in some cases, the characteristic data may be corrected by using an external commercial service such as a map. have.

도 10은 본 발명의 실시 예에 따른 예측부(122)를 보다 구체적으로 도시한 블록도이다.10 is a block diagram illustrating the prediction unit 122 according to an embodiment of the present invention in more detail.

도 10을 참조하면, 본 발명의 실시 예에 따른 예측부(122)는, 입력부 전달 데이터 획득부(1221), 부동산 실거래 데이터 획득부(1222), 부동산 시세 분류 모델 적용부(1223), 산출 데이터 전송부(1225)를 포함하며, 산출 데이터 보정부(1224)를 선택적으로 포함할 수 있다.Referring to FIG. 10 , the prediction unit 122 according to an embodiment of the present invention includes an input unit transmission data acquisition unit 1221 , a real estate transaction data acquisition unit 1222 , a real estate market price classification model application unit 1223 , and calculation data. The transmitter 1225 may be included, and the calculated data corrector 1224 may be optionally included.

입력부 전달 데이터 획득부(1221)는 입력부(131)로부터 사용자가 조회하고자 하는 부동산의 특성 데이터를 수신하여, 부동산 실거래 데이터 획득부(1222) 및 부동산 시세 분류 모델 적용부(1223)로 전달한다.The input unit transmission data acquisition unit 1221 receives the property data of the real estate that the user wants to inquire from the input unit 131 and transmits it to the real estate transaction data acquisition unit 1222 and the real estate market price classification model application unit 1223 .

부동산 실거래 데이터 획득부(1222)는 부동산의 특성 데이터로 실거래가 데이터베이스(110)를 검색하고, 부동산의 특성 데이터와 일치하는 실거래가 데이터가 존재하는 경우 상기 일치하는 실거래가 데이터를 획득하며, 존재하지 않는 경우 사전 설정 기준에 따라 상기 특성 데이터와 유사한 특성 데이터를 상기 실거래가 데이터로서 획득한다.The real estate transaction data acquisition unit 1222 searches the real transaction price database 110 with the characteristic data of the real estate, and if there is actual transaction data matching the characteristic data of the real estate, the matching actual transaction data is obtained, If not, characteristic data similar to the characteristic data is acquired as the actual transaction price data according to a preset criterion.

부동산 시세 분류 모델 적용부(1223)는 입력부로부터 전송받은 특성 데이터를 학습부(121)에서 사전 생성된 부동산 시세 분류 모델에 적용하여, 실거래가 예측 데이터를 산출한다.The real estate price classification model application unit 1223 applies the characteristic data received from the input unit to the real estate price classification model generated in advance by the learning unit 121 to calculate actual transaction price prediction data.

그리고, 산출 데이터 보정부(1224)는, 부동산 시세 분류 모델 적용부(1223)에서 산출한 실거래가 예측 데이터를 부동산 실거래 데이터 획득부(1222)에서 획득한 실거래가 데이터에 기반하여 보정한다.Then, the calculated data correcting unit 1224 corrects the actual transaction price prediction data calculated by the real estate market price classification model application unit 1223 based on the actual transaction price data obtained by the real estate actual transaction data obtaining unit 1222 .

이에 따라, 산출 데이터 전송부(1225)는 최종적으로 산출된 실거래가 예측 데이터를 출력부(132)로 전송하며, 출력부(132)는 사용자 단말(200)로 실거래가 예측 데이터 기반의 서비스 인터페이스 정보를 가공하여 전송 처리할 수 있다.Accordingly, the calculated data transmission unit 1225 transmits the finally calculated actual transaction price prediction data to the output unit 132 , and the output unit 132 transmits the service interface information based on the actual transaction price prediction data to the user terminal 200 . can be processed and processed for transmission.

도 11 내지 도 13은 상술한 본 발명의 실시 예에 따른 서버(100)의 동작 방법을 순차적으로 도시한 흐름도이다.11 to 13 are flowcharts sequentially illustrating an operation method of the server 100 according to an embodiment of the present invention described above.

먼저, 도 11을 참조하면, 서버(100)의 입력부(131)는 사용자 단말(200)로부터 사용자 입력 데이터를 수신하며(S101), 입력된 사용자 입력 데이터를 보정 처리한다(S103).First, referring to FIG. 11 , the input unit 131 of the server 100 receives user input data from the user terminal 200 ( S101 ) and corrects the input user input data ( S103 ).

그리고, 서버(100)의 입력부(131)는 부동산 공부 데이터베이스에 접속하며(S105), 보정된 사용자 입력 데이터와 연관된 부동산 공부 데이터를 검색한다(S107).Then, the input unit 131 of the server 100 accesses the real estate study database (S105), and searches for real estate study data associated with the corrected user input data (S107).

이후, 서버(100)의 입력부(131)는 부동산 공부 데이터 및 보정된 사용자 입력 데이터를 이용하여, 특성 데이터를 추출하고(S109), 추출된 특성 데이터를 보정하며(S111), 보정된 특성 데이터는 예측부(122)로 전달한다(S113).Thereafter, the input unit 131 of the server 100 uses the real estate study data and the corrected user input data, extracting characteristic data (S109), and correcting the extracted characteristic data (S111), the corrected characteristic data is It is transmitted to the prediction unit 122 (S113).

한편, 도 12를 참조하면, 서버(100)의 학습부(121)는, 실거래가 데이터베이스(110)를 사전 구축하고(S201), 실거래가 데이터로부터 일부 데이터가 분리된 가상 데이터 학습 모델용 데이터를 획득한다(S203).Meanwhile, referring to FIG. 12 , the learning unit 121 of the server 100 pre-builds the real transaction price database 110 ( S201 ), and the data for the virtual data learning model in which some data is separated from the real transaction price data. Acquire (S203).

그리고, 학습부(121)는 가상 데이터 학습 모델용 데이터를 이용하여, 기계학습 프로세스에 따른 예측 구간의 가상 데이터를 생성하며(S205), 가상 데이터 기반의 데이터 학습 트리를 생성하고(S207), 생성된 트리를 현재 학습 모델에 결합한다(S209).Then, the learning unit 121 generates virtual data of the prediction section according to the machine learning process by using the data for the virtual data learning model (S205), and generates a data learning tree based on the virtual data (S207), and generates The obtained tree is combined with the current learning model (S209).

이후, 학습부(121)는 앞서 S203 단계에서 분리된 데이터로 현재 학습 모델 기반 트리의 테스트를 수행할 수 있으며(S211), 검증 단계를 통해 목표 정확도 도달여부를 판단한다(S213).Thereafter, the learning unit 121 may perform a test of the current learning model-based tree with the data separated in the previous step S203 ( S211 ), and determine whether the target accuracy is reached through the verification step ( S213 ).

만약 목표 정확도에 도달하지 못한 경우, S207 단계부터 다시 수행할 수 있으며, 만약 목표 정확도에 도달한 경우에는 현재 결합된 트리 기반의 학습 모델을 최종 모델로 고정 처리한다(S215).If the target accuracy is not reached, it can be performed again from step S207. If the target accuracy is reached, the currently combined tree-based learning model is fixedly processed as the final model (S215).

그리고, 도 13을 참조하면, 서버(100)의 예측부(122)는, 입력부(131)에서 전달된 특성 데이터를 획득하고(S301), 특성 데이터를 이용한 실거래가 데이터베이스(110)의 검색을 수행한다(S303).And, referring to FIG. 13 , the prediction unit 122 of the server 100 obtains the characteristic data transmitted from the input unit 131 ( S301 ), and searches the real transaction price database 110 using the characteristic data. do (S303).

그리고, 서버(100)의 예측부(122)는, 특성 데이터와 일치하는 실거래가 데이터가, 상기 실거래가 데이터베이스(110)에서 존재하는지 여부를 판단하며(S305), 존재하는 경우에는 상기 일치한 주소 및 물건에 대응하는 실거래가 데이터를 추출하고(S307), 존재하지 않는 경우에는 일정 기준 이내의 유사 데이터로서, 예를 들어 주변 물건의 실거래가 데이터를 추출하여(S309), 학습부(121)에서 생성된 고정 모델 기반의 부동산 시세 분류 모델에 적용한다(S311).Then, the prediction unit 122 of the server 100 determines whether or not the actual transaction data matching the characteristic data exists in the real transaction database 110 (S305), and if there is, the matching address And extract the actual transaction data corresponding to the product (S307), and if it does not exist, as similar data within a certain standard, for example, extract the actual transaction price data of the surrounding product (S309), and in the learning unit 121 It is applied to the generated fixed model-based real estate market classification model (S311).

이후, 서버(100)의 예측부(122)는 산출 데이터의 보정 처리를 수행하고(S313), 보정된 산출 데이터를 출력부(132)로 전달한다.Thereafter, the prediction unit 122 of the server 100 performs correction processing on the calculated data ( S313 ), and transmits the corrected calculated data to the output unit 132 .

출력부(132)에서는 산출 데이터 기반의 실거래가 예측 정보를 사용자 단말(200)로 제공함으로써, 서비스 프로세스가 완료되게 된다.The output unit 132 provides the actual transaction price prediction information based on the calculated data to the user terminal 200, thereby completing the service process.

한편, 도 14 내지 도 19는 본 발명의 실시 예에 따른 실거래가 데이터베이스(110) 구축 단계와, 이에 기초한 학습 모델을 구축하기 위한 특성 데이터 구성, 특징 정보 인스턴스 구성 및 이에 따른 프로세스를 설명하기 위한 도면들이다.Meanwhile, FIGS. 14 to 19 are diagrams for explaining the step of constructing the real transaction price database 110, the configuration of characteristic data for building a learning model based thereon, the configuration of the characteristic information instance, and the process according thereto according to an embodiment of the present invention. admit.

도 14를 참조하면, 본 발명의 실시 예에 따른 실거래가 데이터베이스(110)의 정보 저장부(121)는, 실거래가 데이터 가공 적재부(1121) 및 모델 기반 실거래가 데이터 병합부(1122)를 포함한다.Referring to FIG. 14 , the information storage unit 121 of the real transaction price database 110 according to an embodiment of the present invention includes a real transaction price data processing and loading unit 1121 and a model-based actual transaction price data merging unit 1122 . do.

보다 구체적으로, 정보 저장부(121)는, 실거래가 데이터베이스(110)를 구축하기 위해 시계열 데이터를 주기적으로 수집 및 저장함에 있어서, 기계학습과 예측에 용이한 부동산 실거래가 데이터를 가공 및 저장하여 데이터베이스를 보다 효율적으로 구축할 수 있다.More specifically, the information storage unit 121, in periodically collecting and storing time series data to build the real transaction price database 110, processes and stores real estate transaction price data, which is easy for machine learning and prediction, to the database can be built more efficiently.

이에 따라, 본 발명의 실시 예에 따른 정보 저장부(121)는, 시계열 데이터를 주기적으로 수집하여 최신화하는 경우 데이터 적재를 위한 데이터베이스를 효율적으로 구축하여 별도로 운용하게 할 수 있다. 특히, 기계학습에 활용하기 용이하도록 가공이 완료된 상태로 공개되는 시계열 데이터 세트가 부족한 상황이며, 서비스 운영의 측면에서 외부 의존성 확대는 곧 통제 불가능한 리스크의 증가로 이어진다. 따라서, 데이터 제공 주체의 내부 정책 변화나 서버 불안정 등의 상황으로 인해 해당 데이터 세트나 데이터베이스에 접근이 불가능할 경우 서비스 제공에 큰 차질을 빚을 수 있다.Accordingly, the information storage unit 121 according to an embodiment of the present invention can efficiently construct and separately operate a database for data loading when time series data is periodically collected and updated. In particular, there is a shortage of time-series data sets that are ready to be processed for easy use in machine learning, and the expansion of external dependence in terms of service operation leads to an increase in uncontrollable risks. Therefore, if access to the data set or database is not possible due to changes in the internal policy of the data provider or server instability, service provision may be severely disrupted.

이를 해결하기 위해 본 발명의 실시 예에 따른 정보 저장부(121)는, 실거래가 데이터 가공 적재부(1121) 및 모델 기반 실거래가 데이터 병합부(1122)를 포함하는 것으로, 먼저 실거래가 데이터 가공 적재부(1121)는, 공공 데이터를 수집 및 가공하여 데이터베이스상에 적재 처리한다.In order to solve this problem, the information storage unit 121 according to an embodiment of the present invention includes a real transaction price data processing and loading unit 1121 and a model-based actual transaction price data merging unit 1122. First, actual transaction price data processing and loading The unit 1121 collects and processes public data, and loads it on a database.

도 15를 참조하면, 2020년 8월 28일을 기준으로, 국토교통부 서버에서는 건물 종류(아파트, 연립/다세대, 단독/다가구, 오피스텔 등)와 거래 유형(매매, 전세 등)에 따라 시군구, 번지, 본번, 부번, 단지명, 전용면적, 계약년월, 계약일, 거래금액, 층, 건축년도, 도로명 정보 등의 정보를 포함한 실거래가 데이터를 매일마다 전일 신고분까지 제공하고 있다. 웹페이지 뿐만 아니라 csv나 스프레드시트 파일 형태로도 데이터가 제공되고 있다.15, as of August 28, 2020, in the server of the Ministry of Land, Infrastructure and Transport, depending on the building type (apartment, tenement/multi-family, single/multi-family, officetel, etc.) and transaction type (sales, charter, etc.) Actual transaction price data including information such as , main number, sub-number, complex name, exclusive area, contract year, contract date, transaction amount, floor, construction year, and road name information is provided every day up to the previous day's report. Data is provided in the form of csv or spreadsheet files as well as web pages.

그리고, 도 16에 도시된 바와 같이, 실거래가 데이터 가공 적재부(1121)는, 실거래가 데이터 크롤러 모듈을 이용하여, 매일 특정 시각에 국토교통부 실거래가 공개시스템에 업로드된 실거래가 데이터를 크롤링하여 수집할 수 있다.And, as shown in FIG. 16 , the real transaction price data processing and loading unit 1121 crawls and collects the actual transaction price data uploaded to the real transaction price disclosure system of the Ministry of Land, Infrastructure and Transport at a specific time every day using the real transaction price data crawler module. can do.

그리고, 실거래가 데이터 가공 적재부(1121)는, 각각의 실거래가 데이터에 데이터 연번 정보(id), 건물 종류 정보와 거래 유형 정보를 결합하고, 이를 최종적으로 실거래가 데이터베이스(110)에 적재할 수 있다.In addition, the actual transaction price data processing and loading unit 1121 combines the data serial number information (id), building type information, and transaction type information to each actual transaction price data, and finally loads it into the actual transaction price database 110 . have.

한편, 모델 기반 실거래가 데이터 병합부(1122)는, 차후 학습 및 예측 진행시 필요한 동일한 부동산 물건 별 데이터 로드를 용이하게 하기 위해, 동일 부동산에 대응한 실거래가 데이터를 하나의 모델로 병합하는 프로세스를 수행할 수 있다.On the other hand, the model-based real transaction price data merging unit 1122 performs a process of merging the real transaction price data corresponding to the same real estate into one model in order to facilitate loading of data for the same real estate object required for subsequent learning and prediction. can be done

모델 기반 실거래가 데이터 병합부(1122)는, 동일 부동산 여부를 판단하기 위해, 실거래가 데이터 상에서 부동산마다 고유하게 생성되는 정보 내지는 정보 조합을 선별할 수 있다.The model-based real transaction price data merging unit 1122 may select information or information combinations that are uniquely generated for each real estate on the real transaction data in order to determine whether the real estate is the same.

예를 들어, '우성' 이라는 단지명을 가진 아파트는 전국 각지에 존재할 수 있으므로, 단지명 정보만으로는 동일 부동산 여부를 판단하기에 불충분하다. 단지명 정보에 주소 정보, 즉 시군구 정보 및 번지 정보가 결합되더라도 해당 부동산 단지 내 아파트와 오피스텔이 혼재할 수 있으므로 역시 부동산의 분류 기준으로 설정하기에 어려움이 있다.For example, since an apartment with a complex name of 'Woosung' can exist in various parts of the country, complex name information alone is insufficient to determine whether it is the same real estate. Even if address information, that is, city, county, and street information is combined with complex name information, apartments and officetels in the real estate complex may coexist, so it is difficult to set it as a classification standard for real estate.

따라서 도 17에 도시된 바와 같이, 본 발명의 실시 예에 따른 모델 기반 실거래가 데이터 병합부(1122)는, (건물 종류, 시군구, 번지, 단지명)의 정보 조합으로 실거래가 데이터를 1차적으로 분류하며, 이후 각 (건물 종류, 시군구, 번지, 단지명) 조합마다 id가 가장 낮은 데이터를 기준으로 부동산 물건 모델을 생성하고, 분류된 데이터를 모델에 병합함으로써, 학습에 용이한 부동산 실거래가 데이터를 부동산 물건별로 구성, 분류 및 적재 처리할 수 있다.Therefore, as shown in FIG. 17, the model-based real transaction price data merging unit 1122 according to an embodiment of the present invention primarily classifies the actual transaction price data into information combinations of (building type, city, county, street, street name). Then, for each (building type, city, county, street, complex name) combination, a real estate object model is created based on the data with the lowest id, and by merging the classified data into the model, real estate transaction data that is easy to learn is converted to real estate. It can be organized, sorted, and stacked for each item.

한편, 도 18을 참조하면, 본 발명의 실시 예에 따른 모델 기반 실거래가 데이터 병합부(1122)는, 각 실거래가 데이터마다 지오코딩(위도, 경도) 정보를 추가하는 프로세스를 더 수행할 수 있다.Meanwhile, referring to FIG. 18 , the model-based real transaction price data merging unit 1122 according to an embodiment of the present invention may further perform a process of adding geocoding (latitude, longitude) information for each real transaction data data. .

텍스트 형태의 주소 정보는 행정구역 변동에 따라 변경될 가능성이 존재하고 그에 부수하는 관계 정보 구현이 까다로운 반면, 지오코딩 정보는 절대 위치 정보이자 일종의 좌표계로써 수치화 가능한 위치 정보에 해당되어 텍스트 형태의 주소 정보보다 학습에 활용하기 용이하기 때문이다.Address information in text form has the potential to change according to changes in administrative districts and it is difficult to implement the accompanying relational information, whereas geocoding information is absolute location information and quantifiable location information as a kind of coordinate system. This is because it is easier to use for learning.

이와 같이 데이터 가공 적재부(1121) 및 모델 기반 실거래가 데이터 병합부(1122)에 의해 구축된 실거래가 데이터베이스(110)를 이용하여 본 발명의 실시 예에 따른 학습부(121)는, 전술한 바와 같은 학습 프로세스를 수행하여 트리 기반의 부동산 실거래가 예측을 위한 학습 모델을 생성할 수 있다.As described above, the learning unit 121 according to the embodiment of the present invention using the real transaction price database 110 built by the data processing loading unit 1121 and the model-based real transaction price data merging unit 1122 as described above. By performing the same learning process, it is possible to create a learning model for tree-based real estate price prediction.

여기서, 트리 기반의 학습 프로세스에 대하여 보다 구체적으로 설명하고자 한다. 최근 시장에서는 기계학습을 활용하여 부동산 예상 실거래가를 산출하고자 하는 흐름이 전개되고는 있으나, 예측하고자 하는 데이터의 특성에 맞는 시도가 이루어지는지에 대해서는 회의적인 상황이다. 예를 들어 빅밸류의 관련 특허(10-2016-0123722 외 3건)에서는 다중회귀분석과 유사도 점수 등의 방법을 활용하여 부동산의 예상 실거래가를 산출할 수 있는 것으로 설명하고 있다. 그러나, 다중회귀분석의 경우 정교한 모델링이 어려울 뿐만 아니라 투입하는 특징 정보의 설명력에 그 정확도가 크게 좌우되므로 부동산 예상 실거래가 산출에 활용하기에는 부적합하다. 유사도 점수의 경우에도 각 부동산 물건마다 유사도를 계산할 부동산 물건을 다수 매칭하여야 하는데, 사용자의 예측 요청 이전 모든 부동산에 대해 유사도 점수를 계산해놓지 않는 이상 계산에 많은 시간이 소모되므로 실시간 서비스 제공시 애로사항이 다수 발생할 것으로 예상된다.Here, the tree-based learning process will be described in more detail. In recent years, there is a trend in the market to calculate the expected real estate transaction price using machine learning, but it is skeptical as to whether an attempt is made to match the characteristics of the data to be predicted. For example, Big Value's related patents (10-2016-0123722 and 3 other cases) explain that the expected actual transaction price of real estate can be calculated using methods such as multiple regression analysis and similarity scores. However, in the case of multiple regression analysis, sophisticated modeling is difficult, and the accuracy of the input feature information greatly depends on the explanatory power, so it is not suitable for use in calculating the expected real estate transaction price. In the case of the similarity score, it is necessary to match a number of real estate objects for which the similarity is to be calculated for each real estate object. Many are expected to occur.

한편, 부동산 실거래가 데이터는 특징 간 구분을 명확히 할 수 있어 표로 정리 가능한 데이터, 즉 태뷸러(tabular) 데이터에 해당되는데, 이와 같은 유형의 데이터는 특정 값을 기준으로 학습 세트를 나누어 학습하는 결정 트리 기반 학습 모델이 효과적인 것으로 알려져 있다. 그러나 동시에 학습 세트보다 미래 시점의 시계열 데이터를 예측하게 하는 경우에는 성능이 급락한다는 단점이 존재한다.On the other hand, real estate transaction price data is data that can be organized into tables, that is, tabular data because it can clearly distinguish between features. This type of data is a decision tree that learns by dividing a training set based on a specific value. Based learning models are known to be effective. However, at the same time, there is a disadvantage in that performance plummets when time series data at a future time point is predicted rather than the training set.

따라서 부동산 예상 실거래가를 산출하기 위해 기계학습을 활용하는 경우, 위에 나열된 기계학습 방식을 보완하거나 이외의 기계학습 방식을 채택하여야 하는 것으로, 앞서 설명된 데이터 분리 기반 학습 처리가 수행되는 것이다. 바람직하게는 현재 예상 실거래가 산출 프로세스를 위해, 결정 트리 기반 앙상블 학습방식을 이용한 그래디언트 부스팅(Gradient Boosting) 방식이 예시될 수 있으며, 앞서 설명된 기존기술의 약점을 보완하기, 위해 실거래가 데이터베이스를 활용하여 선형 회귀 등 보다 가벼운 기계학습 처리를 통해 특징 트리를 추가 생성하여 학습을 추가 진행하게 할 수 있다.Therefore, when using machine learning to calculate the expected real transaction price of real estate, it is necessary to supplement the machine learning method listed above or adopt a machine learning method other than the above, and the data separation-based learning processing described above is performed. Preferably, for the current expected actual transaction price calculation process, a gradient boosting method using a decision tree-based ensemble learning method may be exemplified, and the actual transaction price database is used to compensate for the weaknesses of the existing technology described above. Therefore, it is possible to further generate a feature tree through lighter machine learning processing such as linear regression to further advance learning.

이하에서는 도 19를 참조하여, 이러한 본 발명의 실시 예를 보다 효율적으로 실시할 수 있는 학습을 위한 세부 프로세스와, 학습 모델의 특징 인스턴스 구성 프로세스를 설명하고자 한다.Hereinafter, with reference to FIG. 19 , a detailed process for learning capable of more efficiently implementing this embodiment of the present invention and a process for configuring a feature instance of a learning model will be described.

도 19를 참조하면, 먼저 실거래가 데이터베이스(110)는, 외부 부동산 정보 데이터베이스(300)로부터 실거래가 데이터를 수집하여, 식별 정보, 건물 유형 및 거래 유형에 따라, 부동산 물건별 개별 데이터 모델을 생성한다(S1101).Referring to FIG. 19 , first, the actual transaction price database 110 collects actual transaction price data from the external real estate information database 300 and generates individual data models for each real estate object according to identification information, building type, and transaction type. (S1101).

그리고, 실거래가 데이터베이스(110)는 부동산 물건별 개별 데이터 모델을 이용하여 실거래가 데이터를 병합하고, 실거래가 데이터에 지오코딩 정보를 매핑 및 부가하여 실거래가 데이터베이스를 구축한다(S1103).Then, the real transaction price database 110 merges the actual transaction price data using the individual data model for each real estate product, and maps and adds geocoding information to the actual transaction price data to construct the actual transaction price database (S1103).

이후, 학습부(121)는, 실거래가 데이터베이스로부터 학습 모델 생성을 위한 트레이닝 세트로서 산출된 복수의 특징 정보 인스턴스를 구성한다(S1105).Thereafter, the learning unit 121 configures a plurality of feature information instances calculated as a training set for generating a learning model from the real transaction price database (S1105).

학습부(121)는 먼저 부동산 실거래 데이터 처리부(1211)를 통해, 데이터 로드 및 분리 처리를 수행하는 바, 앞서 지오코딩 정보까지 추가가 완료된 실거래가 데이터베이스를 로드하고, 로드된 데이터 세트를 특정 시점 구간을 기준으로 분리하여, 해당 시점 구간 이전의 데이터를 트레이닝 데이터 세트, 해당 시점 이후의 데이터를 테스트 세트로 분리 처리할 수 있다. 여기서 시점 구간은 예를 들어, 현재 시점으로부터 3개월 이전 일정 구간과 3개월 이후 일정 구간 또는 현재까지의 구간일 수 있으나 이는 설정에 따라 가변될 수 있다.The learning unit 121 first performs data loading and separation processing through the real estate real transaction data processing unit 1211, loads the real transaction price database in which the addition of geocoding information has been completed, and stores the loaded data set at a specific point in time. By separating based on , data before the corresponding time period may be separated into a training data set, and data after the corresponding time point may be separated and processed as a test set. Here, the time period may be, for example, a period of three months before the current time, a period of three months after, or a period up to the present, but this may vary according to settings.

그리고, 학습부(121)는 학습에 따른 트리 모델 생성을 위해, 트레이닝 데이터 세트로부터 특징 인스턴스를 구성할 수 있다. 이를 위해, 부동산 실거래 데이터 처리부(1211)는, 트레이닝 세트의 각 데이터마다 그래디언트 부스팅을 처리하기 위한 특징 정보를 추출하거나 추가 처리하여 특징 정보 인스턴스를 구성할 수 있다.In addition, the learner 121 may construct a feature instance from the training data set to generate a tree model according to learning. To this end, the real estate transaction data processing unit 1211 may configure a characteristic information instance by extracting or additionally processing characteristic information for processing the gradient boosting for each data of the training set.

여기서, 특징 정보 인스턴스는 아래와 같이 다양하게 예시될 수 있다.Here, the feature information instance may be variously exemplified as follows.

제1 실시 예에서, 특징 정보 인스턴스는 실거래가 데이터베이스의 단일 실거래가 모델 데이터를 그대로 추출하는 제1 특징 정보 인스턴스를 포함할 수 있다.In the first embodiment, the feature information instance may include the first feature information instance for extracting the single real transaction price model data of the real transaction price database as it is.

이 경우, 제1 특징 정보 인스턴스는, 건물 종류 정보, 전용 면적 정보, 층 정보, 건축년도 정보, 위도 및 경도 정보를 포함할 수 있다. 특히 동일 물건의 실거래가 데이터는 건물 종류 및 건축년도가 모두 동일한 값을 가질 수 있다.In this case, the first characteristic information instance may include building type information, exclusive area information, floor information, construction year information, and latitude and longitude information. In particular, data on the actual transaction price of the same product may have the same value for both the building type and the year of construction.

제2 실시 예에서, 특징 정보 인스턴스는 실거래가 데이터베이스의 단일 실거래가 모델 데이터로부터 변환된 제2 특징 정보 인스턴스를 포함할 수 있다.In the second embodiment, the characteristic information instance may include the second characteristic information instance converted from the single actual transaction price model data of the actual transaction price database.

예를 들어, 제2 특징 정보 인스턴스는 날짜 특징 정보를 포함할 수 있는 바, 이는 계약일이 1년 중 몇 번째 날짜인지를 학습에 활용하기 위한 것으로, 계약년월 및 계약일의 날짜 정보를 코사인 또는 사인 처리한 날짜 특징 정보 값을 포함할 수 있다.For example, the second characteristic information instance may include date characteristic information, which is used for learning what the contract date is in a year, and cosine or sine processing of the contract date and date information of the contract date. One date feature information value may be included.

그리고 제3 실시 예에서, 특징 정보 인스턴스는 실거래가 데이터베이스의 단일 실거래가 모델 데이터에 대응하여, 참조된 외부 데이터를 부가한 제3 특징 정보 인스턴스를 포함할 수 있다.And in the third embodiment, the feature information instance may include a third feature information instance in which the referenced external data is added corresponding to the single real transaction price model data of the real transaction price database.

예를 들어, 제3 특징 정보 인스턴스는 건물의 최고층 정보를 포함할 수 있으며, 이는 실거래가 모델 내 데이터 중 가장 높은 층으로 참조될 수 있으나, 외부 부동산 정보 데이터베이스(300)로부터 건축물대장 데이터가 수집된 경우에는 상기 건출물대장 데이터로부터 획득되어 제3 특징 정보 인스턴스에 부가될 수 있다.For example, the third characteristic information instance may include information on the highest floor of the building, which may be referred to as the highest floor among data in the actual transaction model, but the building ledger data is collected from the external real estate information database 300 . In this case, it may be obtained from the building ledger data and added to the third characteristic information instance.

또한, 예를 들어, 제3 특징 정보 인스턴스는 실거래가 데이터가 소속된 광역지방자치단체(특별시, 광역시, 도, 특별자치시, 특별자치도) 실거래가 평균 정보, 실거래가 데이터가 소속된 기초지방자치단체(시, 군, 구) 실거래가 평균 정보를 더 포함할 수 있다.Also, for example, the third characteristic information instance is the regional government (special city, metropolitan city, province, special self-governing city, special self-governing province) to which the actual transaction price data belongs, the actual transaction price average information, and the basic local government to which the actual transaction price data belongs The group (city, county, gu) may further include average actual transaction price information.

한편, 제4 특징 정보 인스턴스는 외부의 다른 기계학습 방식으로 생성된 예상 실거래가 데이터를 더 포함할 수 있다. 예를 들어, 학습부(121)에서 리지 회귀 방식으로 학습 모델을 생성하는 경우, 회귀에 이용할 트레이닝 세트의 시간 및 공간 범위를 상이하게 설정한 복수의 특징 정보 인스턴스를 상기 제4 특징 정보 인스턴스로 구성할 수도 있다.Meanwhile, the fourth characteristic information instance may further include expected actual transaction price data generated by another external machine learning method. For example, when the learning unit 121 generates a learning model using the ridge regression method, a plurality of characteristic information instances in which time and spatial ranges of a training set to be used for regression are set differently are configured as the fourth characteristic information instance. You may.

제4 특징 정보 인스턴스를 예를 들면, 특정 기간 이내(1년, 2년, 3년) 같은 건물 동일 평형의 실거래가 최소값 정보, 최대값 정보, 평균 정보를 포함하는 트레이닝 세트가 예시될 수 있다.For example, the fourth characteristic information instance may be a training set including minimum value information, maximum value information, and average information of the actual transaction price of the same building within a specific period (1 year, 2 years, 3 years).

또한, 제4 특징 정보 인스턴스는, 특정 기간 이내(1년, 2년, 3년) 같은 건물 동일 평형 동일 층의 실거래가 최소값 정보, 최대값 정보, 평균 정보, 하위 5% 정보, 상위 5% 정보를 포함하는 트레이닝 세트가 예시될 수 있다.In addition, the fourth characteristic information instance is, within a specific period (1 year, 2 years, 3 years), the actual transaction price minimum value information, maximum value information, average information, lower 5% information, upper 5% information of the same floor of the same building within a specific period A training set including

그리고, 제4 특징 정보 인스턴스는, 특정 기간 이내(6개월, 1년, 2년) 같은 건물 실거래 데이터를 대상으로 한 릿지 회귀(ridge regression) 예측값(feature : 전용면적, timestamp, target : 실거래가), 결정계수(r squared 값) 및 거래 횟수 정보를 포함하는 트레이닝 세트가 예시될 수 있다.And, the fourth feature information instance is a ridge regression predicted value (feature: exclusive area, timestamp, target: actual transaction price) for the same building transaction data within a specific period (6 months, 1 year, 2 years) , a determination coefficient (r squared value) and a training set including information on the number of transactions may be exemplified.

또한, 제4 특징 정보 인스턴스는, 특정 기간 이내(6개월, 1년, 2년) 같은 건물 동일 평형 실거래 데이터를 대상으로 한 릿지 회귀(ridge regression) 예측값(feature : timestamp, target : 실거래가), 결정계수(r squared 값), 거래 횟수 정보를 포함하는 트레이닝 세트가 예시될 수 있다.In addition, the fourth feature information instance is a ridge regression predicted value (feature: timestamp, target: actual transaction price) for the same building equilibrium actual transaction data within a specific period (6 months, 1 year, 2 years), A training set including a coefficient of determination (r squared value) and information on the number of transactions may be exemplified.

그리고, 제4 특징 정보 인스턴스는, 특정 기간 이내(3개월, 6개월) 같은 행정동 혹은 행정리 내 건물 실거래 데이터를 대상으로 한 릿지 회귀(ridge regression) 예측값(feature : 전용면적, timestamp, target : 실거래가), 결정계수(r squared 값), 거래 횟수 정보를 포함하는 트레이닝 세트가 예시될 수 있다.And, the fourth characteristic information instance is a ridge regression predicted value (feature: exclusive area, timestamp, target: actual transaction price) for the actual transaction data of buildings within the same administrative dong or administrative district within a specific period (3 months, 6 months) ), a coefficient of determination (r squared value), and a training set including information on the number of transactions may be exemplified.

이후, 학습부(121)는, 시계열 추세 예측 학습 모델 생성부(1212)를 통해, 사전 설정된 하이퍼 파라미터를 이용한 특징 정보 인스턴스의 그래디언트 부스팅 기반 기계학습을 수행하여, 시계열 추세 예측 학습 모델을 생성한다(S1107).Thereafter, the learning unit 121, through the time series trend prediction learning model generation unit 1212, performs gradient boosting-based machine learning of the feature information instance using a preset hyper parameter to generate a time series trend prediction learning model ( S1107).

보다 구체적으로, 학습부(121)는, 예를 들어 리프 중심 트리 분할(Leaf Wise) 방식을 사용한 라이트 GBM(LightGBM) 알고리즘 등을 이용하여, 트리 기반의 학습 모델을 생성할 수 있다. 라이트 GBM 알고리즘은 최대한 균형이 잡힌 트리를 유지하면서 분할하기 때문에 트리의 깊이가 최소화 되는 장점이 있다.More specifically, the learning unit 121 may generate a tree-based learning model using, for example, a LightGBM algorithm using a leaf-centered tree splitting (Leaf Wise) method. The light GBM algorithm has the advantage of minimizing the depth of the tree because it splits while maintaining a balanced tree as much as possible.

그리고, 학습부(121)는 적절한 하이퍼파라미터(hyperparameter)를 사전 설정할 수 있는 바, 만약 설정되지 않은 상태에서는 유효 세트(valid set)을 활용할 수 있다.In addition, the learning unit 121 may preset an appropriate hyperparameter, and if it is not set, a valid set may be used.

이러한 유효 세트(valid set)는 상기 트레이닝 세트의 마지막 일정 기간(예를 들어, 3개월) 데이터를 분리하여 임의의 하이퍼파라미터를 이용한 그래디언트 부스팅을 처리하여 생성된 모델부터, 테스트를 수행하여 산출될 수 있다.Such a valid set can be calculated by performing a test from a model created by separating the last predetermined period (eg, 3 months) data of the training set and processing gradient boosting using arbitrary hyperparameters. have.

다만, 과적합(overfitting)이 발생되는 것을 방지하기 위해, 결정 트리(decision tree의 depth)가 지나치게 크거나 말단 노드(node)가 지나치게 많아지는 것을 사전에 방지하기 위한 방안으로 일부 노드에 대해서 특정 깊이에 도달하기 이전이라도 하위 노드의 생성을 차단하는, 얼리 스토핑(early stopping) 프로세스가 부가될 수 있다.However, in order to prevent overfitting from occurring, the depth of the decision tree is excessively large or the number of end nodes is excessively large. An early stopping process that blocks the creation of child nodes even before reaching .

한편, 시계열 추세 예측 학습 모델이 생성되면, 학습부(121)는 부동산 예측 데이터 생성부(1213)를 통해, 실거래가 데이터베이스(110)에서 분리된 테스트 데이터 세트의 특징 정보 인스턴스를 동일하게 구성하고, 상기 시계열 추세 예측 학습 모델에 적용하여, 학습 모델의 예측 오차 보정을 처리한다(S1109).On the other hand, when the time series trend prediction learning model is generated, the learning unit 121 configures the feature information instance of the test data set separated from the real transaction price database 110 through the real estate prediction data generation unit 1213 identically, By applying to the time series trend prediction learning model, the prediction error correction of the learning model is processed (S1109).

학습부(121)는 검증부(1215)를 통해, 학습 모델의 예측 오차 보정 데이터로부터 모델의 트리 추가 또는 고정을 처리하고, 실거래가 신뢰구간을 산출할 수 있다(S1111).The learning unit 121 may process addition or fixation of the model tree from the prediction error correction data of the learning model through the verification unit 1215 and calculate the actual transaction price confidence interval ( S1111 ).

즉, 학습부(121)는, 분리 구성된 테스트 데이터 세트에 대응하여, 트레이닝 세트와 같은 방식을 특징 정보 인스턴스를 구성할 수 있으며, 이를 현재의 트리 기반 시계열 추세 예측 학습 모델에 적용하여, 예측된 값과 테스트 데이터의실제 실거래가 사이의 오차를 검증할 수 있다.That is, the learning unit 121 may configure the feature information instance in the same manner as the training set in response to the separately configured test data set, and apply this to the current tree-based time series trend prediction learning model, and the predicted value It is possible to verify the error between the actual transaction price of the test data.

그리고, 학습부(121)는, 목표 성능 미달시 시계열 추세 예측 학습 모델 생성부(1212)에 전달하여, 학습 트리를 더 부가시키도록 할 수 있으며, 목표 만족시에는 모델을 고정 처리하게 한다. 고정된 모델은 부동산 시세 분류 모델 생성부(1214)로 전달되어, 이후 예측부(122)에서 이용할 수 있게 된다.In addition, the learning unit 121 transmits to the time series trend prediction learning model generation unit 1212 when the target performance is not achieved, so that a learning tree is further added, and when the target performance is satisfied, the model is fixedly processed. The fixed model is transmitted to the real estate price classification model generation unit 1214 , and then can be used in the prediction unit 122 .

즉, 예측부(122)에서는 사용자 단말(200)의 예측 요청에 따라, 주소 정보 및 공부 정보로부터 산출 가능한 부동산 물건 정보를 매칭하고(S1113), 매칭된 부동산 물건 정보로부터 특징 정보 인스턴스를 구성하며, 이를 고정된 학습 모델에 적용하여 예상 실거래가를 산출할 수 있다(S1115).That is, the prediction unit 122 matches the real estate object information that can be calculated from the address information and the study information according to the prediction request of the user terminal 200 (S1113), and configures a feature information instance from the matched real estate object information, By applying this to the fixed learning model, it is possible to calculate the expected actual transaction price (S1115).

이에 따라, 서버(100)는 출력부(132)를 통해, 예상 실거래가 및 상기 신뢰구간 기반의 정확도 정보를 포함하는 부동산 실거래가 예측 정보를 제공할 수 있다(S1117).Accordingly, the server 100 may provide, through the output unit 132, real estate actual transaction price prediction information including the expected actual transaction price and accuracy information based on the confidence interval (S1117).

보다 구체적으로, 모델 고정 이후 예측부(122)의 예측 진행시에 산출되는 예측값, 즉 예상 실거래가는 예측을 위한 특징 정보 인스턴스를 포함하는 노드들에 포함된 데이터의 타겟(target) 값, 즉 예측 실거래가의 평균 값으로 산출될 수 있다. 각 노드 별 타겟 값은 다양할 수 있으나, 그 분포는 일정한 확률분포를 형성한다고 전제함에 따라, 노드들 간 평균과 표준편차를 활용하여 실거래가의 신뢰구간을 산출할 수 있다.More specifically, the prediction value, that is, the expected actual transaction price calculated during the prediction process of the prediction unit 122 after fixing the model, is the target value of the data included in the nodes including the feature information instance for prediction, that is, the predicted actual transaction. It can be calculated as an average value of Although the target value for each node may vary, it is assumed that the distribution forms a constant probability distribution, so the confidence interval of the actual transaction price can be calculated by using the average and standard deviation between nodes.

반대로, 부동산 물건 정보가 아니라, 특정 가격을 지정하여 예측하는 경우, 노드들의 타겟 분포 정보는 그 가격이 어느 위치에 분포하고 있는지를 확인할 수 있게 하는 바, 본 발명의 실시 예에 따른 출력부(132)는, 이를 이용하여 가격대별 거래 가능성 정보를 사용자 단말(200)로 제공할 수 있다.Conversely, when predicting by designating a specific price rather than real estate product information, the target distribution information of the nodes makes it possible to check where the price is distributed, and the output unit 132 according to the embodiment of the present invention. ) may use this to provide transaction possibility information for each price point to the user terminal 200 .

예를 들어, 출력부(132)는, 예상 실거래가가 1억 6,000만 원으로 산출된 부동산 물건이 있고, 해당 부동산 물건이 분류된 노드 내 실거래가 데이터상 실거래가가 정규분포를 이루며, 그 표준편차가 1,000만 원이라 하면, 이 때 해당 부동산 물건이 1억 9,000만원 이상에 거래될 확률은, P(x>19000) = P(z>(19000-16000)/1000) = P(z>3) = 0.00135 = 0.135% 으로 계산할 수 있다. 이러한 확률 정보에 따라, 출력부(132)는 사용자 단말(200)로 해당 부동산 물건에 대해 요청된 1억 9천만원 이상으로 거래될 확률은 0.135%로 낮을 것임을 시계열 모델 분석 기반 예측 정보로서 제공할 수 있다.For example, in the output unit 132, there is a real estate product whose expected actual transaction price is calculated to be 160 million won, and the actual transaction price data loss in the node where the corresponding real estate product is classified has a normal distribution, and the standard If the deviation is 10 million won, then the probability that the real estate product will be traded at 190 million won or more at this time is, P(x>19000) = P(z>(19000-16000)/1000) = P(z>3 ) = 0.00135 = 0.135%. According to this probability information, the output unit 132 may provide as time series model analysis-based prediction information that the probability of being transacted for more than 190 million won requested for the real estate product by the user terminal 200 will be as low as 0.135%. have.

따라서, 본 발명의 실시 예에 따른 서버(100)는, 예상 실거래가 뿐만 아니라, 신뢰구간 기반의 정확도 정보를 포함하는 부동산 실거래가의 예측 정보를 보다 구체적이고 정확하게 제공할 수 있는 것이다.Accordingly, the server 100 according to an embodiment of the present invention can provide more specifically and accurately predictive information of the actual transaction price of real estate including the accuracy information based on the confidence interval as well as the expected actual transaction price.

한편, 상술한 본 발명의 다양한 실시 예들에 따른 방법은 프로그램으로 구현되어 다양한 비일시적 판독 가능 매체(non-transitory computer readable medium)에 저장된 상태로 각 서버 또는 기기들에 제공될 수 있다.Meanwhile, the above-described method according to various embodiments of the present invention may be implemented as a program and provided to each server or device while being stored in various non-transitory computer readable media.

비일시적 판독 가능 매체란 레지스터, 캐쉬, 메모리 등과 같이 짧은 순간 동안 데이터를 저장하는 매체가 아니라 반영구적으로 데이터를 저장하며, 기기에 의해 판독(reading)이 가능한 매체를 의미한다. 구체적으로는, 상술한 다양한 어플리케이션 또는 프로그램들은 CD, DVD, 하드 디스크, 블루레이 디스크, USB, 메모리카드, ROM 등과 같은 비일시적 판독 가능 매체에 저장되어 제공될 수 있다.The non-transitory readable medium refers to a medium that stores data semi-permanently, rather than a medium that stores data for a short moment, such as a register, cache, memory, etc., and can be read by a device. Specifically, the above-described various applications or programs may be provided by being stored in a non-transitory readable medium such as a CD, DVD, hard disk, Blu-ray disk, USB, memory card, ROM, and the like.

또한, 이상에서는 본 발명의 바람직한 실시예에 대하여 도시하고 설명하였지만, 본 발명은 상술한 특정의 실시예에 한정되지 아니하며, 청구범위에서 청구하는 본 발명의 요지를 벗어남이 없이 당해 발명이 속하는 기술분야에서 통상의 지식을 가진자에 의해 다양한 변형실시가 가능한 것은 물론이고, 이러한 변형실시들은 본 발명의 기술적 사상이나 전망으로부터 개별적으로 이해되어서는 안될 것이다.In addition, although preferred embodiments of the present invention have been illustrated and described above, the present invention is not limited to the specific embodiments described above, and the technical field to which the present invention belongs without departing from the gist of the present invention as claimed in the claims In addition, various modifications may be made by those of ordinary skill in the art, and these modifications should not be individually understood from the technical spirit or perspective of the present invention.

Claims

In the device for generating a learning model for real estate transaction price prediction,
Separating the data set loaded from the real estate transaction database into two based on a specific time point, dividing data before the time point into a training set and data after the point point into a test set;
forming an instance by extracting or adding a feature for performing gradient boosting for each data of the training set;
generating a decision tree-based time series trend prediction learning model by setting a hyperparameter and performing gradient boosting to learn when an instance for each data of the training set is generated;
when the learning model is generated, forming an instance on the test set data by feature extraction or addition in the same manner as in the training set;
generating virtual data in which the actual transaction price is predicted by inputting the instance characteristics of the test set data to the learning model;
Measuring the error between the actual transaction price of the virtual data and the test set data, when the target performance is not met, by varying the hyperparameter of the learning model creation step to further expand the learning tree, and fixing the learning model when the target performance is satisfied ; to create a learning model for real estate real transaction price prediction by executing a learning model generation method for predicting real estate transaction price, including
A learning model generating device for real estate transaction price prediction.

The method of claim 1,
The real estate transaction price database,
collecting, by the actual transaction price data crawler, the actual transaction price data uploaded to the external real estate information real transaction price disclosure system at a specific time every day;
combining data serial number information (id), building type information, and transaction type information with the collected real transaction price data, and finally loading it into a real transaction price database;
The real transaction price data is classified by a combination of building type, city, county, street, and complex name information, and then a model for each real estate object is created based on the data with the lowest id for each building type, city, county, street, and complex name combination, and classified data merging the actual transaction price data model for each real estate product;
The step of adding geocoding (latitude, longitude) information to each real transaction data;
A learning model generating device for real estate transaction price prediction.

The method of claim 1,
The learning model uses gradient boosting corresponding to decision tree-series ensemble learning, and features a linear regression machine learning method using a real transaction price database ( feature) to create additional features and proceed with learning.
A learning model generating device for real estate transaction price prediction.

The method of claim 1,
Forming the instance comprises:
Unique extraction characteristics of each single data in the real transaction price database including building type, exclusive area, floor, construction year, latitude, and longitude;
Actual transaction expressed as a value obtained by cosizing and sinizing the date of the contract year, month, and contract date is added by converting information from single data in the database;
Features added by referencing or processing additional data; Extracting or adding features including at least one of
A learning model generating device for real estate transaction price prediction.

5. The method of claim 4,
Features added by referencing or processing the additional data are,
information on the highest layer among models to which data to form a reference instance at a predetermined point in time belongs;
Average actual transaction price of affiliated metropolitan/basic local governments;
one or more characteristics of the expected actual transaction price generated in a machine learning manner by varying the size (with different spatiotemporal ranges) and characteristics of the training set; containing any of
A learning model generating device for real estate transaction price prediction.

6. The method of claim 5,
One or more characteristics of the expected actual transaction price generated by the machine learning method are:
Within a specific period (1 year, 2 years, 3 years) same-equilibrium actual transaction price for the same building Minimum/Maximum value/Average;
Within a specific period (1 year, 2 years, 3 years) same building same floor real transaction price min / max / average / bottom 5% / top 5%;
Predicted value of dge regression (feature: exclusive area, timestamp, target: actual transaction price), r squared value, number of transactions for the same building transaction data within a specific period (6 months, 1 year, 2 years);
Predicted value of ridge regression (feature: timestamp, target: actual transaction price), r squared value, number of transactions within a specific period (6 months, 1 year, 2 years) for the same building-equilibrium transaction data;
Predicted ridge regression value (feature: exclusive area, timestamp, target: actual transaction price), r squared value, number of transactions within a specific period (3 months, 6 months) for the actual transaction data of buildings within the same administrative dong or administrative district; at least one of
A learning model generating device for real estate transaction price prediction.

The method of claim 1,
The learning model creation step is,
Leveraging the LightGBM framework, setting hyperparameters and learning by gradient boosting
A learning model generating device for real estate transaction price prediction.

The method of claim 1,
The learning model creation step is,
If there is no suitable hyperparameter, a valid set is used, but the data of the last predetermined period of the training set is separated into the valid set and gradated with arbitrary hyperparameters. Age boosting and testing the model created with the belly set to calculate the appropriate hyperparameters
A learning model generating device for real estate transaction price prediction.

9. The method of claim 8,
The learning model creation step is
In order to prevent overfitting, the early stopping technique is applied, which prevents the creation of sub-nodes even before reaching the specified depth for some nodes.
A learning model generating device for real estate transaction price prediction.