KR102566235B1

KR102566235B1 - Method for Used-Car Price Prediction using Machine Learning Ensemble and System Using the Same

Info

Publication number: KR102566235B1
Application number: KR1020220105218A
Authority: KR
Inventors: 이규창; 박영선; 김진웅; 전지원; 현원재; 안정헌; 송재욱; 송정윤; 이재준
Original assignee: 현대글로비스 주식회사
Priority date: 2022-08-23
Filing date: 2022-08-23
Publication date: 2023-08-11

Abstract

본 명세서는 머신러닝 앙상블(Ensemble)을 이용하여 적은 데이터를 통해서도 효율적으로 중고차 가격 예측을 수행하는 가격 예측 모델, 이에 기반한 가격 예측 방법 및 시스템에 대한 것이다.
이에 따른 중고차 가격 예측 방법은, 대상 중고차에 대한 제 1 타입 가격 예측 요청 데이터를 수신하고; 상기 제 1 타입 가격 예측 요청 데이터를 상기 대상 중고차의 차량 구분 키(key)에 대응하는 플래그(flag) 및 상기 플래그에 대응하는 N개의 최상 알고리즘에 맵핑하되, 상기 N개의 최상 알고리즘은, 복수의 데이터베이스로부터 전처리 과정을 통해 추출된 트레이닝(training) 세트 및 밸리데이션(validation) 세트를 통해 제 1 단계 앙상블 머신러닝 모델의 기 학습 과정에서 도출되는 알고리즘이며; 상기 제 1 단계 앙상블 머신러닝 모델을 통해 상기 제 1 타입 가격 예측 요청 데이터에 대응하는 제 1 차 예측 값을 도출하고; 그리고 상기 제 1 단계 앙상블 머신러닝 모델의 출력값을 입력 받도록 결합되는 제 2 단계 앙상블 머신러닝 모델을 통해 상기 제 1 차 예측 값에 대응하는 제 2 차 예측 값을 도출하는 것을 포함하는 것을 특징으로 한다. The present specification relates to a price prediction model that efficiently predicts used car prices using a small amount of data using a machine learning ensemble, and a price prediction method and system based thereon.
The used car price prediction method according to this method includes receiving first type price prediction request data for a target used car; The first type price prediction request data is mapped to a flag corresponding to a vehicle identification key of the target used car and N best algorithms corresponding to the flags, wherein the N best algorithms are selected from a plurality of databases. An algorithm derived from the pre-learning process of the first-stage ensemble machine learning model through a training set and a validation set extracted through a pre-processing process from; deriving a first prediction value corresponding to the first type price prediction request data through the first stage ensemble machine learning model; and deriving a second prediction value corresponding to the first prediction value through a second stage ensemble machine learning model coupled to receive an output value of the first stage ensemble machine learning model.

Description

Method for Used-Car Price Prediction using Machine Learning Ensemble and System Using the Same}

본 문서는 중고차 가격 예측 방법 및 이를 이용한 시스템에 대한 것으로서, 구체적으로 머신러닝 앙상블(Ensemble)을 이용하여 적은 데이터를 통해서도 효율적으로 중고차 가격 예측을 수행하는 가격 예측 모델, 이에 기반한 가격 예측 방법 및 시스템에 대한 것이다.This document is about a used car price prediction method and a system using the same. Specifically, a price prediction model that efficiently predicts a used car price using a machine learning ensemble even with a small amount of data, and a price prediction method and system based thereon. it is about

중고차는 통신 및 네트워크 기술의 발달에 따라 전자 상거래를 통해 거래가 활발히 이루어지고 있다. 최근, 인공지능 관련 기업에 대한 시장의 관심이 폭발적으로 증가됨에 따라 중고차 산업 또한 인공지능 기반 서비스를 적극적으로 도입하는 추세이다. 이런 추세와 함께, 소비자들 사이에서 자동차를 소유하기보다 공유하는 트렌드가 보편화되면서 중고차 거래량이 급증하여 중고차 거래 시장이 빠르게 성장하고 있다. 아울러, 소비자들이 간편하게 자신의 차량의 시세를 번호판 정보만으로 조회할 수 있는 서비스도 제시되고 있다.Used cars are actively traded through electronic commerce according to the development of communication and network technology. Recently, as the market's interest in AI-related companies has exploded, the used car industry is also actively introducing AI-based services. Along with this trend, as the trend of sharing a car rather than owning it among consumers has become common, the used car trading volume has soared and the used car trading market is growing rapidly. In addition, a service through which consumers can easily inquire the market price of their vehicle using license plate information has also been proposed.

그러나, 이러한 급성장에 비해, 중고차 거래 시장의 신뢰도는 중고차 거래를 이용하는 사용자에게 높지 않다. 중고차 시장의 특성상 사고 이력, 부품 교체 여부 등 차량 정보가 투명하게 공개되지 않는다. 이러한 시장의 불투명성으로 인해, 중고차를 구매하고자 하는 소비자와 판매하려는 판매자 사이의 정보의 비대칭이 발생하여 허위 매물로 인해 많은 피해가 발생되고 있다.However, compared to this rapid growth, the reliability of the used car trading market is not high for users who use used car trading. Due to the nature of the used car market, vehicle information such as accident history and whether or not parts have been replaced is not transparently disclosed. Due to the opacity of the market, information asymmetry occurs between consumers who want to buy used cars and sellers who want to sell them, resulting in a lot of damage due to false sales.

또한, 중고차의 가격은 연식, 주행거리, 사고 이력, 차량상태, 옵션, 변속기의 종류, 색상, 사용 용도, 유행, 지역 등 다양한 요인으로부터 산정될 수 있다. 중고차의 구매자는 중고차의 판매자가 제시하는 중고차의 가격의 적정성을 정확히 판단하거나 시세 변동을 예측하기 어렵다. 이에 따라, 중고차 거래 시장에서 실제 거래된　중고차　가격을　바탕으로 시세 변동을 예측하여 소비자들이 가격의 적정성을 판단할 수 있는 방법이 요구된다.In addition, the price of a used car can be calculated based on various factors such as year, mileage, accident history, vehicle condition, options, type of transmission, color, purpose of use, fashion, and region. It is difficult for buyers of used cars to accurately judge the appropriateness of used car prices offered by used car sellers or to predict market price fluctuations. Accordingly, there is a need for a method in which consumers can judge the appropriateness of the price by predicting market price fluctuations based on the used car price actually traded in the used car market.

도 1은 종래 인공지능을 활용한 중고차 시세 예측 방식을 설명하기 위한 도면이다.
구체적으로 도 1은 등록특허공보 제10-2218287호에서 소개된 중고차 시세 예측 방식으로서, 도 1에 도시된 중고차 시세 예측 시스템(130)은 통신 네트워크(120)를 통해 외부 장치와 통신 가능하도록 구성될 수 있다. 중고차 시세 예측 시스템(130)은 외부 장치와 통신 네트워크(120)를 통해 통신하여 거래 완료된 복수의 차량에 대한 데이터를 수신할 수 있다. 여기서, 거래 완료된 복수의 차량에 대한 데이터는 거래 완료된 복수의 차량에 대한 데이터 세트를 포함할 수 있다. 거래 완료된 복수의 차량에 대한 데이터 세트는 신차 가격 데이터 및 중고차에 대한 성능 점검 데이터를 포함할 수 있다. 1 is a diagram for explaining a used car market price prediction method using conventional artificial intelligence.
Specifically, FIG. 1 is a used car price prediction method introduced in Patent Registration No. 10-2218287, and the used car price prediction system 130 shown in FIG. 1 is configured to communicate with an external device through a communication network 120. can The used car price prediction system 130 may communicate with an external device through the communication network 120 to receive data on a plurality of vehicles for which transactions have been completed. Here, the data on the plurality of vehicles for which the transaction has been completed may include a data set for the plurality of vehicles for which the transaction is completed. The data set for the plurality of vehicles that have been transacted may include new car price data and performance check data for used cars.

삭제delete

예를 들어, 중고차 시세 예측 시스템(130)은 통신 네트워크(120)를 통해 신차 가격 DB(110_1) 및 성능 검사 DB(110_2)와 같이 외부 장치로부터 주기적으로 또는 비주기적으로 거래 완료된 복수의 차량에 대한 데이터 세트를 수신하여 저장부에 저장한다.For example, the used car market price prediction system 130 provides information about a plurality of vehicles that are periodically or non-periodically transacted from an external device, such as a new car price DB 110_1 and a performance inspection DB 110_2 through the communication network 120. A data set is received and stored in a storage unit.

도 1에 도시된 방식은 중고차 가격을 산정하는데 신차 가격 DB(110_1) 및 성능 검사 DB(110_2)를 활용하고 있으나, 중고차 가격을 산정하는데 있어서 이러한 DB 정보만으로는 정확한 판단이 어려운 경우가 많다. 또한, 중고차에 대한 거래 DB를 활용하더라도 하나의 DB에 소비자가 요청하는 차량에 정확히 맵핑되는 선 거래 정보를 확인하기 어려울 수 있다.
따라서, 적은 데이터를 가지고도 소비자가 원하는 타입의 중고차 가격을 정확히 예측하기 위한 기술이 요구되고 있다.
이와 관련하여 일본 공개특허공보 특개2021-144374호는 기계학습 앙상블 기반의 중고차 매매 시스템 및 방법을 개시하고 있으며, 구체적으로 가격 산출 시 학습 데이터를 토대로 다수의 결정목을 생성할 수 있는 기계적 학습 알고리즘을 이용하여 가격을 산출하는 구성을 개시하고 있다.
또한, 공개특허공보 제10-2021-0112299호는 자동차의 판매자와 구매자 사이의 자율적인 제3자 검증 소스로서 참여함으로써 차량의 상태, 추정된 재조정 비용 및 추정된 시장 가치를 결정하기 위한 서로 다른 기계 학습 방법들을 구현할 수 있으며, 시스템은 소매업체가 그들의 감정 프로세스 또는 온라인 도매 시장에서 활용되고 있는 레거시 시스템들과 통합될 수 있는 구성을 개시하고 있다.The method shown in FIG. 1 uses a new car price DB 110_1 and a performance test DB 110_2 to calculate the used car price, but it is often difficult to accurately determine the used car price only with these DB information. In addition, even if a used car transaction DB is used, it may be difficult to check pre-transaction information accurately mapped to a vehicle requested by a consumer in one DB.
Therefore, there is a need for a technology for accurately predicting the price of a used car of a type desired by consumers even with a small amount of data.
In this regard, Japanese Unexamined Patent Publication No. 2021-144374 discloses a used car trading system and method based on a machine learning ensemble. A configuration for calculating the price is disclosed.
In addition, Publication No. 10-2021-0112299 discloses different machines for determining the condition, estimated reconditioning cost, and estimated market value of a vehicle by participating as an autonomous third-party verification source between the seller and the buyer of the vehicle. Learning methods can be implemented, and the system discloses a configuration that allows retailers to integrate with legacy systems being utilized in their appraisal process or online wholesale markets.

삭제delete

등록특허공보 제10-2218287호 (2021.02.22)Registered Patent Publication No. 10-2218287 (2021.02.22) 일본 공개특허공보 특개2021-144374호 (2021.09.24)Japanese Unexamined Patent Publication No. 2021-144374 (2021.09.24) 공개특허공보 제10-2021-0112299호 (2021.09.14)Publication No. 10-2021-0112299 (2021.09.14)

상술한 바와 같은 문제를 해결하기 위해 본 발명의 일 측면에서는 머신러닝 앙상블(Ensemble)을 이용하여 적은 데이터를 통해서도 효율적으로 중고차 가격 예측을 수행하는 가격 예측 모델, 이에 기반한 가격 예측 방법 및 시스템을 제안하고자 한다.In order to solve the above problems, one aspect of the present invention proposes a price prediction model that efficiently predicts used car prices using a small amount of data using a machine learning ensemble, and a price prediction method and system based thereon. do.

또한, 본 발명의 다른 일 측면에서는 소비자의 유형, 소비자의 요청 타입에 따라 도매 시세 예측 알고리즘, 소매 시세 예측 알고리즘, 미래 시세 예측 알고리즘을 구분하여 적용하는 것을 제안한다.In addition, another aspect of the present invention proposes to separately apply a wholesale market price prediction algorithm, a retail market price prediction algorithm, and a future market price prediction algorithm according to the type of consumer and the type of request of the consumer.

구체적으로, 도매시세예측 타입, 소매시세예측 타입, 미래시세예측 타입 중 특정 타입(예를 들어, 도매시세예측 타입)의 시세 예측 요청에 대해서는 머신러닝 앙상블의 여러 유형 중 제 1 단계 앙상블 머신러닝 모델의 출력 값을 제 2 단계 앙상블 머신러닝 모델에 입력받아 예측 값을 추정하도록 연결되는 모델을 사용하여, 예측 성능을 향상시키는 것을 제안한다.Specifically, for a market price prediction request of a specific type (eg, wholesale market price prediction type) among wholesale market price prediction type, retail market price prediction type, and future market price prediction type, the first stage ensemble machine learning model among several types of machine learning ensembles It is proposed to improve the prediction performance by receiving the output value of the second-stage ensemble machine learning model and using the model connected to estimate the prediction value.

또한, 이와 같은 단계적 연결 방식 앙상블 모델 구조에서, 적은 데이터로도 가격 추정 성능을 향상시키기 위한 학습 방법을 제시하고자 한다.In addition, in this step-by-step connection method ensemble model structure, a learning method for improving price estimation performance with less data is proposed.

본 발명에서 해결하고자 하는 과제들은 이상에서 언급한 기술적 과제들로 제한되지 않으며, 언급하지 않은 또 다른 기술적 과제들은 아래의 기재로부터 본 발명이 속하는 기술분야에서 통상의 지식을 가진 자에게 명확하게 이해될 수 있을 것이다. The problems to be solved in the present invention are not limited to the technical problems mentioned above, and other technical problems not mentioned will be clearly understood by those skilled in the art from the description below. You will be able to.

상술한 바와 같은 과제를 해결하기 위한 본 발명의 일 측면에서는, 머신러닝 앙상블(Ensemble)을 이용한 중고차 가격 예측 방법에 있어서, 대상 중고차에 대한 제 1 타입 가격 예측 요청 데이터를 수신하고; 상기 제 1 타입 가격 예측 요청 데이터를 상기 대상 중고차의 차량 구분 키(key)에 대응하는 플래그(flag) 및 상기 플래그에 대응하는 N개의 최상 알고리즘에 맵핑하되, 상기 N개의 최상 알고리즘은, 복수의 데이터베이스로부터 전처리 과정을 통해 추출된 트레이닝(training) 세트 및 밸리데이션(validation) 세트를 통해 제 1 단계 앙상블 머신러닝 모델의 기 학습 과정에서 도출되는 알고리즘이며; 상기 제 1 단계 앙상블 머신러닝 모델을 통해 상기 제 1 타입 가격 예측 요청 데이터에 대응하는 제 1 차 예측 값을 도출하고; 그리고 상기 제 1 단계 앙상블 머신러닝 모델의 출력값을 입력 받도록 결합되는 제 2 단계 앙상블 머신러닝 모델을 통해 상기 제 1 차 예측 값에 대응하는 제 2 차 예측 값을 도출하는 것을 포함하는, 중고차 가격 예측 방법을 제안한다.In one aspect of the present invention for solving the above problems, in a used car price prediction method using a machine learning ensemble, receiving first type price prediction request data for a target used car; The first type price prediction request data is mapped to a flag corresponding to a vehicle identification key of the target used car and N best algorithms corresponding to the flags, wherein the N best algorithms are selected from a plurality of databases. An algorithm derived from the pre-learning process of the first-stage ensemble machine learning model through a training set and a validation set extracted through a pre-processing process from; deriving a first prediction value corresponding to the first type price prediction request data through the first stage ensemble machine learning model; and deriving a second prediction value corresponding to the first prediction value through a second stage ensemble machine learning model combined to receive an output value of the first stage ensemble machine learning model as an input, a used car price prediction method suggests

상기 N개의 최상 알고리즘은 상기 제 1 단계 앙상블 머신러닝 모델의 기 학습 과정에서 도출되어, 상기 제 2 단계 앙상블 머신러닝 모델의 학습에 활용되는 트레이닝 세트를 재구성하는데 활용될 수 있다.The N best algorithms may be derived from the pre-learning process of the first-stage ensemble machine learning model and used to reconstruct a training set used for learning the second-stage ensemble machine learning model.

상기 복수의 데이터베이스는, 신차 출고가에 대한 제 1 데이터베이스, 신차 등급 가격에 대한 제 2 데이터베이스, 및 신규 생성 신차 가격에 대한 제 3 데이터베이스 중 둘 이상을 포함할 수 있으며, 상기 복수의 데이터베이스 중 상기 제 1, 상기 제 2, 그리고 상기 제 3 데이터베이스의 우선순위로 선택된 데이터를 기준으로 상기 트레이닝 세트 및 밸리데이션 세트 중 하나 이상을 구성하도록 전처리 될 수 있다.The plurality of databases may include two or more of a first database for new car factory prices, a second database for new car class prices, and a third database for newly generated new car prices, wherein the first database among the plurality of databases , The second, and the third database may be pre-processed to construct one or more of the training set and validation set based on the data selected in priority order.

상기 트레이닝 세트 및 밸리데이션 세트는, 세부 등급이 맵핑 가능한 제 1 플래그, 대표 등급이 맵핑 가능한 제 2 플래그, 세부 모델이 맵핑 가능한 제 3 플래그, 대표 모델이 맵핑 가능한 제 4 플래그, 제조사 또는 차종이 맵핑 가능한 제 5 플래그, 그리고 학습 데이터 전체에 대한 제 6 플래그를 포함하는 복수의 플래그에 의해 구분되어 구성될 수 있다.The training set and the validation set include a first flag to which a detailed grade can be mapped, a second flag to which a representative grade can be mapped, a third flag to which a detailed model can be mapped, a fourth flag to which a representative model can be mapped, and a manufacturer or vehicle type to which a model can be mapped. It can be configured by being distinguished by a plurality of flags including the fifth flag and the sixth flag for the entire learning data.

상기 N은 상기 플래그의 신뢰도에 따라 개수가 달리 설정될 수 있다.The number of N may be set differently according to the reliability of the flag.

상기 제 1 단계 앙상블 머신러닝 모델 및 상기 제 2 단계 앙상블 머신러닝 모델은 각각 복수의 머신러닝 모델 또는 딥러닝 모델을 포함하는 앙상블 모델일 수 있다.Each of the first-stage ensemble machine learning model and the second-stage ensemble machine learning model may be an ensemble model including a plurality of machine learning models or deep learning models.

상기 제 1 단계 앙상블 머신러닝 모델은, GLM (General Linear Model), 제 1 XGBoost 모델 및 제 1 DNN (Deep Neural Network) 모델을 포함할 수 있으며, 상기 제 2 단계 앙상블 머신러닝 모델은, 제 2 XGBoost 모델 및 제 2 DNN 모델을 포함할 수 있다.The first-stage ensemble machine learning model may include a general linear model (GLM), a first XGBoost model, and a first deep neural network (DNN) model, and the second-stage ensemble machine learning model may include a second XGBoost model and a second DNN model.

상기 제 1 단계 앙상블 머신러닝 모델 중 딥러닝 모델은 상기 트레이닝 세트 및 밸리데이션 세트의 플래그 단위로 학습을 수행하며, 상기 제 1 단계 앙상블 머신러닝 모델 중 머신러닝 모델은 상기 트레이닝 세트 및 밸리데이션 세트의 플래그 및 세부 항목 단위로 학습을 수행할 수 있다.A deep learning model among the first-stage ensemble machine learning models performs learning in units of flags of the training set and validation set, and the machine learning model among the first-stage ensemble machine learning models includes flags and flags of the training set and validation set. Learning can be performed in units of detailed items.

상기 제 2 단계 앙상블 머신러닝 모델의 학습에 활용되는 트레이닝 세트는, 상기 제 1 단계 앙상블 머신러닝 모델의 트레이닝 세트에 상기 N개의 최상 알고리즘에 따른 평균값 및 추가 정보를 바인딩하는 방식으로 재구성될 수 있다.A training set used for learning the second-stage ensemble machine learning model may be reconstructed by binding an average value according to the N best algorithms and additional information to the training set of the first-stage ensemble machine learning model.

상기 제 1 타입 가격 예측 요청 데이터는, 도매 가격 예측 요청 데이터일 수 있다.The first type of price prediction request data may be wholesale price prediction request data.

한편, 소매 가격 예측 요청 데이터에 대응하는 제 2 타입 가격 예측 요청 데이터를 수신하는 경우, 유사 차종 기준 데이터를 활용하여 기 학습된 머신러닝 모델을 통해 예측 값을 도출할 수 있다.Meanwhile, when the second type price prediction request data corresponding to the retail price prediction request data is received, a predicted value may be derived through a pre-learned machine learning model using similar vehicle model reference data.

또한, 미래 시세 예측 요청 데이터에 대응하는 제 3 타입 가격 예측 요청 데이터를 수신하는 경우, 상기 제 3 타입 가격 예측 요청 데이터에 대응하여 주행거리 또는 경과 일수에 따른 잔가율을 도출하여 제공할 수 있다.In addition, when the third type price prediction request data corresponding to the future market price prediction request data is received, a residual rate according to the mileage or elapsed days may be derived and provided corresponding to the third type price prediction request data.

상술한 바와 같은 과제를 해결하기 위한 본 발명의 다른 일 측면에서는, 머신러닝 앙상블(Ensemble)을 이용한 중고차 가격 예측 시스템에 있어서, 명령 정보를 저장하는 저장 매체; 및 상기 저장 매체와 통신 가능하게 연결되며, 상기 저장 매체에 저장된 명령 정보를 실행 가능하도록 구성되는 프로세서를 포함하되, 상기 프로세서에 의해 실행 시 상기 명령 정보는, 대상 중고차에 대한 제 1 타입 가격 예측 요청 데이터를 수신하고; 상기 제 1 타입 가격 예측 요청 데이터를 상기 대상 중고차의 차량 구분 키(key)에 대응하는 플래그(flag) 및 상기 플래그에 대응하는 N개의 최상 알고리즘에 맵핑하되, 상기 N개의 최상 알고리즘은, 복수의 데이터베이스로부터 전처리 과정을 통해 추출된 트레이닝(training) 세트 및 밸리데이션(validation) 세트를 통해 제 1 단계 앙상블 머신러닝 모델의 기 학습 과정에서 도출되는 알고리즘이며; 상기 제 1 단계 앙상블 머신러닝 모델을 통해 상기 제 1 타입 가격 예측 요청 데이터에 대응하는 제 1 차 예측 값을 도출하고; 그리고 상기 제 1 단계 앙상블 머신러닝 모델의 출력값을 입력 받도록 결합되는 제 2 단계 앙상블 머신러닝 모델을 통해 상기 제 1 차 예측 값에 대응하는 제 2 차 예측 값을 도출하는, 중고차 가격 예측 시스템을 제안한다.In another aspect of the present invention for solving the above problems, in the used car price prediction system using a machine learning ensemble (Ensemble), a storage medium for storing command information; and a processor communicatively connected to the storage medium and configured to execute command information stored in the storage medium, wherein, when executed by the processor, the command information generates a first-type price prediction request for a target used car. receive data; The first type price prediction request data is mapped to a flag corresponding to a vehicle identification key of the target used car and N best algorithms corresponding to the flags, wherein the N best algorithms are selected from a plurality of databases. An algorithm derived from the pre-learning process of the first-stage ensemble machine learning model through a training set and a validation set extracted through a pre-processing process from; deriving a first prediction value corresponding to the first type price prediction request data through the first stage ensemble machine learning model; In addition, we propose a used car price prediction system that derives a second prediction value corresponding to the first prediction value through a second stage ensemble machine learning model combined to receive an output value of the first stage ensemble machine learning model. .

상술한 바와 같은 본 발명의 실시예들에 따르면, 머신러닝 앙상블을 이용하여 적은 데이터를 통해서도 효율적으로 중고차 가격 예측을 수행할 수 있다.According to the embodiments of the present invention as described above, using a machine learning ensemble, it is possible to efficiently predict a used car price using a small amount of data.

또한, 소비자의 유형, 소비자의 요청 타입에 따라 도매 시세 예측, 소매 시세 예측, 미래 시세 예측을 구분하여 소비자의 상황에 맞는 중고차 가격 예측 정보를 제공할 수 있다.In addition, it is possible to provide used car price prediction information suitable for the consumer's situation by classifying wholesale market price prediction, retail market price prediction, and future market price prediction according to the type of consumer and the type of request of the consumer.

아울러, 단계적 연결 방식으로 연결되는 앙상블 모델을 사용하여, 적은 데이터 하에서도 중고차 가격 예측 성능을 효율적으로 향상시킬 수 있다.In addition, using an ensemble model connected in a stepwise connection method, the used car price prediction performance can be efficiently improved even under a small amount of data.

본 발명에서 얻을 수 있는 효과는 이상에서 언급한 효과들로 제한되지 않으며, 언급하지 않은 또 다른 효과들은 아래의 기재로부터 본 발명이 속하는 기술분야에서 통상의 지식을 가진 자에게 명확하게 이해될 수 있을 것이다.The effects obtainable in the present invention are not limited to the effects mentioned above, and other effects not mentioned can be clearly understood by those skilled in the art from the description below. will be.

도 1은 종래 인공지능을 활용한 중고차 시세 예측 방식을 설명하기 위한 도면이다.
도 2는 본 발명의 일 실시예에서 활용되는 머신러닝 앙상블의 개념에 대해 설명하기 위한 도면이다.
도 3은 도 2의 머신러닝 앙상블을 이용하여 중고차의 잔가율을 예측하는 예를 들어 설명하기 위한 도면이다.
도 4 및 도 5는 여러 머신러닝 앙상블 유형 중 본 발명의 바람직한 일 실시예에 따른 단계적 연결 모델 구조를 설명하기 위한 도면이다.
도 6은 본 발명의 일 실시예에 따라 사용자의 가격 예측 요청 데이터의 타입을 구분하는 개념을 설명하기 위한 도면이다.
도 7은 본 발명의 일 실시예에 따라 중고차 가격 예측을 수행하는 방법을 설명하기 위한 도면이다.
도 8은 도 7의 방식을 구체적인 예를 들어 설명하기 위한 도면이다.
도 9 내지 도 11은 도매 예측 알고리즘의 학습 과정을 상세히 설명하기 위한 도면이다.
도 12는 본 발명의 일 실시예에 따라 복수의 DB 데이터를 우선순위 기반으로 전처리하는 방식을 설명하기 위한 도면이다.
도 13은 본 발명의 일 실시예에 따라 전처리 참조 코드를 생성하는 방법을 설명하기 위한 도면이다.
도 14는 본 발명의 일 실시예에 따라 소매 가격 예측 방식을 구체적으로 설명하기 위한 도면이다.
도 15는 본 발명의 일 실시예에 따라 미래 시세 예측 알고리즘의 동작을 설명하기 위한 도면이다.1 is a diagram for explaining a used car market price prediction method using conventional artificial intelligence.
2 is a diagram for explaining the concept of a machine learning ensemble used in an embodiment of the present invention.
FIG. 3 is a diagram for explaining an example of predicting the residual value of a used car using the machine learning ensemble of FIG. 2 .
4 and 5 are diagrams for explaining the structure of a step-by-step connection model among several machine learning ensemble types according to a preferred embodiment of the present invention.
6 is a diagram for explaining a concept of classifying a type of price prediction request data of a user according to an embodiment of the present invention.
7 is a diagram for explaining a method of predicting a used car price according to an embodiment of the present invention.
FIG. 8 is a diagram for explaining the method of FIG. 7 as a specific example.
9 to 11 are diagrams for explaining in detail the learning process of the wholesale prediction algorithm.
12 is a diagram for explaining a method of pre-processing a plurality of DB data based on priority according to an embodiment of the present invention.
13 is a diagram for explaining a method of generating a preprocessing reference code according to an embodiment of the present invention.
14 is a diagram for explaining in detail a retail price prediction method according to an embodiment of the present invention.
15 is a diagram for explaining the operation of a future market price prediction algorithm according to an embodiment of the present invention.

이하에서는 첨부한 도면을 참고로 하여 본 발명의 실시예에 대하여 본 발명이 속하는 기술 분야에서 통상의 지식을 가진 자가 용이하게 실시할 수 있도록 상세히 설명한다. 그러나 본 발명은 여러 가지 상이한 형태로 구현될 수 있으며 여기에서 설명하는 실시예에 한정되지 않는다. 그리고 도면에서 본 발명을 명확하게 설명하기 위해서 설명과 관계없는 부분은 생략하였으며, 명세서 전체를 통하여 유사한 부분에 대해서는 유사한 도면 부호를 붙였다.Hereinafter, with reference to the accompanying drawings, embodiments of the present invention will be described in detail so that those skilled in the art can easily carry out the present invention. However, the present invention may be embodied in many different forms and is not limited to the embodiments described herein. And in order to clearly explain the present invention in the drawings, parts irrelevant to the description are omitted, and similar reference numerals are attached to similar parts throughout the specification.

명세서 전체에서, 어떤 부분이 어떤 구성요소를 "포함"한다고 할 때, 이는 특별히 반대되는 기재가 없는 한 다른 구성요소를 제외하는 것이 아니라 다른 구성요소를 더 포함할 수 있는 것을 의미한다. Throughout the specification, when a certain component is said to "include", it means that it may further include other components without excluding other components unless otherwise stated.

도 2는 본 발명의 일 실시예에서 활용되는 머신러닝 앙상블의 개념에 대해 설명하기 위한 도면이다.2 is a diagram for explaining the concept of a machine learning ensemble used in an embodiment of the present invention.

도 2의 좌측에는 일반적인 머신러닝 모델(200)의 구조를 도시하고 있다. 머신러닝 모델(200)은 입력 데이터가 입력되는 입력 계층(input layer)와 출력 데이터가 출력되는 출력 계층(output layer) 사이에 다양한 숨겨진 계층(hidden layer)를 포함하여 구성될 수 있다.The left side of FIG. 2 shows the structure of a general machine learning model 200 . The machine learning model 200 may include various hidden layers between an input layer where input data is input and an output layer where output data is output.

본 발명의 실시예들에서 사용되는 머신러닝 앙상블은 도 2의 우측에 도시된 바와 같이 복수의 머신러닝 모델들(200a, 200b, 200n)이 각각 예측치를 출력하며, 결합기(210)가 이들 예측치들을 결합하여 보다 정확한 예측치를 제공하는 구조를 가진다.In the machine learning ensemble used in the embodiments of the present invention, as shown on the right side of FIG. They have structures that combine to provide more accurate predictions.

일반적으로 하나의 선형모델이나 머신러닝 모델을 사용하는 경우는 적은 데이터를 활용하여 정확한 예측을 수행하기 어려운 바, 상술한 바와 같은 머신러닝 앙상블 모델을 활용하는 것을 제안한다. In general, when a single linear model or machine learning model is used, it is difficult to perform accurate prediction using a small amount of data. Therefore, it is proposed to use the above-described machine learning ensemble model.

도 3은 도 2의 머신러닝 앙상블을 이용하여 중고차의 잔가율을 예측하는 예를 들어 설명하기 위한 도면이다.FIG. 3 is a diagram for explaining an example of predicting the residual value of a used car using the machine learning ensemble of FIG. 2 .

도 3은 예를 들어 10개의 머신러닝 모델(200)을 사용하여 잔가율 예측치 1 내지 10(310-1 내지 310-10)을 예측하는 예를 도시하고 있다. 이들 예측치들((310-1 내지 310-10)은 도2의 결합기(210)에 의해 결합되어 최종 잔가율 예측치(320)를 제공할 수 있다. 또한, 도 3에 도시된 바와 같이 잔가율 예측 범위(330)를 추가적으로 제공하여 활용할 수도 있다. 이와 같이 결합된 정보에 기반하여 차량 가격(340)을 환산하여 제공할 수 있다.FIG. 3 illustrates an example of predicting remaining interest rate predicted values 1 to 10 (310-1 to 310-10) using 10 machine learning models 200, for example. These predicted values (310-1 to 310-10) may be combined by the combiner 210 of FIG. 2 to provide the final residual rate predicted value 320. In addition, as shown in FIG. 3, the residual rate predicted range ( 330) may be additionally provided and used based on the combined information, the vehicle price 340 may be converted and provided.

도 4 및 도 5는 여러 머신러닝 앙상블 유형 중 본 발명의 바람직한 일 실시예에 따른 단계적 연결 모델 구조를 설명하기 위한 도면이다.4 and 5 are diagrams for explaining the structure of a step-by-step connection model among several machine learning ensemble types according to a preferred embodiment of the present invention.

앙상블 학습(Ensemble Learning)은 여러 개의 분류기를 생성하고, 그 예측을 결합함으로써 보다 정확한 예측을 도출하는 기법을 말한다. 강력한 하나의 모델을 사용하는 대신 보다 약한 모델 여러 개를 조합하여 더 정확한 예측에 도움을 주는 방식이다. Ensemble learning refers to a technique of generating more accurate predictions by generating multiple classifiers and combining the predictions. Instead of using one strong model, several weaker models are combined to help make more accurate predictions.

도 4의 (A)는 앙상블 학습의 유형 중 보팅(Voting) 방식의 구조를 도시하고 있으며, 도 4의 (B)는 앙상블 학습의 유형 중 배깅(Bagging) 방식의 구조를 도시하고 있다.Figure 4(A) shows the structure of a voting method among types of ensemble learning, and Figure 4(B) shows the structure of a bagging method among types of ensemble learning.

도 4의 (A)에 도시된 보팅 방식의 경우, 여러 개의 분류기가 투표(Voting)를 통해 최종 예측 결과를 결정하는 방식을 의미한다. 이 경우, 서로 다른 알고리즘을 여러 개 결합하여 사용할 수 있으며, 도 4의 (A)에서는 선형회귀(Linear Regression) 알고리즘과 SVM (Support Vector Machine)을 결합하여 사용하는 예를 도시하고 있다.In the case of the voting method shown in (A) of FIG. 4, it means a method in which several classifiers determine the final prediction result through voting. In this case, several different algorithms can be combined and used, and FIG. 4 (A) shows an example of combining a linear regression algorithm and a support vector machine (SVM).

보팅 방식으로는, 다수의 분류기가 예측한 결과값을 최종 결과로 선정하는 하드 보팅(hard voting) 방식과, 모든 분류기가 예측한 레이블 값의 결정 확률 평균을 구한 뒤 가장 확률이 높은 레이블 값을 최종 결과로 선정하는 소프트 보팅(Soft Voting) 방식이 이용 가능하다.The voting method includes a hard voting method in which a result value predicted by a plurality of classifiers is selected as the final result, and a label value with the highest probability is selected as the final result after calculating the average of the probability of decision of the label value predicted by all classifiers. A soft voting method in which selection is made based on the result is available.

도 4의 (B)에 도시된 배깅 방식의 경우, 데이터 샘플링(Bootstrap)을 통해 모델을 학습시키고 결과를 집계(Aggregating) 하는 방법을 이용한다. 이 경우, 도 4의 (B)에 도시된 바와 같이 모두 같은 유형의 알고리즘 기반의 분류기를 사용하며, 데이터 분할 시 중복을 허용할 수 있다.In the case of the bagging method shown in (B) of FIG. 4, a method of learning a model through data sampling (bootstrap) and aggregating the result is used. In this case, as shown in (B) of FIG. 4, all of the algorithm-based classifiers of the same type are used, and redundancy can be allowed when dividing data.

집계 방식으로는 데이터의 유형에 따라 다른 방식이 이용될 수 있으며, 예를 들어, 카테고리를 가지는 데이터의 경우 다수결 투표 방식으로 결과 집계하고, 연속적인 데이터의 경우 평균값으로 집계될 수 있다,As the aggregation method, different methods may be used depending on the type of data. For example, in the case of data having categories, results may be aggregated using a majority vote method, and in the case of continuous data, the average value may be aggregated.

이와 같은 배깅 방식은, 과적합(Overfitting) 방지에 효과적일 수 있다.Such a bagging method may be effective in preventing overfitting.

한편, 도 5는 본 발명의 바람직한 일 실시예에 따라 단계적 연결 구조를 도시하고 있다. 단계적 연결 구조는, 여러 개의 분류기가 순차적으로 학습을 수행하는 것을 특징으로 하며, 도 5의 예에서는 1차 모델링(410)으로 ElasticNet 방식의 GLM (General Linear Model; 410a), XGBoost 모델(410b), DNN (Deep Neural Network; 410c)를 결합하여 사용하고, 2차 모델링(420)으로 XGBoost 모델(420a) 및 DNN (420b)을 결합하여 사용하는 예를 도시하고 있으나, 1차 모델(410)/2차 모델(420)의 구체적인 구성은 이에 한정될 필요는 없다.Meanwhile, FIG. 5 shows a step-by-step connection structure according to a preferred embodiment of the present invention. The step-by-step connection structure is characterized in that several classifiers sequentially perform learning, and in the example of FIG. An example of combining and using DNN (Deep Neural Network; 410c) and using XGBoost model (420a) and DNN (420b) as secondary modeling (420) is shown, but the first model (410)/2 The specific configuration of the car model 420 need not be limited thereto.

이와 같은 단계적 연결 구조는, 이전 분류기(410)가 예측이 틀린 데이터에 대해서 올바르게 예측할 수 있도록 다음 분류기(420)에게 가중치(weight)를 부여하면서 학습과 예측을 진행하는 것이 특징이다. Such a step-by-step connection structure is characterized in that learning and prediction are performed while assigning a weight to the next classifier 420 so that the previous classifier 410 can correctly predict data for which the prediction is wrong.

일례로서, 부스팅(boosting) 방식의 경우, 계속하여 분류기에게 가중치를 부스팅하며 학습을 진행하는 것으로 이해할 수 있다. 이와 같은 부스팅 방식은 배깅에 비해 성능이 좋지만, 속도가 느리고 과적합이 발생할 가능성이 존재하므로 상황에 따라 적절하게 사용하는 것이 바람직하다.As an example, in the case of a boosting method, it can be understood that learning continues while boosting weights to a classifier. Although this boosting method has better performance than bagging, it is slow and there is a possibility of overfitting, so it is preferable to use it appropriately depending on the situation.

이하에서 더 상세히 설명할 바와 같이, 본 발명의 일 측면에서는 단순히 중고차의 가격을 획일적으로 산출하는 것이 아니라, 사용자의 타입 등에 따라 도매 시세, 소매 시세, 미래 시세 예측 정보를 제공하는 것을 특징으로 한다. 이러한 전제 하에서, 본 발명의 일 실시예에서는 중고차 시세 예측 요청의 타입이 특정 타입 (예를 들어, 도매 시세 예측 타입)인 경우, 보다 정확한 예측치를 제공하기 위해 상술한 바와 같은 단계적 연결 모델을 사용하는 것을 제안한다.As will be described in more detail below, one aspect of the present invention is characterized by providing wholesale market price, retail market price, and future market price prediction information according to the type of user, rather than simply calculating the price of a used car uniformly. Under this premise, in one embodiment of the present invention, when the type of used car price prediction request is a specific type (eg, wholesale market price prediction type), the step-by-step linking model as described above is used to provide a more accurate prediction. suggest something

도 6은 본 발명의 일 실시예에 따라 사용자의 가격 예측 요청 데이터의 타입을 구분하는 개념을 설명하기 위한 도면이다.6 is a diagram for explaining a concept of classifying a type of price prediction request data of a user according to an embodiment of the present invention.

도 6에 도시된 바와 같이 사용자 또는 고객은 가격 예측 요청 데이터(510)를 본 실시예에 따른 중고차 시세 예측 서비스 제공 시스템에 입력할 수 있다. 사용자/고객이 입력하는 가격 예측 요청 데이터(510)는 사용자/고객이 입력하는 정보를 특정 차량의 차량 번호(즉, 번호판 정보)와 같이 간단한 정보로 제한하여 운영될 수 있다. 사용자/고객이 간단한 정보만 입력하면 이에 기반하여 자동적으로 대응하는 해당 차량에 대한 정보, 고객 유형에 기반하여 가격 예측 요청 데이터(510)를 구성하고, 요청 타입을 판단(520)하여, 도매 가격 예측 타입(530-1; 이하 '제 1 타입'이라 함), 소매 가격 예측 타입(530-2; 이하 '제 2 타입'이라 함), 미래 시세 예측 타입(530-3; 이하 '제 3 타입'이라 함)으로 구분할 수 있다. 이하의 설명에서 제 1 타입, 제 2 타입, 제 3 타입은 설명의 편의를 위해 이와 같은 기준에서 언급하나, 문맥에 따라 위와 같은 다양한 가격 요청 타입 중 어느 하나의 타입을 지칭하기 위해 제 1 타입으로 지칭될 수도 있다.As shown in FIG. 6 , the user or customer may input price prediction request data 510 into the used car price prediction service providing system according to the present embodiment. The price prediction request data 510 input by the user/customer may be operated by limiting the information input by the user/customer to simple information such as a license plate number of a specific vehicle. When a user/customer inputs only simple information, price prediction request data 510 is configured based on information about the corresponding vehicle and customer type, and the request type is determined 520 to predict the wholesale price. Type (530-1; hereinafter referred to as 'first type'), retail price prediction type (530-2; hereinafter referred to as 'second type'), future market price prediction type (530-3; hereinafter referred to as 'third type') ) can be distinguished. In the following description, the first type, the second type, and the third type are referred to in this standard for convenience of description, but according to the context, the first type is referred to as the first type to refer to any one of the various price request types. may be referred to.

상기 타입 판단(520)과 관련한, 본 발명의 일 실시예에서는, 회원가입 등의 단계에서 사전에 파악된 사용자/고객의 유형에 기반하여 제 1 타입 내지 제 3 타입 중 어느 하나로 구분하도록 설정될 수 있다. 예를 들어, 사용자/고객의 유형은 크게 법인 고객과 개인 고객으로 구분될 수 있으며, 법인 고객은 도매 가격 예측 유형과 미래 시세 가격 예측 유형으로 구분될 수 있다. 중고차 거래를 전문적으로 수행하는 법인의 경우 도매 가격 예측 유형으로 분류될 수 있다. 또한, 공유 자동차 서비스를 제공하는 법인 고객의 경우 자신이 보유한 차량들의 미래 시세 가격을 예측하여 이에 따른 회계 처리가 필요할 수 있다. Regarding the type determination 520, in an embodiment of the present invention, it may be set to classify into one of the first to third types based on the type of user/customer identified in advance in the step of membership registration, etc. there is. For example, the types of users/customers can be largely divided into corporate customers and individual customers, and corporate customers can be divided into wholesale price prediction types and future market price prediction types. Corporations that specialize in used car transactions can be classified as wholesale price prediction types. In addition, in the case of a corporate customer providing a shared car service, it may be necessary to predict the future market price of vehicles owned by the customer and to process the accounting accordingly.

상기 타입 판단(520)과 관련한 본 발명의 다른 일 실시예에서는, 회원가입 등의 단계에서 사전에 파악된 사용자/고객의 유형에 추가적으로 사용자/고객이 가격 요청 정보를 입력하는 단계에서, 제 1 타입 내지 제 3 타입 중 어느 하나의 타입을 선택하도록 하는 UI를 제공할 수 있다. 예를 들어, 공유 자동차 서비스를 제공하는 법인 고객의 경우에도 상술한 바와 같이 미래 시세 가격 예측 뿐만 아니라 자사 차량 구매/판매 등의 목적으로 도매 가격 예측이 필요한 경우가 있을 수 있으며, 이 경우 가격 예측 요청 데이터(510)의 일환으로서 어떠한 타입의 가격 예측이 필요한지 추가적으로 입력하도록 UI를 제공할 수 있다.In another embodiment of the present invention related to the type determination 520, in the step of inputting price request information by the user/customer in addition to the type of user/customer identified in advance in the step of membership registration, etc., the first type It is possible to provide a UI for selecting any one of the third to third types. For example, even in the case of corporate customers providing shared car services, there may be cases where wholesale price prediction is required for the purpose of buying/selling their own vehicle as well as future market price prediction as described above. In this case, price prediction request As part of the data 510 , a UI may be provided to additionally input what type of price prediction is required.

도 7은 본 발명의 일 실시예에 따라 중고차 가격 예측을 수행하는 방법을 설명하기 위한 도면이다.7 is a diagram for explaining a method of predicting a used car price according to an embodiment of the present invention.

도 6과 관련하여 상술한 바와 같이 사용자/고객의 입력 정보를 기반으로 가격 예측 타입이 판단될 수 있으며, 도 7은 제 1 타입 가격 예측 요청 데이터(530-1)가 수신되는 경우를 도시하고 있다. 여기서 제 1 타입 가격 예측 요청 데이터(530-1)는 도매 시세 예측 요청 데이터를 가정하나, 이에 한정될 필요는 없으며, 여러 유형 중 어느 특정 타입의 가격 예측 요청 데이터로 볼 수도 있다.As described above with respect to FIG. 6, the price prediction type may be determined based on user/customer input information, and FIG. 7 illustrates a case where the first type price prediction request data 530-1 is received. . Here, the first type price prediction request data 530-1 is assumed to be wholesale market price prediction request data, but is not necessarily limited thereto, and may be regarded as a specific type of price prediction request data among several types.

수신된 제 1 타입 가격 예측 요청 데이터(530-1)에 대해 머신러닝 앙상블 모델에 입력하기 전, 전처리(S610) 과정을 수행하는 것을 제안한다. 이때 전처리 과정은 이하에서 더 상세히 설명하겠으나, 이상 데이터의 제거, 입력 데이터의 보완 등의 처리를 포함한다.It is proposed to perform pre-processing (S610) on the received first-type price prediction request data 530-1 before inputting them to the machine learning ensemble model. At this time, the pre-processing process will be described in more detail below, but includes processing such as removal of abnormal data and supplementation of input data.

도 7의 실시예에서는 도 5와 관련하여 상술한 바와 같이 머신러닝 앙상블 모델을 사용하는 것을 제안하며, 특히 제 1 머신러닝 앙상블 모델(410)과 제 2 머신러닝 앙상블 모델(420)이 단계적 방식으로 연결된 모델을 사용하는 것을 제안한다. In the embodiment of FIG. 7, it is proposed to use the machine learning ensemble model as described above with respect to FIG. We suggest using a linked model.

이러한 가정 하에, 도 7의 실시예에서는 상기 제 1 타입 가격 예측 요청 데이터(530-1)를 대상 중고차의 차량 구분 키(key)에 대응하는 플래그(flag) 및 상기 플래그에 대응하는 N개의 최상 알고리즘에 맵핑하여 추정을 수행하는 것을 제안한다(S620). 여기서 N개의 최상 알고리즘은, 복수의 데이터베이스로부터 전처리 과정을 통해 추출된 트레이닝(training) 세트 및 밸리데이션(validation) 세트를 통해 제 1 단계 앙상블 머신러닝 모델(410)의 기 학습 과정에서 도출되는 것을 특징으로 한다. 학습 과정과 관련하여서는 이하 도 9 내지 도 11과 관련하여 보다 상세히 설명하겠으나, 상술한 학습 과정에서 N개의 최상 알고리즘은 제 2 단계 앙상블 머신러닝 모델(420)의 학습에 사용되는 트레이닝 세트를 재구성하는데 활용될 수 있다.Under this assumption, in the embodiment of FIG. 7 , the first type price prediction request data 530-1 is set to a flag corresponding to a vehicle classification key of a target used car and N best algorithms corresponding to the flags. It is proposed to perform estimation by mapping to (S620). Here, the N best algorithms are derived from the pre-learning process of the first-stage ensemble machine learning model 410 through a training set and a validation set extracted from a plurality of databases through a pre-processing process. do. Although the learning process will be described in more detail with reference to FIGS. 9 to 11 below, in the above-described learning process, the N best algorithms are used to reconstruct the training set used for learning the second-stage ensemble machine learning model 420. It can be.

상기 제 1 단계 앙상블 머신러닝 모델(410)을 통해 제 1 타입 가격 예측 요청 데이터(530-1)에 대응하는 제 1 차 예측 값이 도출되면, 제 2 단계 앙상블 머신러닝 모델(420)을 통해 상기 제 1 차 예측 값에 대응하는 제 2 차 예측 값이 도출되며, 이를 통해 최종 중고차 가격 예측 정보로 제공될 수 있다.When the first prediction value corresponding to the first type price prediction request data 530-1 is derived through the first-stage ensemble machine learning model 410, the second-stage ensemble machine learning model 420 A second prediction value corresponding to the first prediction value is derived, and through this, it may be provided as final used car price prediction information.

도 8은 도 7의 방식을 구체적인 예를 들어 설명하기 위한 도면이다.FIG. 8 is a diagram for explaining the method of FIG. 7 as a specific example.

도 8은 사용자/고객으로부터 중고차 가격 요청 정보를 입력 받아 획득된 제 1 타입 가격 예측 요청 데이터(530-1)가 특정 자동차 모델에 대한 경우를 예시로서 도시하고 있다. 대상 중고차의 차량 구분 키(key)로는 제조사키: 5, 차량구분키: 3, 대표모델키: 112, 세부모델키: 1571, 대표등급키: 17089, 세부등급키: 33129로 구분되는 경우의 예를 도시하고 있다. 이와 같은 차량 구분 키는 여러 중고차 가격 정보를 보유하는 DB들의 데이터를 취합하는 과정에서 선택될 수 있으며, 이하 도 12와 관련하여 보다 상세히 후술한다.8 illustrates a case in which the first type price prediction request data 530-1 obtained by receiving used car price request information from the user/customer is for a specific car model as an example. The vehicle classification key of the target used car is an example of a case where the manufacturer key: 5, vehicle classification key: 3, representative model key: 112, detailed model key: 1571, representative grade key: 17089, detailed grade key: 33129 is showing Such a vehicle classification key may be selected in a process of collecting data of DBs holding various used car price information, and will be described in more detail with reference to FIG. 12 below.

도 8에 도시된 예에서는 상술한 대상 차량의 키 값을 기준으로, 해당 차종이 세부등급부터 모델 구성이 가능한 차종으로 플래그 될 수 있다. 이에 따라 해당 플래그에 대응하는 N개의 최상 알고리즘(예를 들어, GLM, Random Forest, XgBoost, DNN)에 맵핑하여 추정을 수행할 수 있다(S620). In the example shown in FIG. 8 , the corresponding vehicle model may be flagged as a vehicle model capable of model configuration starting from a detailed grade based on the above-described key value of the target vehicle. Accordingly, estimation may be performed by mapping to N best algorithms (eg, GLM, Random Forest, XgBoost, and DNN) corresponding to the corresponding flag (S620).

이러한 과정을 통해 제 1 차 예측 값이 도출되면, 제 2 단계 앙상블 머신러닝 모델(420)로서 앙상블 XGBoost, 앙상블 DNN의 조합에 입력될 수 있으며, 이를 통해 출력되는 제 2 차 예측 값을 최종 추정 가격으로 제시할 수 있다. When the first prediction value is derived through this process, it can be input to the combination of ensemble XGBoost and ensemble DNN as the second stage ensemble machine learning model 420, and the second prediction value output through this is the final estimated price. can be presented as

도 9 내지 도 11은 도매 예측 알고리즘의 학습 과정을 상세히 설명하기 위한 도면이다.9 to 11 are diagrams for explaining in detail the learning process of the wholesale prediction algorithm.

먼저, 도 9를 참조하면, 도매 예측 알고리즘을 학습하기 위해 먼저 데이터를 로드할 수 있다 (S911). 데이터 로드(S911)는 기존 경매 학습 데이터, 카마트 마스터키, 전처리 참조코드에 추가적으로 신규 경매 데이터를 로드할 수 있으나, 이에 제한될 필요는 없다.First, referring to FIG. 9 , data may be loaded in order to learn the wholesale prediction algorithm (S911). The data load ( S911 ) may load new auction data in addition to the existing auction learning data, Kama master key, and preprocessing reference code, but is not limited thereto.

이후, 신규 데이터에 대한 전처리를 수행할 수 있다(S912). 신규 데이터에 대한 전처리는 이상치 처리, 결측치 보정, 실제 학습 활용 변수를 필터링하는 과정을 포함할 수 있다. 이러한 과정 후, 단계 S913에서는 기존 학습 데이터와 신규 학습 데이터를 바인딩(binding)하는 과정을 수행할 수 있다. Thereafter, preprocessing may be performed on the new data (S912). Pre-processing of new data may include processing outliers, correcting missing values, and filtering actual learning variables. After this process, in step S913, a process of binding existing learning data and new learning data may be performed.

한편, 이와 같이 바인딩된 데이터에 대해 플래그 맵핑을 진행할 수 있다 (S920). 예를 들어, 세부 등급 맵핑 가능한 경우로서 동일 세부등급키 값이 200건 이상인 데이터에 대해서는 플래그 1을 1로 맵핑하고, 나머지를 0으로 맵핑할 수 있으며, 대표 등급 맵핑 가능한 경우로서 동일 대표등급키 값이 200건 이상인 데이터에 대해서는 플래그 2를 1로, 나머지를 0으로 맵핑할 수 있다. 유사하게, 세부 모델 등급 맵핑 가능한 경우로서, 동일 세부모델키 값이 200건 이상인 데이터는 플래그 3을 1로, 나머지를 0으로 맵핑할 수 있으며, 대표 모델 맵핑 가능한 경우로서 동일 대표모델키 값이 200건 이상인 데이터에 대해서는 플래그 4를 1로, 나머지를 0으로 맵핑할 수 있다. 또한, 제조사/차종을 맵핑할 수 있는 경우로서 동일 제조사/차종 키 값이 200건 이상인 데이터에 대해서는 플래그 5값을 1로, 나머지를 0으로 맵핑할 수 있으며, 마지막으로 학습 데이터 전체에 대해 플래그 6을 1로 맵핑할 수 있다.Meanwhile, flag mapping may be performed on data bound in this way (S920). For example, flag 1 may be mapped to 1 and the rest may be mapped to 0 for data having 200 or more identical detailed grade key values as a case in which a detailed grade may be mapped, and the same representative grade key value as a case in which a representative grade may be mapped. For data of more than 200 cases, flag 2 can be mapped to 1 and the rest to 0. Similarly, as a case where the detailed model level can be mapped, flag 3 can be mapped to 1 and the rest can be mapped to 0 for data with the same detailed model key value of 200 or more cases, and the same representative model key value can be mapped to 200 as a representative model. For data with more than one case, flag 4 can be mapped to 1 and the rest to 0. In addition, as a case where manufacturers/vehicles can be mapped, flag 5 can be mapped to 1 and the rest to 0 for data with 200 or more identical manufacturer/vehicle key values, and finally, flag 6 for the entire training data. can be mapped to 1.

도 9는 상술한 바와 같이 플래그 1 내지 6에 기반하여 데이터 세트를 구성한 개념을 도시하고 있다. 각각의 플래그에 대응하는 데이터 세트들에 대해 예를 들어 신규 데이터들은 트레이닝 세트로, 기존 데이터를 밸리데이션 세트로 구성하여 학습을 수행할 수 있으나, 트레이닝 세트와 밸리데이션 세트의 구분은 이에 제한될 필요는 없다.9 illustrates the concept of constructing a data set based on flags 1 to 6 as described above. For data sets corresponding to each flag, learning may be performed by configuring new data as a training set and existing data as a validation set, for example, but the distinction between the training set and the validation set need not be limited thereto. .

도 10은 상술한 바와 같이 각 플래그별로 구성된 트레이닝 세트/밸리데이션 세트가 제 1 단계 앙상블 머신러닝 모델에 입력되는 단계부터 도시하고 있다. 도 10의 예는 도 5와 마찬가지로 제 1 단계 앙상블 머신러닝 모델로서 GLM 알고리즘, XGBoost 알고리즘, DNN 알고리즘이 이용되는 예를 도시하고 있다. 또한, 도 10에서 LM 알고리즘 및 XGBoost 알고리즘은 각 플래그별 데이터 세트 내 동일 등급/모델/제조사/차종 정보를 대상(S931-1, S931-2)으로 트레이닝 세트 내부 Cross-Validation을 통한 파라미터를 추정(S932-1, S932-2)하고, DNN 알고리즘의 경우, 각 플래그별 데이터 세트 전체를 대상(S931-3)으로 트레이닝 세트 내무 Cross-Validation을 통한 파라미터 추정(S933-3)을 수행하는 것을 도시하고 있다. 이는 일반 머신러닝 모델에 비해 딥러닝 모델의 경우, 많은 데이터를 자체적인 기준에 따라 특징을 추출하여 판단하는 점에 기반한 차이점으로 볼 수 있다.FIG. 10 shows the step of inputting the training set/validation set configured for each flag as described above to the ensemble machine learning model in the first step. The example of FIG. 10 shows an example in which the GLM algorithm, the XGBoost algorithm, and the DNN algorithm are used as the first stage ensemble machine learning model, similar to FIG. 5 . In addition, in FIG. 10, the LM algorithm and the XGBoost algorithm estimate parameters through cross-validation inside the training set for the same grade / model / manufacturer / model information (S931-1, S931-2) in the data set for each flag ( S932-1, S932-2), and in the case of the DNN algorithm, parameter estimation (S933-3) is performed through cross-validation within the training set for the entire data set for each flag (S931-3). there is. This can be seen as a difference based on the fact that, compared to general machine learning models, in the case of deep learning models, a lot of data is extracted and judged according to its own criteria.

이와 같이 각각의 플래그에 대응하는 데이터 세트에 대해 파라미터 추정(S932)이 된 후, 학습 및 밸리데이션 추정 값이 맵핑되며(S933), 이들 정보에 기반하여 제 1 예측치의 상위 차종 바인딩 세트를 구성할 수 있다(S940). 본 실시예는 이와 같은 과정을 통해 트레이닝 세트를 재구성하는 것을 제안한다(S934).In this way, after parameter estimation (S932) is performed for the data set corresponding to each flag, learning and validation estimated values are mapped (S933), and based on these information, a high-order model binding set of the first predicted value can be configured. Yes (S940). This embodiment proposes reconstructing a training set through such a process (S934).

트레이닝 세트를 재구성(S934)하는 구체적인 예를 들어 설명한다. 플래그 1에 대해, 트레이닝 세트 및 밸리데이션 세트 기반으로 선정된 5개 알고리즘 에측치 데이터를 바인딩하고, 최상 5개 알고리즘 예측치의 평균 값을 바인딩할 수 있다. 또한, 각 차종의 대표등급 기준 XGBoost, 세부모델 기분 XGBoost 를 통해 최종 활용 세트로서, (1) 기존 1차 학습 세트, (2) 탑 5 알고리즘 예측치의 평균 값, (3) 대표등급 기준 XGBoost 예측, (4) 세부모델 기준 XGBoost 예측으로 재구성할 수 있다.A specific example of reconstructing the training set (S934) will be described. For flag 1, the data of 5 algorithm predictions selected based on the training set and the validation set can be bound, and the average value of the best 5 algorithm predictions can be bound. In addition, as the final utilization set through XGBoost based on the representative grade of each vehicle model and XGBoost for the detailed model, (1) the existing primary learning set, (2) the average value of the top 5 algorithm predictions, (3) the XGBoost prediction based on the representative grade, (4) It can be reconstructed by XGBoost prediction based on the detailed model.

이때, 선택되는 최상 N 알고리즘에서 N의 개수는 플래그별로 다르게 설정될 수 있다. 예를 들어, 플래그 1 내지 3에 대해서는 N=5로, 플래그 4 및 플래그 5에 대해서는 N=3으로 설정될 수 있다. 이는 가격 추정에 있어서 플래그 1 내지 3의 정확도가 플래그 4-5에 비해 높기 때문이다.In this case, the number of N in the selected best N algorithm may be set differently for each flag. For example, N=5 for flags 1 to 3 and N=3 for flags 4 and 5 may be set. This is because the accuracy of flags 1 to 3 in price estimation is higher than that of flags 4 to 5.

마지막으로 플래그 6에 대해서는 각 차량 전체 데이터 기반으로 추정된 XGBoost, 플래그 5/6의 DNN 값을 바인딩하여 최종 활용 세트로는, (1) 기존 1차 학습 세트, (2) 전체 데이터 기반 XGBoost 예측, (3) 플래그 5/6의 DNN 예측으로 재구성할 수 있다.Finally, for flag 6, the XGBoost estimated based on the entire data of each vehicle and the DNN value of flags 5/6 are bound, and the final utilization set is (1) the existing primary learning set, (2) XGBoost prediction based on the entire data, (3) It can be reconstructed with DNN prediction of flag 5/6.

도 11은 도 10에서 상술한 바와 같이 재구성된 트레이닝 세트에 기반하여 제 2 단계 앙상블 머신러닝 모델이 학습되는 과정을 도시하고 있다. 도 11의 예에서 제 2 단계 앙상블 머신러닝 모델로는 XGBoost 알고리즘과 DNN 알고리즘을 예시하고 있으나, 이에 한정될 필요는 없다.FIG. 11 illustrates a process in which a second-stage ensemble machine learning model is learned based on the training set reconstructed as described above in FIG. 10 . In the example of FIG. 11, the XGBoost algorithm and the DNN algorithm are exemplified as the second-stage ensemble machine learning model, but it is not necessary to be limited thereto.

도 11에서 상술한 바와 같이 트레이닝 세트가 재구성(S950)된 후, 각 플래그별로 재구성된 데이터 세트가 나타내어질 수 있다. 도 11에서 각 플래그별 데이터 세트의 밸리데이션 세트는 도 9와 달리 재구성된 상태임을 주목할 필요가 있다.After the training set is reconstructed (S950) as described above with reference to FIG. 11, the reconstructed data set for each flag may be displayed. It should be noted that the validation set of each flag data set in FIG. 11 is in a reconstructed state unlike FIG. 9 .

이와 같이 재구성된 데이터 세트에 대해 제 2 단계 앙상블 머신러닝 모델들은 각각 플래그별 데이터 세트를 기준으로(S960, S961), Cross-Validation을 통한 파라미터 추정을 거쳐 학습될 수 있다.For the data set reconstructed as described above, the second-stage ensemble machine learning models may be learned through parameter estimation through cross-validation based on the data set for each flag (S960 and S961).

이와 같이 학습 과정을 거친 본 실시예에 따른 앙상블 머신러닝 모델은 도 7과 관련하여 상술한 바와 같은 가격 추정에 활용될 수 있다.The ensemble machine learning model according to the present embodiment that has undergone the learning process in this way can be used for price estimation as described above with respect to FIG. 7 .

한편, 상술한 바와 같이 단계적 방식의 앙상블 모델을 활용하는 예는 다양한 가격 요청 타입 중 도매 시세 예측에 활용되는 것이 바람직하다. 도 4 및 도 5와 관련하여 상술한 바와 같이 단계적 방식은 예측 정확도가 높은 반면 처리 속도가 배깅 방식에 비해 느린 단점을 가질 수 있어, 소매 가격 예측과 같이 사용자/고객에게 즉각적인 결과를 제공하는데 부적절할 수도 있다. 다만, 이와 같은 제한은 사용자/고객의 상황에 따라 다를 수 있으며, 단계적 연결 방식 앙상블 모델의 사용을 도매 시세 예측에 한정하여 적용할 필요는 없다.On the other hand, as described above, it is preferable to use the stepwise ensemble model for prediction of wholesale market prices among various price request types. As described above with reference to FIGS. 4 and 5, the step-by-step method may have a disadvantage in that the processing speed is slower than the bagging method while the prediction accuracy is high, and thus may be inappropriate for providing immediate results to users/customers, such as retail price prediction. may be However, such restrictions may vary depending on the user/customer situation, and it is not necessary to limit the use of the ensemble model of the step-by-step connection method to prediction of wholesale market prices.

도 12는 본 발명의 일 실시예에 따라 복수의 DB 데이터를 우선순위 기반으로 전처리하는 방식을 설명하기 위한 도면이다.12 is a diagram for explaining a method of pre-processing a plurality of DB data based on priority according to an embodiment of the present invention.

상술한 바와 같이 중고차 가격 예측에 있어 큰 문제 중 하나는 가격 예측을 수행하는 특정 중고차에 정확하게 맵핑되는 기존 데이터가 많지 않은 경우이다. 따라서, 본 발명의 일 실시예에서는 복수의 DB들로부터 데이터를 추출하여 학습/가격 추정을 수행하되, 전처리 과정에서 이들 DB들 각각에 우선순위를 부가하여 데이터 세트를 구성하는 것(S1210)을 제안한다.As described above, one of the major problems in predicting the price of a used car is when there is not much existing data accurately mapped to a specific used car for price prediction. Therefore, in one embodiment of the present invention, learning/price estimation is performed by extracting data from a plurality of DBs, but in the preprocessing process, it is proposed to configure a data set by adding a priority to each of these DBs (S1210). do.

도 12의 예에서, 제 1 DB는 신차 출고가에 대한 데이터베이스, 제 2 DB는 신차 등급 가격에 대한 데이터베이스, 그리고 제 N DB는 신규 생성 신차 가격에 대한 데이터베이스를 가정하며, 그 밖에도 다양한 DB들을 조합하여 사용할 수 있다.In the example of FIG. 12, it is assumed that the first DB is a database for new car factory prices, the second DB is a database for new car class prices, and the Nth DB is a database for newly generated new car prices. can be used

여기서, 신차 출고가에 대한 제 1 DB는 차량 세부 모델별 상세 정보를 가지고 있어 다른 DB에 비해 정보의 정확도/신뢰도 측면에서 우선순위를 가질 수 있다. 또한, 신차 등급 가격에 대한 제 2 DB는 제 1 DB보다는 정확도/신뢰도가 떨어질 수 있으나, 제 3 DB에 비해 우선순위를 가질 수 있다.Here, the first DB for the new car factory price has detailed information for each detailed vehicle model, and thus may have priority in terms of accuracy/reliability of information compared to other DBs. In addition, the second DB for new car grade prices may have lower accuracy/reliability than the first DB, but may have priority over the third DB.

따라서, 본 실시예에서는 이러한 우선순위를 기반으로 데이터 세트를 구성하는 것을 제안하며, 여기서 우선순위에 기반한다는 의미는 동일한 데이터가 중복되는 경우 제 1 DB의 정보를 다른 DB의 정보에 우선하여 학습/추론에 활용하는 데이터 세트로 활용함을 의미한다.Therefore, in this embodiment, it is proposed to construct a data set based on such a priority, where the meaning of based on the priority means that when the same data is duplicated, the information in the first DB is given priority to the information in the other DB. It means that it is used as a data set used for inference.

도 13은 본 발명의 일 실시예에 따라 전처리 참조 코드를 생성하는 방법을 설명하기 위한 도면이다.13 is a diagram for explaining a method of generating a preprocessing reference code according to an embodiment of the present invention.

도 13의 예에서 전처리 참조 코드를 생성하기 위해 먼저 데이터 로드 과정을 수행할 수 있다(S1210). 데이터 로드 과정은 도매 데이터, 소매 데이터, 카마트 마스터키 등의 정보를 로드하는 것을 포함할 수 있다.In the example of FIG. 13, a data loading process may be performed first to generate a preprocessing reference code (S1210). The data loading process may include loading information such as wholesale data, retail data, and a Karmart master key.

이와 같이 로드된 데이터들에 대해 연료 코드 맵핑(S1221), 차량 배기량 맵핑(S1222), 그리고 차량 유형 및 차량 크기 맵핑(S1223)이 수행될 수 있다. 연료 코드 맵핑(S1221)은 등급키별 최빈 연료 코드를 기준으로 수행될 수 있으며, 연료 코드를 통일하는 과정을 포함할 수 있다. 차량 배기량 맵핑(S1222)은 등급키별 배기량 중간 값을 이용할 수 있으며, 전기차 배기량은 제거하도록 진행될 수 있다. 차량 유형 및 차량 크기 맵핑(S1223)은 모델키 별 유형을 통일하고, 승용차, SUV/RV 유형 차량 크기를 도출하는 과정을 포함할 수 있다.Fuel code mapping (S1221), vehicle displacement mapping (S1222), and vehicle type and vehicle size mapping (S1223) may be performed on the data loaded in this way. Fuel code mapping (S1221) may be performed based on the most frequent fuel code for each grade key, and may include a process of unifying fuel codes. The vehicle displacement mapping (S1222) may use the median displacement value for each grade key, and may proceed to remove the electric vehicle displacement. The vehicle type and vehicle size mapping (S1223) may include a process of unifying the type for each model key and deriving the size of a passenger car or SUV/RV type vehicle.

한편, 전처리 참조코드 생성 과정은 잔가율 이상치 확인(S1224), 신차가 이상치 확인(S1225), 그리고 최초 생산년월 확인(S1226) 과정을 추가적으로 수행할 수 있다. 잔가율 이상치 확인(S1224) 과정은 주행거리별 잔가율 이상치의 기준을 설정하고, 경과일수 별 잔가율 이상치 기준을 설정하는 과정을 포함할 수 있다. 신차가 이상치 확인(S1225) 과정은 세부등급별 이상치 기준을 설정하는 과정을 포함하여 수행될 수 있다. 아울러, 최초 생산년월 확인(S1226) 과정은 모델키별 최초 생산년월을 설정하는 과정을 의미할 수 있다.On the other hand, in the process of generating the preprocessing reference code, the residual value outlier check (S1224), the new car price check (S1225), and the first year of production (S1226) may be additionally performed. The process of checking the residual value rate outlier (S1224) may include a process of setting a standard for the residual value rate abnormal value for each mileage and setting a residual value rate abnormal value standard for each elapsed number of days. The process of confirming the outlier value of the new car (S1225) may be performed including a process of setting an outlier standard for each detailed grade. In addition, the first production year and month confirmation process (S1226) may refer to a process of setting the first production year and month for each model key.

만일, 이들 확인 과정(S1225, S1226)에서 데이터가 부재인 경우, 수동 업데이트를 통해 데이터(S1227)를 보충할 수 있음은 물론이다. 수동 업데이트(S1227)는 또한 모든 값이 존재하는지를 확인하는 과정(S1229)을 통해, 특정 정보가 부재인 경우 이를 보충하기 위해 수행될 수 있다. 이러한 과정을 거쳐 차량 유형, 차량 크기, 차량 생산년월 및 이상치 기준을 맵핑(S1228)한 후 전처리 참조 코드를 생성할 수 있다. 이와 같이 생성된 전처리 참조 코드는 학습 및 추론 단계에서 데이터의 전처리에 활용될 수 있다.If the data is absent in these confirmation processes (S1225 and S1226), it is of course possible to supplement the data (S1227) through manual update. Manual update (S1227) may also be performed to compensate for the absence of specific information through a process of checking whether all values exist (S1229). Through this process, the vehicle type, vehicle size, year and month of vehicle production, and outlier criteria are mapped (S1228), and then a preprocessing reference code may be generated. The preprocessing reference code generated in this way can be used for data preprocessing in the learning and reasoning steps.

도 14는 본 발명의 일 실시예에 따라 소매 가격 예측 방식을 구체적으로 설명하기 위한 도면이다.14 is a diagram for explaining in detail a retail price prediction method according to an embodiment of the present invention.

도매 가격 예측 모델과 대비하여 소매 가격 예측 모델의 경우, 가격 요청에 대응되는 특정 차량에 정확히 대응하는 차종 데이터가 부재하는 경우가 더 빈번할 수 있다. 이에 따라 본 실시예에서는 소매 가격 예측 요청 데이터에 대응하는 제 2 타입 가격 예측 요청 데이터를 수신하는 경우, 유사 차종 기준 데이터를 추가적으로 활용하여 기 학습된 머신러닝 모델을 통해 예측 값을 도출하는 것을 제안한다.Compared to the wholesale price prediction model, in the case of the retail price prediction model, there may be more frequent cases in which vehicle model data that accurately corresponds to a specific vehicle corresponding to a price request is absent. Accordingly, in this embodiment, when the second type price prediction request data corresponding to the retail price prediction request data is received, it is proposed to derive a predicted value through a pre-learned machine learning model by additionally utilizing similar vehicle model reference data. .

구체적으로 도 14의 상단은 학습 과정을, 하단은 가격 추정 과정을 도시하고 있다. 소매 예측 알고리즘의 학습을 위해 데이터를 로르할 수 있다(S1310). 이때, 소매 데이터, 카마트 마스터키, 전처리 참조코드를 로드할 수 있으며, 마스터키에 없다면 필터링하는 과정을 포함할 수 있다. Specifically, the upper part of FIG. 14 shows a learning process, and the lower part shows a price estimation process. Data may be loaded for training of the retail prediction algorithm (S1310). At this time, retail data, Kama master key, and preprocessing reference code can be loaded, and if not in the master key, a filtering process can be included.

이와 같이 로드된 데이터(S1310)를 이용하여 데이터 전처리(S1311)가 수행될 수 있다. 데이터 전처리는 이상치 데이터를 제거하고, 도매 모형에 활용된 기준으로 필터링을 수행하며, 유의미한 변수를 추출하는 과정을 포함할 수 있다. 이때, 상술한 바와 같이 본 실시예에서는 유사 차종 정보를 추가적으로 활용하는 것을 제안하며, 이를 위해 유사 차종 기준 데이터 세트를 생성하고(S1314), 유사 차종 변수 중요도를 도출(S1315)하여, 유사 차종 탐색 알고리즘을 구축하고(S1317), 이를 후술할 소매 가격 예측 과정에 활용할 수 있다.Data preprocessing (S1311) may be performed using the data (S1310) loaded in this way. Data preprocessing may include a process of removing outlier data, performing filtering based on criteria used in the wholesale model, and extracting significant variables. At this time, as described above, in this embodiment, it is proposed to additionally utilize similar vehicle model information. To this end, a similar vehicle model reference data set is generated (S1314) and the similar vehicle model variable importance is derived (S1315) to search for similar vehicle models. is built (S1317), and can be used in a retail price prediction process to be described later.

한편, 상술한 바와 같은 데이터 전처리(S1311)를 거친 후 데이터 정규화 및 더미화(S1312)를 통해 뉴럴넷 기반의 모델을 생성할 수 있다(S1313). 소매 가격 예측 알고리즘의 경우에도 앙상블 모델을 생성할 수 있음은 물론이다. 또한, 오차 보정 모델을 생성(S1316)할 수 있으며, 예를 들어 XGBoost 기반 모델로, 1년 내 500건 이상 차종에 대한 보정 모델을 생성할 수 있다. 이와 같이 생성된 모델은 이후 가격 추정 과정에서 활용될 수 있다(S1323).Meanwhile, after data preprocessing (S1311) as described above, a neural net-based model may be generated through data normalization and dummyization (S1312) (S1313). It goes without saying that an ensemble model can also be created in the case of a retail price prediction algorithm. In addition, an error correction model may be generated (S1316), and, for example, an XGBoost-based model may be used to generate correction models for more than 500 types of vehicles within one year. The model generated in this way can be used in the subsequent price estimation process (S1323).

소매 시세 예측 과정은 예측 데이터를 로드(S1320)하는 과정으로 시작할 수 있다. 예측 데이터가 로드(S1320)된 후 전처리 과정(S1321, S1322)을 수행할 수 있으며, 도 14의 예에서는 구체적으로 모델 예측 형태로 예측 데이터를 변경하고, 정규화 및 더미화를 수행하는 제 1 예측 데이터 전처리(S1321)와 유사 차종 탐색 알고리즘(S1317)을 활용하여 유사 차종의 빈값 중위값으로 데이터를 대체하고, 경우에 따라 신규차량 유사차량 정보로 대체하는 과정의 제 2 예측 데이터 전처리(S1322)로 구분하여 도시하고 있다.The retail market price prediction process may start with a process of loading prediction data (S1320). After the prediction data is loaded (S1320), preprocessing (S1321, S1322) may be performed. In the example of FIG. 14, the first prediction data for which the prediction data is specifically changed to a model prediction form and normalization and dummyization are performed. Using the preprocessing (S1321) and the similar vehicle type search algorithm (S1317), the data is replaced with the median empty value of the similar vehicle type, and in some cases, the second prediction data preprocessing (S1322) of the process of replacing the new vehicle with similar vehicle information are showing

이러한 전처리 과정(S1321, S1322)을 거쳐 기 학습된 모델(S1323)을 통해 최종적인 잔가율 예측을 수행할 수 있다.Through these preprocessing processes (S1321 and S1322), the final residual rate prediction can be performed through the pre-learned model (S1323).

도 14와 관련하여 상술한 유사차량 정보를 활용하는 방식은 소매 시세 예측 타입의 경우로 설명하였으나, 데이터가 부족한 상황에서 도매 시세 예측 및 미래 시세 예측에도 유사한 방식으로 적용될 수 있다.The method of using the similar vehicle information described above with reference to FIG. 14 has been described in the case of the retail market price prediction type, but can be applied to wholesale market price prediction and future market price prediction in a similar manner in a situation where data is insufficient.

도 15는 본 발명의 일 실시예에 따라 미래 시세 예측 알고리즘의 동작을 설명하기 위한 도면이다.15 is a diagram for explaining the operation of a future market price prediction algorithm according to an embodiment of the present invention.

도 15에 예시한 미래 시세 예측 알고리즘의 경우에도 상단에 학습 과정을 하단에 시세 예측 과정을 도시하고 있다. 본 실시예에서, 미래 시세 예측 요청 데이터에 대응하는 제 3 타입 가격 예측 요청 데이터를 수신하는 경우, 제 3 타입 가격 예측 요청 데이터에 대응하여 주행거리 또는 경과일수에 따른 잔가율을 도출하여 제공하는 것을 특징으로 한다.In the case of the future market price prediction algorithm illustrated in FIG. 15, the learning process is shown at the top and the market price prediction process is shown at the bottom. In this embodiment, when the third type price prediction request data corresponding to the future market price prediction request data is received, a residual rate according to the mileage or elapsed days is derived and provided in response to the third type price prediction request data. to be

미래 시세 예측을 위한 학습 과정에서 학습 데이터를 로드하고(S1410), 데이터 전처리 과정(S1411)을 통해 이상 데이터를 제거하고, 유의미한 변수를 추출할 수 있다. 이때 유의미한 정보는 주행거리 또는 경과일수에 따른 잔가율 변화 등의 정보를 포함할 수 있다. 이와 같이 전처리된 데이터는 데이터 정규화(S1412)를 거쳐 모델이 생성(S1413)될 수 있으며, 본 실시예에서도 앙상블 모델이 이용될 수 있음은 물론이다. 또한, 본 실시예에서도 도 14의 실시예와 유사하게 오차 보정 모델을 생성하여 활용하는 것을 제안한다(S1414).In the learning process for predicting the future market price, learning data may be loaded (S1410), abnormal data may be removed through a data pre-processing process (S1411), and significant variables may be extracted. In this case, the meaningful information may include information such as a change in residual value according to a mileage or elapsed number of days. The preprocessed data may be subjected to data normalization (S1412) and a model may be generated (S1413), and an ensemble model may be used in this embodiment as well. In addition, in this embodiment, similar to the embodiment of FIG. 14, it is proposed to generate and utilize an error correction model (S1414).

이와 같이 학습된 모델(S1422)을 통해 미래 시세 예측을 위한 잔가율 테이블을 도출하기 위해, 예측 데이터를 로드하고(S1420), 예측 데이터의 전처리(S1421)를 통해 기 학습된 모델이 입력할 수 있다.In order to derive a residual rate table for predicting the future market price through the model (S1422) learned in this way, predicted data is loaded (S1420), and the pre-learned model can be input through preprocessing (S1421) of the predicted data.

상술한 바와 같이 개시된 본 발명의 바람직한 실시예들에 대한 상세한 설명은 당업자가 본 발명을 구현하고 실시할 수 있도록 제공되었다. 상기에서는 본 발명의 바람직한 실시예들을 참조하여 설명하였지만, 해당 기술 분야의 숙련된 당업자는 본 발명의 영역으로부터 벗어나지 않는 범위 내에서 본 발명을 다양하게 수정 및 변경시킬 수 있음을 이해할 수 있을 것이다. 예를 들어, 당업자는 상술한 실시예들에 기재된 각 구성을 서로 조합하는 방식으로 이용할 수 있다.Detailed descriptions of the preferred embodiments of the present invention disclosed as described above are provided to enable those skilled in the art to implement and practice the present invention. Although the above has been described with reference to preferred embodiments of the present invention, those skilled in the art will understand that the present invention can be variously modified and changed without departing from the scope of the present invention. For example, those skilled in the art can use each configuration described in the above-described embodiments in a manner of combining with each other.

따라서, 본 발명은 여기에 나타난 실시예들에 제한되려는 것이 아니라, 여기서 개시된 원리들 및 신규한 특징들과 일치하는 최광의 범위를 부여하려는 것이다.Accordingly, the present invention is not intended to be limited to the embodiments shown herein but is to be accorded the widest scope consistent with the principles and novel features disclosed herein.

상술한 바와 같은 본 발명의 실시예들에 따른 중고차 가격 예측 방법 및 이를 이용한 시스템은 적은 데이터를 통해서도 효율적으로 중고차 가격 예측을 수행하여, 도매/소매/미래 시세 예측에 다양하게 활용될 수 있다.As described above, the used car price prediction method and the system using the used car price prediction method according to the embodiments of the present invention can be used in various ways for wholesale/retail/future market price prediction by efficiently predicting used car prices even with a small amount of data.

Claims

In a used car price prediction system including a storage medium for storing command information and a processor configured to execute the command information stored in the storage medium, a used car price prediction method using a machine learning ensemble performed by the processor in
receiving first-type price prediction request data for a target used car;
Mapping the first type price prediction request data to flags corresponding to vehicle identification keys of the target used car and N best algorithms corresponding to the flags,
The N best algorithms are algorithms derived from a pre-learning process of the first-stage ensemble machine learning model through a training set and a validation set extracted from a plurality of databases through a pre-processing process;
deriving a first prediction value corresponding to the first type price prediction request data through the first stage ensemble machine learning model; and
Deriving a second prediction value corresponding to the first prediction value through a second stage ensemble machine learning model coupled to receive an output value of the first stage ensemble machine learning model,
The N best algorithms are derived from the pre-learning process of the first-stage ensemble machine learning model and used to reconstruct a training set used for learning the second-stage ensemble machine learning model.

delete

According to claim 1,
The plurality of databases,
including at least two of a first database for new car factory prices, a second database for new car class prices, and a third database for newly generated new car prices;
Pre-processing to configure at least one of the training set and the validation set based on the data selected as the priority of the first, second, and third databases among the plurality of databases, Used car price prediction method.

According to claim 3,
The training set and validation set,
A first flag to which a detailed grade can be mapped;
A second flag to which a representative grade can be mapped;
A third flag to which a detailed model can be mapped;
A fourth flag to which a representative model can be mapped,
A fifth flag that can be mapped by manufacturer or vehicle type, and
A sixth flag for the entire training data;
Used car price prediction method configured by being distinguished by a plurality of flags including.

According to claim 1,
The method of predicting used car prices, wherein the number of N is set differently according to the reliability of the flag.

According to claim 1,
Wherein the first-stage ensemble machine learning model and the second-stage ensemble machine learning model are ensemble models including a plurality of machine learning models or deep learning models, respectively.

According to claim 6,
The first stage ensemble machine learning model,
A used car price prediction method including a general linear model (GLM), a first XGBoost model, and a first deep neural network (DNN) model.

According to claim 6,
The second stage ensemble machine learning model,
A method for predicting used car prices, including a second XGBoost model and a second DNN model.

According to claim 6,
The deep learning model among the first-stage ensemble machine learning models performs learning in units of flags of the training set and validation set,
The used car price prediction method of the first-stage ensemble machine learning model, wherein the machine learning model performs learning in units of flags and details of the training set and validation set.

According to claim 1,
The training set used for learning the second-stage ensemble machine learning model,
The used car price prediction method, which is reconstructed by binding the average value and additional information according to the N best algorithms to the training set of the first-stage ensemble machine learning model.

According to claim 1,
The first type price prediction request data,
Used car price prediction method, which is wholesale price prediction request data.

According to claim 1,
When receiving the second type price prediction request data corresponding to the retail price prediction request data,
A used car price prediction method that derives a predicted value through a pre-learned machine learning model using similar vehicle model data.

According to claim 1,
When receiving the third type price prediction request data corresponding to the future market price prediction request data,
The used car price prediction method of deriving and providing a residual value ratio according to mileage or elapsed days in response to the third type price prediction request data.

In the used car price prediction system using machine learning ensemble,
a storage medium that stores command information; and
A processor communicatively connected to the storage medium and configured to execute command information stored in the storage medium;
When executed by the processor, the instruction information,
receiving first-type price prediction request data for a target used car;
Mapping the first type price prediction request data to flags corresponding to vehicle identification keys of the target used car and N best algorithms corresponding to the flags,
The N best algorithms are algorithms derived from a pre-learning process of the first-stage ensemble machine learning model through a training set and a validation set extracted from a plurality of databases through a pre-processing process;
deriving a first prediction value corresponding to the first type price prediction request data through the first stage ensemble machine learning model; and
Deriving a second prediction value corresponding to the first prediction value through a second stage ensemble machine learning model coupled to receive an output value of the first stage ensemble machine learning model,
The N best algorithms are derived from the pre-learning process of the first-stage ensemble machine learning model and used to reconstruct a training set used for learning the second-stage ensemble machine learning model. Used car price prediction system.