KR102218287B1

KR102218287B1 - Method and system for a used car through machine learning

Info

Publication number: KR102218287B1
Application number: KR1020200008202A
Authority: KR
Inventors: 김종서; 김종우; 오신행
Original assignee: (주) 아톤모빌리티
Priority date: 2020-01-21
Filing date: 2020-01-21
Publication date: 2021-02-22

Abstract

The present disclosure relates to a method for predicting the market price of a used car through machine learning. The method for predicting the market price of a used car comprises the steps of: receiving a data set about a plurality of vehicles for which a transaction has been completed; classifying the data set about the plurality of vehicles, for which a transaction has been completed according to a predetermined condition, into a plurality of data subsets; generating a plurality of prediction models, which are configured to deduce the price of a used car regarding the plurality of classified data subsets; receiving a request for predicting the market price of a target used car and extracting one or more prediction models corresponding to the target used car from the plurality of generated prediction models; and calculating the market price of the target used car on the basis of one or more extracted prediction modules in response to the request for predicting the market price of the target used car. The present invention enables a user to easily and quickly find and purchase a desired product at a desired price.

Description

Used car price prediction method and system through machine learning {METHOD AND SYSTEM FOR A USED CAR THROUGH MACHINE LEARNING}

본 개시는 머신 러닝을 통한 중고차 시세 예측 방법 및 시스템에 관한 것으로, 구체적으로, 앙상블 학습(Ensemble Learning)을 통해 예측 모델을 생성하여 중고차의 시세를 산출할 수 있는 중고차 시세 예측 방법 및 시스템에 관한 것이다.The present disclosure relates to a method and system for predicting used car market prices through machine learning, and more specifically, to a used car price prediction method and system capable of calculating the market price of a used car by generating a predictive model through ensemble learning. .

일반적으로 중고차는 신차가 출고된 후에 일정 기간 동안 운행한 자동차를 의미한다. 이러한 중고차는 통신 및 네트워크 기술의 발달에 따라 전자 상거래를 통해 거래가 활발히 이루어지고 있다. 최근, 인공지능 관련 기업에 대한 시장의 관심이 폭발적으로 증가됨에 따라 중고차 산업 또한 인공지능 기반 서비스를 적극적으로 도입하는 추세이다. 이런 추세와 함께, 소비자들 사이에서 자동차를 소유하기보다 공유하는 트렌드가 보편화되면서 중고차 거래량이 급증하여 중고차 거래 시장이 빠르게 성장하고 있다.In general, a used car refers to a car that has been operated for a certain period after a new car is released. These used cars are actively traded through electronic commerce with the development of communication and network technologies. Recently, as the market's interest in AI-related companies has exploded, the used car industry is also actively introducing AI-based services. Along with this trend, as the trend of sharing cars rather than owning them among consumers is becoming more common, the used car trading volume is rapidly increasing, and the used car trading market is growing rapidly.

그러나, 이러한 급성장에 비해, 중고차 거래 시장의 신뢰도는 중고차 거래를 이용하는 사용자에게 높지 않다. 중고차 시장의 특성상 사고 이력, 부품 교체 여부 등 차량 정보가 투명하게 공개되지 않는다. 이러한 시장의 불투명성으로 인해, 중고차를 구매하고자 하는 소비자와 판매하려는 판매자 사이의 정보의 비대칭이 발생하여 허위 매물로 인해 많은 피해가 발생되고 있다. However, compared to this rapid growth, the reliability of the used car trading market is not high for users who use used car trading. Due to the nature of the used car market, vehicle information such as accident history and parts replacement is not transparently disclosed. Due to such market opacity, asymmetry of information between a consumer who wants to buy a used car and a seller who wants to sell occurs, and a lot of damage is caused by a false sale.

또한, 중고차의 가격은 연식, 주행거리, 사고이력, 차량상태, 옵션, 변속기의 종류, 색상, 사용 용도, 유행, 지역 등 다양한 요인으로부터 산정될 수 있다. 중고차의 구매자는 중고차의 판매자가 제시하는 중고차의 가격의 적정성을 정확히 판단하거나 시세 변동을 예측하기 어렵다. 이에 따라, 중고차 거래 시장에서 실제 거래된　중고차　가격을　바탕으로 시세 변동을 예측하여 소비자들이 가격의 적정성을 판단할 수 있는 방법이 요구된다.In addition, the price of a used car can be calculated from various factors such as year, mileage, accident history, vehicle condition, options, type of transmission, color, usage, fashion, and region. It is difficult for a buyer of a used car to accurately judge the appropriateness of the price of a used car suggested by a seller of a used car or to predict a change in market price. Accordingly, there is a need for a method that enables consumers to determine the adequacy of prices by predicting market price fluctuations based on 　 used cars 　 prices actually traded in the used car trading market.

본 개시는 상기와 같은 문제점을 해결하기 위한 중고차 시세 예측 방법, 시스템 및 기록매체에 저장된 컴퓨터 프로그램을 제공한다.The present disclosure provides a method, a system, and a computer program stored in a recording medium to solve the above problems.

거래 완료된 복수의 차량에 대한 데이터를 복수의 데이터 서브 세트로 분류하고, 복수의 데이터 서브 세트에 대해 중고차 시세를 추론하도록 구성된 복수의 예측 모델을 생성하고, 생성된 복수의 예측 모델 중에서 대상 중고차에 대응하는 하나 이상의 예측 모델을 추출하여 대상 중고차에 대한 시세를 산출하는 중고차 시세 예측 방법, 시스템 및 기록매체에 저장된 컴퓨터 프로그램 이 제공된다.Classify data on a plurality of vehicles for which transactions have been completed into a plurality of data subsets, generate a plurality of predictive models configured to infer used car prices for a plurality of data subsets, and respond to a target used car among the plurality of generated predictive models A used car market price prediction method, a system and a computer program stored in a recording medium for calculating the market price of a target used car by extracting one or more prediction models are provided.

본 개시는 방법, 시스템, 장치 또는 명령어들을 저장하는 컴퓨터 판독가능 저장 매체를 포함한 다양한 방식으로 구현될 수 있다, The present disclosure may be implemented in a variety of ways, including a method, system, apparatus or computer readable storage medium storing instructions.

본 개시의 일 실시예에 따른 중고차 시세 예측 방법은, 거래 완료된 복수의 차량에 대한 데이터 세트를 수신하는 단계, 미리 결정된 조건에 따라 거래 완료된 복수의 차량에 대한 데이터 세트를 복수의 데이터 서브 세트로 분류하는 단계, 분류된 복수의 데이터 서브 세트에 대해, 중고차의 시세를 추론하도록 구성된 복수의 예측 모델을 생성하는 단계, 대상 중고차에 대한 시세 예측 요청을 수신하고, 생성된 복수의 예측 모델 중에서, 대상 중고차에 대응하는 하나 이상의 예측 모델을 추출하는 단계 및 대상 중고차에 대한 시세 예측 요청에 응답하여, 추출된 하나 이상의 예측 모델을 기초로 대상 중고차에 대한 시세를 산출하는 단계를 포함한다.A method for predicting a used car market price according to an embodiment of the present disclosure includes receiving data sets for a plurality of vehicles for which transactions have been completed, classifying data sets for a plurality of vehicles for which transactions have been completed according to a predetermined condition into a plurality of data subsets Generating a plurality of predictive models configured to infer the price of a used car for a plurality of classified data subsets, receiving a request for price prediction for a target used car, and among the plurality of generated prediction models, a target used car And extracting at least one prediction model corresponding to and in response to a request for price prediction for the target used car, calculating a market price for the target used car based on the extracted at least one prediction model.

일 실시예에 따르면, 미리 결정된 조건은, 복수의 조건을 포함하고, 복수의 데이터 서브 세트의 각각은 복수의 조건의 각각에 연관되어 저장되고, 미리 결정된 조건에 따라 거래 완료된 복수의 차량에 대한 데이터 세트를 복수의 데이터 서브 세트로 분류하는 단계는 거래 완료된 복수의 차량에 대한 데이터 세트의 각각이 복수의 조건 중 적어도 일부 조건을 만족하는 경우, 데이터 세트의 각각을 만족된 적어도 일부 조건의 각각에 대응하는 데이터 서브 세트로서 분류하는 단계를 포함한다.According to an embodiment, the predetermined condition includes a plurality of conditions, and each of the plurality of data subsets is stored in association with each of the plurality of conditions, and data on a plurality of vehicles that have completed a transaction according to the predetermined condition Classifying the set into a plurality of data subsets corresponds to each of the data sets corresponding to each of the at least some conditions satisfied when each of the data sets for the plurality of vehicles for which the transaction has been completed satisfies at least some of the plurality of conditions. And classifying it as a subset of data.

일 실시예에 따르면, 복수의 조건은, 연식에 따른 차량 등급 조건, 차량 등급 조건, 세부 모델 조건, 모델 조건, 제조사 조건, 모든 매물 조건을 포함하고, 복수의 데이터 서브 세트의 각각은 미리 정해진 값 이상의 모수를 가진다. According to an embodiment, the plurality of conditions include vehicle grade conditions according to year, vehicle grade conditions, detailed model conditions, model conditions, manufacturer conditions, and all sales conditions, and each of the plurality of data subsets is a predetermined value. It has more than one parameter.

일 실시예에 따르면, 분류된 복수의 데이터 서브 세트에 대해, 중고차의 시세를 추론하도록 구성된 복수의 예측 모델을 생성하는 단계는, 분류된 복수의 데이터 서브 세트의 각각에 대해, 머신러닝 알고리즘을 통해 중고차의 시세를 추론하도록 구성된 예측 모델을 생성하는 단계를 포함한다.According to an embodiment, generating a plurality of predictive models configured to infer a market price of a used car for a plurality of classified data subsets includes, for each of the plurality of classified data subsets, through a machine learning algorithm. And generating a predictive model configured to infer a market price of a used car.

일 실시예에 따르면, 분류된 복수의 데이터 서브 세트에 대해, 중고차의 시세를 추론하도록 구성된 복수의 예측 모델을 생성하는 단계는, 연식에 따른 차량 등급 조건에 대응하는 복수의 데이터 서브 세트의 각각을 이용하여 앙상블 학습을 통해 복수의 제1 예측 모델 후보를 생성하는 단계, 차량 등급 조건에 대응하는 복수의 데이터 서브 세트의 각각을 이용하여 앙상블 학습을 통해 복수의 제2 예측 모델 후보를 생성하는 단계, 세부 모델 조건에 대응하는 복수의 데이터 서브 세트의 각각을 이용하여 앙상블 학습을 통해 복수의 제3 예측 모델 생성하는 단계, 모델 조건에 대응하는 복수의 데이터 서브 세트의 각각을 이용하여 앙상블 학습을 통해 복수의 제4 예측 모델 후보를 생성하는 단계, 제조사 조건에 대응하는 복수의 데이터 서브 세트의 각각을 이용하여 앙상블 학습을 통해 복수의 제5 예측 모델 후보를 생성하는 단계 및 모든 매물 조건에 대응하는 데이터 서브 세트를 이용하여 앙상블 학습을 통해 복수의 제6 예측 모델 후보를 생성하는 단계를 포함한다.According to an embodiment, generating a plurality of predictive models configured to infer a market price of a used car for a plurality of classified data subsets includes: each of the plurality of data subsets corresponding to the vehicle class condition according to the year Generating a plurality of first prediction model candidates through ensemble learning using, generating a plurality of second prediction model candidates through ensemble learning using each of a plurality of data subsets corresponding to vehicle class conditions, Generating a plurality of third prediction models through ensemble learning using each of the plurality of data subsets corresponding to the detailed model conditions, and multiple through ensemble learning using each of the plurality of data subsets corresponding to the model conditions Generating a fourth prediction model candidate of, generating a plurality of fifth prediction model candidates through ensemble learning using each of a plurality of data subsets corresponding to the manufacturer's condition, and data subs corresponding to all selling conditions And generating a plurality of sixth prediction model candidates through ensemble learning using the set.

일 실시예에 따르면, 분류된 복수의 데이터 서브 세트의 각각은, 복수의 학습 데이터 서브 세트 및 테스트 데이터 서브 세트를 포함하고, 분류된 복수의 데이터 서브 세트의 각각에 대해, 머신러닝 알고리즘을 통해 중고차의 시세를 추론하도록 구성된 예측 모델을 생성하는 단계는, 복수의 학습 데이터 서브 세트를 이용하여 복수의 데이터 서브 세트의 각각에 대응하는 복수의 예측 모델 후보의 각각이 최소한의 오차를 가지도록 각 예측 모델 후보의 하이퍼파라미터를 추출하는 단계 및 복수의 데이터 서브 세트의 각각에 대응하는 복수의 예측 모델 후보 중에서, 테스트 데이터 서브 세트를 이용하여 검증 시, 오차가 가장 작은 예측 모델을 선택하는 단계를 포함한다. According to an embodiment, each of the plurality of classified data subsets includes a plurality of training data subsets and test data subsets, and for each of the classified plurality of data subsets, a used car through a machine learning algorithm. The step of generating a prediction model configured to infer the price of each prediction model using a plurality of training data subsets so that each of the plurality of prediction model candidates corresponding to each of the plurality of data subsets has a minimum error Extracting the candidate hyperparameters and selecting a prediction model having the smallest error when verifying using the test data subset from among a plurality of prediction model candidates corresponding to each of the plurality of data subsets.

일 실시예에 따르면, 대상 중고차에 대한 시세를 산출하는 단계는, 대상 중고차가 만족하는, 복수의 조건 중 적어도 일부 조건을 선택하는 단계, 선택된 복수의 데이터 서브 세트의 각각에 대하여 선택된 예측 모델을 통해 대상 중고차에 대한 시세를 산출하는 단계 및 산출된 중고차에 대한 시세의 각각에 가중치를 적용하여 앙상블 학습을 통해 대상 중고차에 대한 최종 시세를 산출하는 단계를 포함한다. According to an embodiment, calculating a market price for a target used car includes selecting at least some of a plurality of conditions that the target used car satisfies, through a prediction model selected for each of a plurality of selected data subsets. And calculating a market price for the target used car and calculating a final market price for the target used car through ensemble learning by applying a weight to each of the calculated market prices for the used car.

일 실시예에 따르면, 대상 중고차에 대한 시세 예측 요청은, 대상 중고차에 대한 미래 예측 조건을 포함하고, 대상 중고차에 대한 시세를 산출하는 단계는 대상 중고차가 만족하는, 복수의 조건 중 적어도 일부 조건을 선택하는 단계, 선택된 적어도 일부 조건에 대응하는 복수의 데이터 서브 세트의 각각에 대하여 선택된 예측 모델을 통해 대상 중고차에 대한 미래 시세를 산출하는 단계 및 산출된 대상 중고차에 대한 미래 시세의 각각에 가중치를 적용하여 앙상블 학습을 통해 대상 중고차에 대한 최종 미래 시세를 산출하는 단계를 포함한다.According to an embodiment, the request for price prediction for the target used car includes a future prediction condition for the target used car, and the step of calculating the market price for the target used car satisfies at least some of the plurality of conditions that the target used car satisfies. Selecting, calculating a future market price for a target used car through a predictive model selected for each of a plurality of data subsets corresponding to the selected at least some conditions, and applying a weight to each of the calculated future market prices for the target used car. And calculating the final future market price for the target used car through ensemble learning.

일 실시예에 따르면, 거래 완료된 복수의 차량에 대한 데이터 세트를 수신하는 단계는, 복수의 차량에 대한 데이터 세트 중, 동일한 매물을 가진 데이터 세트, 차량 가격 오류를 가진 데이터 세트 및 운행 거리 오류를 가진 데이터 세트 중 적어도 하나의 데이터 세트를 필터링하는 단계를 포함한다. According to an embodiment, the receiving of a data set for a plurality of vehicles for which the transaction has been completed includes, among data sets for a plurality of vehicles, a data set having the same property, a data set having a vehicle price error, and a driving distance error. Filtering at least one data set of the data set.

본 개시의 일 실시예에 전술한 중고차 시세 예측 방법을 컴퓨터에서 실행시키기 위하여 컴퓨터로 판독 가능한 기록 매체에 저장된 컴퓨터 프로그램이 제공된다. In an embodiment of the present disclosure, a computer program stored in a computer-readable recording medium is provided in order to execute the method for predicting used car market prices described above in a computer.

본 개시의 일 실시예에 따른 중고차 시세 예측 시스템은, 거래 완료된 복수의 차량에 대한 데이터 세트를 수신하고, 미리 결정된 조건에 따라 거래 완료된 복수의 차량에 대한 데이터 세트를 복수의 데이터 서브 세트로 분류하고, 분류된 복수의 데이터 서브 세트에 대해, 중고차의 시세를 추론하도록 구성된 복수의 예측 모델을 생성하도록 구성된 학습부, 및 대상 중고차에 대한 시세 예측 요청을 수신하고, 생성된 복수의 예측 모델 중에서, 대상 중고차에 대응하는 하나 이상의 예측 모델을 추출하고, 대상 중고차에 대한 시세 예측 요청에 응답하여, 추출된 하나 이상의 예측 모델을 기초로 대상 중고차에 대한 시세를 산출하도록 구성된 시세 산출부를 포함한다.A used car market price prediction system according to an embodiment of the present disclosure receives data sets for a plurality of vehicles that have been traded, and classifies data sets for a plurality of vehicles that have been traded into a plurality of data subsets according to a predetermined condition. , For a plurality of classified data subsets, a learning unit configured to generate a plurality of predictive models configured to infer a market price of a used car, and a target used car receiving a request for price prediction, and among the plurality of generated prediction models, a target And a market price calculation unit configured to extract one or more prediction models corresponding to the used car, and calculate a market price for the target used car based on the extracted one or more prediction models in response to a request for price prediction for the target used car.

본 개시의 일부 실시예에 따르면, 외부 장치로부터 수신된 거래 완료된 복수의 차량에 대한 데이터를 이용하여 복수의 예측 모델을 생성할 수 있고, 생성한 복수의 예측 모델을 이용하여 앙상블 모델(Ensemble model)을 생성함으로써, 중고차의 시세 변동을 보다 정확하게 예측하여 산출된 시세의 신뢰도를 향상시킬 수 있고, 이로 인해, 중고차를 구매하고자 하는 소비자들이 중고차에 대한 가격의 적정성을 판단할 수 있다.According to some embodiments of the present disclosure, a plurality of prediction models may be generated using data on a plurality of vehicles for which transactions have been completed received from an external device, and an ensemble model may be performed using the generated prediction models. By generating the used car market price fluctuations more accurately, the reliability of the calculated market price can be improved, and thus, consumers who want to purchase a used car can determine the appropriateness of the price for the used car.

본 개시의 일부 실시예에 따르면, 대상 중고차에 대한 시세 예측 요청에 응답하여, 복수의 예측 모델로부터 하나 이상의 예측 모델을 추출하여 대상 중고차에 대한 시세를 산출함으로써, 시세 변동에 연속성을 가지는 시세 예측 시스템을 제공할 수 있다.According to some embodiments of the present disclosure, in response to a market price prediction request for a target used car, a market price prediction system having continuity in market price fluctuations by extracting one or more prediction models from a plurality of prediction models to calculate a market price for a target used car Can provide.

본 개시의 일부 실시예에 따르면, 대상 중고차에 대한 시세 예측 요청에 응답하여, 추출된 각각의 예측 모델의 시세에 가중치를 적용하여 앙상블 학습을 통해 대상 중고차에 대한 최종 시세를 산출함으로써, 중고차의 시세 변동을 빠르게 감지할 수 있다.According to some embodiments of the present disclosure, in response to a market price prediction request for a target used car, a weight is applied to the price of each of the extracted prediction models to calculate the final market price for the target used car through ensemble learning, Fluctuations can be detected quickly.

본 개시의 효과는 이상에서 언급한 효과로 제한되지 않으며, 언급되지 않은 다른 효과들은 청구범위의 기재로부터 본 개시에 속하는 기술 분야에서 통상의 지식을 가진 자(이하, '통상의 기술자')에게 명확하게 이해될 수 있을 것이다.The effects of the present disclosure are not limited to the effects mentioned above, and other effects that are not mentioned are obvious to those of ordinary skill in the technical field belonging to the present disclosure from the description of the claims (hereinafter, ``ordinary technician''). Can be understood.

본 개시의 실시예들은, 이하 설명하는 첨부 도면들을 참조하여 설명될 것이며, 여기서 유사한 참조 번호는 유사한 요소들을 나타내지만, 이에 한정되지는 않는다.
도 1은 본 개시의 일 실시예에 따른 중고차 시세 예측 시스템이 외부 장치와 통신 가능하도록 연결된 구성을 나타내는 개요도이다.
도 2는 본 개시의 일 실시예에 따른 중고차 시세 예측 시스템의 상세 구성을 나타내는 블록도이다.
도 3은 본 개시의 일 실시예에 따른 거래 완료된 복수의 차량에 대한 데이터를 복수의 데이터 서브 세트로 분류하는 과정을 나타낸 예시도이다.
도 4는 본 개시의 일 실시예에 따른 중고차의 시세를 추론할 수 있는 복수의 예측 모델을 생성하는 과정을 나타낸 예시도이다.
도 5는 본 개시의 일 실시예에 따른 시세 예측 요청에 응답하여 대상 중고차에 대한 시세를 산출하는 예시를 나타낸 예시도이다.
도 6은 본 개시의 일 실시예에 따른 중고차 시세 예측 방법을 나타내는 흐름도이다.Embodiments of the present disclosure will be described with reference to the accompanying drawings described below, where like reference numerals denote like elements, but are not limited thereto.
1 is a schematic diagram illustrating a configuration in which a used car price prediction system according to an embodiment of the present disclosure is connected to enable communication with an external device.
2 is a block diagram showing a detailed configuration of a used car price prediction system according to an embodiment of the present disclosure.
3 is an exemplary diagram illustrating a process of classifying data on a plurality of vehicles for which transactions have been completed into a plurality of data subsets according to an embodiment of the present disclosure.
4 is an exemplary view showing a process of generating a plurality of prediction models capable of inferring a market price of a used car according to an embodiment of the present disclosure.
5 is an exemplary view showing an example of calculating a market price for a target used car in response to a market price prediction request according to an embodiment of the present disclosure.
6 is a flowchart illustrating a method for predicting a used car market price according to an embodiment of the present disclosure.

이하, 본 개시의 실시를 위한 구체적인 내용을 첨부된 도면을 참조하여 상세히 설명한다. 다만, 이하의 설명에서는 본 개시의 요지를 불필요하게 흐릴 우려가 있는 경우, 널리 알려진 기능이나 구성에 관한 구체적 설명은 생략하기로 한다.Hereinafter, with reference to the accompanying drawings, specific details for the implementation of the present disclosure will be described in detail. However, in the following description, when there is a possibility that the subject matter of the present disclosure may be unnecessarily obscure, detailed descriptions of widely known functions or configurations will be omitted.

첨부된 도면에서, 동일하거나 대응하는 구성요소에는 동일한 참조부호가 부여되어 있다. 또한, 이하의 실시예들의 설명에 있어서, 동일하거나 대응되는 구성요소를 중복하여 기술하는 것이 생략될 수 있다. 그러나 구성요소에 관한 기술이 생략되어도, 그러한 구성요소가 어떤 실시예에 포함되지 않는 것으로 의도되지는 않는다.In the accompanying drawings, the same or corresponding elements are assigned the same reference numerals. In addition, in the description of the following embodiments, overlapping descriptions of the same or corresponding components may be omitted. However, even if description of a component is omitted, it is not intended that such component is not included in any embodiment.

개시된 실시예의 이점 및 특징, 그리고 그것들을 달성하는 방법은 첨부되는 도면과 함께 후술되어 있는 실시예들을 참조하면 명확해질 것이다. 그러나 본 개시는 이하에서 개시되는 실시예들에 한정되는 것이 아니라 서로 다른 다양한 형태로 구현될 수 있으며, 단지 본 실시예들은 본 개시가 완전하도록 하고, 본 개시가 속하는 기술분야에서 통상의 지식을 가진 자에게 발명의 범주를 완전하게 알려주기 위해 제공되는 것일 뿐이다.Advantages and features of the disclosed embodiments, and a method of achieving them will become apparent with reference to the embodiments described below together with the accompanying drawings. However, the present disclosure is not limited to the embodiments disclosed below, but may be implemented in a variety of different forms, and only these embodiments make the present disclosure complete, and those skilled in the art to which the present disclosure pertains. It is provided only to fully inform the person of the scope of the invention.

본 명세서에서 사용되는 용어에 대해 간략히 설명하고, 개시된 실시예에 대해 구체적으로 설명하기로 한다. 본 명세서에서 사용되는 용어는 본 개시에서의 기능을 고려하면서 가능한 현재 널리 사용되는 일반적인 용어들을 선택하였으나, 이는 관련 분야에 종사하는 기술자의 의도 또는 판례, 새로운 기술의 출현 등에 따라 달라질 수 있다. 또한, 특정한 경우는 출원인이 임의로 선정한 용어도 있으며, 이 경우 해당되는 발명의 설명 부분에서 상세히 그 의미를 기재할 것이다. 따라서 본 개시에서 사용되는 용어는 단순한 용어의 명칭이 아닌, 그 용어가 가지는 의미와 본 개시의 전반에 걸친 내용을 토대로 정의되어야 한다.The terms used in the present specification will be briefly described, and the disclosed embodiments will be described in detail. The terms used in the present specification have selected general terms that are currently widely used as possible while considering functions in the present disclosure, but this may vary according to the intention or precedent of a technician engaged in a related field, the emergence of new technologies, and the like. In addition, in certain cases, there are terms arbitrarily selected by the applicant, and in this case, the meaning of the terms will be described in detail in the description of the corresponding invention. Therefore, the terms used in the present disclosure should be defined based on the meaning of the term and the contents of the present disclosure, not the name of a simple term.

본 명세서에서의 단수의 표현은 문맥상 명백하게 단수인 것으로 특정하지 않는 한, 복수의 표현을 포함한다. 또한, 복수의 표현은 문맥상 명백하게 복수인 것으로 특정하지 않는 한, 단수의 표현을 포함한다. 명세서 전체에서 어떤 부분이 어떤 구성요소를 "포함"한다고 할 때, 이는 특별히 반대되는 기재가 없는 한 다른 구성요소를 제외하는 것이 아니라 다른 구성요소를 더 포함할 수 있음을 의미한다.In this specification, expressions in the singular include plural expressions, unless the context clearly specifies that they are singular. In addition, plural expressions include expressions in the singular unless clearly specified as plural in context. When a part of the specification is said to "include" a certain component, it means that other components may be further included rather than excluding other components unless otherwise stated.

또한, 명세서에서 사용되는 '모듈' 또는 '부'라는 용어는 소프트웨어 또는 하드웨어 구성요소를 의미하며, '모듈' 또는 '부'는 어떤 역할들을 수행한다. 그렇지만 '모듈' 또는 '부'는 소프트웨어 또는 하드웨어에 한정되는 의미는 아니다. '모듈' 또는 '부'는 어드레싱할 수 있는 저장 매체에 있도록 구성될 수도 있고 하나 또는 그 이상의 프로세서들을 재생시키도록 구성될 수도 있다. 따라서, 일 예로서 '모듈' 또는 '부'는 소프트웨어 구성요소들, 객체지향 소프트웨어 구성요소들, 클래스 구성요소들 및 태스크 구성요소들과 같은 구성요소들과, 프로세스들, 함수들, 속성들, 프로시저들, 서브루틴들, 프로그램 코드의 세그먼트들, 드라이버들, 펌웨어, 마이크로 코드, 회로, 데이터, 데이터베이스, 데이터 구조들, 테이블들, 어레이들 또는 변수들 중 적어도 하나를 포함할 수 있다. 구성요소들과 '모듈' 또는 '부'들은 안에서 제공되는 기능은 더 작은 수의 구성요소들 및 '모듈' 또는 '부'들로 결합되거나 추가적인 구성요소들과 '모듈' 또는 '부'들로 더 분리될 수 있다.In addition, the terms "module" or "unit" used in the specification mean software or hardware components, and "module" or "unit" performs certain roles. However,'module' or'unit' is not meant to be limited to software or hardware. The'module' or'unit' may be configured to be in an addressable storage medium, or may be configured to reproduce one or more processors. Thus, as an example, the'module' or'unit' refers to components such as software components, object-oriented software components, class components and task components, processes, functions, properties, It may include at least one of procedures, subroutines, segments of program code, drivers, firmware, microcode, circuits, data, databases, data structures, tables, arrays, or variables. Components and the functions provided within the'module' or'unit' may be combined into a smaller number of components and'module' or'unit', or additional components and'module' or'unit'. Can be further separated.

본 개시의 일 실시예에 따르면 '모듈' 또는 '부'는 프로세서 및 메모리로 구현될 수 있다. '프로세서'는 범용 프로세서, 중앙 처리 장치(CPU), 마이크로프로세서, 디지털 신호 프로세서(DSP), 제어기, 마이크로제어기, 상태 머신 등을 포함하도록 넓게 해석되어야 한다. 몇몇 환경에서는, '프로세서'는 주문형 반도체(ASIC), 프로그램가능 로직 디바이스(PLD), 필드 프로그램가능 게이트 어레이(FPGA) 등을 지칭할 수도 있다. '프로세서'는, 예를 들어, DSP와 마이크로프로세서의 조합, 복수의 마이크로프로세서들의 조합, DSP 코어와 결합한 하나 이상의 마이크로프로세서들의 조합, 또는 임의의 다른 그러한 구성들의 조합과 같은 처리 디바이스들의 조합을 지칭할 수도 있다. 또한, '메모리'는 전자 정보를 저장 가능한 임의의 전자 컴포넌트를 포함하도록 넓게 해석되어야 한다. '메모리'는 임의 액세스 메모리(RAM), 판독-전용 메모리(ROM), 비-휘발성 임의 액세스 메모리(NVRAM), 프로그램가능 판독-전용 메모리(PROM), 소거-프로그램가능 판독 전용 메모리(EPROM), 전기적으로 소거가능 PROM(EEPROM), 플래쉬 메모리, 자기 또는 광학 데이터 저장장치, 레지스터들 등과 같은 프로세서-판독가능 매체의 다양한 유형들을 지칭할 수도 있다. 프로세서가 메모리로부터 정보를 판독하고/하거나 메모리에 정보를 기록할 수 있다면 메모리는 프로세서와 전자 통신 상태에 있다고 불린다. 프로세서에 집적된 메모리는 프로세서와 전자 통신 상태에 있다. According to an embodiment of the present disclosure, a'module' or'unit' may be implemented with a processor and a memory. 'Processor' should be interpreted broadly to include general purpose processors, central processing units (CPUs), microprocessors, digital signal processors (DSPs), controllers, microcontrollers, state machines, etc. In some circumstances, a'processor' may refer to an application specific semiconductor (ASIC), programmable logic device (PLD), field programmable gate array (FPGA), and the like. 'Processor' refers to a combination of processing devices such as, for example, a combination of a DSP and a microprocessor, a combination of a plurality of microprocessors, a combination of one or more microprocessors in combination with a DSP core, or any other such combination of configurations. You may. In addition, the'memory' should be broadly interpreted to include any electronic component capable of storing electronic information. 'Memory' refers to random access memory (RAM), read-only memory (ROM), non-volatile random access memory (NVRAM), programmable read-only memory (PROM), erase-programmable read-only memory (EPROM), It may refer to various types of processor-readable media such as electrically erasable PROM (EEPROM), flash memory, magnetic or optical data storage, registers, and the like. The memory is said to be in electronic communication with the processor if it can read information from and/or write information to the memory. The memory integrated in the processor is in electronic communication with the processor.

본 개시에서, '시스템'은 서버 장치와 클라우드 장치 중 적어도 하나의 장치를 포함할 수 있으나, 이에 한정되는 것은 아니다. 예를 들어, 시스템은 하나 이상의 서버 장치로 구성될 수 있다. 다른 예로서, 시스템은 하나 이상의 클라우드 장치로 구성될 수 있다. 또 다른 예로서, 시스템은 서버 장치와 클라우드 장치가 함께 구성되어 동작될 수 있다.In the present disclosure, the'system' may include at least one of a server device and a cloud device, but is not limited thereto. For example, the system may consist of one or more server devices. As another example, the system may consist of one or more cloud devices. As another example, in the system, a server device and a cloud device may be configured and operated together.

본 개시에서, '예측 모델'은 '예측 모델 후보'를 지칭할 수 있으며, 이와 반대로 '예측 모델 후보'는 '예측 모델'을 지칭할 수 있다. In the present disclosure, a'prediction model' may refer to a'prediction model candidate', and conversely, a'prediction model candidate' may refer to a'prediction model'.

본 개시에서, '예측 모델의 하이퍼파라미터' 또는 '예측 모델 후보의 하이퍼파라미터'는 예측 모델 또는 예측 모델 후보가 나타내는 트리(tree)를 나타내거나 특징화하는 파라미터를 지칭할 수 있다. 예를 들어, 파라미터는 트리의 깊이(depth), 트리의 개수의 평균, 리프(leaf) 노드의 개수, 트리의 브랜치(branch) 기준 등을 포함할 수 있으나, 이에 한정되지 않는다. In the present disclosure, the'hyperparameter of the prediction model' or the'hyperparameter of the prediction model candidate' may refer to a parameter representing or characterizing a tree represented by a prediction model or a prediction model candidate. For example, the parameters may include a depth of a tree, an average of the number of trees, the number of leaf nodes, and a branch criterion of a tree, but are not limited thereto.

도 1은 본 개시의 일 실시예에 따른 중고차 시세 예측 시스템(130)이 외부 장치(110_1 및 110_2)와 통신 가능하도록 연결된 구성을 나타내는 개요도이다. 중고차 시세 예측 시스템(130)은 통신 네트워크(120)를 통해 외부 장치와 통신 가능하도록 구성될 수 있다. 중고차 시세 예측 시스템(130)은 외부 장치와 통신 네트워크(120)를 통해 통신하여 거래 완료된 복수의 차량에 대한 데이터를 수신할 수 있다. 여기서, 거래 완료된 복수의 차량에 대한 데이터는 거래 완료된 복수의 차량에 대한 데이터 세트를 포함할 수 있다. 거래 완료된 복수의 차량에 대한 데이터 세트는 신차 가격 데이터 및 중고차에 대한 성능 점검 데이터를 포함할 수 있으나, 이에 한정되지 않는다. 예를 들어, 중고차 시세 예측 시스템(130)은 하나 이상의 서버 장치 및/또는 클라우드 장치에 포함될 수 있다. 또 다른 예로서, 중고차 시세 예측 시스템(130)은 사용자 단말에 포함될 수 있다.1 is a schematic diagram illustrating a configuration in which a used car price prediction system 130 according to an embodiment of the present disclosure is connected to enable communication with external devices 110_1 and 110_2. The used car price prediction system 130 may be configured to communicate with an external device through the communication network 120. The used car price prediction system 130 may communicate with an external device through the communication network 120 to receive data on a plurality of vehicles for which transactions have been completed. Here, the data on the plurality of vehicles for which the transaction has been completed may include a data set for the plurality of vehicles for which the transaction has been completed. The data set for the plurality of vehicles for which the transaction has been completed may include new car price data and performance check data for used cars, but is not limited thereto. For example, the used car price prediction system 130 may be included in one or more server devices and/or cloud devices. As another example, the used car price prediction system 130 may be included in the user terminal.

일 실시예에 따르면, 중고차 시세 예측 시스템(130)은 통신 네트워크(120)를 통해 신차 가격 DB(110_1) 및 성능 검사 DB(110_2)와 같이 외부 장치로부터 주기적으로 또는 비주기적으로 거래 완료된 복수의 차량에 대한 데이터 세트를 수신하여 저장부에 저장할 수 있다. 도 1에서 신차 가격 DB(110_1) 및 성능 검사 DB(110_2)가 외부 장치의 예로서 도시되었으나, 이에 한정되지 않으며, 중고차 시세 예측 시스템(130)과 유선 및/또는 무선 통신이 가능하고 거래 완료된 복수의 차량에 대한 데이터 세트를 수신할 수 있는 임의의 컴퓨팅 장치일 수 있다.According to an embodiment, the used car price prediction system 130 is a plurality of vehicles that have been periodically or aperiodically traded from an external device such as a new car price DB 110_1 and a performance test DB 110_2 through the communication network 120 The data set for can be received and stored in a storage unit. In FIG. 1, the new car price DB 110_1 and the performance test DB 110_2 are shown as examples of external devices, but are not limited thereto, and wired and/or wireless communication with the used car price prediction system 130 is possible, and a plurality of transactions have been completed. It may be any computing device capable of receiving a data set for a vehicle of a vehicle.

통신 네트워크(120)는, 중고차 시세 예측 시스템(130)과 외부 장치 사이의 통신이 가능하도록 구성될 수 있다. 통신 네트워크(120)는 설치 환경에 따라, 예를 들어, 이더넷(Ethernet), 유선 홈 네트워크(Power Line Communication), 전화선 통신 장치 및 RS-serial 통신 등의 유선 네트워크, 이동통신망, WLAN(Wireless LAN), Wi-Fi, Bluetooth 및 ZigBee 등과 같은 무선 네트워크 또는 그 조합으로 구성될 수 있다.The communication network 120 may be configured to enable communication between the used car price prediction system 130 and an external device. Depending on the installation environment, the communication network 120 is, for example, a wired network such as Ethernet, a wired home network (Power Line Communication), a telephone line communication device, and RS-serial communication, a mobile communication network, or a wireless LAN (WLAN). , Wi-Fi, Bluetooth, and wireless networks such as ZigBee, or a combination thereof.

일 실시예에 따르면, 중고차 시세 예측 시스템(130)은 중고차 시세 예측과 관련된 컴퓨터 실행 가능한 프로그램 및 데이터를 저장, 제공 및 실행할 수 있는 하나 이상의 서버 장치 및/또는 데이터베이스, 또는 클라우드 컴퓨팅 서비스 기반의 하나 이상의 분산 컴퓨팅 장치 및/또는 분산 데이터베이스를 포함할 수 있다. 예를 들어, 중고차 시세 예측 시스템(130)은, 유선 또는 무선 네트워크를 통해 다른 장치와 통신할 수 있는 컴퓨팅 장치로서, CPU(central processing unit), GPU(graphic processing unit), DSP(digital signal processor) 등과 같은 처리 장치를 이용하여 연산 동작을 수행할 수 있는 컴퓨팅 장치일 수 있으나, 이에 한정되는 것은 아니다.According to an embodiment, the used car market price prediction system 130 may store, provide, and execute computer-executable programs and data related to used car price prediction, and one or more server devices and/or databases, or one or more cloud computing services-based Distributed computing devices and/or distributed databases. For example, the used car price prediction system 130 is a computing device capable of communicating with other devices through a wired or wireless network, and includes a central processing unit (CPU), a graphic processing unit (GPU), and a digital signal processor (DSP). It may be a computing device capable of performing an operation using a processing device such as, but is not limited thereto.

중고차 시세 예측 시스템(130)은 외부 장치로부터 수신된 거래 완료된 복수의 차량에 대한 데이터 세트를 복수의 데이터 서브 세트로 분류하고, 복수의 데이터 서브 세트에 대해 중고차의 시세를 추론하는 복수의 예측 모델을 생성할 수 있다. 중고차 시세 예측 시스템(130)은 대상 중고차에 대한 시세 예측 요청을 수신하고, 복수의 예측 모델 중 하나 이상의 예측 모델을 추출하여 중고차에 대한 시세를 산출함으로써, 중고차의 시세 변동을 보다 정확하게 예측할 수 있다. 여기서, 대상 중고차에 대한 시세 예측 요청은 하나 이상의 사용자 단말 및/또는 중고차 판매 시스템으로부터 수신될 수 있으며, 산출된 중고차에 대한 시세는 시세 예측 요청을 한 사용자 단말 및/또는 중고차 판매 시스템에 통신 네트워크(120)를 통해 제공될 수 있다. 외부 장치로부터 제공받은 데이터를 이용하여 중고차 시세를 예측하는 과정에 대해서는 이하에서, 도 2 내지 도 6를 참조하여 상세히 설명한다.The used car market price prediction system 130 classifies data sets for a plurality of vehicles that have been transaction-completed received from an external device into a plurality of data subsets, and provides a plurality of prediction models for inferring the price of used cars for the plurality of data subsets. Can be generated. The used car market price prediction system 130 may more accurately predict the market price fluctuation of the used car by receiving a market price prediction request for a target used car, extracting one or more prediction models from among a plurality of prediction models and calculating the market price for the used car. Here, the market price prediction request for the target used car may be received from one or more user terminals and/or the used car sales system, and the calculated market price for the used car is transmitted to the user terminal and/or the used car sales system that requested the price prediction. 120) can be provided. A process of predicting the market price of a used car using data provided from an external device will be described in detail below with reference to FIGS. 2 to 6.

도 2는 본 개시의 일 실시예에 따른 중고차 시세 예측 시스템(130)의 상세 구성을 나타내는 블록도이다. 이하에서는, 중고차 시세 예측 시스템(130)이 거래 완료된 복수의 차량에 대한 데이터 세트를 이용하여 앙상블 학습(Ensemble Learning)을 통해 예측 모델을 생성함으로써, 중고차의 시세 변동을 보다 정확하게 예측하는 구체적인 방식에 대해서 상세히 설명된다. 일 실시예에서, 중고차 시세 예측 시스템(130)은 도 2에 도시된 바와 같이, 통신부(210), 저장부(220) 및 프로세서(230)를 포함하도록 구성될 수 있다.2 is a block diagram showing a detailed configuration of a used car price prediction system 130 according to an embodiment of the present disclosure. Hereinafter, a specific method of more accurately predicting market price fluctuations of used cars by generating a prediction model through ensemble learning using data sets of a plurality of vehicles for which the used car market price prediction system 130 has completed transactions will be described. It will be described in detail. In an embodiment, the used car price prediction system 130 may be configured to include a communication unit 210, a storage unit 220 and a processor 230, as shown in FIG. 2.

통신부(210)는, 통신 네트워크를 통해 외부 장치로부터 거래 완료된 복수의 차량에 대한 데이터 세트를 수신하도록 구성될 수 있다. 일 실시예에 따르면, 통신부(210)는 미리 설정된 주기 마다 중고차 중 거래 완료된 차량에 대한 데이터 세트를 외부장치로부터 수신할 수 있다. 여기서, 거래 완료된 차량에 대한 데이터 세트는 신차 가격, 중고차 성능 점검, 차량의 연료의 종류, 차량 운행거리, 차량 운행일, 사고 여부 등과 관련된 데이터를 포함할 수 있으나, 이에 한정되지 않으며, 중고차 거래 가격 데이터(예를 들어, 차량 세부 모델에 따른 실제 거래 가격 데이터) 등 거래 완료된 중고차와 관련된 다양한 정보를 더 포함할 수 있다. 또한, 통신부(210)는 수신된 데이터를 프로세서(230)로 제공할 수 있고, 프로세서(230)에 의해 예측된 중고차의 시세에 대한 정보 및/또는 그러한 정보를 분석 또는 가공한 정보를 통신 네트워크를 통해 송신하도록 구성될 수 있다. 예를 들어, 프로세서(230)는 대상 중고차에 대한 시세 예측 요청에 응답하여, 이러한 예측 요청을 송신한 사용자 단말 또는 시스템에 산출된 중고차에 대한 시세를 통신부(210)를 통해 송신할 수 있다. The communication unit 210 may be configured to receive data sets for a plurality of vehicles for which transactions have been completed from an external device through a communication network. According to an embodiment, the communication unit 210 may receive a data set for a vehicle in which a transaction has been completed among used vehicles from an external device every preset period. Here, the data set for the vehicle for which the transaction has been completed may include data related to the price of a new vehicle, a performance check of a used vehicle, the type of fuel of the vehicle, the vehicle driving distance, the vehicle operation date, and whether an accident has occurred, but is not limited thereto. It may further include various information related to the used car that has been traded, such as data (eg, actual transaction price data according to a detailed vehicle model). In addition, the communication unit 210 may provide the received data to the processor 230, and transmit information on the market price of the used car predicted by the processor 230 and/or the information analyzed or processed through the communication network. Can be configured to transmit via. For example, the processor 230 may transmit a quote for a used car calculated to a user terminal or a system that has transmitted the prediction request through the communication unit 210 in response to a price prediction request for a target used car.

프로세서(230)는 CPU(central processing unit), GPU(graphic processing unit), DSP(digital signal processor), FPGA (Field Programmable Gate Array), ASIC (Application Specific Integrated Circuit) 중 적어도 하나를 포함하여 임의의 연산 동작을 수행하고, 외부 장치로부터 수신된 데이터 세트 및/또는 수신된 데이터 세트를 분석 또는 가공한 데이터 세트를 관리, 처리 및/또는 저장부(220)에 저장하도록 구성될 수 있다. 일 실시예에서, 프로세서(230)는 통신부(210), 학습부 및 시세 산출부(234)를 포함하도록 구성될 수 있다.The processor 230 includes at least one of a central processing unit (CPU), a graphic processing unit (GPU), a digital signal processor (DSP), a field programmable gate array (FPGA), and an application specific integrated circuit (ASIC). It may be configured to perform an operation and manage, process, and/or store a data set received from an external device and/or a data set obtained by analyzing or processing the received data set in the storage unit 220. In one embodiment, the processor 230 may be configured to include a communication unit 210, a learning unit and a price calculation unit 234.

프로세서(230)는 외부 장치로부터 수신한 거래 완료된 복수의 차량에 대한 데이터 세트를 저장부(220)에 데이터베이스 형태로 저장할 수 있다. 일 실시예에 따르면, 프로세서(230)는 거래 완료된 복수의 차량에 대한 데이터 세트를 필터링하고, 필터링된 복수의 차량에 대한 데이터 세트를 저장부(220)에 데이터베이스 형태로 저장할 수 있다. 예를 들어, 프로세서(230)는 복수의 차량에 대한 데이터 세트 중에서, 동일한 매물을 가진 데이터 세트, 차량 가격 오류를 가진 데이터 세트 및 운행 거리 오류를 가진 데이터 세트 중 적어도 하나의 데이터 세트를 필터링하도록 구성될 수 있다. 도 2에서는, 저장부(220)가 중고차 시세 예측 시스템(130)에 포함되도록 도시되어 있으나, 이에 한정되지 않으며, 저장부(220)는 중고차 시세 예측 시스템(130)이 유선 및/또는 무선으로 통신가능한 외부 장치에 포함될 수 있다.The processor 230 may store a data set of a plurality of vehicles for which transactions have been completed received from an external device in the form of a database in the storage unit 220. According to an embodiment, the processor 230 may filter data sets of a plurality of vehicles for which transactions have been completed, and store the filtered data sets of a plurality of vehicles in the storage unit 220 in the form of a database. For example, the processor 230 is configured to filter at least one data set of a data set having the same property, a data set having a vehicle price error, and a data set having a driving distance error, among data sets for a plurality of vehicles. Can be. In FIG. 2, the storage unit 220 is shown to be included in the used car price prediction system 130, but is not limited thereto, and the storage unit 220 communicates with the used car price prediction system 130 by wire and/or wirelessly. It can be included in possible external devices.

학습부(232)는 미리 결정된 조건에 따라 거래 완료된 복수의 차량에 대한 데이터 세트를 복수의 데이터 서브 세트로 분류할 수 있다. 여기서, 미리 결정된 조건은 복수의 조건을 포함할 수 있으며, 분류된 복수의 데이터 서브 세트의 각각은 복수의 조건의 각각에 연관되어 저장될 수 있다. 일 실시예에 따르면, 복수의 조건은 연식에 따른 차량 등급 조건, 세부 모델 조건, 모델 조건, 제조사 조건, 모든 매물 조건을 포함할 수 있다. The learning unit 232 may classify data sets for a plurality of vehicles for which transactions have been completed according to a predetermined condition into a plurality of data subsets. Here, the predetermined condition may include a plurality of conditions, and each of the plurality of classified data subsets may be stored in association with each of the plurality of conditions. According to an embodiment, the plurality of conditions may include vehicle grade conditions, detailed model conditions, model conditions, manufacturer conditions, and all sale conditions according to a year.

학습부(232)는 거래 완료된 복수의 차량에 대한 데이터 세트의 각각이 복수의 조건 중 적어도 일부 조건을 만족하는 경우, 데이터 세트의 각각을 만족된 적어도 일부 조건의 각각에 대응하는 데이터 서브 세트로서 분류하도록 구성될 수 있다. 또한, 복수의 데이터 서브 세트 각각은 각각의 데이터 서브 세트를 통해 생성된 복수의 예측 모델 후보를 학습시키기 위한 학습 데이터 서브 세트 및 학습된 복수의 예측 모델 후보를 가하기 위한 테스트 데이터 서브 세트를 포함할 수 있다. 일 실시예에 따르면, 학습부(232)는 분류된 각각의 데이터 서브 세트를 학습 데이터 서브 세트와 테스트 데이터 서브 세트로 분류할 수 있다. 예를 들어, 학습부(232)는 특정 조건에 대응하는 데이터 서브 세트 중 적어도 일부를 샘플링하여 각각의 데이터 서브 세트에 대한 학습 데이터로 분류할 수 있고, 각각의 데이터 서브 세트 중 샘플링되지 않은 나머지 데이터를 테스트 데이터로 분류할 수 있다. 여기서, 학습부(232)는 특정 조건에 대응하는 데이터 서브 세트에 대하여, 여러 번 샘플링하여 복수의 학습 데이터 서브 세트 및 샘플링되지 않은 테스트 데이터 서브 세트를 분류하여 저장할 수 있다. The learning unit 232 classifies each of the data sets as a data subset corresponding to each of the satisfied at least some conditions when each of the data sets for the plurality of vehicles for which the transaction has been completed satisfies at least some of the plurality of conditions. Can be configured to In addition, each of the plurality of data subsets may include a subset of training data for training a plurality of prediction model candidates generated through each subset of data and a subset of test data for applying a plurality of learned prediction model candidates. have. According to an embodiment, the learning unit 232 may classify each classified data subset into a training data subset and a test data subset. For example, the learning unit 232 may sample at least a portion of the data subset corresponding to a specific condition and classify it as training data for each data subset, and the remaining unsampled data among each data subset Can be classified as test data. Here, the learning unit 232 may sample a subset of data corresponding to a specific condition several times to classify and store a plurality of subsets of training data and a subset of unsampled test data.

여기서, 학습 데이터는 각각의 데이터 서브 세트로부터 생성된 복수의 예측 모델 각각의 하이퍼파라미터를 추출하기 위한 검증 데이터로서도 사용될 수 있다.Here, the training data may also be used as verification data for extracting hyperparameters of each of a plurality of prediction models generated from each data subset.

학습부(232)는 예측 모델 생성부(240) 및 예측 모델 검증부(242)를 포함하도록 구성될 수 있다. 예측 모델 생성부(240)는 분류된 복수의 데이터 서브 세트에 대해, 중고차의 시세를 추론하도록 구성된 복수의 예측 모델을 생성하도록 구성될 수 있다. 일 실시예에 따르면, 분류된 복수의 데이터 서브 세트의 각각에 대해 머신러닝 알고리즘을 통해 중고차의 시세를 추론하도록 구성된 복수의 예측 모델을 생성도록 구성될 수 있다. 또한, 생성된 복수의 예측 모은 예측 모델 생성부(240)에 의해 학습될 수 있다. 이때, 머신러닝 알고리즘은 앙상블 학습(Ensemble Learning)일 수 있다. 예를 들어, 예측 모델 생성부(240)는 앙상블 학습 알고리즘을 이용하여 복수의 조건의 각각에 대응하는 데이터 서브 세트에 대하여 복수의 예측 모델을 생성하고, 생성된 예측 모델을 이용하여 앙상블 모델을 생성할 수 있다. 여기서, 앙상블 학습 알고리즘은, 배깅(Bagging) 알고리즘, 부스팅(Boosting) 알고리즘 등을 포함할 수 있다. 예를 들어, 부스팅 알고리즘은 XGBoost(Extreme Gradient Boosting) 알고리즘을 포함할 수 있다. 이러한 XGBoost 알고리즘을 이용하여 앙상블 모델을 생성하는 과정에 대해서는 이하에서, 도 4를 참조하여 상세히 설명한다.The learning unit 232 may be configured to include a prediction model generator 240 and a prediction model verification unit 242. The prediction model generation unit 240 may be configured to generate a plurality of prediction models configured to infer a market price of a used car for a plurality of classified data subsets. According to an embodiment, for each of a plurality of classified data subsets, it may be configured to generate a plurality of predictive models configured to infer a market price of a used car through a machine learning algorithm. In addition, a plurality of generated predictions may be trained by the prediction model generator 240. In this case, the machine learning algorithm may be ensemble learning. For example, the prediction model generation unit 240 generates a plurality of prediction models for a subset of data corresponding to each of a plurality of conditions using an ensemble learning algorithm, and generates an ensemble model using the generated prediction model. can do. Here, the ensemble learning algorithm may include a bagging algorithm, a boosting algorithm, and the like. For example, the boosting algorithm may include an XGBoost (Extreme Gradient Boosting) algorithm. A process of generating an ensemble model using the XGBoost algorithm will be described in detail below with reference to FIG. 4.

일 실시예에 따르면, 예측 모델 생성부(240)는 분류된 복수의 학습 데이터 서브 세트의 각각으로부터 특징(feature)으로 추출하여 특징 벡터(feature vector)를 생성하고, 생성된 각각의 특징 벡터를 입력값으로 이용하여 복수의 예측 모델을 생성하고 학습시킬 수 있다. 이때, 연료의 종류, 차량 운행거리, 차량 운행일, 신차 가격 및 사고 여부 등의 값이 특징(feature)으로서 추출될 수 있다.According to an embodiment, the prediction model generator 240 generates a feature vector by extracting a feature from each of a plurality of classified training data subsets, and inputs each of the generated feature vectors. Multiple prediction models can be generated and trained using values. In this case, values such as the type of fuel, vehicle driving distance, vehicle driving date, new vehicle price, and whether an accident has occurred may be extracted as a feature.

예측 모델 검증부(242)는 학습 데이터 서브 세트를 이용하여 생성된 각각의 예측 모델 후보의 하이퍼파라미터를 추출하고, 테스트 데이터 서브 세트를 이용하여 추출된 각각의 하이퍼파라미터가 적용된 각각의 예측 모델 후보를 평가하도록 구성될 수 있다. 일 실시예에 따르면, 예측 모델 검증부(242)는 복수의 학습 데이터 서브 세트를 이용하여 복수의 데이터 서브 세트의 각각에 대응하는 복수의 예측 모델 후보의 각각이 최소한의 오차를 가지도록 각 예측 모델 후보의 하이퍼파라미터를 추출하도록 구성될 수 있다. 이에 더하여, 예측 모델 검증부(242)는 복수의 데이터 서브 세트의 각각에 대응하는 복수의 예측 모델 후보 중에서, 테스트 데이터 서브 세트를 이용하여 검증 시, 오차가 가장 작은 예측 모델을 선택하도록 구성될 수 있다. 예를 들어, 예측 모델 검증부(242)는 Grid Search를 이용하여 각각의 예측 모델 후보의 하이퍼파라미터를 추출할 수 있다. 대안적으로, 예측 모델 검증부(242)는 Random Search를 이용하여 각각의 예측 모델 후보의 하이퍼파라미터를 추출할 수 있다.The prediction model verification unit 242 extracts hyperparameters of each prediction model candidate generated using a subset of the training data, and determines each prediction model candidate to which each hyperparameter extracted using the subset of test data is applied. It can be configured to evaluate. According to an embodiment, the prediction model verification unit 242 uses a plurality of training data subsets so that each prediction model candidate corresponding to each of the plurality of data subsets has a minimum error. It can be configured to extract candidate hyperparameters. In addition, the prediction model verification unit 242 may be configured to select a prediction model having the smallest error when verifying using a test data subset from among a plurality of prediction model candidates corresponding to each of the plurality of data subsets. have. For example, the prediction model verification unit 242 may extract hyperparameters of each prediction model candidate using Grid Search. Alternatively, the prediction model verification unit 242 may extract hyperparameters of each prediction model candidate using Random Search.

시세 산출부(234)는 통신부(210)를 대상 중고차에 대한 시세 예측 요청을 수신하고, 생성된 복수의 예측 모델 중에서 대상 중고차에 대응하는 하나 이상의 예측 모델을 추출하도록 구성될 수 있다. 대상 중고차에 대한 시세 예측 요청에 응답하여, 시세 산출부(234)는 추출된 하나 이상의 예측 모델을 기초로 대상 중고차에 대한 시세를 산출할 수 있다. 여기서, 대상 중고차에 대한 시세 예측 요청은 대상 중고차를 나타내거나 특징화할 수 있는 임의의 정보 또는 조건을 포함할 수 있으며, 예를 들어, 차량 등급, 차량 연식, 차량 모델, 차량 세부 모델, 제조사 중 적어도 하나 정보 또는 조건을 포함할 수 있다. The price calculation unit 234 may be configured to receive a price prediction request for a target used car from the communication unit 210 and extract one or more prediction models corresponding to the target used car from among a plurality of generated prediction models. In response to a market price prediction request for the target used car, the market price calculation unit 234 may calculate a market price for the target used car based on the extracted one or more prediction models. Here, the market price prediction request for the target used car may include any information or conditions that can indicate or characterize the target used car, and for example, at least one of vehicle class, vehicle year, vehicle model, vehicle detailed model, and manufacturer. One can contain information or conditions.

일 실시예에 따르면, 시세 산출부(234)는 대상 중고차가 만족하는, 복수의 조건 중 적어도 일부 조건을 선택하고, 선택된 적어도 일부 조건에 대응하는 복수의 데이터 서브 세트의 각각에 대해 선택된 예측 모델을 통해 대상 중고차에 대한 시세를 산출하도록 구성될 수 있다. 그리고 나서, 시세 산출부(234)는 산출된 대상 중고차에 대한 시세의 각각에 가중치를 적용하여 앙상블 학습을 통해 대상 중고차에 대한 최종 시세를 산출하도록 구성될 수 있다. 예를 들어, 앙상블 학습을 통해 대상 중고차를 나타내거나 특징화하는 조건 중에서, 더욱 구체적이고 세부적인 조건에 대응하는 예측 모델에 높은 가중치가 적용될 수 있다. According to an embodiment, the price calculation unit 234 selects at least some conditions from among a plurality of conditions that the target used car satisfies, and generates a prediction model selected for each of a plurality of data subsets corresponding to the selected at least some conditions. It can be configured to calculate the market price for the target used car. Then, the market price calculation unit 234 may be configured to calculate a final market price for the target used car through ensemble learning by applying a weight to each of the calculated market prices for the target used car. For example, among conditions representing or characterizing a target used car through ensemble learning, a high weight may be applied to a prediction model corresponding to a more specific and detailed condition.

대안적으로, 또는 추가적으로, 시세 산출부(234)는 대상 중고차에 대한 시세 예측 요청에 응답하여, 추출된 하나 이상의 예측 모델을 기초로 대상 중고차에 대한 미래 시세를 산출하도록 구성될 수 있다. 여기서, 대상 중고차에 대한 시세 예측 요청은 대상 중고차를 나타내거나 특징화할 수 있는 임의의 정보 또는 조건뿐만 아니라, 대상 중고차에 대한 미래 예측 조건을 더 포함할 수 있다. 여기서, 미래 예측 조건은 미래의 어느 시점(예: 6개월 후, 1년 후, 2년 후 등)에 대상 중고차를 나타내거나 특징화할 수 있는 임의의 정보 또는 조건을 나타낼 수 있다. 예를 들어, 미래 예측 조건은 1년 후 예상 운행 거리, 1년 후 예상 사고율, 1년 후 대상 중고차의 예상 인기도 등과 같이 시간적 순차와 관련된 대상 중고차에 대한 임의의 정보 또는 조건일 수 있다. Alternatively, or additionally, the market price calculation unit 234 may be configured to calculate a future market price for the target used car based on one or more extracted prediction models in response to a request for price prediction for the target used car. Here, the market price prediction request for the target used car may further include not only arbitrary information or conditions capable of indicating or characterizing the target used car, but also a future prediction condition for the target used car. Here, the future prediction condition may indicate any information or conditions capable of representing or characterizing the target used car at a certain point in the future (eg, 6 months, 1 year, 2 years, etc.). For example, the future prediction conditions may be arbitrary information or conditions on a target used car related to a temporal sequence, such as a predicted driving distance after one year, a predicted accident rate after one year, and an expected popularity of a target used car after one year.

일 실시예에 따르면, 미래 예측 조건은 대상 중고차에 현재 조건(예를 들어, 현재까지의 총 운행 거리, 총 운행 기간, 총 사고 발생 수 등)을 이용하여 미래 예측 조건을 산출할 수 있다. 예를 들어, 대상 중고차의 총 운행 거리와 운행 기간을 이용하여 하루 평균 운행 거리를 산출할 수 있고, 산출된 하루 평균 거리를 이용하여 1년 후 예상 운행 거리를 산출하여 1년 후 예상 운행 거리를 산출할 수 있다. 다른 예로서, 대상 중고차의 운전자들이 특정 기간에 운행했던 운행 습관(예: 사고율)을 기초로 미래의 어느 시점 후의 사고율이 산출될 수 있으며, 산출된 사고율이 미래 예측 조건에 포함될 수 있다. 또 다른 예로서, 미래의 어느 시점 후의 대상 중고차의 인기도는 인기도 산출에 대한 미리 결정된 수식에 기초하여 산출되어, 산출된 인기도는 미래 예측 조건에 포함될 수 있다. According to an embodiment, the future prediction condition may calculate a future prediction condition by using the current condition (eg, total driving distance up to now, total driving period, total number of accidents, etc.) for a target used car. For example, the average daily driving distance can be calculated by using the total driving distance and driving period of the target used car, and the estimated driving distance after one year is calculated using the calculated average daily distance to calculate the estimated driving distance after one year. Can be calculated. As another example, an accident rate after a certain point in the future may be calculated based on a driving habit (eg, an accident rate) that drivers of a target used car have driven in a specific period, and the calculated accident rate may be included in a future prediction condition. As another example, the popularity of the target used car after a certain point in the future is calculated based on a predetermined equation for calculating the popularity, and the calculated popularity may be included in a future prediction condition.

일 실시예에 따르면, 시세 산출부(234)는 미래 예측 조건을 포함하는 대상 중고차에 대한 시세 예측 요청에 응답하여, 대상 중고차가 만족하는, 복수의 조건 중 적어도 일부 조건을 선택하고, 선택된 적어도 일부 조건에 대응하는 복수의 데이터 서브 세트의 각각에 대해 선택된 예측 모델을 통해 대상 중고차에 대한 미래 시세를 산출할 수 있다. 그리고 나서, 시세 산출부(234)는 산출된 대상 중고차에 대한 미래 시세의 각각에 가중치를 적용하여 앙상블 학습을 통해 대상 중고차에 대한 최종 미래 시세를 산출하도록 구성될 수 있다. 예를 들어, 앙상블 학습을 통해 대상 중고차를 나타내거나 특징화하는 조건 중에서, 시간적 순차와 관련된 조건에 대응하는 예측 모델에 높은 가중치가 적용될 수 있다. 시세 산출부(234)가 대상 중고차에 대한 시세 예측 요청에 응답하여, 대상 중고차에 대한 시세를 산출하는 과정에 대해서는 이하에서, 도 5를 참조하여 상세히 설명한다.According to an embodiment, the price calculation unit 234 selects at least some of a plurality of conditions that the target used car satisfies in response to a market price prediction request for a target used car including a future prediction condition, and selects at least some A future market price for a target used car may be calculated through a prediction model selected for each of a plurality of data subsets corresponding to the condition. Then, the market price calculation unit 234 may be configured to calculate a final future market price for the target used car through ensemble learning by applying a weight to each of the calculated future market prices for the target used car. For example, among conditions representing or characterizing a target used car through ensemble learning, a high weight may be applied to a prediction model corresponding to a condition related to a temporal sequence. A process of calculating the market price for the target used car in response to the market price prediction request for the target used car by the market price calculation unit 234 will be described in detail below with reference to FIG. 5.

도 3은 본 개시의 일 실시예에 따른 거래 완료된 복수의 차량에 대한 데이터 세트(310)를 복수의 데이터 서브 세트(312, 314, 316, 318. 320, 322)로 분류하는 과정을 나타낸 예시도이다. 일 실시예에 따르면, 학습부(232)는 거래 완료된 복수의 차량에 대한 데이터 세트를 연식에 따른 차량 등급 데이터 서브 세트(312), 차량 등급 데이터 서브 세트(314), 세부 모델 데이터 서브 세트(316), 모델 데이터 서브 세트(318), 제조사 데이터 서브 세트(320), 모든 매물 데이터 서브 세트(322)로 분류할 수 있다. 3 is an exemplary view showing a process of classifying a data set 310 for a plurality of vehicles for which transactions have been completed into a plurality of data subsets 312, 314, 316, 318, 320, 322 according to an embodiment of the present disclosure. to be. According to an embodiment, the learning unit 232 determines a data set for a plurality of vehicles for which transactions have been completed, according to a vehicle grade data subset 312, a vehicle grade data subset 314, and a detailed model data subset 316 according to the year. ), model data subset 318, manufacturer data subset 320, and all listing data subset 322.

일 실시예에 따르면, 학습부(232)는 외부 장치로부터 수신된 데이터 세트(310)에 대하여 모든 매물(예를 들어, 거래 완료된 모든 국내 및 수입 차량에 대한 데이터) 조건에 대응하는 데이터 서브 세트(322)로 분류할 수 있다. 이 때, 모든 매물 데이터 서브 세트(322)는 모든 매물에 대한 데이터 세트를 포함하므로, 하나의 데이터 서브 세트로 이루어질 수 있다. According to an embodiment, the learning unit 232 is a data subset corresponding to the conditions of all sales (for example, data on all domestic and imported vehicles for which transactions have been completed) with respect to the data set 310 received from an external device. 322). In this case, since all of the property data subset 322 includes a data set for all properties, it may be made of one data subset.

또한, 학습부(232)는 거래 완료된 복수의 차량에 대한 데이터 세트의 각각을 예를 들어, A사, B사, C사 등 각각의 제조사 조건에 대응하는 데이터 서브 세트(320)로 분류할 수 있다. 이 때, 복수의 제조사 조건의 각각에 대응하는 데이터 서브 세트가 분류되므로, 제조사 데이터 서브 세트(320)는, 도시된 바와 같이, 복수의 데이터 서브 세트를 포함할 수 있다. In addition, the learning unit 232 may classify each of the data sets for a plurality of vehicles for which transactions have been completed into, for example, a data subset 320 corresponding to each manufacturer's condition, such as Company A, Company B, and Company C. have. At this time, since data subsets corresponding to each of the plurality of manufacturer conditions are classified, the manufacturer data subset 320 may include a plurality of data subsets, as shown.

이에 더하여, 학습부(232)는 데이터 세트(310)를 예를 들어, A사의 A 모델, A사의 B 모델, A사의 C 모델, B사의 D 모델, B사의 C 모델 등 각 제조사의 차량 모델 조건, 즉 모델 조건 각각에 대응하는 데이터 서브 세트(318)로 분류할 수 있다. 이 때, 복수의 모델 조건의 각각에 대응하는 데이터 서브 세트가 분류되므로, 모델 데이터 서브 세트(318)는, 도시된 바와 같이, 복수의 데이터 서브 세트를 포함할 수 있다. In addition, the learning unit 232 uses the data set 310 as a vehicle model condition of each manufacturer, such as, for example, A model of company A, B model of company A, C model of company A, D model of company B, C model of company B, etc. That is, it can be classified into a data subset 318 corresponding to each model condition. At this time, since data subsets corresponding to each of the plurality of model conditions are classified, the model data subset 318 may include a plurality of data subsets, as shown.

또한, 학습부(232)는 데이터 세트(310)를 예를 들어, A사의 A 모델 XG, A사의 A모델 HG, A사의 A 모델 IG 등 각 제조사의 차량 세부 모델 조건, 즉 세부 모델 조건 각각에 대응하는 데이터 서브 세트(316)로 분류할 수 있다. 이 때, 복수의 세부 모델 조건의 각각에 대응하는 데이터 서브 세트가 분류되므로, 세부 모델 데이터 서브 세트(316)는, 도시된 바와 같이, 복수의 데이터 서브 세트를 포함할 수 있다.In addition, the learning unit 232 applies the data set 310 to each manufacturer's detailed model conditions, that is, detailed model conditions, such as company A's A model XG, company A's A model HG, and company A's model IG. It can be categorized into a corresponding data subset 316. In this case, since data subsets corresponding to each of the plurality of detailed model conditions are classified, the detailed model data subset 316 may include a plurality of data subsets, as illustrated.

이와 유사하게, 학습부(232)는 데이터 세트(310)를 예를 들어, A사의 A 모델 하이브리드 프리미엄, A사의 A 모델 하이브리드 럭셔리 등 각 차량 등급 조건의 각각에 대응하는 데이터 서브 세트(314)로 분류할 수 있다. 이 때, 복수의 차량 등급 조건의 각각에 대응하는 데이터 서브 세트가 분류되므로, 차량 등급 데이터 서브 세트(314)는, 도시된 바와 같이, 복수의 데이터 서브 세트를 포함할 수 있다.Similarly, the learning unit 232 converts the data set 310 into a data subset 314 corresponding to each of the vehicle class conditions, such as company A's A model hybrid premium and company A's A model hybrid luxury. Can be classified. In this case, since data subsets corresponding to each of the plurality of vehicle class conditions are classified, the vehicle class data subset 314 may include a plurality of data subsets, as shown.

또한, 학습부(232)는 거래 완료된 복수의 차량에 대한 데이터 세트의 각각을 2016년식 A사의 A 모델 하이브리드 프리미엄, 2015년식 A 모델 하이브리드 프리미엄, 2015년 B 모델 가솔린 럭셔리 등 연식에 따른 차량의 각 등급 조건에 대응하는 데이터 서브 세트(312)로 분류할 수 있다. 이 때, 복수의 차량 등급 조건의 각각에 대응하는 데이터 서브 세트가 분류되므로, 연식에 따른 차량 등급 데이터 서브 세트(314)는, 도시된 바와 같이, 복수의 데이터 서브 세트를 포함할 수 있다.In addition, the learning unit 232 determines each of the data sets for a plurality of vehicles for which the transaction has been completed, and each class of vehicles according to the year, such as 2016 model A company A model hybrid premium, 2015 model A model hybrid premium, and 2015 B model gasoline luxury. It can be classified into a data subset 312 corresponding to the condition. At this time, since data subsets corresponding to each of the plurality of vehicle class conditions are classified, the vehicle class data subset 314 according to an age may include a plurality of data subsets, as illustrated.

이때, 복수의 데이터 서브 세트(312 내지 322)의 각각은 미리 정해진 값 이상의 모수를 가질 수 있다. 예를 들어, 분류된 각각의 데이터 서브 세트는 100 이상의 데이터 모수를 가질 수 있다.In this case, each of the plurality of data subsets 312 to 322 may have a parameter greater than or equal to a predetermined value. For example, each classified data subset may have 100 or more data parameters.

예를 들어, 데이터 세트(310) 중 하나의 데이터 세트가 A사의 차량 매물에 대한 데이터라고 가정해보자. 그러면, 이 데이터 세트는 모든 매물 데이터 서브 세트(322)에 분류될 수 있으며, 제조사 데이터 서브 세트(320) 중 A사의 데이터 서브세트에 분류될 수 있다. 그러나, 이 데이터 세트는 모델, 세부 모델, 차량 등급, 연식에 따른 차량 등급에 대한 정보를 포함하고 있지 않으므로, 데이터 서브 세트(312, 314, 316, 318) 중 어느 데이터 서브 세트로 분류되지 않을 수 있다. For example, suppose that one of the data sets 310 is data for a company A's vehicle for sale. Then, this data set may be classified into all of the listing data subset 322, and may be classified into the data subset of Company A among the manufacturer data subset 320. However, since this data set does not contain information on vehicle class according to model, detailed model, vehicle class, and year, it may not be classified into any data subset among the data subsets 312, 314, 316, 318. have.

분류된 복수의 데이터 서브 세트의 각각은 학습 데이터 서브 세트와 테스트 데이터 서브 세트를 포함할 수 있다. 일 실시예에 따르면, 학습부(232)는 모든 매물에 대한 데이터 서브 세트의 일부를 샘플링하여 학습 데이터 서브 세트로 분류할 수 있고, 샘플링하지 않은 나머지 데이터 서브 세트를 테스트 데이터 서브 세트로 분류할 수 있다. 이러한 샘플링을 여러 번 수행하여, 복수의 학습 데이터 서브 세트(334_a) 및 이에 대응하는 복수의 테스트 데이터 서브 세트(334_b)가 생성될 수 있다. 또한, 학습부(232)는 제조사 조건에 대한 복수의 데이터 서브 세트의 각각의 일부를 샘플링하여 학습 데이터 서브 세트로 분류하고 샘플링하지 않은 나머지 데이터 서브 세트를 테스트 데이터 서브 세트로 분류할 수 있다. 이러한 샘플링을 여러 번 수행하여, 복수의 학습 데이터 서브 세트(332_a) 및 이에 대응하는 복수의 테스트 데이터 서브 세트(332_b)가 생성될 수 있다. 예를 들어, A사에 대한 데이터 서브 세트의 일부를 여러 번 샘플링하여 복수의 학습 데이터 서브 세트 및 이에 대응하는 복수의 테스트 데이터 서브 세트로 분류될 수 있다.Each of the plurality of classified data subsets may include a training data subset and a test data subset. According to an embodiment, the learning unit 232 may sample a part of the data subset for all properties and classify it as a training data subset, and classify the remaining data subset that has not been sampled as a test data subset. have. By performing such sampling several times, a plurality of subsets of training data 334_a and a plurality of subsets of test data 334_b corresponding thereto may be generated. In addition, the learning unit 232 may sample a part of each of the plurality of data subsets for manufacturer conditions and classify them as a training data subset, and classify the remaining data subsets that are not sampled as a test data subset. By performing such sampling several times, a plurality of subsets of training data 332_a and a plurality of subsets of test data 332_b corresponding thereto may be generated. For example, a part of the data subset for Company A may be sampled several times and classified into a plurality of training data subsets and a plurality of test data subsets corresponding thereto.

이에 더하여, 학습부(232)는 모델 조건에 대한 복수의 데이터 서브 세트의 각각의 일부를 샘플링하여 학습 데이터 서브 세트로 분류하고 샘플링하지 않은 나머지 데이터 서브 세트를 테스트 데이터 서브 세트로 분류할 수 있다. 이러한 샘플링을 여러 번 수행하여, 복수의 학습 데이터 서브 세트(330_a) 및 이에 대응하는 복수의 테스트 데이터 서브 세트(330_b)가 생성될 수 있다. 예를 들어, A사의 B 모델에 대한 데이터 서브 세트의 일부를 여러 번 샘플링하여 복수의 학습 데이터 서브 세트 및 이에 대응하는 복수의 테스트 데이터 서브 세트로 분류될 수 있다.In addition, the learning unit 232 may sample a part of each of the plurality of data subsets for the model condition and classify it as a training data subset, and classify the remaining data subsets that are not sampled as a test data subset. By performing such sampling several times, a plurality of subsets of training data 330_a and a plurality of subsets of test data 330_b corresponding thereto may be generated. For example, a part of the data subset for the model B of Company A may be sampled several times and classified into a plurality of training data subsets and a plurality of test data subsets corresponding thereto.

이와 유사하게, 학습부(232)는 세부 모델 조건에 대한 복수의 데이터 서브 세트의 각각의 일부를 샘플링하여 학습 데이터 서브 세트로 분류하고 샘플링하지 않은 나머지 데이터 서브 세트를 테스트 데이터 서브 세트로 분류할 수 있다. 이러한 샘플링을 여러 번 수행하여, 복수의 학습 데이터 서브 세트(328_a) 및 이에 대응하는 복수의 테스트 데이터 서브 세트(328_b)가 생성될 수 있다. 예를 들어, A사의 B 모델 HG에 대한 데이터 서브 세트의 일부를 여러 번 샘플링하여 복수의 학습 데이터 서브 세트 및 이에 대응하는 복수의 테스트 데이터 서브 세트로 분류될 수 있다.Similarly, the learning unit 232 may sample a part of each of the plurality of data subsets for detailed model conditions and classify them into a training data subset, and classify the remaining data subsets that are not sampled into a test data subset. have. By performing such sampling several times, a plurality of subsets of training data 328_a and a plurality of subsets of test data 328_b corresponding thereto may be generated. For example, a part of the data subset for Company A's B model HG may be sampled several times and classified into a plurality of training data subsets and a plurality of test data subsets corresponding thereto.

또한, 학습부(232)는 차량 등급 조건에 대한 복수의 데이터 서브 세트의 각각의 일부를 샘플링하여 학습 데이터 서브 세트로 분류하고 샘플링하지 않은 나머지 데이터 서브 세트를 테스트 데이터 서브 세트로 분류할 수 있다. 이러한 샘플링을 여러 번 수행하여, 복수의 학습 데이터 서브 세트(326_a) 및 이에 대응하는 복수의 테스트 데이터 서브 세트(326_b)가 생성될 수 있다. 예를 들어, A사의 B 모델 HG 하이브리드에 대한 데이터 서브 세트의 일부를 여러 번 샘플링하여 복수의 학습 데이터 서브 세트 및 이에 대응하는 복수의 테스트 데이터 서브 세트로 분류될 수 있다.In addition, the learning unit 232 may sample a part of each of the plurality of data subsets for the vehicle class condition and classify them as a training data subset, and classify the remaining data subsets that are not sampled as a test data subset. By performing such sampling several times, a plurality of subsets of training data 326_a and a plurality of subsets of test data 326_b corresponding thereto may be generated. For example, a part of the data subset for the B model HG hybrid of Company A may be sampled several times and classified into a plurality of training data subsets and a plurality of test data subsets corresponding thereto.

이에 더하여, 학습부(232)는 연식에 따른 차량 등급 조건에 대한 복수의 데이터 서브 세트의 각각의 일부를 샘플링하여 학습 데이터 서브 세트로 분류하고 샘플링하지 않은 나머지 데이터 서브 세트를 테스트 데이터 서브 세트로 분류할 수 있다. 이러한 샘플링을 여러 번 수행하여, 복수의 학습 데이터 서브 세트(324_a) 및 이에 대응하는 복수의 테스트 데이터 서브 세트(324_b)가 생성될 수 있다. 예를 들어, 2019년식 A사의 B 모델 HG 하이브리드에 대한 데이터 서브 세트의 일부를 여러 번 샘플링하여 복수의 학습 데이터 서브 세트 및 이에 대응하는 복수의 테스트 데이터 서브 세트로 분류될 수 있다.In addition, the learning unit 232 samples a part of each of the plurality of data subsets for vehicle grade conditions according to the year and classifies them into a training data subset, and classifies the remaining data subsets that are not sampled into a test data subset. can do. By performing such sampling several times, a plurality of subsets of training data 324_a and a plurality of subsets of test data 324_b corresponding thereto may be generated. For example, a part of the data subset for the 2019 model A's B model HG hybrid may be sampled several times and classified into a plurality of training data subsets and a plurality of test data subsets corresponding thereto.

일 실시예에 따르면, 각 데이터 서브 세트에 대하여 학습 데이터 서브 세트는 70%로, 테스트 데이터 서브 세트는 30%로 분류될 수 있다. 여기서, 각 학습 데이터 서브 세트는 이와 대응하는 복수의 예측 모델의 각각의 하이퍼파라미터를 추출하고 각 예측 모델이 최소한의 오차를 내도록 구성되는 데에 이용될 수 있다. 또한, 예측 모델 생성부(240)는 분류된 각각의 학습 데이터 서브 세트를 이용하여 머신 러닝 알고리즘을 통해 중고차의 시세를 추론하도록 구성된 복수의 예측 모델을 생성할 수 있다. 여기서, 머신 러닝 알고리즘은 앙상블 학습(예: Bagging, Boosting 등)을 포함할 수 있다. According to an embodiment, for each data subset, the training data subset may be classified as 70%, and the test data subset may be classified as 30%. Here, each subset of training data may be used to extract each hyperparameter of a plurality of prediction models corresponding thereto, and to configure each prediction model to give a minimum error. In addition, the prediction model generator 240 may generate a plurality of prediction models configured to infer the market price of a used car through a machine learning algorithm using each classified subset of training data. Here, the machine learning algorithm may include ensemble learning (eg, Bagging, Boosting, etc.).

도 4는 본 개시의 일 실시예에 따른 중고차의 시세를 추론할 수 있는 복수의 예측 모델을 생성하는 과정을 나타낸 예시도이다. 일 실시예에 따르면, 예측 모델 생성부(240)는 앙상블 학습 알고리즘을 이용하여 중고차의 시세를 추론하도록 구성된 앙상블 모델(Ensemble model)을 생성할 수 있다. 예를 들어, 앙상블 학습 알고리즘은 XGBoost(Extreme Gradient Boosting) 등과 같은 부스팅 알고리즘을 포함할 수 있다. 4 is an exemplary view showing a process of generating a plurality of prediction models capable of inferring a market price of a used car according to an embodiment of the present disclosure. According to an embodiment, the prediction model generator 240 may generate an ensemble model configured to infer a market price of a used car using an ensemble learning algorithm. For example, the ensemble learning algorithm may include a boosting algorithm such as XGBoost (Extreme Gradient Boosting).

예측 모델 생성부(240)는 특정 데이터 서브 세트(예를 들어, 특정 조건에 대응하는 복수의 데이터 서브 세트 중 하나의 데이터 서브 세트)로부터 여러 번 샘플링하여 복수의 학습 데이터 서브 세트를 생성할 수 있다. 예를 들어, 도시된 바와 같이, 예측 모델 생성부(240)는 데이터 서브 세트(410)로부터 제1 학습 데이터 서브 세트(412), 제2 학습 데이터 서브 세트(416), ..., 제n 학습 데이터 서브 세트(420)를 생성할 수 있다. 본 개시에서, n은 3이상의 정수일 수 있다. The prediction model generation unit 240 may generate a plurality of subsets of training data by sampling multiple times from a specific data subset (eg, one data subset among a plurality of data subsets corresponding to a specific condition). . For example, as shown, the prediction model generation unit 240 includes a first subset of training data 412, a second subset of training data 416, ..., nth from the data subset 410 A subset of training data 420 may be generated. In the present disclosure, n may be an integer of 3 or more.

예측 모델 생성부(240)는 학습 데이터 서브 세트로부터 머신 알고리즘을 통하여 예측 모델을 생성하도록 구성될 수 있다. 여기서, 머신러닝 알고리즘은 미리 결정되거나 또는 자동으로 선택될 수 있으며, 인공 신경 회로망(artificial neural network), 딥러닝(deep learning), 의사 결정 트리(Decision Tree) 중 하나일 수 있으나, 이에 한정되지 않는다. 예를 들어, 도시된 바와 같이, 예측 모델 생성부(240)는 제1 내지 제n 학습 데이터 서브 세트(412, 416, 420)의 각각로부터 제1 예측 모델(414), 제2 예측 모델(418), ... 제n 예측 모델(422)을 생성할 수 있다. The prediction model generation unit 240 may be configured to generate a prediction model from a subset of training data through a machine algorithm. Here, the machine learning algorithm may be predetermined or automatically selected, and may be one of an artificial neural network, deep learning, and a decision tree, but is not limited thereto. . For example, as shown, the prediction model generation unit 240 includes a first prediction model 414 and a second prediction model 418 from each of the first to nth training data subsets 412, 416, and 420. ), ... The n-th prediction model 422 may be generated.

예측 모델 생성부(240)는 복수의 예측 모델을 생성할 때, 이전 예측 모델의 학습 결과에 기초하여 다음 학습 데이터 서브 세트의 가중치를 조정할 수 있다. 예를 들어, 도시된 바와 같이, 제1 예측 모델(414)을 통해 학습한 결과에 기초하여 제2 학습 데이터 서브 세트(416)에 적용될 가중치가 조정될 수 있다. 이와 마찬가지로, 제2 예측 모델(418)을 학습한 결과에 기초하여 제3 학습 데이터 서브 세트의 가중치가 조정될 수 있다. 이러한 과정을 제n 학습 데이터 서브 세트 및 제n 예측 모델(422)이 생성될 때까지 반복될 수 있다. 즉, 이전 분류기의 학습 결과로부터 다음 학습 데이터의 가중치가 조정되고, 조정된 가중치를 통해 제1 예측 모델(414), 제2 예측 모델(418), ... 제n 예측 모델(422)이 순차적으로 생성될 수 있다. 그리고 나서, 예측 모델 생성부(424)는 생성된 제1 예측 모델(414), 제2 예측 모델(418), ... 제n 예측 모델(422)을 기초로 최종 예측 모델(430)을 생성할 수 있다. 여기서, 최종 예측 모델(430)은, 제1 내지 제n 학습 데이터 서브 세트의 각각과 조정된 가중치를 이용하여 생성될 수 있다. When generating a plurality of prediction models, the prediction model generator 240 may adjust the weight of the next subset of training data based on the training result of the previous prediction model. For example, as shown, a weight applied to the second subset of training data 416 may be adjusted based on a result of learning through the first prediction model 414. Likewise, the weight of the third subset of training data may be adjusted based on a result of training the second prediction model 418. This process may be repeated until the n-th subset of training data and the n-th prediction model 422 are generated. That is, the weight of the next training data is adjusted from the training result of the previous classifier, and the first prediction model 414, the second prediction model 418, ... n-th prediction model 422 are sequentially performed through the adjusted weight. Can be created with Then, the prediction model generation unit 424 generates a final prediction model 430 based on the generated first prediction model 414, second prediction model 418, ... n-th prediction model 422 can do. Here, the final prediction model 430 may be generated using each of the first to n-th subsets of training data and adjusted weights.

도 5는 본 개시의 일 실시예에 따른 시세 예측 요청에 응답하여 대상 중고차에 대한 시세를 산출하는 예시를 나타낸 예시도이다. 일 실시예에 따르면, 시세 산출부(234)는 수신된 대상 중고차에 대한 시세 예측 요청에 기초하여 대상 중고차에 대응하는 하나 이상의 예측 모델을 추출할 수 있다. 여기서, 대상 중고차에 대한 시세 예측 요청은 대상 중고차를 나타내거나 특징화하는 임의의 정보를 포함할 수 있으며, 예를 들어, 차량 등급, 연식에 따른 차량 등급, 차량 세부 모델(세부 모델), 차량 모델(모델), 제조사 중 적어도 하나를 포함할 수 있으나, 이에 한정되지 않는다.5 is an exemplary view showing an example of calculating a market price for a target used car in response to a market price prediction request according to an embodiment of the present disclosure. According to an embodiment, the market price calculator 234 may extract one or more prediction models corresponding to the target used car based on the received market price prediction request for the target used car. Here, the market price prediction request for the target used car may include arbitrary information indicating or characterizing the target used car, for example, vehicle class, vehicle class according to year, vehicle detailed model (detail model), vehicle model It may include at least one of (model) and manufacturer, but is not limited thereto.

본 개시에서, 도 3와 유사하게, 데이터 세트(310)로부터 연식에 따른 차량 등급 데이터 서브 세트(312), 차량 등급 데이터 서브 세트(314), 세부 모델 데이터 서브 세트(316), 모델 데이터 서브 세트(318), 제조사 데이터 서브 세트(320), 모든 매물 데이터 서브 세트(322)로 분류될 수 있다. 또한, 각 데이터 서브 세트(312 내지 322)에 대해 복수의 학습 데이터 서브 세트(324_a, 326_a, 328_a, 330_a, 332_a, 334_a) 및 이에 대응하는 복수의 테스트 데이터 서브 세트(324_b, 326_b, 328_b, 330_b, 332_b, 334_b)가 생성될 수 있다. In this disclosure, similar to FIG. 3, from the data set 310, the vehicle grade data subset 312 according to the year, the vehicle grade data subset 314, the detailed model data subset 316, the model data subset 318, manufacturer data subset 320, and all listing data subset 322. Further, for each data subset 312 to 322, a plurality of training data subsets 324_a, 326_a, 328_a, 330_a, 332_a, 334_a and a plurality of test data subsets 324_b, 326_b, 328_b, 330_b corresponding thereto , 332_b, 334_b) may be generated.

일 실시예에 따르면, 예측 모델 생성부(240)는 복수의 데이터 서브 세트(312 내지 322)에 대해 머신 알고리즘을 통해 중고차의 시세를 추론하도록 복수의 예측 모델을 생성할 수 있다. 여기서, 머신 알고리즘은 배깅, 부스팅 등과 같은 앙상블 학습을 통한 결정 트리를 이용하여 동작될 수 있다. 예를 들어, 예측 모델 생성부(240)는 연식에 따른 차량 등급 조건에 대응하는 복수의 데이터 서브 세트의 각각을 이용하여 앙상블 학습을 통해 복수의 제1 예측 모델 후보(534)를 생성할 수 있다. 또한, 차량 등급 조건에 대응하는 복수의 데이터 서브 세트의 각각을 이용하여 앙상블 학습을 통해 복수의 제2 예측 모델 후보(536)가 생성될 수 있다. 이에 더하여, 예측 모델 생성부(240)는 세부 모델 조건에 대응하는 복수의 데이터 서브 세트의 각각을 이용하여 앙상블 학습을 통해 복수의 제3 예측 모델 후보(538)를 생성할 수 있다. 또한, 모델 조건에 대응하는 복수의 데이터 서브 세트의 각각을 이용하여 앙상블 학습을 통해 복수의 제4 예측 모델 후보(540)가 생성될 수 있다. 이에 더하여, 예측 모델 생성부(240)는 제조사 조건에 대응하는 복수의 데이터 서브 세트의 각각을 이용하여 앙상블 학습을 통해 복수의 제5 예측 모델 후보(542)를 생성할 수 있다. 또한, 모든 매물 조건에 대응하는 데이터 서브 세트를 이용하여 앙상블 학습을 통해 복수의 제6 예측 모델 후보(544)가 생성될 수 있다. According to an embodiment, the prediction model generation unit 240 may generate a plurality of prediction models to infer a market price of a used car for a plurality of data subsets 312 to 322 through a machine algorithm. Here, the machine algorithm may be operated using a decision tree through ensemble learning such as bagging and boosting. For example, the prediction model generation unit 240 may generate a plurality of first prediction model candidates 534 through ensemble learning using each of a plurality of data subsets corresponding to vehicle grade conditions according to an age. . In addition, a plurality of second prediction model candidates 536 may be generated through ensemble learning using each of a plurality of data subsets corresponding to the vehicle class condition. In addition, the prediction model generator 240 may generate a plurality of third prediction model candidates 538 through ensemble learning using each of a plurality of data subsets corresponding to detailed model conditions. Further, a plurality of fourth prediction model candidates 540 may be generated through ensemble learning using each of a plurality of data subsets corresponding to the model conditions. In addition, the prediction model generator 240 may generate a plurality of fifth prediction model candidates 542 through ensemble learning using each of a plurality of data subsets corresponding to the manufacturer's condition. In addition, a plurality of sixth prediction model candidates 544 may be generated through ensemble learning using a data subset corresponding to all property conditions.

예측 모델 검증부(242)는 복수의 학습 데이터 서브 세트를 이용하여 각 예측 모델 후보의 하이퍼파라미터를 추출할 수 있다. 일 실시예에 따르면, 각 예측 모델 후보의 하이퍼파라미터는 복수의 학습 데이터 서브 세트(324_a 내지 334_a)를 이용하여 복수의 데이터 서브 세트(312 내지 322)의 각각에 대응하는 복수의 예측 모델 후보(534 내지 544)의 각각이 최소한의 오차를 가지도록 추출될 수 있다. 이 때, 예측 모델 검증부(242)는 Grid Search, Random search 등과 같이 알려진 하이퍼파라미터를 추출하는 알려진 알고리즘을 사용할 수 있다. The prediction model verification unit 242 may extract hyperparameters of each prediction model candidate by using a plurality of subsets of training data. According to an embodiment, the hyperparameter of each prediction model candidate is a plurality of prediction model candidates 534 corresponding to each of a plurality of data subsets 312 to 322 using a plurality of training data subsets 324_a to 334_a. Each of to 544) may be extracted to have a minimum error. In this case, the prediction model verification unit 242 may use a known algorithm for extracting known hyperparameters such as Grid Search and Random Search.

예를 들어, 예측 모델 검증부(242)는 복수의 학습 데이터 서브 세트(324_a)를 이용하여 연식에 따른 차량 등급 데이터 서브 세트(312)에 대응하는 복수의 예측 모델 후보(534)의 각각이 최소한의 오차를 가지도록 하이퍼파라미터를 추출할 수 있다. 예측 모델 검증부(242)는 다른 데이터 서브 세트(314 내지 322)의 각각에도 연식에 따른 차량 등급 데이터 서브 세트(312)에 적용된 하이퍼파라미터 추출 방식과 유사한 방식을 적용할 수 있다. For example, the prediction model verification unit 242 uses the plurality of training data subsets 324_a to determine that each of the plurality of prediction model candidates 534 corresponding to the vehicle class data subset 312 according to the year is at least Hyperparameters can be extracted to have an error of. The prediction model verification unit 242 may apply a method similar to the hyperparameter extraction method applied to the vehicle class data subset 312 according to the year to each of the other data subsets 314 to 322.

그리고 나서, 예측 모델 검증부(242)는 복수의 데이터 서브 세트(312 내지 322)의 각각에 대응하는 복수의 예측 모델 후보(534 내지 544)를 대응 테스트 데이터 서브 세트(324_b 내지 334_b)를 이용하여 검증할 수 있다. 검증 시, 복수의 예측 모델 후보(534 내지 544) 중에서 오차가 가장 작은 예측 모델이 대응 데이터 서브 세트를 위해 선택될 수 있다. 예를 들어, 예측 모델 검증부(242)는 복수의 테스트 데이터 서브 세트(324_b)를 이용하여 연식에 따른 차량 등급 데이터 서브 세트(312)에 대응하는 복수의 예측 모델 후보(534)를 검증하여, 복수의 예측 모델 후보(534) 중 제1 예측 모델(546)을 선택할 수 있다. 여기서, 제1 예측 모델(546)은 복수의 예측 모델 후보(534) 중에서 가장 오차가 작은 예측 모델 또는 정확도가 가장 높은 예측 모델일 수 있다. 또한, 예측 모델 검증부(242)는 다른 데이터 서브 세트(314 내지 322)의 각각에도 연식에 따른 차량 등급 데이터 서브 세트(312)에 적용된 예측 모델 선택 방식과 유사한 방식을 적용하여 각 데이터 서브 세트(314 내지 322)의 각각에 대응하는 제2 내지 제6 예측 모델(548, 550, 552, 554, 556)을 선택할 수 있다.Then, the prediction model verification unit 242 uses the plurality of prediction model candidates 534 to 544 corresponding to each of the plurality of data subsets 312 to 322 by using the corresponding test data subsets 324_b to 334_b. Can be verified. During verification, a prediction model having the smallest error among the plurality of prediction model candidates 534 to 544 may be selected for the corresponding data subset. For example, the prediction model verification unit 242 verifies a plurality of prediction model candidates 534 corresponding to the vehicle class data subset 312 according to the year using the plurality of test data subsets 324_b, The first prediction model 546 may be selected from among the plurality of prediction model candidates 534. Here, the first prediction model 546 may be a prediction model having the smallest error or a prediction model having the highest accuracy among the plurality of prediction model candidates 534. In addition, the prediction model verification unit 242 applies a method similar to the prediction model selection method applied to the vehicle grade data subset 312 according to the year to each of the other data subsets 314 to 322 to each data subset ( Second to sixth prediction models 548, 550, 552, 554, and 556 corresponding to each of 314 to 322 may be selected.

시세 산출부(234)는 대상 중고차에 대한 시세 예측 요청(558)에 응답하여, 대상 중고차에 대응하는 하나 이상의 예측 모델을 추출하고, 추출된 하나 이상의 예측 모델을 기초로 대상 중고차에 대한 시세(560)를 산출할 수 있다. 예를 들어, 시세 산출부(234)는 대상 중고차가 만족하는, 복수의 조건 중 적어도 일부 조건을 선택하고, 선택된 적어도 일부 조건에 대응하는 복수의 데이터 서브 세트의 각각에 대하여 선택된 예측 모델을 통해 대상 중고차에 대한 시세를 산출할 수 있다. 여기서, 복수의 데이터 서브 세트(312 내지 322)의 각각은 복수의 조건의 각각과 연관되어 저장될 수 있다. The market price calculation unit 234 extracts one or more prediction models corresponding to the target used car in response to the market price prediction request 558 for the target used car, and the market price 560 for the target used car based on the extracted one or more prediction models. ) Can be calculated. For example, the market price calculation unit 234 selects at least some of a plurality of conditions that the target used car satisfies, and uses a prediction model selected for each of a plurality of data subsets corresponding to the selected at least some conditions. You can calculate the market price for used cars. Here, each of the plurality of data subsets 312 to 322 may be stored in association with each of the plurality of conditions.

일 실시예에 따르면, 시세 산출부(234)는 추출된 하나 이상의 예측 모델로부터 산출된 중고차 시세에 가중치(예를 들어, A, B, C, D, E, F)를 적용하여 앙상블 학습을 통해 대상 중고차에 대한 최종 시세를 산출할 수 있다. 이러한 가중치는 추출된 예측 모델의 수에 따라 달라질 수 있다. 예를 들어, 추출된 예측 모델의 수가 많을수록 후 순위의 예측 모델(예를 들어, 제5 예측 모델, 제6 예측 모델 등)에 대한 가중치가 작아질 수 있다. 이와 달리, 추출된 예측 모델의 수가 적을수록 후 순위 예측 모델에 대한 가중치가 상대적으로 커질 수 있다. According to an embodiment, the price calculation unit 234 applies a weight (for example, A, B, C, D, E, F) to the used car price calculated from the extracted one or more prediction models, and through ensemble learning. You can calculate the final market price for the target used car. These weights may vary depending on the number of extracted prediction models. For example, as the number of extracted prediction models increases, a weight for a later-ranked prediction model (eg, a fifth prediction model, a sixth prediction model, etc.) may decrease. On the contrary, as the number of extracted prediction models decreases, a weight for a later-ranked prediction model may be relatively increased.

예를 들어, 대상 중고차에 대한 시세 예측 요청에 연식에 따른 차량 등급인 2019년식 A사의 A 모델 HG 하이브리드가 포함되었다고 가정하자. 이 경우, 시세 산출부(234)는 차량 세부 모델에 대한 복수의 데이터 서브 세트 중 2019년식 A사의 A 모델 HG 하이브리드로 분류된 데이터 서브 세트로부터 생성되고 학습된 제1 예측 모델을 추출하고, A사의 A 모델 HG 하이브리드로 분류된 데이터 서브 세트로부터 생성되고 학습된 제2 예측 모델을 추출하고, A사의 A 모델 하이브리드로 분류된 데이터 서브 세트로부터 생성되고 학습된 제3 예측 모델을 추출하고, A 모델로 분류된 데이터 서브 세트로부터 생성되고 학습된 제4 예측 모델, A사로 분류된 데이터 서브 세트로부터 생성되고 학습된 제5 예측 모델 및 모든 매물에 대한 데이터 서브 세트로부터 생성되고 학습된 제6 예측 모델을 추출할 수 있다. 이 때, 제1 예측 모델, 제2 예측 모델, 제3 예측 모델, 제4 예측 모델, 제5 예측 모델 및 제6 예측 모델은 각각의 테스트 데이터 서브 세트로부터 평가된 결과에 기초하여 선택된 가장 오차가 적으며 높은 정확도를 가지는 예측 모델일 수 있다. 이에 따라, 시세 산출부(234)는 대상 중고차에 대한 시세 예측 요청에 포함된 2019년식 A사의 A 모델 HG 하이브리드에 기초하여 추출된 제1 예측 모델, 제2 예측 모델, 제3 예측 모델, 제4 예측 모델, 제5 예측 모델 및 제6 예측 모델의 각각을 통해 2019년 A사의 A 모델 HG 하이브리드에 대한 시세를 산출할 수 있다. 그리고 나서, 시세 산출부(234)는 산출된 각각의 시세에 대해 예를 들어, 제1 예측 모델의 가중치 A는 35%, 제2 예측 모델의 가중치 B는 20%, 제3 예측 모델의 가중치 C는 15%, 제4 예측 모델의 가중치 D는 10%, 제5 예측 모델의 가중치 E는 10%, 제6 예측 모델의 가중치 F는 5%으로 적용하여 앙상블 학습을 통해 2019년식 A사의 A 모델 HG 하이브리드에 대한 최종 시세를 산출할 수 있고, 산출된 시세를 대상 중고차에 대한 시세 예측 요청을 송신한 사용자 단말로 제공할 수 있다.For example, suppose that a request for price prediction for a target used car includes a 2019 model A company A model HG hybrid, which is a vehicle class according to the year. In this case, the market price calculation unit 234 extracts the first prediction model generated and learned from the data subset classified as the 2019 model A company A model HG hybrid among the plurality of data subsets for the vehicle detailed model, and A second prediction model generated and trained from the data subset classified as the A model HG hybrid is extracted, and the third prediction model generated and learned from the data subset classified as the A model hybrid of Company A is extracted, and the A model is used. A fourth prediction model generated and learned from the classified data subset, a fifth prediction model generated and learned from the data subset classified as Company A, and a sixth prediction model generated and learned from the data subset for all properties are extracted. can do. At this time, the first prediction model, the second prediction model, the third prediction model, the fourth prediction model, the fifth prediction model, and the sixth prediction model have the most error selected based on the results evaluated from each of the test data subsets. It may be a prediction model that is small and has high accuracy. Accordingly, the market price calculation unit 234 includes a first prediction model, a second prediction model, a third prediction model, and a fourth prediction model extracted based on the 2019 model A company A model HG hybrid included in the market price prediction request for the target used car. Through each of the prediction model, the fifth prediction model, and the sixth prediction model, the market price for the A model HG hybrid of Company A in 2019 can be calculated. Then, the price calculation unit 234, for each calculated price, for example, the weight A of the first prediction model is 35%, the weight B of the second prediction model is 20%, and the weight C of the third prediction model. Is 15%, the weight D of the 4th prediction model is 10%, the weight E of the 5th prediction model is 10%, and the weight F of the 6th prediction model is 5%. The final price for the hybrid can be calculated, and the calculated price can be provided to a user terminal that has transmitted a price prediction request for a target used car.

다른 예로서, 대상 중고차에 대한 시세 예측 요청에 차량 세부 모델(세부 모델)인 A사의 A 모델 하이브리드가 포함되었다고 가정하자. 이 경우, 시세 산출부(234)는 세부 모델에 대한 복수의 데이터 서브 세트 중 A사의 A 모델 하이브리드로 분류된 데이터 서브 세트로부터 생성되고 학습된 제3 예측 모델을 추출하고, A 모델로 분류된 데이터 서브 세트로부터 생성되고 학습된 제4 예측 모델, A사로 분류된 데이터 서브 세트로부터 생성되고 학습된 제5 예측 모델 및 모든 매물에 대한 데이터 서브 세트로부터 생성되고 학습된 제6 예측 모델을 추출할 수 있다. 이때, 제3 예측 모델, 제4 예측 모델, 제5 예측 모델 및 제6 예측 모델은 각각의 테스트 데이터 서브 세트로부터 평가된 결과에 기초하여 선택된 가장 오차가 적으며 높은 정확도를 가지는 예측 모델일 수 있다. 이에 따라, 시세 산출부(234)는 대상 중고차에 대한 시세 예측 요청에 포함된 A사의 A 모델 하이브리드에 기초하여 추출된 제3 예측 모델, 제4 예측 모델, 제5 예측 모델 및 제6 예측 모델의 각각을 통해 A사의 A 모델 하이브리드에 대한 시세를 산출할 수 있다. 시세 산출부(234)는 산출된 각각의 시세에 대해 예를 들어, 제3 예측 모델의 가중치 C는 45%, 제4 예측 모델의 가중치 D는 30%, 제5 예측 모델의 가중치 E는 15%, 제6 예측 모델의 가중치 F는 10%으로 적용하여 앙상블 학습을 통해 A사의 A 모델 하이브리드에 대한 최종 시세를 산출할 수 있고, 산출된 시세를 대상 중고차에 대한 시세 예측 요청(558)을 송신한 사용자 단말로 제공할 수 있다.As another example, suppose that a request for price prediction for a target used car includes a vehicle detailed model (detail model), a company A model hybrid. In this case, the market price calculation unit 234 extracts a third prediction model generated and trained from the data subset classified as A company's A model hybrid among the plurality of data subsets for the detailed model, and the data classified as the A model. A fourth prediction model generated and learned from the subset, a fifth prediction model generated and learned from the data subset classified as Company A, and a sixth prediction model generated and learned from the data subset for all properties may be extracted. . In this case, the third prediction model, the fourth prediction model, the fifth prediction model, and the sixth prediction model may be prediction models having the smallest error and high accuracy selected based on the evaluation result from each test data subset. . Accordingly, the market price calculation unit 234 includes a third prediction model, a fourth prediction model, a fifth prediction model, and a sixth prediction model extracted based on the A model hybrid of Company A included in the price prediction request for the target used car. Through each, it is possible to calculate the market price for Company A's A model hybrid. For each calculated price, the price calculation unit 234, for example, the weight C of the third prediction model is 45%, the weight D of the fourth prediction model is 30%, and the weight E of the fifth prediction model is 15%. , By applying the weight F of the sixth prediction model as 10%, the final price for the hybrid of the A company's A model can be calculated through ensemble learning, and the calculated price is transmitted to a price prediction request 558 for the target used car. It can be provided as a user terminal.

도 6은 본 개시의 일 실시예에 따른 중고차 시세 예측 방법(600)을 나타내는 흐름도이다. 일 실시예에서, 중고차 시세 예측 방법(600)은 중고차 시세 예측 시스템에 의해 수행될 수 있다. 도시된 것과 같이, 중고차 시세 예측 방법(600)은 거래 완료된 복수의 차량에 대한 데이터 세트를 수신함으로써 개시될 수 있다(S610). 여기서, 거래 완료된 차량에 대한 데이터 세트는 신차 가격, 중고차 성능 점검, 차량의 연료의 종류, 차량 운행거리, 차량 운행일, 사고 여부 등과 관련된 데이터를 포함할 수 있으나, 이에 한정되지 않는다. 일 실시예에 따르면, 중고차 시세 예측 시스템은 미리 설정된 주기마다 중고차 중 거래 완료된 차량에 대한 데이터 세트를 외부장치로부터 수신할 수 있다.6 is a flowchart illustrating a method 600 for predicting a used car market price according to an embodiment of the present disclosure. In an embodiment, the used car price prediction method 600 may be performed by a used car price prediction system. As shown, the used car market price prediction method 600 may be initiated by receiving data sets for a plurality of vehicles for which the transaction has been completed (S610). Here, the data set for the vehicle for which the transaction has been completed may include data related to the price of a new car, a performance check of a used car, a type of fuel of the vehicle, a vehicle driving distance, a vehicle operation date, and whether an accident has occurred, but is not limited thereto. According to an embodiment, the used car market price prediction system may receive a data set for a vehicle in which a transaction has been completed among used cars from an external device every preset period.

일 실시예에 따르면, 중고차 시세 예측 시스템은 수신한 차량에 대한 데이터 세트를 동일한 매물을 가진 데이터 세트, 차량 가격 오류를 가진 데이터 세트 및 운행 거리 오류를 가진 데이터 세트 중 적어도 하나의 데이터 세트가 필터링하여, 저장부에 데이터베이스 형태로 저장할 수 있다. According to an embodiment, the used car market price prediction system filters a data set for a received vehicle by at least one of a data set having the same sale, a data set having a vehicle price error, and a data set having a driving distance error. , Can be saved in the form of a database in the storage unit.

그리고 나서, 미리 결정된 조건에 따라 거래 완료된 복수의 차량에 대한 데이터 세트를 복수의 데이터 서브 세트로 분류할 수 있다(S620). 여기서, 미리 결정된 조건은 복수의 조건을 포함할 수 있다. 일 실시예에서, 중고차 시세 예측 시스템은 거래 완료된 복수의 차량에 대한 데이터 세트의 각각이 복수의 조건 중 적어도 일부 조건을 만족하는 경우, 데이터 세트의 각각을 만족된 적어도 일부 조건의 각각에 대응하는 복수의 데이터 서브 세트로서 분류할 수 있다. Then, according to a predetermined condition, data sets for a plurality of vehicles for which transactions have been completed may be classified into a plurality of data subsets (S620). Here, the predetermined condition may include a plurality of conditions. In one embodiment, when each of the data sets for the plurality of vehicles for which the transaction has been completed satisfies at least some of the plurality of conditions, the used car market price prediction system includes a plurality of data sets corresponding to each of the at least some conditions satisfied. Can be classified as a subset of the data.

예를 들어, 복수의 조건은 연식에 따른 차량 등급 조건, 차량 등급 조건, 세부 모델 조건, 모델 조건, 제조사 조건, 모든 매물 조건을 포함할 수 있으며, 학습부는 거래 완료된 복수의 차량에 대한 데이터 세트를 모든 매물, 제조사, 모델, 세부 모델, 차량 등급 및 연식에 따른 차량 등급에 따라 하나 이상의 데이터 서브 세트로 분류할 수 있다. 이때, 분류된 복수의 데이터 서브 세트의 각각은 미리 정해진 값 이상의 모수를 가질 수 있고, 복수의 조건의 각각에 연관되어 저장될 수 있다. For example, the plurality of conditions may include vehicle grade conditions according to year, vehicle grade conditions, detailed model conditions, model conditions, manufacturer conditions, and all sales conditions, and the learning unit may include data sets for a plurality of vehicles for which transactions have been completed. It can be categorized into one or more subsets of data according to the vehicle class by all sale, make, model, detailed model, vehicle class, and year. In this case, each of the plurality of classified data subsets may have a parameter equal to or greater than a predetermined value, and may be stored in association with each of the plurality of conditions.

그 후, 단계(S630)에서 분류된 복수의 데이터 서브 세트에 대해, 중고차의 시세를 추론하도록 구성된 복수의 예측 모델을 생성할 수 있다. 일 실시예에 따르면, 중고차 시세 예측 시스템은 분류된 복수의 데이터 서브 세트의 각각에 대해, 머신러닝 알고리즘을 통해 중고차의 시세를 추론하도록 구성된 예측 모델을 생성할 수 있다. Thereafter, with respect to the plurality of data subsets classified in step S630, a plurality of prediction models configured to infer the market price of the used car may be generated. According to an embodiment, the used car price prediction system may generate a prediction model configured to infer the price of a used car through a machine learning algorithm for each of a plurality of classified data subsets.

예를 들어, 중고차 시세 예측 시스템은 연식에 따른 차량 등급 조건에 대응하는 복수의 데이터 서브 세트의 각각을 이용하여 앙상블 학습(Ensemble Learning)을 통해 복수의 제1 예측 모델 후보를 생성할 수 있고, 차량 등급 조건에 대응하는 복수의 데이터 서브 세트의 각각을 이용하여 앙상블 학습을 통해 복수의 제2 예측 모델 후보를 생성할 수 있고, 세부 모델 조건에 대응하는 복수의 데이터 서브 세트의 각각을 이용하여 앙상블 학습을 통해 복수의 제3 예측 모델 생성할 수 있다.For example, the used car market price prediction system may generate a plurality of first prediction model candidates through ensemble learning using each of a plurality of data subsets corresponding to the vehicle class condition according to the year A plurality of second prediction model candidates can be generated through ensemble learning using each of the plurality of data subsets corresponding to the grade condition, and ensemble learning using each of the plurality of data subsets corresponding to the detailed model conditions Through this, a plurality of third prediction models may be generated.

또한, 중고차 시세 예측 시스템은 모델 조건에 대응하는 복수의 데이터 서브 세트의 각각을 이용하여 앙상블 학습을 통해 복수의 제4 예측 모델 후보를 생성할 수 있고, 제조사 조건에 대응하는 복수의 데이터 서브 세트의 각각을 이용하여 앙상블 학습을 통해 복수의 제5 예측 모델 후보를 생성할 수 있고, 모든 매물 조건에 대응하는 데이터 서브 세트를 이용하여 앙상블 학습을 통해 복수의 제6 예측 모델 후보를 생성할 수 있다. 여기서, 앙상블 학습은, XGBoost(Extreme Gradient Boosting) 알고리즘을 이용한 학습일 수 있다.In addition, the used car market price prediction system may generate a plurality of fourth prediction model candidates through ensemble learning using each of a plurality of data subsets corresponding to the model conditions, and the plurality of data subsets corresponding to the manufacturer conditions. Using each of them, a plurality of fifth prediction model candidates may be generated through ensemble learning, and a plurality of sixth prediction model candidates may be generated through ensemble learning using a data subset corresponding to all property conditions. Here, the ensemble learning may be learning using an XGBoost (Extreme Gradient Boosting) algorithm.

일 실시예에서, 복수의 데이터 서브 세트의 각각은 복수의 학습 데이터 서브 세트 및 테스트 데이터 서브 세트를 포함할 수 있다. 중고차 시세 예측 시스템은 복수의 학습 데이터 서브 세트를 이용하여 복수의 데이터 서브 세트의 각각에 대응하는 복수의 예측 모델 후보의 각각이 최소한의 오차를 가지도록 각 예측 모델 후보의 하이퍼파라미터를 추출하고, 추출된 각각의 하이퍼파라미터를 각 예측 모델 후보에 적용시켜 테스트 데이터 서브 세트를 이용하여 각각의 예측 모델 후보를 검증할 수 있다.In one embodiment, each of the plurality of data subsets may include a plurality of training data subsets and test data subsets. The used car price prediction system extracts and extracts hyperparameters of each prediction model candidate so that each of a plurality of prediction model candidates corresponding to each of the plurality of data subsets has a minimum error using a plurality of subsets of training data. Each prediction model candidate can be verified using a subset of test data by applying each of the hyperparameters to each prediction model candidate.

중고차 시세 예측 시스템은 복수의 데이터 서브 세트의 각각에 대응하는 복수의 예측 모델 후보 중에서, 예측 모델 후보에 대한 각각의 평가 결과에 기초하여 오차가 가장 작은 예측 모델을 분류된 데이터 서브 세트의 예측 모델로서 각각 선택할 수 있다.The used car market price prediction system selects a prediction model with the smallest error based on each evaluation result of the prediction model candidate among a plurality of prediction model candidates corresponding to each of a plurality of data subsets as a prediction model of the classified data subset. You can choose each.

다음으로, 중고차 시세 예측 시스템은 대상 중고차에 대한 시세 예측 요청을 수신하고, 생성된 복수의 예측 모델 중에서, 대상 중고차에 대응하는 하나 이상의 예측 모델을 추출할 수 있다(S640). 일 실시예에 따르면, 중고차 시세 예측 시스템은 사용자 단말로부터 대상 중고차에 대한 시세 예측 요청을 수신할 수 있다. 여기서, 대상 중고차에 대한 시세 예측 요청은 차량 연식, 차량 등급, 차량 세부 모델, 차량 모델, 제조사 중 적어도 하나를 포함할 수 있으나, 이에 한정되지 않는다.Next, the used car price prediction system may receive a price prediction request for the target used car, and extract one or more prediction models corresponding to the target used car from among the generated prediction models (S640). According to an embodiment, the used car price prediction system may receive a price prediction request for a target used car from a user terminal. Here, the price prediction request for the target used car may include at least one of a vehicle year, vehicle class, vehicle detailed model, vehicle model, and manufacturer, but is not limited thereto.

마지막으로, 중고차 시세 예측 시스템은 대상 중고차에 대한 시세 예측 요청에 응답하여, 추출된 하나 이상의 예측 모델을 기초로 대상 중고차에 대한 시세를 산출할 수 있다(S650). 일 실시예에 따르면, 추출된 각각의 예측 모델을 통해 대상 중고차에 대한 시세가 산출될 수 있다. 중고차 시세 예측 시스템은 대상 중고차가 만족하는, 복수의 조건 중 적어도 일부 조건을 선택하고, 선택된 적어도 일부 조건에 대응하는 복수의 데이터 서브 세트의 각각에 대하여 선택된 예측 모델을 통해 상기 대상 중고차에 대한 시세를 산출할 수 있다. 이때, 중고차 시세 예측 시스템은 산출된 중고차에 대한 시세의 각각에 가중치를 적용하여 앙상블 학습을 통해 상기 대상 중고차에 대한 최종 시세를 산출할 수 있다.Finally, the used car market price prediction system may calculate a market price for a target used car based on one or more extracted prediction models in response to a request for price prediction for the target used car (S650). According to an embodiment, a market price for a target used car may be calculated through each of the extracted prediction models. The used car price prediction system selects at least some of a plurality of conditions that the target used car satisfies, and calculates the price of the target used car through a prediction model selected for each of a plurality of data subsets corresponding to the selected at least some conditions. Can be calculated. In this case, the used car price prediction system may calculate the final price for the target used car through ensemble learning by applying a weight to each of the calculated price for the used car.

산출된 각각의 중고차 시세에 가중치를 적용하여 앙상블 학습을 통해 대상 중고차에 대한 최종 시세를 산출하고, 산출된 최종 시세를 통신 네트워크를 통해 사용자 단말로 제공할 수 있다. 여기서, 각각의 중고차 시세에 대한 가중치는 중고차 시세를 산출한 각각의 예측 모델에 따라 적용될 수 있다.A weight is applied to each calculated used car market price, the final price for a target used car is calculated through ensemble learning, and the calculated final price can be provided to a user terminal through a communication network. Here, the weight for each used car price may be applied according to each prediction model that calculated the used car price.

상술된 중고차 시세 예측 방법 및 시스템은, 컴퓨터로 읽을 수 있는 기록매체에 컴퓨터가 읽을 수 있는 코드로서 구현될 수도 있다. 컴퓨터가 읽을 수 있는 기록매체는 컴퓨터 시스템에 의해 판독될 수 있는 데이터가 저장되는 모든 종류의 기록장치를 포함한다. 컴퓨터가 읽을 수 있는 기록매체의 예로는 ROM, RAM, CD-ROM, 자기 테이프, 플로피 디스크, 광데이터 저장장치 등이 있다. 또한, 컴퓨터가 읽을 수 있는 기록매체는 네트워크로 연결된 컴퓨터 시스템에 분산되어, 분산방식으로 컴퓨터가 읽을 수 있는 코드가 저장되고 실행될 수 있다. 그리고, 전술된 실시예들을 구현하기 위한 기능적인(functional) 프로그램, 코드 및 코드 세그먼트들은 본 발명이 속하는 기술분야의 프로그래머들에 의해 용이하게 추론될 수 있다.The above-described method and system for predicting used car market prices may be implemented as computer-readable codes on a computer-readable recording medium. The computer-readable recording medium includes all types of recording devices that store data that can be read by a computer system. Examples of computer-readable recording media include ROM, RAM, CD-ROM, magnetic tape, floppy disk, and optical data storage devices. In addition, the computer-readable recording medium is distributed over a computer system connected through a network, so that computer-readable codes can be stored and executed in a distributed manner. In addition, functional programs, codes, and code segments for implementing the above-described embodiments can be easily inferred by programmers in the technical field to which the present invention belongs.

본 개시의 방법, 동작 또는 기법들은 다양한 수단에 의해 구현될 수도 있다. 예를 들어, 이러한 기법들은 하드웨어, 펌웨어, 소프트웨어, 또는 이들의 조합으로 구현될 수도 있다. 본원의 개시와 연계하여 설명된 다양한 예시적인 논리적 블록들, 모듈들, 회로들, 및 알고리즘 단계들은 전자 하드웨어, 컴퓨터 소프트웨어, 또는 양자의 조합들로 구현될 수도 있음을 통상의 기술자들은 이해할 것이다. 하드웨어 및 소프트웨어의 이러한 상호 대체를 명확하게 설명하기 위해, 다양한 예시적인 구성요소들, 블록들, 모듈들, 회로들, 및 단계들이 그들의 기능적 관점에서 일반적으로 위에서 설명되었다. 그러한 기능이 하드웨어로서 구현되는지 또는 소프트웨어로서 구현되는 지의 여부는, 특정 애플리케이션 및 전체 시스템에 부과되는 설계 요구사항들에 따라 달라진다. 통상의 기술자들은 각각의 특정 애플리케이션을 위해 다양한 방식들로 설명된 기능을 구현할 수도 있으나, 그러한 구현들은 본 개시의 범위로부터 벗어나게 하는 것으로 해석되어서는 안된다.The method, operation, or techniques of this disclosure may be implemented by various means. For example, these techniques may be implemented in hardware, firmware, software, or a combination thereof. Those of skill in the art will appreciate that the various illustrative logical blocks, modules, circuits, and algorithm steps described in connection with the disclosure herein may be implemented in electronic hardware, computer software, or combinations of both. To clearly illustrate this interchange of hardware and software, various illustrative components, blocks, modules, circuits, and steps have been described above generally in terms of their functionality. Whether such functionality is implemented as hardware or software depends on the particular application and design requirements imposed on the overall system. Skilled artisans may implement the described functionality in varying ways for each particular application, but such implementations should not be interpreted as causing a departure from the scope of the present disclosure.

하드웨어 구현에서, 기법들을 수행하는 데 이용되는 프로세싱 유닛들은, 하나 이상의 ASIC들, DSP들, 디지털 신호 프로세싱 디바이스들(digital signal processing devices; DSPD들), 프로그램가능 논리 디바이스들(programmable logic devices; PLD들), 필드 프로그램가능 게이트 어레이들(field programmable gate arrays; FPGA들), 프로세서들, 제어기들, 마이크로제어기들, 마이크로프로세서들, 전자 디바이스들, 본 개시에 설명된 기능들을 수행하도록 설계된 다른 전자 유닛들, 컴퓨터, 또는 이들의 조합 내에서 구현될 수도 있다.In a hardware implementation, the processing units used to perform the techniques include one or more ASICs, DSPs, digital signal processing devices (DSPDs), programmable logic devices (PLDs). ), field programmable gate arrays (FPGAs), processors, controllers, microcontrollers, microprocessors, electronic devices, other electronic units designed to perform the functions described in this disclosure , Computer, or a combination thereof.

따라서, 본 개시와 연계하여 설명된 다양한 예시적인 논리 블록들, 모듈들, 및 회로들은 범용 프로세서, DSP, ASIC, FPGA나 다른 프로그램 가능 논리 디바이스, 이산 게이트나 트랜지스터 로직, 이산 하드웨어 컴포넌트들, 또는 본원에 설명된 기능들을 수행하도록 설계된 것들의 임의의 조합으로 구현되거나 수행될 수도 있다. 범용 프로세서는 마이크로프로세서일 수도 있지만, 대안으로, 프로세서는 임의의 종래의 프로세서, 제어기, 마이크로제어기, 또는 상태 머신일 수도 있다. 프로세서는 또한, 컴퓨팅 디바이스들의 조합, 예를 들면, DSP와 마이크로프로세서, 복수의 마이크로프로세서들, DSP 코어와 연계한 하나 이상의 마이크로프로세서들, 또는 임의의 다른 구성의 조합으로서 구현될 수도 있다.Accordingly, the various exemplary logic blocks, modules, and circuits described in connection with the present disclosure may include a general purpose processor, DSP, ASIC, FPGA or other programmable logic device, discrete gate or transistor logic, discrete hardware components, or It may be implemented or performed in any combination of those designed to perform the functions described in. A general purpose processor may be a microprocessor, but in the alternative, the processor may be any conventional processor, controller, microcontroller, or state machine. The processor may also be implemented as a combination of computing devices, eg, a combination of a DSP and a microprocessor, a plurality of microprocessors, one or more microprocessors in connection with the DSP core, or any other configuration.

펌웨어 및/또는 소프트웨어 구현에 있어서, 기법들은 랜덤 액세스 메모리(random access memory; RAM), 판독 전용 메모리(read-only memory; ROM), 비휘발성 RAM(non-volatile random access memory; NVRAM), PROM(programmable read-only memory), EPROM(erasable programmable read-only memory), EEPROM(electrically erasable PROM), 플래시 메모리, 컴팩트 디스크(compact disc; CD), 자기 또는 광학 데이터 스토리지 디바이스 등과 같은 컴퓨터 판독가능 매체 상에 저장된 명령들로서 구현될 수도 있다. 명령들은 하나 이상의 프로세서들에 의해 실행 가능할 수도 있고, 프로세서(들)로 하여금 본 개시에 설명된 기능의 특정 양태들을 수행하게 할 수도 있다.In firmware and/or software implementation, the techniques include random access memory (RAM), read-only memory (ROM), non-volatile random access memory (NVRAM), PROM ( on a computer-readable medium such as programmable read-only memory, erasable programmable read-only memory (EPROM), electrically erasable PROM (EEPROM), flash memory, compact disc (CD), magnetic or optical data storage device, etc. It can also be implemented as stored instructions. The instructions may be executable by one or more processors, and may cause the processor(s) to perform certain aspects of the functionality described in this disclosure.

소프트웨어로 구현되는 경우, 상기 기법들은 하나 이상의 명령들 또는 코드로서 컴퓨터 판독 가능한 매체 상에 저장되거나 또는 컴퓨터 판독 가능한 매체를 통해 전송될 수도 있다. 컴퓨터 판독가능 매체들은 한 장소에서 다른 장소로 컴퓨터 프로그램의 전송을 용이하게 하는 임의의 매체를 포함하여 컴퓨터 저장 매체들 및 통신 매체들 양자를 포함한다. 저장 매체들은 컴퓨터에 의해 액세스될 수 있는 임의의 이용 가능한 매체들일 수도 있다. 비제한적인 예로서, 이러한 컴퓨터 판독가능 매체는 RAM, ROM, EEPROM, CD-ROM 또는 다른 광학 디스크 스토리지, 자기 디스크 스토리지 또는 다른 자기 스토리지 디바이스들, 또는 소망의 프로그램 코드를 명령들 또는 데이터 구조들의 형태로 이송 또는 저장하기 위해 사용될 수 있으며 컴퓨터에 의해 액세스될 수 있는 임의의 다른 매체를 포함할 수 있다. 또한, 임의의 접속이 컴퓨터 판독가능 매체로 적절히 칭해진다.When implemented in software, the techniques may be stored on a computer-readable medium as one or more instructions or code or transmitted through a computer-readable medium. Computer-readable media includes both computer storage media and communication media, including any medium that facilitates transfer of a computer program from one place to another. Storage media may be any available media that can be accessed by a computer. By way of non-limiting example, such computer-readable medium may contain RAM, ROM, EEPROM, CD-ROM or other optical disk storage, magnetic disk storage or other magnetic storage devices, or the desired program code in the form of instructions or data structures. It may include any other medium that may be used for transfer or storage to and accessible by a computer. Also, any connection is properly termed a computer-readable medium.

예를 들어, 소프트웨어가 동축 케이블, 광섬유 케이블, 연선, 디지털 가입자 회선 (DSL), 또는 적외선, 무선, 및 마이크로파와 같은 무선 기술들을 사용하여 웹사이트, 서버, 또는 다른 원격 소스로부터 전송되면, 동축 케이블, 광섬유 케이블, 연선, 디지털 가입자 회선, 또는 적외선, 무선, 및 마이크로파와 같은 무선 기술들은 매체의 정의 내에 포함된다. 본원에서 사용된 디스크(disk) 와 디스크(disc)는, CD, 레이저 디스크, 광 디스크, DVD(digital versatile disc), 플로피디스크, 및 블루레이 디스크를 포함하며, 여기서 디스크들(disks)은 보통 자기적으로 데이터를 재생하고, 반면 디스크들(discs) 은 레이저를 이용하여 광학적으로 데이터를 재생한다. 위의 조합들도 컴퓨터 판독가능 매체들의 범위 내에 포함되어야 한다.For example, if the software is transmitted from a website, server, or other remote source using wireless technologies such as coaxial cable, fiber optic cable, twisted pair, digital subscriber line (DSL), or infrared, wireless, and microwave, coaxial cable , Fiber optic cable, twisted pair, digital subscriber line, or wireless technologies such as infrared, wireless, and microwave are included within the definition of the medium. Disks and discs as used herein include CDs, laser disks, optical disks, digital versatile discs (DVDs), floppy disks, and Blu-ray disks, where disks are usually magnetic It reproduces data optically, whereas discs reproduce data optically using a laser. Combinations of the above should also be included within the scope of computer-readable media.

소프트웨어 모듈은, RAM 메모리, 플래시 메모리, ROM 메모리, EPROM 메모리, EEPROM 메모리, 레지스터들, 하드 디스크, 이동식 디스크, CD-ROM, 또는 공지된 임의의 다른 형태의 저장 매체 내에 상주할 수도 있다. 예시적인 저장 매체는, 프로세가 저장 매체로부터 정보를 판독하거나 저장 매체에 정보를 기록할 수 있도록, 프로세서에 연결될 수 있다. 대안으로, 저장 매체는 프로세서에 통합될 수도 있다. 프로세서와 저장 매체는 ASIC 내에 존재할 수도 있다. ASIC은 유저 단말 내에 존재할 수도 있다. 대안으로, 프로세서와 저장 매체는 유저 단말에서 개별 구성요소들로서 존재할 수도 있다.The software module may reside in RAM memory, flash memory, ROM memory, EPROM memory, EEPROM memory, registers, hard disk, removable disk, CD-ROM, or any other known type of storage medium. An exemplary storage medium may be coupled to a processor such that the processor can read information from or write information to the storage medium. Alternatively, the storage medium may be integrated into the processor. The processor and storage medium may also reside within the ASIC. The ASIC may exist in the user terminal. Alternatively, the processor and storage medium may exist as separate components in the user terminal.

이상 설명된 실시예들이 하나 이상의 독립형 컴퓨터 시스템에서 현재 개시된 주제의 양태들을 활용하는 것으로 기술되었으나, 본 개시는 이에 한정되지 않고, 네트워크나 분산 컴퓨팅 환경과 같은 임의의 컴퓨팅 환경과 연계하여 구현될 수도 있다. 또 나아가, 본 개시에서 주제의 양상들은 복수의 프로세싱 칩들이나 장치들에서 구현될 수도 있고, 스토리지는 복수의 장치들에 걸쳐 유사하게 영향을 받게 될 수도 있다. 이러한 장치들은 PC들, 네트워크 서버들, 및 휴대용 장치들을 포함할 수도 있다.Although the above-described embodiments have been described as utilizing aspects of the currently disclosed subject matter in one or more standalone computer systems, the present disclosure is not limited thereto, and may be implemented in connection with any computing environment such as a network or a distributed computing environment. . Furthermore, aspects of the subject matter in this disclosure may be implemented in multiple processing chips or devices, and storage may be similarly affected across multiple devices. Such devices may include PCs, network servers, and portable devices.

본 명세서에서는 본 개시가 일부 실시예들과 관련하여 설명되었지만, 본 개시의 발명이 속하는 기술분야의 통상의 기술자가 이해할 수 있는 본 개시의 범위를 벗어나지 않는 범위에서 다양한 변형 및 변경이 이루어질 수 있다. 또한, 그러한 변형 및 변경은 본 명세서에 첨부된 특허청구의 범위 내에 속하는 것으로 생각되어야 한다.In the present specification, the present disclosure has been described in connection with some embodiments, but various modifications and changes may be made without departing from the scope of the present disclosure as understood by those of ordinary skill in the art to which the present disclosure belongs. In addition, such modifications and changes should be considered to fall within the scope of the claims appended to this specification.

상술한 바와 같이, 다른 사용자들의 결제 정보를 적극 활용하여 예약 구매 요청을 처리함으로써, 사용자는 자기가 원하는 상품을 원하는 가격에 구매하기 위해 시간을 지속해서 투자할 필요 없이 쉽게 구매를 진행하는 것이 가능하다. 즉, 다른 사용자들의 결제 정보를 이용하여 허위 매물이 아닌 실제로 거래가 가능한 최저 가격 상품을 찾기는 것이 가능하다. 또한, 판매자는 구매자의 수요를 예측하기 힘들고 재고 관리에도 노력이 필요한데, 사용자들이 등록하는 예약 구매 요청에 대한 통계 정보를 판매자들에게 제공함으로써 판매자는 상품에 대한 수요와 수요 가격을 쉽게 파악하여 전략적으로 이용하는 것도 가능하다.As described above, by actively utilizing the payment information of other users to process the reservation purchase request, the user can easily make a purchase without having to continuously invest time to purchase the product he wants at a desired price. . That is, it is possible to find the lowest price product that can be actually traded, not a false sale, using payment information of other users. In addition, it is difficult for the seller to predict the demand of the buyer and it is necessary to manage the inventory. By providing statistical information on the reservation purchase request registered by the users to the seller, the seller can easily grasp the demand and the demand price for the product and strategically It is also possible to use.

110_1: 신차 가격 DB
110_2: 성능 검사 DB
120: 통신 네트워크
130: 중고차 시세 예측 시스템
210: 통신부
220: 저장부
230: 프로세서
232: 학습부
234: 시세 산출부
240: 예측 모델 생성부
242: 예측 모델 검증부110_1: New car price DB
110_2: Performance check DB
120: communication network
130: used car price prediction system
210: communication department
220: storage unit
230: processor
232: Learning Department
234: price calculation unit
240: prediction model generation unit
242: prediction model verification unit

Claims

In the used car market price prediction method through machine learning,
Receiving data sets for a plurality of vehicles for which transactions have been completed;
Classifying the data sets of the plurality of vehicles for which the transaction has been completed into a plurality of data subsets according to a predetermined condition;
Generating a plurality of prediction models configured to infer a market price of a used car for the classified plurality of data subsets, the plurality of prediction models including a first prediction model and a second prediction model;
Receiving a market price prediction request for a target used car, and extracting a plurality of prediction models corresponding to the target used car from among the plurality of generated prediction models; And
In response to a request for price prediction for the target used car, calculating a plurality of market prices for the target used car based on the extracted plurality of prediction models
Including,
For the plurality of classified data subsets, generating a plurality of predictive models configured to infer prices of used cars may include, for each of the plurality of classified data subsets, the price of used cars is calculated through a machine learning algorithm. Generating a plurality of predictive models configured to infer,
The step of calculating a plurality of market prices for the target used car,
Selecting at least some of the plurality of conditions that the target used car satisfies;
Calculating a plurality of market prices for the target used car through a plurality of prediction models selected for each of a plurality of data subsets corresponding to the selected at least some conditions; And
And calculating a final market price for the target used car through ensemble learning by applying a weight to each of the calculated market prices for the target used car.

delete

The method of claim 1,
The plurality of conditions include vehicle grade conditions, vehicle grade conditions, detailed model conditions, model conditions, manufacturer conditions, and all sales conditions according to the year,
Each of the plurality of data subsets has a parameter greater than or equal to a predetermined value,
How to predict used car prices.

delete

The method of claim 1,
For the plurality of classified data subsets, generating a plurality of predictive models configured to infer a market price of a used car,
Generating a plurality of first prediction model candidates through ensemble learning by using each of a plurality of data subsets corresponding to vehicle grade conditions according to a model year;
Generating a plurality of second prediction model candidates through ensemble learning using each of the plurality of data subsets corresponding to the vehicle class condition;
Generating a plurality of third prediction model candidates through ensemble learning using each of a plurality of data subsets corresponding to detailed model conditions;
Generating a plurality of fourth prediction model candidates through ensemble learning using each of the plurality of data subsets corresponding to the model conditions;
Generating a plurality of fifth prediction model candidates through ensemble learning using each of a plurality of data subsets corresponding to a manufacturer condition; And
Generating a plurality of sixth prediction model candidates through ensemble learning by using a data subset corresponding to all the property conditions
Including, used car market price prediction method.

The method of claim 1,
Each of the classified plurality of data subsets includes a plurality of training data subsets and test data subsets,
For each of the plurality of classified data subsets, generating a predictive model configured to infer a market price of a used car through a machine learning algorithm,
Extracting hyperparameters of each prediction model candidate using the plurality of training data subsets so that each of a plurality of prediction model candidates corresponding to each of the plurality of data subsets has a minimum error; And
Including the step of selecting a prediction model having the smallest error when verifying using the test data subset from among a plurality of prediction model candidates corresponding to each of the plurality of data subsets,
How to predict used car prices.

delete

The method of claim 6,
The market price prediction request for the target used car includes a future prediction condition for the target used car,
The step of calculating the market price for the target used car,
Calculating a future market price for the target used car through a plurality of prediction models selected for each of a plurality of data subsets corresponding to the selected at least some conditions; And
Comprising the step of calculating a final future market price for the target used car through ensemble learning by applying a weight to each of the calculated future market price for the target used car,
How to predict used car prices.

The method of claim 1,
Receiving a data set for a plurality of vehicles for which the transaction has been completed,
Filtering at least one data set from among the data sets for the plurality of vehicles, a data set having the same property, a data set having a vehicle price error, and a data set having a driving distance error,
How to predict used car prices.

A computer program stored in a computer-readable recording medium to execute the used car market price prediction method according to any one of claims 1, 3, 5, 6, 8, and 9 on a computer.

In a used car price prediction system through machine learning,
Receiving data sets for a plurality of vehicles for which transactions have been completed, and classifying data sets for a plurality of vehicles for which transactions have been completed according to a predetermined condition into a plurality of data subsets, and for the classified plurality of data subsets, A learning unit configured to generate a plurality of prediction models configured to infer a market price, the plurality of prediction models including a first prediction model and a second prediction model; And
Receiving a market price prediction request for a target used car, extracting a plurality of prediction models corresponding to the target used car from among the generated prediction models, and in response to a market price prediction request for the target used car, the extracted plurality Includes a market price calculation unit configured to calculate a plurality of market prices for the target used car based on the prediction model of,
The learning unit,
For each of the classified plurality of data subsets, further configured to generate a predictive model configured to infer a market price of a used car through a machine learning algorithm,
The price calculation unit,
A plurality of market prices for the target used car are selected through a plurality of prediction models selected for each of a plurality of data subsets corresponding to the selected at least some conditions that are satisfied by the target used car, and And calculating the final price for the target used car through ensemble learning by applying a weight to each of the plurality of market prices for the target used car calculated,
Used car price prediction system.