CN117114775A - Rapid pricing method and system for second-hand vehicle based on LightGBM model - Google Patents

Rapid pricing method and system for second-hand vehicle based on LightGBM model Download PDF

Info

Publication number
CN117114775A
CN117114775A CN202310373660.3A CN202310373660A CN117114775A CN 117114775 A CN117114775 A CN 117114775A CN 202310373660 A CN202310373660 A CN 202310373660A CN 117114775 A CN117114775 A CN 117114775A
Authority
CN
China
Prior art keywords
vehicle
model
brand
price
regression
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Pending
Application number
CN202310373660.3A
Other languages
Chinese (zh)
Inventor
郭建广
宫伟来
关志勇
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Shanghai Xinbao Botong E Commerce Co ltd
Original Assignee
Shanghai Xinbao Botong E Commerce Co ltd
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Shanghai Xinbao Botong E Commerce Co ltd filed Critical Shanghai Xinbao Botong E Commerce Co ltd
Priority to CN202310373660.3A priority Critical patent/CN117114775A/en
Publication of CN117114775A publication Critical patent/CN117114775A/en
Pending legal-status Critical Current

Links

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06QINFORMATION AND COMMUNICATION TECHNOLOGY [ICT] SPECIALLY ADAPTED FOR ADMINISTRATIVE, COMMERCIAL, FINANCIAL, MANAGERIAL OR SUPERVISORY PURPOSES; SYSTEMS OR METHODS SPECIALLY ADAPTED FOR ADMINISTRATIVE, COMMERCIAL, FINANCIAL, MANAGERIAL OR SUPERVISORY PURPOSES, NOT OTHERWISE PROVIDED FOR
    • G06Q30/00Commerce
    • G06Q30/02Marketing; Price estimation or determination; Fundraising
    • G06Q30/0283Price estimation or determination
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06NCOMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
    • G06N20/00Machine learning
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06QINFORMATION AND COMMUNICATION TECHNOLOGY [ICT] SPECIALLY ADAPTED FOR ADMINISTRATIVE, COMMERCIAL, FINANCIAL, MANAGERIAL OR SUPERVISORY PURPOSES; SYSTEMS OR METHODS SPECIALLY ADAPTED FOR ADMINISTRATIVE, COMMERCIAL, FINANCIAL, MANAGERIAL OR SUPERVISORY PURPOSES, NOT OTHERWISE PROVIDED FOR
    • G06Q30/00Commerce
    • G06Q30/02Marketing; Price estimation or determination; Fundraising
    • G06Q30/0201Market modelling; Market analysis; Collecting market data
    • G06Q30/0202Market predictions or forecasting for commercial activities
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06QINFORMATION AND COMMUNICATION TECHNOLOGY [ICT] SPECIALLY ADAPTED FOR ADMINISTRATIVE, COMMERCIAL, FINANCIAL, MANAGERIAL OR SUPERVISORY PURPOSES; SYSTEMS OR METHODS SPECIALLY ADAPTED FOR ADMINISTRATIVE, COMMERCIAL, FINANCIAL, MANAGERIAL OR SUPERVISORY PURPOSES, NOT OTHERWISE PROVIDED FOR
    • G06Q30/00Commerce
    • G06Q30/02Marketing; Price estimation or determination; Fundraising
    • G06Q30/0201Market modelling; Market analysis; Collecting market data
    • G06Q30/0206Price or cost determination based on market factors

Abstract

The application relates to a quick pricing method and system for a second-hand vehicle based on a LightGBM model, comprising the steps of preprocessing second-hand vehicle historical data, and carrying out feature analysis on the preprocessed data to obtain vehicle features; inputting the vehicle characteristics to a pre-constructed LightGBM model, and obtaining a vehicle estimation model through training; estimating the price of the vehicle to be estimated through a vehicle estimation model; and calibrating the estimated price output by the model to obtain a vehicle estimated value result. The scheme not only solves the problems that the traditional method relies on manual experience and is difficult to duplicate quickly, but also solves the problem that the prediction result deviation is large because the simple tree model cannot sense market dynamics; in architecture, an off-line training and calculating mode and an on-line prediction mode are adopted, so that the prediction efficiency is improved.

Description

Rapid pricing method and system for second-hand vehicle based on LightGBM model
Technical Field
The application belongs to the technical field of computers, and particularly relates to a quick pricing method and system for a second-hand vehicle based on a LightGBM model.
Background
1. Reset cost method: refers to the total cost required to re-purchase a new state of the vehicle under current conditions (i.e., the full reset cost. Simply reset full price), subtracting the difference of the various stale devaluations of the vehicle under evaluation as an evaluation method of the current price of the vehicle under evaluation. Generally, the price of a second hand car is about 20% lost over its new car price within one year, calculated as 10% per year after one year. This formula is often used by most second-hand car purchasers because of the fast operation. Experienced motorists are often used. The method is thick, and does not consider the influence of the market occupation rate on the warranty rate of each brand of vehicle system and the characteristic difference of the vehicle condition of each second hand vehicle.
2. The current price estimation method is to estimate the price of the vehicle by multiplying the average price by the coefficient based on the average price of the vehicles in the same style, year and service life in the second-hand market and considering the current state of the art of the estimated vehicle. This approach is the approach closest to the market real price, but one precondition is to have a large amount of market real transaction data as a sample, such average price being more representative. Both the KBB model in the United states (based on the Manheim platform for more than 350 tens of thousands of beats of vehicle data per year) and the equity big data linear weighted model can provide a relatively transparent and feasible assessment method for the second-hand vehicle market as a common method for second-hand vehicle acquisition assessment. There is also a need to identify and reject abnormal transaction records.
3. Expert pricing method: the method is widely applicable to second-hand carters with low frequency, small quantity and small area range and small scene, and is difficult to ensure the requirements of high frequency, large scale and national range for quick and high efficiency pricing in the operation process of multiple scenes, and also difficult to control the operation cost and the risk management;
4. building a model automatic estimation based on the second-hand vehicle price data: the tree model (random forest, gradient drop) has higher requirements on data quality; the second-hand car valuation model based on machine learning in the related technology can not dynamically sense the change of market quotation, so that the problem of larger deviation between a predicted result and an actual result is caused.
In summary, the above prior art solutions have the following drawbacks:
1) The accuracy rate is insufficient: the expert experience is greatly influenced by human experience, and the pure machine learning model is not obvious in the market dynamic perception, so that the deviation is large.
2) The data quality requirement is high, and the real transaction data requirement is high.
Disclosure of Invention
In order to overcome the defects of the technology, the application provides a quick pricing method and a quick pricing system for a second-hand vehicle based on a LightGBM model, which not only solve the problems that the traditional method relies on manual experience and is difficult to quickly duplicate, but also solve the problem that a simple tree model has large prediction result deviation due to the fact that market dynamics cannot be perceived; in architecture, an off-line training and calculating mode and an on-line prediction mode are adopted, so that the prediction efficiency is improved.
The application aims at adopting the following technical scheme:
a method for quick pricing of a second hand truck based on a LightGBM model, the method comprising:
preprocessing the history data of the second hand vehicle, and carrying out feature analysis on the preprocessed data to obtain vehicle features;
inputting the vehicle characteristics to a pre-constructed LightGBM model, and obtaining a vehicle estimation model through training;
estimating the price of the vehicle to be estimated through a vehicle estimation model;
and calibrating the estimated price output by the model to obtain a vehicle estimated value result.
Preferably, before the preprocessing of the second-hand vehicle history data, the method further comprises: the method comprises the steps of collecting second-hand vehicle historical data in a unit time period, wherein the second-hand vehicle historical data comprises vehicle basic information, use information and transaction information.
Preferably, the preprocessing the history data of the second hand vehicle, and performing feature analysis on the preprocessed data, to obtain the vehicle features includes:
acquiring vehicle information and vehicle condition information of a historical unit time period;
performing abnormal information processing on the vehicle information and the vehicle condition information by adopting a null filling, abnormal value correction and data type conversion mode:
based on the vehicle and the vehicle condition information for processing the abnormal data, establishing characteristic engineering information; the characteristic engineering information comprises brand yield, regional yield and brand yield price average/variance;
classifying the vehicles, the vehicle condition information and the characteristic engineering information into continuous columns and classified columns, carrying out data processing aiming at each type, and taking the processed data as the vehicle characteristics.
Further, the vehicle information includes: manufacturing party, brand, train, model, annual style, emission standard, new vehicle guiding price, displacement, horsepower, fuel consumption, vehicle type, joint fuel type, fuel supply mode, gearbox type, comprehensive fuel consumption and vehicle body type;
the vehicle condition information: mileage, license plate area, vehicle body color, vehicle age, operating property, number of passes, optional components, vehicle condition star rating, vehicle source, maintenance record, and risk record.
Preferably, the calibrating the estimated price of the model output includes:
obtaining a vehicle regression conservation rate according to the association relation between the vehicle age and the brand vehicle system dimension conservation rate of the vehicle;
calculating a regression valuation based on the regression retention rate; wherein, regression valuation = regression underwriting rate new vehicle guidance price;
and obtaining deviation between the regression estimation value and the model estimation value through comparison, and determining a vehicle estimation value according to the deviation.
Further, the obtaining the vehicle regression conservation rate according to the association relationship between the vehicle age and the brand train dimension conservation rate comprises:
determining dimension conservation rate of brand vehicle types according to transaction information of the history data of the second hand vehicle;
and obtaining the regression conservation rate of the vehicle according to the dimension conservation rate curve of the brand vehicle type.
Further, the determining the dimension retention rate of the brand vehicle model includes:
calculating the vehicle age and the maintenance rate of each vehicle according to the data of the vehicles of any brand in the unit time category; wherein, the vehicle age refers to the month difference between the date of the transaction and the date of the card-up; the value retention rate refers to the ratio of the price of the exchange to the new vehicle guidance price;
if the number of vehicles which are intersected in the unit time category of the brand vehicle is smaller than the preset statistic, and the vehicle age and the retention rate cannot be fitted into an exponential function, selecting brand vehicle system dimension data corresponding to the vehicle, and re-fitting;
if the brand system dimension can not be fit into the index function, taking the brand dimension retention;
if the brand dimension still fails to fit to an exponential function, setting the brand as a popular brand;
combining all the brand engagement records of the minor population, obtaining the corresponding value retention rates of different vehicle ages through linear fitting according to statistical analysis, and taking the value retention rates as the dimension value retention rates of the brand vehicle types.
Further, the obtaining the vehicle regression coverage according to the dimension coverage curve of the brand vehicle model includes:
inputting the vehicle age of each vehicle according to the dimension value retention rate curve of the brand vehicle model to obtain the regression value retention rate of the vehicle;
the obtaining of the vehicle regression conservation rate further comprises the following steps: abnormal transaction processing is carried out on vehicles with lower success rate according to the vehicle regression rate;
wherein, the vehicle with lower transaction insurance value rate comprises:
for the vehicles with the proportion of the vehicle transaction guarantee rate lower than a first threshold value exceeding a second threshold value in the transaction data of the entrusters and the vehicle operators;
and screening the vehicles by taking the vehicle price and the vehicle condition as indexes based on the vehicle dimension, wherein the vehicle with the lower price and the lower maintenance rate exceeds a third threshold.
Further, the determining the vehicle estimate based on the magnitude of the deviation comprises:
if the deviation between the regression estimation and the model estimation is lower than a threshold d, the model estimation is used as a final estimation of the vehicle;
if the deviation between the regression estimation and the model estimation is higher than the threshold d, the following processing is performed:
finding out vehicle transaction information of similar vehicle conditions and similar mileage of the same vehicle to be estimated from historical transaction data in a preset time range;
if the record number is higher than N, calculating an average price;
if the deviation between the average cost and the regression estimated value is smaller than the deviation between the average cost and the model estimated value, the regression estimated value is used as a vehicle final estimated value, otherwise, the model estimated value is used as the vehicle final estimated value;
and if the number of records is lower than the preset number threshold, taking the model estimated value as a final estimated value of the vehicle.
A LightGBM model-based secondary vehicle rapid pricing system comprising:
the feature acquisition module is used for preprocessing the history data of the second hand vehicle, and carrying out feature analysis on the preprocessed data to obtain vehicle features;
the model training module is used for inputting the vehicle characteristics into a pre-constructed LightGBM model and obtaining a vehicle estimated model through training;
the evaluation module is used for estimating the price of the vehicle to be estimated through the vehicle estimation model;
and the calibration module is used for calibrating the estimated price output by the model to obtain a vehicle estimated result.
The beneficial effects achieved by the application are as follows:
according to the quick pricing method and system for the second-hand vehicle based on the LightGBM model, which are provided by the application, the historical data of the second-hand vehicle is preprocessed, and the preprocessed data is subjected to characteristic analysis to obtain the characteristics of the vehicle; inputting the vehicle characteristics to a pre-constructed LightGBM model, and obtaining a vehicle estimation model through training; estimating the price of the vehicle to be estimated through a vehicle estimation model; and calibrating the estimated price output by the model to obtain a vehicle estimated value result. In architecture, an off-line training and calculating mode and an on-line prediction mode are adopted, so that the prediction efficiency is improved. The method solves the problems that the traditional method depends on manual experience and is difficult to duplicate quickly, and the problem that the prediction result deviation is large due to the fact that a simple tree model cannot sense market dynamics.
Through a large number of experimental comparisons, the scheme has better calibration effect and faster running speed. Make up for the defect that manual experience calibration is difficult to duplicate fast.
Drawings
FIG. 1 is a flow chart of a quick pricing method for a second hand truck based on a LightGBM model provided by the application;
fig. 2 is a schematic diagram of an operation process of the second-hand vehicle rapid pricing method based on the LightGBM model provided by the application;
fig. 3 is a schematic diagram of a module structure of a second-hand vehicle rapid pricing system based on a LightGBM model.
Detailed Description
The following describes the embodiments of the present application in further detail with reference to the drawings.
As shown in fig. 1, the application provides a quick pricing method for a second-hand vehicle based on a LightGBM model. The method belongs to a technical scheme for estimating the price of the second-hand vehicle based on machine learning and statistical analysis technology, and can be applied to the following 5 aspects:
a. new vehicle replacement: as a second hand car price reference for a user or dealer ready to sell the car;
b. buying: for consumers, knowing market conditions, and rapidly selecting satisfactory brand vehicles according to self budget; for the dealer, the purchase price is quickly determined, and the collection profit management is conveniently checked;
c. financial credit: providing a reference to a user for a loan credit based on a vehicle estimate
d. An auction platform: setting references for reserve price (reserve price is exceeded, i.e. the transaction can be made) and starting price (starting price at which the auction starts);
e.4s store or distributor: and (5) collecting business references. To vehicle acquisition prices, wholesale prices (trade in the same party), retail prices (sales to end consumers).
Referring to fig. 1, the method includes:
s1, preprocessing historical data of a second hand vehicle, and performing feature analysis on the preprocessed data to obtain vehicle features;
s2, inputting the vehicle characteristics into a pre-constructed LightGBM model, and obtaining a vehicle estimation model through training;
s3, estimating the price of the vehicle to be estimated through a vehicle estimation model;
s4, calibrating the estimated price output by the model to obtain a vehicle estimated result.
In step S1, before the preprocessing of the second-hand cart historical data, the method further includes: the method comprises the steps of collecting second-hand vehicle historical data in a unit time period, wherein the second-hand vehicle historical data comprises vehicle basic information, use information and transaction information.
Preprocessing the history data of the second hand vehicle, performing feature analysis on the preprocessed data, and obtaining vehicle features comprises the following steps:
acquiring vehicle information and vehicle condition information of a historical unit time period;
performing abnormal information processing on the vehicle information and the vehicle condition information by adopting a null filling, abnormal value correction and data type conversion mode: based on the basic attribute and the use attribute of the vehicle, vehicle processing which is obviously lower than the market price is judged;
based on the vehicle and the vehicle condition information for processing the abnormal data, establishing characteristic engineering information; the characteristic engineering information comprises brand yield, regional yield and brand yield price average/variance;
classifying the vehicles, the vehicle condition information and the characteristic engineering information into continuous columns and classified columns, carrying out data processing aiming at each type, and taking the processed data as the vehicle characteristics.
Wherein the vehicle information includes: manufacturing party, brand, train, model, annual style, emission standard, new vehicle guiding price, displacement, horsepower, fuel consumption, vehicle type, joint fuel type, fuel supply mode, gearbox type, comprehensive fuel consumption and vehicle body type;
the vehicle condition information: mileage, license plate area, body color, age of vehicle, operational properties, number of passes, optional parts, vehicle condition star rating (including but not limited to appearance, interior trim, skeleton, operating conditions, appliances), vehicle source, maintenance records, and risk records.
In step S3, the specific workflow of the model training phase includes:
a. based on the historical vehicle information and vehicle condition information data in a period of time (such as 3 months), the vehicle information and the vehicle condition information data are arranged into a wide table
b. Abnormal data processing: null fill (fill mode), outlier correction, data type conversion
c. Based on the vehicle and the vehicle condition information, constructing feature engineering, such as: brand volume, regional volume, brand price mean/variance, etc.
d. The vehicle and the vehicle condition information are divided into two types of characteristic engineering information: continuous columns (e.g., vehicle range), classified columns (e.g., gearbox types are manual, automatic).
e. For successive column processing:
i. the long tail effect of the value is obvious, the scale transformation or standardization is carried out, such as taking logarithm, and the influence of the difference is weakened
And ii, continuously taking too many values, and dividing into barrels, such as representing mileage, rounding according to 0.5 km. For example 34509 km, round to 3.5 km
f. For the classification column process: enumerating values for onehot encoding, wherein the type of the gearbox is manual and automatic;
g. the processed data is used as the input of a training model, and a LightGBM model is adopted for training to obtain a vehicle estimation model;
h. if the vehicle and the vehicle condition information of a certain trolley, a model estimated value of the vehicle is obtained.
In step S4, the calibrating the estimated price of the model output, and obtaining the vehicle estimation result includes:
s41, obtaining a vehicle regression conservation rate according to the association relation between the vehicle age and the brand vehicle system dimension conservation rate of the vehicle;
s42, calculating a regression valuation based on the regression conservation rate; wherein, regression valuation = regression underwriting rate new vehicle guidance price;
s43, obtaining deviation between the regression estimation value and the model estimation value through comparison, and determining a vehicle estimation value according to the deviation.
In step S41, obtaining the vehicle regression score according to the association between the vehicle age and the brand train dimension score comprises:
determining dimension conservation rate of brand vehicle types according to transaction information of the history data of the second hand vehicle;
and obtaining the regression conservation rate of the vehicle according to the dimension conservation rate curve of the brand vehicle type.
Wherein, determining the brand vehicle model dimension retention rate includes:
calculating the vehicle age and the maintenance rate of each vehicle according to the data of the vehicles of any brand in the unit time category; wherein, the vehicle age refers to the month difference between the date of the transaction and the date of the card-up; the value retention rate refers to the ratio of the price of the exchange to the new vehicle guidance price; for example, calculating the vehicle age and the maintenance rate of each vehicle according to the vehicle data of the vehicles of a certain brand which are delivered for nearly 3 months;
if the number of vehicles which are intersected in the unit time category of the brand vehicle is smaller than the preset statistic, and the vehicle age and the retention rate cannot be fitted into an exponential function, selecting brand vehicle system dimension data corresponding to the vehicle, and re-fitting;
if the brand system dimension can not be fit into the index function, taking the brand dimension retention;
if the brand dimension still fails to fit to an exponential function, setting the brand as a popular brand;
combining all the brand engagement records of the minor population, obtaining the corresponding value retention rates of different vehicle ages through linear fitting according to statistical analysis, and taking the value retention rates as the dimension value retention rates of the brand vehicle types.
Obtaining the vehicle regression maintenance rate includes:
inputting the vehicle age of each vehicle according to the dimension value retention rate curve of the brand vehicle model to obtain the regression value retention rate of the vehicle;
the obtaining of the vehicle regression conservation rate further comprises the following steps: abnormal transaction processing is carried out on vehicles with lower success rate according to the vehicle regression rate;
wherein, the vehicle with lower transaction insurance value rate comprises:
for the vehicles with the proportion of the vehicle transaction guarantee rate lower than a first threshold value exceeding a second threshold value in the transaction data of the entrusters and the vehicle operators;
and screening the vehicles by taking the vehicle price and the vehicle condition as indexes based on the vehicle dimension, wherein the vehicle with the lower price and the lower maintenance rate exceeds a third threshold. Wherein, the first threshold value, the second threshold value and the third threshold value are predefined according to statistical data.
a. From the commissioner and carrier transactions: the ratio of the lower threshold value a of the vehicle trading value rate in the vehicles taken by a certain vehicle manufacturer to the price of more than 1 ten thousand orders exceeds the threshold value b;
b. from the vehicle dimension: and the vehicle with the price exceeding 10000 and better vehicle condition has lower price than the threshold value c.
Description: 1) Entrusting business: vehicle selling party in transaction
2) Vehicle quotient: vehicle buying party in transaction
3) Bargained rate = bargained price/new car guidance price
3) The success rate is low: regression retention rate-success retention rate exceeding threshold
4) The threshold is derived from statistical analysis.
In step S43, the determining the vehicle estimate according to the magnitude of the deviation includes:
if the deviation between the regression estimation and the model estimation is lower than a threshold d, the model estimation is used as a final estimation of the vehicle;
if the deviation between the regression estimation and the model estimation is higher than the threshold d, the following processing is performed:
finding out vehicle transaction information of similar vehicle conditions and similar mileage of the same vehicle to be estimated from historical transaction data in a preset time range;
if the record number is higher than N, calculating an average price;
if the deviation between the average cost and the regression estimated value is smaller than the deviation between the average cost and the model estimated value, the regression estimated value is used as a vehicle final estimated value, otherwise, the model estimated value is used as the vehicle final estimated value;
and if the number of records is lower than the preset number threshold, taking the model estimated value as a final estimated value of the vehicle.
The calibration procedure is as follows:
a. and obtaining the vehicle regression conservation rate according to the association relation between the vehicle age and the brand train dimension conservation rate of the vehicle. Recalculate regression valuation = regression warranty rate new vehicle guidance price
b. If the deviation between the regression estimation and the model estimation is lower than the threshold value d, the model estimation is taken as the final estimation of the vehicle
c. If the deviation between the regression estimation and the model estimation is higher than the threshold d, the following processing is performed:
i. finding out the vehicle transaction information of the similar mileage of the same vehicle (same brand of vehicle system, same year and same province) as the vehicle to be estimated from the latest three-month transaction data,
if the record number is higher than N, the average price is obtained
1. The average cost and the regression estimate deviate less than the average cost and the model estimate, the regression estimate is used as the final vehicle estimate, otherwise the model estimate is used as the final vehicle estimate
if the number of records is less than N, the model estimation is used as the final estimation of the vehicle
N can be obtained from statistical analysis.
In summary, the experimental results obtained by the above method are as follows:
1) Overall estimation accuracy of vehicle: near 90% (less than 10% MAPE). Mape=abs (valuation-actual price/actual price)
2) Number of concurrency: 8 cores 32GCPU,300 concurrent 1 s.
The data of the same row are as follows: taking batch selling price as an example
Price-offering segmentation The method of the patent Widely used products in the market
0-4 ten thousand 16.8% 27.3%
4-8 ten thousand 7.3% 9.3%
8-15 ten thousand 6.9% 7.9%
15-20 ten thousand 6.9% 7.0%
Over 20 ten thousand 9.3% 9.0%
Integral body 9.9% 21.3%
Example 1: as shown in fig. 2, the operation procedure of the second-hand vehicle rapid pricing method based on the LightGBM model is further described from the off-line and on-line aspects according to the above embodiment, and includes:
1. vehicle model library data cleaning and updating
2. Historical transaction data cleansing and updating
3. According to historical transaction data, counting dimension conservation rate of brand vehicle models
4. Standard feature engineering: and combining the historical transaction vehicle characteristic engineering and the vehicle model library characteristic engineering to be used as the input of model training. Meanwhile, the characteristic engineering data of the vehicle model library is saved and used as a resource file for online use
5. Model training: the tree model training can be selected and a model file can be generated as a resource file for online use
6. The main function operation flow is as follows:
acquiring information of a vehicle to be estimated and vehicle conditions input by a service, and triggering operation;
according to the information of the vehicle to be estimated, finding out the characteristic engineering information of the vehicle from the standard characteristic engineering information of the vehicle model library;
carrying out characteristic engineering treatment according to the vehicle condition information;
combining the vehicle characteristic engineering and the vehicle condition characteristic engineering to be used as input of model prediction to obtain a model estimated value;
performing post-model calibration treatment;
and outputting an estimated result for calling by the development interface.
Example 2: based on the same technical concept, the present embodiment further provides a second-hand vehicle rapid pricing system based on a LightGBM model, as shown in fig. 3, which includes:
the feature acquisition module 101 is configured to preprocess the second-hand vehicle history data, and perform feature analysis on the preprocessed data to obtain vehicle features;
the model training module 102 is configured to input the vehicle features into a pre-constructed LightGBM model, and obtain a vehicle estimation model through training;
the evaluation module 103 is used for estimating the price of the vehicle to be estimated through the vehicle estimation model;
and the calibration module 104 is used for calibrating the estimated price output by the model to obtain a vehicle estimated result.
It will be appreciated by those skilled in the art that embodiments of the present application may be provided as a method, system, or computer program product. Accordingly, the present application may take the form of an entirely hardware embodiment, an entirely software embodiment or an embodiment combining software and hardware aspects. Furthermore, the present application may take the form of a computer program product embodied on one or more computer-usable storage media (including, but not limited to, disk storage, CD-ROM, optical storage, and the like) having computer-usable program code embodied therein.
The present application is described with reference to flowchart illustrations and/or block diagrams of methods, apparatus (systems) and computer program products according to embodiments of the application. It will be understood that each flow and/or block of the flowchart illustrations and/or block diagrams, and combinations of flows and/or blocks in the flowchart illustrations and/or block diagrams, can be implemented by computer program instructions. These computer program instructions may be provided to a processor of a general purpose computer, special purpose computer, embedded processor, or other programmable data processing apparatus to produce a machine, such that the instructions, which execute via the processor of the computer or other programmable data processing apparatus, create means for implementing the functions specified in the flowchart flow or flows and/or block diagram block or blocks.
These computer program instructions may also be stored in a computer-readable memory that can direct a computer or other programmable data processing apparatus to function in a particular manner, such that the instructions stored in the computer-readable memory produce an article of manufacture including instruction means which implement the function specified in the flowchart flow or flows and/or block diagram block or blocks.
These computer program instructions may also be loaded onto a computer or other programmable data processing apparatus to cause a series of operational steps to be performed on the computer or other programmable apparatus to produce a computer implemented process such that the instructions which execute on the computer or other programmable apparatus provide steps for implementing the functions specified in the flowchart flow or flows and/or block diagram block or blocks.
Finally, it should be noted that: the above embodiments are only for illustrating the technical solution of the present application and not for limiting the scope of protection thereof, and although the present application has been described in detail with reference to the above embodiments, it should be understood by those of ordinary skill in the art that: various alterations, modifications, and equivalents may occur to others skilled in the art upon reading the present disclosure, and are within the scope of the appended claims.

Claims (10)

1. A method for quick pricing of a second hand truck based on a LightGBM model, the method comprising:
preprocessing the history data of the second hand vehicle, and carrying out feature analysis on the preprocessed data to obtain vehicle features;
inputting the vehicle characteristics to a pre-constructed LightGBM model, and obtaining a vehicle estimation model through training;
estimating the price of the vehicle to be estimated through a vehicle estimation model;
and calibrating the estimated price output by the model to obtain a vehicle estimated value result.
2. The method of claim 1, wherein the preprocessing of the second hand vehicle history data is preceded by: the method comprises the steps of collecting second-hand vehicle historical data in a unit time period, wherein the second-hand vehicle historical data comprises vehicle basic information, use information and transaction information.
3. The method of claim 1, wherein preprocessing the second-hand vehicle history data, and performing feature analysis on the preprocessed data, and obtaining the vehicle feature comprises:
acquiring vehicle information and vehicle condition information of a historical unit time period;
performing abnormal information processing on the vehicle information and the vehicle condition information by adopting a null filling, abnormal value correction and data type conversion mode:
based on the vehicle and the vehicle condition information for processing the abnormal data, establishing characteristic engineering information; the characteristic engineering information comprises brand yield, regional yield and brand yield price average/variance;
classifying the vehicles, the vehicle condition information and the characteristic engineering information into continuous columns and classified columns, carrying out data processing aiming at each type, and taking the processed data as the vehicle characteristics.
4. The method of claim 3, wherein the vehicle information comprises: manufacturing party, brand, train, model, annual style, emission standard, new vehicle guiding price, displacement, horsepower, fuel consumption, vehicle type, joint fuel type, fuel supply mode, gearbox type, comprehensive fuel consumption and vehicle body type;
the vehicle condition information: mileage, license plate area, vehicle body color, vehicle age, operating property, number of passes, optional components, vehicle condition star rating, vehicle source, maintenance record, and risk record.
5. The method of claim 1, wherein calibrating the estimated price of the model output to obtain the vehicle estimation result comprises:
obtaining a vehicle regression conservation rate according to the association relation between the vehicle age and the brand vehicle system dimension conservation rate of the vehicle;
calculating a regression valuation based on the regression retention rate; wherein, regression valuation = regression underwriting rate new vehicle guidance price;
and obtaining deviation between the regression estimation value and the model estimation value through comparison, and determining a vehicle estimation value according to the deviation.
6. The method of claim 5, wherein obtaining the vehicle regression score based on the association of the vehicle age and the brand train dimension score comprises:
determining dimension conservation rate of brand vehicle types according to transaction information of the history data of the second hand vehicle;
and obtaining the regression conservation rate of the vehicle according to the dimension conservation rate curve of the brand vehicle type.
7. The method of claim 6, wherein the determining a brand vehicle model dimension retention rate comprises:
calculating the vehicle age and the maintenance rate of each vehicle according to the data of the vehicles of any brand in the unit time category; wherein, the vehicle age refers to the month difference between the date of the transaction and the date of the card-up; the value retention rate refers to the ratio of the price of the exchange to the new vehicle guidance price;
if the number of vehicles which are intersected in the unit time category of the brand vehicle is smaller than the preset statistic, and the vehicle age and the retention rate cannot be fitted into an exponential function, selecting brand vehicle system dimension data corresponding to the vehicle, and re-fitting;
if the brand system dimension can not be fit into the index function, taking the brand dimension retention;
if the brand dimension still fails to fit to an exponential function, setting the brand as a popular brand;
combining all the brand engagement records of the minor population, obtaining the corresponding value retention rates of different vehicle ages through linear fitting according to statistical analysis, and taking the value retention rates as the dimension value retention rates of the brand vehicle types.
8. The method of claim 6, wherein obtaining a vehicle regression coverage from the brand model dimension coverage curve comprises:
inputting the vehicle age of each vehicle according to the dimension value retention rate curve of the brand vehicle model to obtain the regression value retention rate of the vehicle;
the obtaining of the vehicle regression conservation rate further comprises the following steps: abnormal transaction processing is carried out on vehicles with lower success rate according to the vehicle regression rate;
wherein, the vehicle with lower transaction insurance value rate comprises:
for the vehicles with the proportion of the vehicle transaction guarantee rate lower than a first threshold value exceeding a second threshold value in the transaction data of the entrusters and the vehicle operators;
and screening the vehicles by taking the vehicle price and the vehicle condition as indexes based on the vehicle dimension, wherein the vehicle with the lower price and the lower maintenance rate exceeds a third threshold.
9. The method of claim 5, wherein determining a vehicle estimate based on the magnitude of the deviation comprises:
if the deviation between the regression estimation and the model estimation is lower than a threshold d, the model estimation is used as a final estimation of the vehicle;
if the deviation between the regression estimation and the model estimation is higher than the threshold d, the following processing is performed:
finding out vehicle transaction information of similar vehicle conditions and similar mileage of the same vehicle to be estimated from historical transaction data in a preset time range;
if the record number is higher than N, calculating an average price;
if the deviation between the average cost and the regression estimated value is smaller than the deviation between the average cost and the model estimated value, the regression estimated value is used as a vehicle final estimated value, otherwise, the model estimated value is used as the vehicle final estimated value;
and if the number of records is lower than the preset number threshold, taking the model estimated value as a final estimated value of the vehicle.
10. A LightGBM model-based secondary vehicle rapid pricing system, comprising:
the feature acquisition module is used for preprocessing the history data of the second hand vehicle, and carrying out feature analysis on the preprocessed data to obtain vehicle features;
the model training module is used for inputting the vehicle characteristics into a pre-constructed LightGBM model and obtaining a vehicle estimated model through training;
the evaluation module is used for estimating the price of the vehicle to be estimated through the vehicle estimation model;
and the calibration module is used for calibrating the estimated price output by the model to obtain a vehicle estimated result.
CN202310373660.3A 2023-04-10 2023-04-10 Rapid pricing method and system for second-hand vehicle based on LightGBM model Pending CN117114775A (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN202310373660.3A CN117114775A (en) 2023-04-10 2023-04-10 Rapid pricing method and system for second-hand vehicle based on LightGBM model

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN202310373660.3A CN117114775A (en) 2023-04-10 2023-04-10 Rapid pricing method and system for second-hand vehicle based on LightGBM model

Publications (1)

Publication Number Publication Date
CN117114775A true CN117114775A (en) 2023-11-24

Family

ID=88809909

Family Applications (1)

Application Number Title Priority Date Filing Date
CN202310373660.3A Pending CN117114775A (en) 2023-04-10 2023-04-10 Rapid pricing method and system for second-hand vehicle based on LightGBM model

Country Status (1)

Country Link
CN (1) CN117114775A (en)

Citations (7)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN110634021A (en) * 2019-09-06 2019-12-31 深圳壹账通智能科技有限公司 Big data based vehicle estimation method, system, device and readable storage medium
CN110648154A (en) * 2018-06-26 2020-01-03 优信拍(北京)信息科技有限公司 Estimation method and system for second-hand car value-preserving rate and service server
CN111242680A (en) * 2020-01-08 2020-06-05 中联财联网科技有限公司 Second-hand vehicle estimation method and system based on machine learning algorithm
CN112508600A (en) * 2020-11-17 2021-03-16 四川新网银行股份有限公司 Vehicle value evaluation method based on Internet public data
CN113159837A (en) * 2021-04-09 2021-07-23 杭州搜车网科技有限公司 Vehicle price evaluation method and device
CN113256325A (en) * 2021-04-21 2021-08-13 北京巅峰科技有限公司 Second-hand vehicle valuation method, system, computing device and storage medium
CN113793170A (en) * 2021-08-25 2021-12-14 周远 Second-hand car price prediction method based on neural network and LightGBM algorithm

Patent Citations (7)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN110648154A (en) * 2018-06-26 2020-01-03 优信拍(北京)信息科技有限公司 Estimation method and system for second-hand car value-preserving rate and service server
CN110634021A (en) * 2019-09-06 2019-12-31 深圳壹账通智能科技有限公司 Big data based vehicle estimation method, system, device and readable storage medium
CN111242680A (en) * 2020-01-08 2020-06-05 中联财联网科技有限公司 Second-hand vehicle estimation method and system based on machine learning algorithm
CN112508600A (en) * 2020-11-17 2021-03-16 四川新网银行股份有限公司 Vehicle value evaluation method based on Internet public data
CN113159837A (en) * 2021-04-09 2021-07-23 杭州搜车网科技有限公司 Vehicle price evaluation method and device
CN113256325A (en) * 2021-04-21 2021-08-13 北京巅峰科技有限公司 Second-hand vehicle valuation method, system, computing device and storage medium
CN113793170A (en) * 2021-08-25 2021-12-14 周远 Second-hand car price prediction method based on neural network and LightGBM algorithm

Non-Patent Citations (2)

* Cited by examiner, † Cited by third party
Title
朱升高: "《二手车评估与经营管理》", 31 July 2020, 机械工业出版社, pages: 227 - 229 *
谢杨等: ""基于机器学习的二手车价格评估方法"", 《企业技术开发》, vol. 34, no. 11, 16 April 2015 (2015-04-16), pages 116 - 118 *

Similar Documents

Publication Publication Date Title
US7725376B2 (en) Systems, methods and computer program products for modeling demand, supply and associated profitability of a good in an aggregate market
US7769628B2 (en) Systems, methods and computer program products for modeling uncertain future demand, supply and associated profitability of a good
Copeland et al. Inventories and the automobile market
Helfand et al. Evaluating the consumer response to fuel economy: A review of the literature
US20040243502A1 (en) Securities trading simulation
US20040249696A1 (en) Systems, methods and computer program products for modeling demand, supply and associated profitability of a good
KR101675820B1 (en) Used car pricing system by standard price
CN108154275A (en) Automobile residual value prediction model and Forecasting Methodology based on big data
CN110704730A (en) Product data pushing method and system based on big data and computer equipment
US20190347676A1 (en) System, method and computer program for forecasting residual values of a durable good over time
US7346566B2 (en) Method for assessing equity adequacy
Parks Durability, maintenance and the price of used assets
JP7451862B2 (en) Vehicle purchase system
US7739166B2 (en) Systems, methods and computer program products for modeling demand, supply and associated profitability of a good in a differentiated market
CN113674040B (en) Vehicle quotation method, computer device and computer-readable storage medium
JP5279498B2 (en) Media pricing system and method
Gillingham et al. A dynamic model of vehicle ownership, type choice, and usage
CN109816409A (en) A kind of used car pricing method, device, equipment and computer-readable medium
US20210326957A1 (en) Systems and methods for determining and leveraging geography-dependent relative desirability of products
Poswal et al. An economic ordering policy to control deteriorating medicinal products of uncertain demand with trade credit for healthcare industries
Kim et al. Predicting merger targets of hospitality firms (a Logit model)
US20090240557A1 (en) Method, apparatus and computer program product for valuing a technological innovation
CN117114775A (en) Rapid pricing method and system for second-hand vehicle based on LightGBM model
Hammond Quantifying consumer perception of a financially distressed company
Ozhegov et al. Ensemble Method for Censored Demand Prediction

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination