CN110288364A - A kind of used car pricing method based on XGBoost model, apparatus and system - Google Patents

A kind of used car pricing method based on XGBoost model, apparatus and system Download PDF

Info

Publication number
CN110288364A
CN110288364A CN201810224028.1A CN201810224028A CN110288364A CN 110288364 A CN110288364 A CN 110288364A CN 201810224028 A CN201810224028 A CN 201810224028A CN 110288364 A CN110288364 A CN 110288364A
Authority
CN
China
Prior art keywords
data
type configuration
xgboost model
unit
used car
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Pending
Application number
CN201810224028.1A
Other languages
Chinese (zh)
Inventor
黄洁申
伊凡
石玉明
邱慧
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Youxuan (Beijing) Information Technology Co., Ltd
Original Assignee
Mdt Infotech Ltd (shanghai) Mdt Infotech Ltd
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Mdt Infotech Ltd (shanghai) Mdt Infotech Ltd filed Critical Mdt Infotech Ltd (shanghai) Mdt Infotech Ltd
Priority to CN201810224028.1A priority Critical patent/CN110288364A/en
Publication of CN110288364A publication Critical patent/CN110288364A/en
Pending legal-status Critical Current

Links

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06QINFORMATION AND COMMUNICATION TECHNOLOGY [ICT] SPECIALLY ADAPTED FOR ADMINISTRATIVE, COMMERCIAL, FINANCIAL, MANAGERIAL OR SUPERVISORY PURPOSES; SYSTEMS OR METHODS SPECIALLY ADAPTED FOR ADMINISTRATIVE, COMMERCIAL, FINANCIAL, MANAGERIAL OR SUPERVISORY PURPOSES, NOT OTHERWISE PROVIDED FOR
    • G06Q30/00Commerce
    • G06Q30/02Marketing; Price estimation or determination; Fundraising
    • G06Q30/0201Market modelling; Market analysis; Collecting market data
    • G06Q30/0206Price or cost determination based on market factors

Landscapes

  • Business, Economics & Management (AREA)
  • Strategic Management (AREA)
  • Engineering & Computer Science (AREA)
  • Accounting & Taxation (AREA)
  • Development Economics (AREA)
  • Finance (AREA)
  • Entrepreneurship & Innovation (AREA)
  • Game Theory and Decision Science (AREA)
  • Data Mining & Analysis (AREA)
  • Economics (AREA)
  • Marketing (AREA)
  • Physics & Mathematics (AREA)
  • General Business, Economics & Management (AREA)
  • General Physics & Mathematics (AREA)
  • Theoretical Computer Science (AREA)
  • Management, Administration, Business Operations System, And Electronic Commerce (AREA)

Abstract

The embodiment of the present application shows a kind of used car pricing method based on XGBoost model, apparatus and system.Method shown in the embodiment of the present application, in the building process of target XGBoost model, by the transaction data of used car, and, automobile type configuration data fusion, and sequence meaning will be assigned after above-mentioned data prediction;Meanwhile the method shown in the embodiment of the present application can effectively avoid the generation of over-fitting using XGBoost model.The application is to implement price evaluation of the target XGBoost model suitable for rolling stock of building, even for the price of the used car after some repackings, it only needs the used car after the repacking, the relevant information of used car, automobile type configuration data, and the information after repacking is defeated, the price of the used car after can obtaining above-mentioned repacking.

Description

A kind of used car pricing method based on XGBoost model, apparatus and system
Technical field
The present invention relates to field of computer technology, in particular to a kind of used car pricing method based on XGBoost model, Apparatus and system.
Background technique
As the development of economic society and life of urban resident level improve, vehicle has become private basic demand.Closely The trade deal of Nian Lai, the rapid development of China's economic, the rapid growth of vehicle population, used car are more and more flourishing, the whole nation The second-hand vehicle broker of each provincial capital and unit generally reach 5000 or more, and the used car to strike a bargain every year reaches 10 Ten thousand -20 ten thousand.With the gradually prosperity of the trade deal of used car, used automobile market faces series of challenges therewith, for example, mesh Preceding used automobile market is still the market of an information asymmetry, and consumer is difficult to know the value of used car, as a result, two Handcart market is difficult to obtain the trust of consumer, and used automobile market is caused to lose some potential clients.Therefore how couple two Handcart carries out assessment price, it appears particularly important.
In recent years, machine learning algorithm is popularized year by year, and more Second-hand Vehicle Transaction platforms are by machine learning algorithm application In the price evaluation of used car, during being modeled using machine learning algorithm to used car price, common model accuracy For vehicle system level-one, modeled respectively according to different vehicle systems.For example, the machine learning algorithm shown in the prior art construct in mould In the building process of type, firstly, the data in database are sorted out, the data include transaction data, the friendship according to vehicle system Easy data include the concluded price of used car, and, the relevant information of used car, the relevant information of the used car includes: vehicle In age, mileage travelled etc. is a series of can be with the data of quantitative reaction used car vehicle condition;Relevant information and two based on the used car The relevant information architecture linear model of handcart, is then based on the linear model and assesses the price of used car.
Linear model shown in the prior art is high to vehicle system integrity degree requirement in database, not corresponding in the database It can not be evaluated when vehicle system.Also, as people promote the experience requirements of automobile now, many car owners can select to add Detailed configuration is filled, conventional model can not be captured to detailed configuration information is installed additional.
Summary of the invention
Goal of the invention of the invention is to provide a kind of used car pricing method based on XGBoost model, device and is System, it is high to vehicle system integrity degree requirement in database to solve the linear model shown in the prior art, it is not corresponding in the database It can not carry out evaluating to obtain technical problem when vehicle system.
The embodiment of the present application first aspect shows a kind of used car pricing method based on XGBoost model, the method Include:
The historical data of used car is obtained, the historical data includes: transaction data, and, automobile type configuration data are described Transaction data includes: concluded price, and, the relevant information of used car;
The historical data is pre-processed, pretreatment historical data is generated;
According to the pretreatment historical data, target XGBoost model is constructed;
Based on the target XGBoost model, the price of the used car is assessed.
Selectable, the pretreatment historical data, generating the step of pre-processing historical data includes:
The transaction data is cleared up, and, the automobile type configuration data, the transaction data after being cleared up, and, cleaning Automobile type configuration data afterwards;
The automobile type configuration data after the cleaning are assigned to sequence meaning, treated for generation according to preset assignment rule Automobile type configuration data;
Automobile type configuration data that treated described in fusion, and, the transaction data after the cleaning generates pretreatment history Data.
Selectable, the pretreatment historical data, generating the step of pre-processing historical data includes:
According to preset assignment rule, the automobile type configuration data are assigned to sequence meaning, the automobile type configuration data after assignment;
The transaction data is cleared up, and, the automobile type configuration data after assignment, the transaction data that obtains that treated, and, Automobile type configuration data that treated;
Fusion is described treated transaction data, and, treated the automobile type configuration data generate pretreatment history Data.
It is selectable, the cleaning transaction data, and, the automobile type configuration data after assignment, the transaction that obtains that treated Data, and, the step of automobile type configuration data that treated includes:
Determine the transaction data, and, data the problem of automobile type configuration data after assignment, described problem data packet It includes: missing values, exceptional value, the combination of one or more of noise figure;
The type for determining described problem data is made to obtain to handle accordingly according to the type to described problem data Treated transaction data, and, automobile type configuration data that treated.
Selectable, described according to type, the step of making described problem data to handle accordingly, includes;
If the type is exceptional value, the corresponding target signature of the exceptional value is determined;
Based on the historical data, the average value of the target signature is calculated;
The value of the target signature is replaced with the average value of the target signature.
It is selectable, it is described to include: according to the step of pre-processing historical data, construct target XGBoost model
Divide the pretreatment historical data, generates training data, and, test data;
Based on the training data, XGBoost model is constructed, generates the first XGBoost model;
Based on the first XGBoost model, the forecast price that used car is tested in the test data is calculated;
Whether the difference for judging forecast price and testing the concluded price of used car falls in presetting range;
If fallen in presetting range, determine that the first XGBoost model is target XGBoost model;
If do not fallen in presetting range, the parameter of the first XGBoost model is adjusted, generates target XGBoost mould Type.
It is selectable, the method also includes:
Count the target XGBoost model, the number that each feature occurs as child node;
Based on the number, to the importance ranking of the feature.
Itself ask embodiment second aspect that a kind of used car pricing device based on XGBoost model, described device are shown Include:
Acquiring unit, for obtaining the historical data of used car, the historical data includes: transaction data, and, vehicle Configuration data, the transaction data includes: concluded price, and, the relevant information of used car;
Pretreatment unit generates pretreatment historical data for pre-processing the historical data;
Construction unit, for constructing target XGBoost model according to the pretreatment historical data;
Assessment unit assesses the price of the used car for being based on the target XGBoost model.
Selectable, the pretreatment unit includes:
First cleaning unit, for clearing up the transaction data, and, the automobile type configuration data, after being cleared up Transaction data, and, the automobile type configuration data after cleaning;
First assignment unit, for assigning the automobile type configuration data after cleaning to sequence meaning according to preset assignment rule Justice generates treated automobile type configuration data;
First integrated unit, for merging treated the automobile type configuration data, and, the number of deals after the cleaning According to generation pretreatment historical data.
Selectable, the pretreatment unit includes:
Second assignment unit, for assigning the automobile type configuration data to sequence meaning, generating according to preset assignment rule Automobile type configuration data after assignment;
Second cleaning unit, for clearing up the transaction data, and, the automobile type configuration data after assignment are handled Transaction data afterwards, and, automobile type configuration data that treated;
Second integrated unit, for merging treated the transaction data, and, treated the automobile type configuration number According to generation pretreatment historical data.
Selectable, the second cleaning unit includes:
Problem data determination unit, for determining the transaction data, and, the problem of automobile type configuration data after assignment Data, described problem data include: missing values, exceptional value, the combination of one or more of noise figure;
Processing unit, according to the type, is made for determining the type of described problem data with phase described problem data The processing answered, the transaction data that obtains that treated, and, automobile type configuration data that treated.
Selectable, the processing unit includes;
Exceptional value determination unit determines the corresponding target signature of the exceptional value if being exceptional value for the type;
Average calculation unit calculates the average value of the target signature for being based on the historical data;
Replacement unit, for replacing the value of the target signature with the average value of the target signature.
Selectable, the construction unit includes:
Cutting unit generates training data for dividing the pretreatment historical data, and, test data;
First generation unit constructs XGBoost model, generates the first XGBoost mould for being based on the training data Type;
Sub- computing unit calculates for being based on the first XGBoost model and tests used car in the test data Forecast price;
Judging unit, for judging whether the difference of concluded price of forecast price and test used car falls in presetting range It is interior;
Sub- determination unit, if determining that the first XGBoost model is target for falling in presetting range XGBoost model;
Adjustment unit generates if adjusting the parameter of the first XGBoost model for not falling in presetting range Target XGBoost model.
It is selectable, described device further include:
Statistic unit, for counting the target XGBoost model, the number that each feature occurs as child node;
Sequencing unit, for being based on the number, to the importance ranking of the feature.
The embodiment of the present application third aspect shows a kind of used car pricing system based on XGBoost model, the system It include: application platform server, the data storage server being connected with the application platform server, the data storage clothes Business device is arranged inside the Platform Server or is independently arranged, and the application platform server is connected by internet with terminal It connects;
The terminal is used for the display of target used car price;
The storage that related data is used for according to storage server;
The application platform server is for realizing the method shown in the embodiment of the present application.
From the above technical scheme, the embodiment of the present application shows a kind of used car price side based on XGBoost model Method, apparatus and system, which comprises the historical data of used car is obtained, the historical data includes: transaction data, with And automobile type configuration data, the transaction data includes: concluded price, and, the relevant information of used car;It is gone through described in pretreatment History data generate pretreatment historical data;According to the pretreatment historical data, target XGBoost model is constructed;Based on described Target XGBoost model, assesses the price of the used car.Method shown in the embodiment of the present application, in target XGBoost model Building process in, by the transaction data of used car, and, automobile type configuration data fusion, and being assigned after above-mentioned data prediction Give sequence meaning;Meanwhile the method shown in the embodiment of the present application can effectively avoid the hair of over-fitting using XGBoost model It is raw.The application is to implement price evaluation of the target XGBoost model suitable for rolling stock of building, even for some repackings The price of used car afterwards, it is only necessary to by the used car after the repacking, the relevant information of used car, automobile type configuration data, with And the information after repacking is defeated, the price of the used car after can obtaining above-mentioned repacking.
Detailed description of the invention
It in order to more clearly explain the embodiment of the invention or the technical proposal in the existing technology, below will be to institute in embodiment Attached drawing to be used is needed to be briefly described, it should be apparent that, the accompanying drawings in the following description is only some implementations of the invention Example, for those of ordinary skill in the art, without creative efforts, can also obtain according to these attached drawings Obtain other attached drawings.
Fig. 1-1 is that a kind of used car pricing method structure based on XGBoost model exemplified is preferably implemented according to one Block diagram;
Fig. 1-2 is according to a kind of used car pricing method knot based on XGBoost model shown in another preferred embodiment Structure block diagram
Fig. 2 is the process that a kind of used car pricing method based on XGBoost model exemplified is preferably implemented according to one Figure;
Fig. 3 is the detail flowchart that the step S102 exemplified is preferably implemented according to one;
Fig. 4 is the detail flowchart according to the step S102 shown in another preferred embodiment;
Fig. 5 is the detail flowchart that the step S10222 exemplified is preferably implemented according to one;
Fig. 6 is the detail flowchart that the step S102222 exemplified is preferably implemented according to one;
Fig. 7 is the detail flowchart that the step S103 exemplified is preferably implemented according to one;
Fig. 8 is that the detail flowchart that the important factor in order exemplified determines method is preferably implemented according to one;
Fig. 9 is the structure that a kind of used car pricing device based on XGBoost model exemplified is preferably implemented according to one Block diagram;
Figure 10 is the structural block diagram that the pretreatment unit exemplified is preferably implemented according to one;
Figure 11 is the structural block diagram according to the pretreatment unit shown in another preferred embodiment;
Figure 12 is the structural block diagram that the second cleaning unit exemplified is preferably implemented according to one;
Figure 13 is the structural block diagram that the processing unit exemplified is preferably implemented according to one;
Figure 14 is the structural block diagram that the construction unit exemplified is preferably implemented according to one;
Figure 15 is that the also included structure of device exemplified is preferably implemented according to one;
Figure 16 is the structural block diagram that a kind of server exemplified is preferably implemented according to one.
Specific embodiment
Following will be combined with the drawings in the embodiments of the present invention, and technical solution in the embodiment of the present invention carries out clear, complete Whole description, it is clear that described embodiments are only a part of the embodiments of the present invention, instead of all the embodiments.It is based on Embodiment in the present invention, it is obtained by those of ordinary skill in the art without making creative efforts every other Embodiment shall fall within the protection scope of the present invention.
Embodiment 1:
Linear model shown in the prior art is high to vehicle system integrity degree requirement in database, not corresponding in the database It can not be evaluated when vehicle system.Also, as people promote the experience requirements of automobile now, many car owners can select to add Detailed configuration is filled, conventional model can not be captured to detailed configuration information is installed additional.
To solve the above-mentioned problems, the embodiment of the present application shows a kind of used car pricing device based on XGBoost model; Specifically, Fig. 1-1 is please referred to, and, Fig. 1-2;The system comprises: application platform server 31 takes with the application platform The data storage server 32 that business device 31 is connected, the data storage server 32 are arranged inside the Platform Server 31 Or be independently arranged, the application platform server 31 is connected by internet with terminal 33;
The terminal 33 is used for the display of target used car price;
Terminal 33 shown in the embodiment of the present application is the equipment for being in network outermost in computer network, is mainly used for Input and the output of processing result of user information etc..Mobile terminal shown in the prior art is such as: mobile phone, PAD are in this Shen It please be in the protection scope of embodiment.
The storage that related data is used for according to storage server 32;
The application platform server 31 shows method for realizing the embodiment of the present application;
Application platform server 31 shown in the embodiment of the present application is provided a kind of simple and can be managed for web application The access mechanism to system resource.Application platform server 31 also provides rudimentary service, such as the realization sum number of http protocol According to library connection management.Servlet container is only a part of application server.Other than Servlet container, application platform clothes Business device 31 it is also possible to provide other Java EE (Enterprise Edition) component, such as Enterprise Java Bean container, JNDI server with And JMS service device etc..
Application platform server 31 in the system shown shown in the embodiment of the present application, obtains the historical data of used car, The historical data includes: transaction data, and, automobile type configuration data, the transaction data includes: concluded price, and, two The relevant information of handcart;The historical data is pre-processed, pretreatment historical data is generated;According to the pretreatment historical data, Construct target XGBoost model;Based on the target XGBoost model, the price of the used car is assessed.
Application platform server 31 is different from Classical forecast data set first during data acquisition, is using two Detailed automobile type configuration data are obtained while the transaction data of handcart.
Then sequence meaning is assigned to the automobile type configuration data in historical data, so that automobile type configuration data can be used for XGBoost model modeling.
Historical data is inputted learning organization established model by the method shown in the embodiment of the present application, since the quantity of input value is huge The problem of being easy to appear over-fitting greatly, the method building XGBoost model shown in the embodiment of the present application can effectively avoid intending Close the generation of phenomenon.
After the completion of target XGBoost modeling, it can be directly used for carrying out used car price expectation.In no vehicle system information, only When there are vehicle detailed automobile type configuration data, used car price can also be predicted.Meanwhile it reequiping and was installing additional It, only can forecast price by directly replacement relevant configuration information when used car is predicted.Moreover, feature importance is arranged Row, can play directive function to modeling iterative process later.
It, can be to the data modeled in iteration next time by the analysis to high error information during error analysis It collects, data scrubbing and Feature Engineering play directive function.
Embodiment 2:
Referring to Fig. 2, the embodiment of the present application second aspect shows a kind of used car price side based on XGBoost model Method, which comprises
S101 obtains the historical data of used car, and the historical data includes: transaction data, and, automobile type configuration data, The transaction data includes: concluded price, and, the relevant information of used car;
The transaction data includes the concluded price of used car, and, the relevant information of used car, the phase of the used car Closing information includes: vehicle age, and mileage travelled etc. is a series of can be with the data of quantitative reaction used car vehicle condition;
The automobile type configuration data of used car are some configuration informations of used car, and usual each used car has 200-300's Configuration information.The automobile type configuration data include configuration, and, it is described to configure corresponding attribute;
For example:
The car light of used car uses halogen lamp, and the car light of used car uses xenon lamp, and the car light of used car uses LED light;
Wherein, car light is configuration;Halogen lamp, xenon lamp, LED light are attribute.
S102 pre-processes the historical data, generates pretreatment historical data;
Usual automobile type configuration data cannot directly be identified that the method shown in the embodiment of the present application in advance will be described by computer Historical data is handled through row, so that the automobile type configuration data in historical data can be identified successfully by computer, for model Building.
Historical data is inputted learning organization established model by the method shown in the embodiment of the present application, since the quantity of input value is huge The problem of being easy to appear over-fitting greatly, the method building XGBoost model shown in the embodiment of the present application can effectively avoid intending Close the generation of phenomenon.
S103 constructs target XGBoost model according to the pretreatment historical data;
S104 is based on the target XGBoost model, assesses the price of the used car.
After the completion of target XGBoost modeling, it can be directly used for carrying out used car price expectation.In no vehicle system information, only When there are vehicle detailed automobile type configuration data, used car price can also be predicted.Meanwhile it reequiping and was installing additional It, only can forecast price by directly replacement relevant configuration information when used car is predicted.
Method shown in the embodiment of the present application can effectively avoid the generation of over-fitting using XGBoost model.This Application is price evaluation of the target XGBoost model suitable for rolling stock for implementing building, after some repackings The price of used car, it is only necessary to by the used car after the repacking, the relevant information of used car, automobile type configuration data, Yi Jigai Information after dress is defeated, the price of the used car after can obtaining above-mentioned repacking.
Embodiment 3:
In order to guarantee the target XGBoost model of the method building shown in the embodiment of the present application, it can be accurately used for two vehicles Price evaluation, the embodiment of the present application shows a kind of preprocess method of historical data, specifically, please referring to Fig. 3:
Embodiment 3 has similar step to embodiment 2, only difference is that in the technical solution shown in embodiment 2 Step S102 includes the following steps:
S10211 clears up the transaction data, and, the automobile type configuration data, the transaction data after being cleared up, with And the automobile type configuration data after cleaning;
The cleaning of historical data mainly for transaction data, and, in automobile type configuration data, missing values, exceptional value is made an uproar Sound item is handled;
Missing values refer to lacking a certain item in historical trading data:
Such as:
The car light Dan Wei of a certain used car is collected into the corresponding attribute of car light;The car light configuration of the used car is to lack Value.
Exceptional value:
It has been approached and scraps for example, the service life of general vehicle after 20 years, changes vehicle, if a historical data Middle display, vehicle age of a certain used car are 50 years, determine that the historical data is exceptional value.
In practical application, the data that the actual conditions of all historical datas and used car are disagreed can be defined as different Constant value.
For noise:
Concluded price, and, the relevant information of used car constructs a linear model, and the ordinate of the linear model is Concluded price, abscissa are the relevant information of used car, and a safe range, the safe range are set on the linear model Certain region around the linear model;
Then, it is based on the safe range, determines whether the historical data is noise.
Above-mentioned missing values, exceptional value, noise usually can be directly deleted, or removes replacement missing values using some numerical value, it is different Constant value, noise.
S10212 is according to preset assignment rule, by the automobile type configuration data after the cleaning, assigns sequence meaning, at generation Automobile type configuration data after reason;
Usual automobile type configuration data cannot directly be identified that the method shown in the embodiment of the present application in advance will be described by computer Historical data is handled through row, so that the automobile type configuration data in historical data can be identified successfully by computer, for model Building.
Such as:
Halogen vehicle lamp, xenon lamp, LED car lamp;
3 can be assigned by LED car lamp;
2 can be assigned by xenon lamp car light;
1 can be assigned by halogen vehicle lamp;
Again for example:
For example, to feature reversing radar, in original data set feature be described as with or without, 1 has been assigned in Feature Engineering, Nothing is assigned to by 0;
It is worth noting that, the embodiment of the present application is only exemplary the assignment side of the automobile type configuration data after showing cleaning Method, mode to the automobile type configuration data after cleaning, can assign the method for sequence meaning within the scope of protection of this application again It is secondary since length is limited, just do not illustrate one by one.
Automobile type configuration data that treated described in S10213 fusion, and, the transaction data after the cleaning generates pre- place Manage historical data.
Method shown in the embodiment of the present application, in data characteristics engineering, according to preset assignment rule by detailed automobile type configuration The feature of data assigns sequence meaning, so that automobile type configuration can be used for XGBoost model modeling.Moreover, history is completed Data are completed, cleaning, and, it can be used for other Tree-structure Models modeling such as decision tree after two step process of assignment Model, Random Forest model, gradient promote decision-tree model etc..
Method shown in the embodiment of the present application carries out cleaning to historical data and assignment is handled, and guarantees that the application is implemented The truth for the reaction used car that historical data used by the method exemplified determines.Guarantee shown in the embodiment of the present application The target XGBoost of method building can accurately be used for the assessment of used car price.
Embodiment 4:
In order to guarantee the target XGBoost model of the method building shown in the embodiment of the present application, it can be accurately used for two vehicles Price evaluation, the embodiment of the present application shows a kind of preprocess method of historical data, specifically, please referring to Fig. 4:
Embodiment 3 has similar step to embodiment 2, only difference is that in the technical solution shown in embodiment 2 Step S102 includes the following steps:
S10221 assigns sequence meaning according to preset assignment rule, by the automobile type configuration data, and the vehicle after assignment is matched Set data;
S10222 clears up the transaction data, and, the automobile type configuration data after assignment, the number of deals that obtains that treated According to, and, automobile type configuration data that treated;
S10223 fusion is described treated transaction data, and, treated the automobile type configuration data generate pre- place Manage historical data.
Method shown in the embodiment of the present application, in data characteristics engineering, according to preset assignment rule by detailed automobile type configuration The feature of data assigns sequence meaning, so that automobile type configuration can be used for XGBoost model modeling.Moreover, history is completed Data are completed, cleaning, and, it can be used for other Tree-structure Models modeling such as decision tree after two step process of assignment Model, Random Forest model, gradient promote decision-tree model etc..
Method shown in the embodiment of the present application carries out cleaning to historical data and assignment is handled, guarantee that the application is implemented The truth for the reaction used car that historical data used by the method exemplified determines.Guarantee shown in the embodiment of the present application The target XGBoost of method building can accurately be used for the assessment of used car price.
Embodiment 5:
Embodiment 5 has similar step to embodiment 4, only difference is that in the technical solution shown in embodiment 4 Step S10222 includes the following steps, specifically, please referring to Fig. 5:
S102221 determines the transaction data, and, data the problem of automobile type configuration data after assignment, described problem Data include: missing values, exceptional value, the combination of one or more of noise figure;
Determine that method is as follows:
The cleaning of historical data mainly for transaction data, and, in automobile type configuration data, missing values, exceptional value is made an uproar Sound item is handled;
Missing values refer to lacking a certain item in historical trading data:
Such as:
The car light Dan Wei of a certain used car is collected into the corresponding attribute of car light;The car light configuration of the used car is to lack Value.
Exceptional value has been approached and scraps for example, the service life of general vehicle after 20 years, changes vehicle, if one It is shown in historical data, the vehicle age of a certain used car is 50 years, then determines the historical data for exceptional value.
The data that all historical datas and the actual conditions of used car are disagreed in practical application, can be defined as exception Value.
For noise:
Concluded price, and, the relevant information of used car constructs a linear model, and the ordinate of the linear model is Concluded price, abscissa are the relevant information of used car, and a safe range, the safe range are set on the linear model Certain region around the linear model;
It is then based on the safe range, determines whether the historical data is noise.
S102222 determines the type of described problem data, according to the type, makees described problem data to locate accordingly Reason, the transaction data that obtains that treated, and, automobile type configuration data that treated.
For missing values, directly the missing values can be deleted, or the supplement missing values;
Exceptional value can directly be deleted the missing values, or the replacement exceptional value;
Noise can directly be deleted the missing values, or replacement.
Method shown in the embodiment of the present application carries out cleaning to historical data and assignment is handled, removes some missing values, Exceptional value, noise guarantee that the target XGBoost of the method building shown in the embodiment of the present application can accurately be used for used car price Assessment.
Embodiment 6:
In order to further ensure the target XGBoost model of the method building shown in the embodiment of the present application, can accurately use In the price evaluation of two vehicles, the embodiment of the present application shows a kind of preprocess method of exceptional value, specifically, please referring to Fig. 6:
Embodiment 6 has similar step to embodiment 5, only difference is that in the technical solution shown in embodiment 5 Step S102222 includes the following steps:
If the S1022221 type is exceptional value, the corresponding target signature of the exceptional value is determined;
S1022222 is based on the historical data, calculates the average value of the target signature;
S1022223 replaces the value of the target signature with the average value of the target signature.
The value of usual target signature, bigger probability is the average value for falling within target signature, using stating the flat of target signature Mean value replaces the value of the target signature, can accurately features indicating target characteristics truth.
Using the historical data shown in the embodiment of the present application, true environment locating for used car can be accurately reacted, is constructed Target XGBoost model, can accurately be used for the price evaluation of two vehicles.
Embodiment 7:
In order to guarantee the target XGBoost model of the method building shown in the embodiment of the present application, it can be accurately used for two vehicles Price evaluation, the embodiment of the present application shows a kind of construction method of target XGBoost model, specifically, please referring to Fig. 7:
Embodiment 7 has similar step to any one embodiment of embodiment 2-6, and unique difference is in step S103 packet Include following step:
S1031 divides the pretreatment historical data, generates training data, and, test data;
For example, pretreatment historical data is usually divided into 10 parts, wherein portion is used as test data, remaining Be used as training data.
S1032 is based on the training data, constructs XGBoost model, generates the first XGBoost model;
S1033 is based on the first XGBoost model, calculates the forecast price that used car is tested in the test data;
S1034 judges whether the difference of the concluded price of forecast price and test used car falls in presetting range;
The training data constructs XGBoost model, the first XGBoost model is generated, using the first XGBoost Model goes to assess the price for testing used car in test data, obtains the forecast price of each used car;
Then, the mean absolute error of this group of data is calculated;Calculation formula is as follows:
Evaluation function used in error analysis is MAPE (mean absolute error), and definition P is true car fare, and Q is It predicts car fare, n vehicle is predicted altogether, mean absolute error ratio is defined as:
Mean absolute error 1 is obtained, then, changes one group of historical trading data as test data, remainder data is as instruction Practice data to obtain mean absolute error 2 according to the method described above, successively calculate, finally obtain 10 mean absolute errors, calculates The average value of 10 mean absolute errors
Judge whether the average value of the mean absolute error is less than preset number.
Less than preset number, then it is assumed that fall in presetting range;
Greater than preset number, then it is assumed that do not fall in presetting range.
If fallen in presetting range, executes S1035 and determine that the first XGBoost model is target XGBoost model;
If do not fallen in presetting range, the parameter that S1036 adjusts the first XGBoost model is executed, generates target XGBoost model.
The parameter includes: mainly for n_trees (quantity of Assembled tree), eta (every subgradient iteration step-length), max_ Depth (each tree depth capacity), subsample (line sampling ratio), colsample_bytree (column sampling proportion) parameter Carry out tune ginseng.
Embodiment 8:
Method described in technical solution shown in embodiment 7 is further comprising the steps of, specifically, please referring to side described in Fig. 8 Method further include:
S105 counts the target XGBoost model, the number that each feature occurs as child node;
S106 is based on the number, to the importance ranking of the feature.
Meanwhile can be according in integrated tree-model, each feature is used as branch degree of node and obtains feature weight The property wanted is ranked.Guidance front personnel are ranked in history data collection emphasis point according to feature importance.
Meanwhile feature importance seniority among brothers and sisters can play directive function to modeling iterative process later.
Embodiment 9:
The embodiment of the present application second aspect shows a kind of used car pricing device based on XGBoost model, specifically, asking Refering to Fig. 9: described device includes:
Acquiring unit 21, for obtaining the historical data of used car, the historical data includes: transaction data, and, vehicle Type configuration data, the transaction data includes: concluded price, and, the relevant information of used car;
Pretreatment unit 22 generates pretreatment historical data for pre-processing the historical data;
Construction unit 23, for constructing target XGBoost model according to the pretreatment historical data;
Assessment unit 24 assesses the price of the used car for being based on the target XGBoost model.
Embodiment 10:
Referring to Fig. 10, the pretreatment unit 22 includes: in technical solution shown in embodiment 9
First cleaning unit 2211, for clearing up the transaction data, and, the automobile type configuration data are cleared up Transaction data afterwards, and, the automobile type configuration data after cleaning;
First assignment unit 2212, for assigning the automobile type configuration data after cleaning to sequence according to preset assignment rule Meaning generates treated automobile type configuration data;
First integrated unit 2213, for merging treated the automobile type configuration data, and, the friendship after the cleaning Easy data generate pretreatment historical data.
Embodiment 11:
Please refer to Figure 11, in the technical solution shown in embodiment 9, the pretreatment unit 22 includes:
Second assignment unit 2221, for assigning the automobile type configuration data to sequence meaning according to preset assignment rule, Automobile type configuration data after generating assignment;
Second cleaning unit 2222, for clearing up the transaction data, and, the automobile type configuration data after assignment obtain Treated transaction data, and, automobile type configuration data that treated;
Second integrated unit 2223, for merging treated the transaction data, and, it is described that treated that vehicle is matched Data are set, pretreatment historical data is generated.
Embodiment 12:
Please refer to Figure 12, in the technical solution shown in embodiment 11, the second cleaning unit 2222 includes:
Problem data determination unit 22221, for determining the transaction data, and, the automobile type configuration data after assignment The problem of data, described problem data include: missing values, exceptional value, the combination of one or more of noise figure;
Processing unit 22222, for determining the type of described problem data, according to the type, to described problem data Make to handle accordingly, the transaction data that obtains that treated, and, automobile type configuration data that treated.
Embodiment 13:
Please refer to Figure 13, in the technical solution shown in embodiment 12, the processing unit 22222 includes;
Exceptional value determination unit 222221 determines the corresponding mesh of the exceptional value if being exceptional value for the type Mark feature;
Average calculation unit 222222 calculates the average value of the target signature for being based on the historical data;
Replacement unit 222223, for replacing the value of the target signature with the average value of the target signature.
Embodiment 14:
Figure 14 is please referred to, construction unit 23 described in the technical solution shown in embodiment 9-13 includes:
Cutting unit 231 generates training data for dividing the pretreatment historical data, and, test data;
First generation unit 232 constructs XGBoost model, generates the first XGBoost for being based on the training data Model;
Sub- computing unit 233, for be based on the first XGBoost model, calculate tested in the test data it is second-hand The forecast price of vehicle;
Judging unit 234, for judging it is preset whether the difference of forecast price and the concluded price for testing used car falls in In range;
Sub- determination unit 235, if determining that the first XGBoost model is target for falling in presetting range XGBoost model;
Adjustment unit 236, it is raw if adjusting the parameter of the first XGBoost model for not falling in presetting range At target XGBoost model.
Embodiment 15:
Please refer to Figure 15, the device shown in embodiment 14 further include:
Statistic unit 25, for counting the target XGBoost model, the number that each feature occurs as child node;
Sequencing unit 26, for being based on the number, to the importance ranking of the feature.
Embodiment 16:
The embodiment of the present application fourth aspect shows a kind of server, please refers to Figure 16 and includes:
One or more processors 41;
Memory 42, for storing one or more programs;
When one or more of programs are executed by one or more of processors 41, so that one or more of places Manage the method that device 41 realizes the embodiment of the present application crucial point.
Those skilled in the art after considering the specification and implementing the invention disclosed here, will readily occur to its of the application Its embodiment.This application is intended to cover any variations, uses, or adaptations of the application, these modifications, purposes or Person's adaptive change follows the general principle of the application and including the undocumented common knowledge in the art of the application Or conventional techniques.The description and examples are only to be considered as illustrative, and the true scope and spirit of the application are by following Claim is pointed out.
It should be understood that the application is not limited to the precise structure that has been described above and shown in the drawings, and And various modifications and changes may be made without departing from the scope thereof.Scope of the present application is only limited by the accompanying claims.
It is worth noting that, in the specific implementation, the application also provides a kind of computer storage medium, wherein the computer Storage medium can be stored with program, which may include the service providing method or use of user identity provided by the present application when executing Step some or all of in each embodiment of family register method.The storage medium can be magnetic disk, CD, read-only storage note Recall body (English: read-only memory, abbreviation: ROM) or random access memory (English: random access Memory, referred to as: RAM) etc..
It is required that those skilled in the art can be understood that the technology in the embodiment of the present application can add by software The mode of general hardware platform realize.Based on this understanding, the technical solution in the embodiment of the present application substantially or Say that the part that contributes to existing technology can be embodied in the form of software products, which can deposit Storage is in storage medium, such as ROM/RAM, magnetic disk, CD, including some instructions are used so that computer equipment (can be with It is personal computer, server or the network equipment etc.) execute certain part institutes of each embodiment of the application or embodiment The method stated.
Same and similar part may refer to each other between each embodiment in this specification.Especially for user identity Service providing apparatus or user's registration device embodiment for, since it is substantially similar to the method embodiment, thus description Comparison it is simple, related place is referring to the explanation in embodiment of the method.
Above-described the application embodiment does not constitute the restriction to the application protection scope.
Those skilled in the art after considering the specification and implementing the invention disclosed here, will readily occur to its of the application Its embodiment.This application is intended to cover any variations, uses, or adaptations of the application, these modifications, purposes or Person's adaptive change follows the general principle of the application and including the undocumented common knowledge in the art of the application Or conventional techniques.The description and examples are only to be considered as illustrative, and the true scope and spirit of the application are by following Claim is pointed out.
It should be understood that the application is not limited to the precise structure that has been described above and shown in the drawings, and And various modifications and changes may be made without departing from the scope thereof.Scope of the present application is only limited by the accompanying claims this Field technical staff after considering the specification and implementing the invention disclosed here, will readily occur to other embodiment party of the invention Case.This application is intended to cover any variations, uses, or adaptations of the invention, these modifications, purposes or adaptability Variation follows general principle of the invention and including the undocumented common knowledge or usual skill in the art of the present invention Art means.The description and examples are only to be considered as illustrative, and true scope and spirit of the invention are by following claim It points out.
It should be understood that the present invention is not limited to the precise structure already described above and shown in the accompanying drawings, and And various modifications and changes may be made without departing from the scope thereof.The scope of the present invention is limited only by the attached claims.

Claims (15)

1. a kind of used car pricing method based on XGBoost model, which is characterized in that the described method includes:
The historical data of used car is obtained, the historical data includes: transaction data, and, automobile type configuration data, the transaction Data include: concluded price, and, the relevant information of used car;
The historical data is pre-processed, pretreatment historical data is generated;
According to the pretreatment historical data, target XGBoost model is constructed;
Based on the target XGBoost model, the price of the used car is assessed.
2. the method according to claim 1, wherein the pretreatment historical data, generates pretreatment and goes through The step of history data includes:
The transaction data is cleared up, and, the automobile type configuration data, the transaction data after being cleared up, and, after cleaning Automobile type configuration data;
According to preset assignment rule, by the automobile type configuration data after the cleaning, sequence meaning is assigned, the vehicle that generates that treated Configuration data;
Automobile type configuration data that treated described in fusion, and, the transaction data after the cleaning generates pretreatment history number According to.
3. the method according to claim 1, wherein the pretreatment historical data, generates pretreatment and goes through The step of history data includes:
According to preset assignment rule, the automobile type configuration data are assigned to sequence meaning, the automobile type configuration data after assignment;
The transaction data is cleared up, and, the automobile type configuration data after assignment, the transaction data that obtains that treated, and, processing Automobile type configuration data afterwards;
Fusion is described treated transaction data, and, treated the automobile type configuration data generate pretreatment history number According to.
4. according to the method described in claim 3, it is characterized in that, the cleaning transaction data, and, the vehicle after assignment is matched Data are set, the transaction data that obtains that treated, and, the step of automobile type configuration data that treated includes:
Determine the transaction data, and, data the problem of automobile type configuration data after assignment, described problem data include: scarce Mistake value, exceptional value, the combination of one or more of noise figure;
The type for determining described problem data is made to be handled to handle accordingly according to the type to described problem data Transaction data afterwards, and, automobile type configuration data that treated.
5. according to the method described in claim 4, making described problem data with corresponding it is characterized in that, described according to type Processing the step of include;
If the type is exceptional value, the corresponding target signature of the exceptional value is determined;
Based on the historical data, the average value of the target signature is calculated;
The value of the target signature is replaced with the average value of the target signature.
6. method according to claim 1-5, which is characterized in that described according to pretreatment historical data, building The step of target XGBoost model includes:
Divide the pretreatment historical data, generates training data, and, test data;
Based on the training data, XGBoost model is constructed, generates the first XGBoost model;
Based on the first XGBoost model, the forecast price that used car is tested in the test data is calculated;
Whether the difference for judging forecast price and testing the concluded price of used car falls in presetting range;
If fallen in presetting range, determine that the first XGBoost model is target XGBoost model;
If do not fallen in presetting range, the parameter of the first XGBoost model is adjusted, generates target XGBoost model.
7. according to the method described in claim 6, it is characterized in that, the method also includes:
Count the target XGBoost model, the number that each feature occurs as child node;
Based on the number, to the importance ranking of the feature.
8. a kind of used car pricing device based on XGBoost model, which is characterized in that described device includes:
Acquiring unit, for obtaining the historical data of used car, the historical data includes: transaction data, and, automobile type configuration Data, the transaction data includes: concluded price, and, the relevant information of used car;
Pretreatment unit generates pretreatment historical data for pre-processing the historical data;
Construction unit, for constructing target XGBoost model according to the pretreatment historical data;
Assessment unit assesses the price of the used car for being based on the target XGBoost model.
9. device according to claim 8, which is characterized in that the pretreatment unit includes:
First cleaning unit, for clearing up the transaction data, and, the automobile type configuration data, the transaction after being cleared up Data, and, the automobile type configuration data after cleaning;
First assignment unit, it is raw for assigning the automobile type configuration data after cleaning to sequence meaning according to preset assignment rule At treated automobile type configuration data;
First integrated unit, for merging treated the automobile type configuration data, and, the transaction data after the cleaning, Generate pretreatment historical data.
10. device according to claim 8, which is characterized in that the pretreatment unit includes:
Second assignment unit, for assigning the automobile type configuration data to sequence meaning, generating assignment according to preset assignment rule Automobile type configuration data afterwards;
Second cleaning unit, for clearing up the transaction data, and, the automobile type configuration data after assignment obtain that treated Transaction data, and, automobile type configuration data that treated;
Second integrated unit, for merging treated the transaction data, and, treated the automobile type configuration data, Generate pretreatment historical data.
11. device according to claim 10, which is characterized in that described second, which clears up unit, includes:
Problem data determination unit, for determining the transaction data, and, number the problem of automobile type configuration data after assignment According to described problem data include: missing values, exceptional value, the combination of one or more of noise figure;
Processing unit, according to the type, is made for determining the type of described problem data with corresponding described problem data Processing, the transaction data that obtains that treated, and, automobile type configuration data that treated.
12. device according to claim 11, which is characterized in that the processing unit includes;
Exceptional value determination unit determines the corresponding target signature of the exceptional value if being exceptional value for the type;
Average calculation unit calculates the average value of the target signature for being based on the historical data;
Replacement unit, for replacing the value of the target signature with the average value of the target signature.
13. according to the described in any item devices of claim 8-12, which is characterized in that the construction unit includes:
Cutting unit generates training data for dividing the pretreatment historical data, and, test data;
First generation unit constructs XGBoost model, generates the first XGBoost model for being based on the training data;
Sub- computing unit calculates the prediction that used car is tested in the test data for being based on the first XGBoost model Price;
Judging unit, for judging whether the difference of concluded price of forecast price and test used car falls in presetting range;
Sub- determination unit, if determining that the first XGBoost model is target XGBoost mould for falling in presetting range Type;
Adjustment unit generates target if adjusting the parameter of the first XGBoost model for not falling in presetting range XGBoost model.
14. device according to claim 13, which is characterized in that described device further include:
Statistic unit, for counting the target XGBoost model, the number that each feature occurs as child node;
Sequencing unit, for being based on the number, to the importance ranking of the feature.
15. a kind of used car pricing system based on XGBoost model, which is characterized in that the system comprises: application platform clothes Business device, the data storage server being connected with the application platform server, the data storage server are arranged described It inside Platform Server or is independently arranged, the application platform server is connected by internet with terminal;
The terminal is used for the display of target used car price;
The storage that related data is used for according to storage server;
The application platform server is for realizing method such as of any of claims 1-7.
CN201810224028.1A 2018-03-19 2018-03-19 A kind of used car pricing method based on XGBoost model, apparatus and system Pending CN110288364A (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN201810224028.1A CN110288364A (en) 2018-03-19 2018-03-19 A kind of used car pricing method based on XGBoost model, apparatus and system

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN201810224028.1A CN110288364A (en) 2018-03-19 2018-03-19 A kind of used car pricing method based on XGBoost model, apparatus and system

Publications (1)

Publication Number Publication Date
CN110288364A true CN110288364A (en) 2019-09-27

Family

ID=68000968

Family Applications (1)

Application Number Title Priority Date Filing Date
CN201810224028.1A Pending CN110288364A (en) 2018-03-19 2018-03-19 A kind of used car pricing method based on XGBoost model, apparatus and system

Country Status (1)

Country Link
CN (1) CN110288364A (en)

Cited By (1)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN112488352A (en) * 2020-10-21 2021-03-12 上海旻浦科技有限公司 Room price interval prediction method and system based on gradient lifting regression

Citations (5)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN104303199A (en) * 2011-07-28 2015-01-21 真车股份有限公司 System and method for analysis and presentation of used vehicle pricing data
CN105741145A (en) * 2016-02-06 2016-07-06 广州拓谷信息科技股份有限公司 Evaluation algorithm for price of second-hand car
CN106408341A (en) * 2016-09-21 2017-02-15 北京小米移动软件有限公司 Goods sales volume prediction method and device, and electronic equipment
CN107274231A (en) * 2017-06-29 2017-10-20 北京京东尚科信息技术有限公司 Data predication method and device
CN107369043A (en) * 2017-07-19 2017-11-21 河海大学常州校区 A kind of used car price evaluation optimized algorithm based on BP neural network

Patent Citations (5)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN104303199A (en) * 2011-07-28 2015-01-21 真车股份有限公司 System and method for analysis and presentation of used vehicle pricing data
CN105741145A (en) * 2016-02-06 2016-07-06 广州拓谷信息科技股份有限公司 Evaluation algorithm for price of second-hand car
CN106408341A (en) * 2016-09-21 2017-02-15 北京小米移动软件有限公司 Goods sales volume prediction method and device, and electronic equipment
CN107274231A (en) * 2017-06-29 2017-10-20 北京京东尚科信息技术有限公司 Data predication method and device
CN107369043A (en) * 2017-07-19 2017-11-21 河海大学常州校区 A kind of used car price evaluation optimized algorithm based on BP neural network

Non-Patent Citations (1)

* Cited by examiner, † Cited by third party
Title
柴利达 等: "基于大数据的物资价格预测方法探索", 《电力大数据》 *

Cited By (1)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN112488352A (en) * 2020-10-21 2021-03-12 上海旻浦科技有限公司 Room price interval prediction method and system based on gradient lifting regression

Similar Documents

Publication Publication Date Title
CN106295351B (en) A kind of Risk Identification Method and device
CN111210072B (en) Prediction model training and user resource limit determining method and device
Baharun et al. Auto modelling for machine learning: a comparison implementation between rapid miner and python
CN111797320A (en) Data processing method, device, equipment and storage medium
CN109345050A (en) A kind of quantization transaction prediction technique, device and equipment
CN111680382A (en) Grade prediction model training method, grade prediction device and electronic equipment
CN105824806A (en) Quality evaluation method and device for public accounts
CN112836771A (en) Business service point classification method and device, electronic equipment and storage medium
CN115456695A (en) Method, device, system and medium for analyzing shop address selection
Wanke et al. Revisiting camels rating system and the performance of Asean banks: a comprehensive mcdm/z-numbers approach
CN110263136B (en) Method and device for pushing object to user based on reinforcement learning model
Kiraz et al. A fuzzy-logic-based approach to the EFQM model for performance enhancement
CN107256461A (en) A kind of electrically-charging equipment builds address evaluation method and system
CN116776006B (en) Customer portrait construction method and system for enterprise financing
Romanenkov et al. Information and Technological Support for the Processes of Prognostic Modeling of Regional Labor Markets.
CN112767126A (en) Collateral grading method and device based on big data
CN110288364A (en) A kind of used car pricing method based on XGBoost model, apparatus and system
CN115222081A (en) Academic resource prediction method and device and computer equipment
Groen et al. Towards modelling the effect of evolving violence on forced migration
Wang et al. A contingency approach for time-cost trade-off in construction projects based on machine learning techniques
Wirawan et al. Application of data mining to prediction of timeliness graduation of students (a case study)
CN115600818A (en) Multi-dimensional scoring method and device, electronic equipment and storage medium
CN115759862A (en) Reservation package service assessment method, device, equipment and storage medium
CN114511250A (en) Enterprise external migration risk early warning method and system based on machine learning
CN115796984A (en) Training method of item recommendation model, storage medium and related equipment

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination
TA01 Transfer of patent application right

Effective date of registration: 20200519

Address after: Room 323605, building 5, yard 1, Futong East Street, Chaoyang District, Beijing 100102

Applicant after: Youxuan (Beijing) Information Technology Co., Ltd

Address before: Room 368, Room 302, No. 211 North Fute Road, China (Shanghai) Free Trade Pilot Area, Pudong New Area, Shanghai, 201315

Applicant before: Youestimate (Shanghai) Information Technology Co., Ltd

TA01 Transfer of patent application right
RJ01 Rejection of invention patent application after publication

Application publication date: 20190927

RJ01 Rejection of invention patent application after publication