CN108256757A - A kind of source of houses conclusion of the business predictor method based on xgboost and estimate platform - Google Patents

A kind of source of houses conclusion of the business predictor method based on xgboost and estimate platform Download PDF

Info

Publication number
CN108256757A
CN108256757A CN201810022667.XA CN201810022667A CN108256757A CN 108256757 A CN108256757 A CN 108256757A CN 201810022667 A CN201810022667 A CN 201810022667A CN 108256757 A CN108256757 A CN 108256757A
Authority
CN
China
Prior art keywords
houses
source
data
characteristic
conclusion
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Withdrawn
Application number
CN201810022667.XA
Other languages
Chinese (zh)
Inventor
于东海
宋鑫
刘�文
王煜杰
蔡白银
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Lianjia Beijing Technology Co Ltd
Original Assignee
Lianjia Beijing Technology Co Ltd
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Lianjia Beijing Technology Co Ltd filed Critical Lianjia Beijing Technology Co Ltd
Priority to CN201810022667.XA priority Critical patent/CN108256757A/en
Publication of CN108256757A publication Critical patent/CN108256757A/en
Withdrawn legal-status Critical Current

Links

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06QINFORMATION AND COMMUNICATION TECHNOLOGY [ICT] SPECIALLY ADAPTED FOR ADMINISTRATIVE, COMMERCIAL, FINANCIAL, MANAGERIAL OR SUPERVISORY PURPOSES; SYSTEMS OR METHODS SPECIALLY ADAPTED FOR ADMINISTRATIVE, COMMERCIAL, FINANCIAL, MANAGERIAL OR SUPERVISORY PURPOSES, NOT OTHERWISE PROVIDED FOR
    • G06Q10/00Administration; Management
    • G06Q10/06Resources, workflows, human or project management; Enterprise or organisation planning; Enterprise or organisation modelling
    • G06Q10/063Operations research, analysis or management
    • G06Q10/0637Strategic management or analysis, e.g. setting a goal or target of an organisation; Planning actions based on goals; Analysis or evaluation of effectiveness of goals
    • G06Q10/06375Prediction of business process outcome or impact based on a proposed change
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06QINFORMATION AND COMMUNICATION TECHNOLOGY [ICT] SPECIALLY ADAPTED FOR ADMINISTRATIVE, COMMERCIAL, FINANCIAL, MANAGERIAL OR SUPERVISORY PURPOSES; SYSTEMS OR METHODS SPECIALLY ADAPTED FOR ADMINISTRATIVE, COMMERCIAL, FINANCIAL, MANAGERIAL OR SUPERVISORY PURPOSES, NOT OTHERWISE PROVIDED FOR
    • G06Q50/00Information and communication technology [ICT] specially adapted for implementation of business processes of specific business sectors, e.g. utilities or tourism
    • G06Q50/10Services
    • G06Q50/16Real estate

Landscapes

  • Business, Economics & Management (AREA)
  • Human Resources & Organizations (AREA)
  • Engineering & Computer Science (AREA)
  • Strategic Management (AREA)
  • Tourism & Hospitality (AREA)
  • Economics (AREA)
  • General Physics & Mathematics (AREA)
  • Theoretical Computer Science (AREA)
  • Entrepreneurship & Innovation (AREA)
  • Marketing (AREA)
  • Educational Administration (AREA)
  • General Business, Economics & Management (AREA)
  • Physics & Mathematics (AREA)
  • Development Economics (AREA)
  • Quality & Reliability (AREA)
  • Operations Research (AREA)
  • Game Theory and Decision Science (AREA)
  • Health & Medical Sciences (AREA)
  • General Health & Medical Sciences (AREA)
  • Primary Health Care (AREA)
  • Management, Administration, Business Operations System, And Electronic Commerce (AREA)

Abstract

The embodiment of the present invention provides a kind of source of houses conclusion of the business predictor method based on xgboost and estimates platform.Wherein, the method includes:Source of houses data are obtained, obtain the characteristic of the source of houses according to the source of houses data, the characteristic of a set of source of houses same day forms a sample;To each sample addition label, the label is the source of houses conclusion of the business probability in preset time;It is exercised supervision study using xgboost to the sample set with the label, obtains prediction model;The characteristic for predicting the source of houses is input to the prediction model, source of houses conclusion of the business probability of the prediction source of houses in the preset time is worth to according to the prediction of the prediction model.The embodiment of the present invention is estimated by the way that xgboost is applied to the conclusion of the business on sale of practical house in scene, is capable of providing the conclusion of the business prospect of reliable individual and integral house;Achievable unattended automatic operating is estimated in conclusion of the business, reduces waste of human resource.

Description

A kind of source of houses conclusion of the business predictor method based on xgboost and estimate platform
Technical field
The present embodiments relate to machine learning techniques field, specially a kind of source of houses conclusion of the business based on xgboost is estimated Method and estimate platform.
Background technology
House is conducive to broker and carries out more efficient source of houses tracking in listed selling period, acquisition hot spot information of real estate And sale, promote the success of source of houses transaction.The hot spot source of houses is the source of houses that conclusion of the business probability is larger, concerned degree is higher.For hot spot The judgement of the source of houses, a kind of mode are that broker judges the understanding of the source of houses and market form etc. according to oneself, thus into Row more efficient source of houses tracking and sale;Not only efficiency is low for manual work under this pure line, but also no global concept, easily It is influenced by subjective factor, error in judgement is larger.Another way is commented automatically simply by concern, pageview on line etc. Sentence the welcome of house or sale complexity, but the data that this method is relied on belong to larger fluctuations Format data, error in judgement are larger.
There is presently no the very ripe systems that fine-grained housing sale prospect is predicted and graded, i.e., can not have Effect ground to it is every set the source of houses carry out short-term trading prospect prediction, therefore there is an urgent need for it is a kind of can to it is every set house in following a period of time The method and platform that conclusion of the business probability is more accurately estimated.
Invention content
To solve the problems, such as not realizing that reliable house conclusion of the business probability intelligence is estimated in the prior art, the present invention is implemented Example provides a kind of source of houses conclusion of the business predictor method based on xgboost and estimates platform.
In a first aspect, the embodiment of the present invention provides a kind of source of houses conclusion of the business predictor method based on xgboost, the method packet It includes:Source of houses data are obtained, the characteristic of the source of houses, the characteristic of a set of source of houses same day are obtained according to the source of houses data According to one sample of composition;To each sample addition label, the label is the source of houses conclusion of the business probability in preset time;To tool There is the sample set of the label to exercise supervision study using xgboost, obtain prediction model;It will predict the characteristic of the source of houses According to the prediction model is input to, the prediction source of houses is worth in the preset time according to the prediction of the prediction model Source of houses conclusion of the business probability.
Second aspect, the embodiment of the present invention provide a kind of source of houses conclusion of the business based on xgboost and estimate platform, it is described estimate it is flat Platform includes:Specifically for acquisition source of houses data, the characteristic of the source of houses is obtained according to the source of houses data for sample generation module, The characteristic of a set of source of houses same day forms a sample;Label add module, specifically for each sample Label is added, the label is the source of houses conclusion of the business probability in preset time;Machine learning module, specifically for the mark The sample set of label is exercised supervision study using xgboost, obtains prediction model;Prediction module, specifically for that will predict the source of houses The characteristic is input to the prediction model, and the prediction source of houses is worth to described according to the prediction of the prediction model Source of houses conclusion of the business probability in preset time.
The third aspect, the embodiment of the present invention provide a kind of electronic equipment, including memory and processor, the processor and The memory completes mutual communication by bus;The memory, which is stored with, to be referred to by the program that the processor performs It enables, the processor calls described program instruction to be able to carry out following method:Source of houses data are obtained, are obtained according to the source of houses data To the characteristic of the source of houses, the characteristic of a set of source of houses same day forms a sample;Each sample is added Label, the label are the source of houses conclusion of the business probability in preset time;Sample set with the label is carried out using xgboost Supervised learning obtains prediction model;The characteristic for predicting the source of houses is input to the prediction model, according to the prediction The prediction of model is worth to source of houses conclusion of the business probability of the prediction source of houses in the preset time.
Fourth aspect, the embodiment of the present invention provide a kind of computer readable storage medium, are stored thereon with computer program, The computer program realizes following method when being executed by processor:Source of houses data are obtained, the source of houses is obtained according to the source of houses data Characteristic, the characteristic of a set of source of houses same day forms a sample;To each sample addition label, institute Label is stated as the source of houses conclusion of the business probability in preset time;It is exercised supervision using xgboost to the sample set with the label It practises, obtains prediction model;The characteristic for predicting the source of houses is input to the prediction model, according to the prediction model Prediction is worth to source of houses conclusion of the business probability of the prediction source of houses in the preset time.
The embodiment of the present invention is estimated by the way that xgboost is applied to the conclusion of the business on sale of practical house in scene, is capable of providing The conclusion of the business prospect of reliable individual and integral house;Achievable unattended automatic operating is estimated in conclusion of the business, reduces manpower money Source wastes.
Description of the drawings
In order to illustrate more clearly about the embodiment of the present invention or technical scheme of the prior art, to embodiment or will show below There is attached drawing needed in technology description to be briefly described, it should be apparent that, the accompanying drawings in the following description is this hair Some bright embodiments, for those of ordinary skill in the art, without creative efforts, can be with root Other attached drawings are obtained according to these attached drawings.
Fig. 1 is the source of houses conclusion of the business predictor method flow chart provided in an embodiment of the present invention based on xgboost;
Fig. 2 is the foundation characteristic distribution map of the source of houses conclusion of the business predictor method provided in an embodiment of the present invention based on xgboost;
Fig. 3 is that platform structure schematic diagram is estimated in the source of houses conclusion of the business provided in an embodiment of the present invention based on xgboost;
Fig. 4 is the structure diagram of electronic equipment provided in an embodiment of the present invention.
Specific embodiment
Purpose, technical scheme and advantage to make the embodiment of the present invention are clearer, below in conjunction with the embodiment of the present invention In attached drawing, the technical solution in the embodiment of the present invention is clearly and completely described, it is clear that described embodiment is Part of the embodiment of the present invention, instead of all the embodiments.Based on the embodiments of the present invention, those of ordinary skill in the art All other embodiments obtained without creative efforts shall fall within the protection scope of the present invention.
Fig. 1 is the source of houses conclusion of the business predictor method flow chart provided in an embodiment of the present invention based on xgboost.As shown in Figure 1, The method includes:
Step 101 obtains source of houses data, obtains the characteristic of the source of houses according to the source of houses data, a set of source of houses is same The characteristic of day forms a sample;
Source of houses data are obtained from data warehouse by sparksql first, the source of houses data are related with the source of houses Original recorded data, such as broker's relevant information, client-related information, interactive information.To obtain the training sample of machine learning This, then counted to obtain the characteristic of the source of houses according to the source of houses data, the characteristic of the source of houses can use next life Into sample.For example, the characteristic of the source of houses can be broker with the different shop numbers seen, then according to broker with seeing The related initial data in shop counted to obtain broker with the different shop numbers seen to get to the source of houses this Characteristic.The characteristic of the source of houses can be stored in unit hard disk or distributed hdfs (hadoop distributions simultaneously Formula file system) on.To reach better prediction effect, the characteristic of the source of houses can be multiple.It is same by a set of source of houses The characteristic of one day forms a sample.
Step 102 adds each sample on label, and the label is the source of houses conclusion of the business probability in preset time;
The preset time for the sample sample day (which day data are the data for referring to sample be) after one section when Between, if the preset time can be in sample 14 days in the future, then the label strikes a bargain general for the source of houses in sample 14 days in the future Rate.This also indicates that, described at least to hang to obtain the label if the preset time is in sample 14 days in the future Board 14 days or more.The date of sample and prediction day are closer, and the precision of prediction is higher.The characteristic of each set source of houses can be with Form the sample set of million ranks.To each sample addition label, if predicting the source of houses conclusion of the business probability in 14 days, institute The source of houses practical conclusion of the business situation of the label for the source of houses in sample 14 days in the future is stated, if struck a bargain, the label is set as 1;If do not struck a bargain, then the label is set as 0.
Step 103 exercises supervision study to the sample set with the label using xgboost, obtains prediction model;
Xgboost is a kind of boost models based on decision tree structure, similar with GBDT, but due to object function into It has gone and the second Taylor series and has added regularization term so that model has stronger generalization ability, can evade over-fitting Risk, strong antijamming capability.
It is exercised supervision study, inputted as sample and label, the sample using xgboost to the sample set with the label This is the characteristic, and the label is the source of houses conclusion of the business probability in the preset time, is exported as in the preset time Source of houses conclusion of the business probability.It is given value, the output is also if the label is the source of houses conclusion of the business probability in sample 14 days in the future Source of houses conclusion of the business probability in sample 14 days in the future is predicted value.According to the realization principle of xgboos algorithms, when output and label value When meeting certain required precision, the prediction model is obtained.
The characteristic for predicting the source of houses is input to the prediction model by step 104, according to the prediction model Prediction is worth to source of houses conclusion of the business probability of the prediction source of houses in the preset time.
After obtaining the prediction model, the characteristic for predicting the source of houses same day is input to the prediction model, root Output according to the prediction model is that the prediction of the prediction model is worth to the prediction source of houses in the preset time Source of houses conclusion of the business probability.The time span of prediction should be identical with the time span of training sample label during progress machine learning, that is, If the label is the source of houses conclusion of the business probability in sample 14 days in the future, the predicted value of prediction model also for forecast sample in the future Source of houses conclusion of the business probability in 14 days.If the predicted value of the prediction model is 0.2, it is believed that the source of houses is in 14 days Source of houses conclusion of the business probability is 20%.
When building the sample towards be each house individual, even if this suite of rooms struck a bargain in history again quilt It sells, can also be used as same sample source, also ensure the continuity in sample physical significance in this way.Therefore the embodiment of the present invention House that can be on sale to every suit in real time strike a bargain and estimates, and can provide following a period of time and accurately strike a bargain Probability.
In order to enable prediction model is more accurate to the assurance of existing market rule, after data characteristics new daily and sample generation Sample set re -training can be updated, obtains new prediction model.
The embodiment of the present invention is estimated by the way that xgboost is applied to the conclusion of the business on sale of practical house in scene, is capable of providing The conclusion of the business prospect of reliable individual and integral house;Achievable unattended automatic operating is estimated in conclusion of the business, reduces manpower money Source wastes.
Further, it based on above-described embodiment, is specifically included according to the characteristic that the source of houses data obtain the source of houses:It is right The source of houses data carry out the foundation characteristic data that statistics obtains the source of houses;Statistics, which is carried out, according to the source of houses data obtains the source of houses Statistical nature data;The source of houses is obtained according to the statistical nature data of the foundation characteristic data of the source of houses and the source of houses Characteristic.
The source of houses data are carried out with the foundation characteristic data that statistics obtains the source of houses, the foundation characteristic data one of the source of houses As for quantity statistics value, the statistics of yes/no, such as nearest broker on the 14th with the different clients number seen, nearest 14 days price adjustment time Number, whether subway, whether school district etc.;The source of houses data are carried out with the statistical nature data that statistics obtains the source of houses, the source of houses Statistical nature data for the data related with distribution characteristics, such as maximum value, the standard deviation of nearest 14 days price adjustment numbers of source of houses A Deng.
The feature of the source of houses is obtained according to the statistical nature data of the foundation characteristic data of the source of houses and the source of houses Data, the characteristic of the source of houses include the foundation characteristic data of the source of houses and the statistical nature data of the source of houses.
For same set of house, it is fused together in the characteristic of same day and forms the sample, but all kinds of The phenomenon that missing is had in characteristic, therefore, it is necessary to carry out completion operation to necessary characteristic.The reason of missing data May be that there is no initial data corresponding with corresponding characteristic because counting in number of days, it at this moment can be by corresponding feature Data carry out completion according to actual conditions, such as are set as 0, are set as no.
On the basis of above-described embodiment, set by the embodiment of the present invention composition sample characteristic, both include by The statistical value that initial data obtains, and the value that a step is counted is included into, it ensure that the comprehensive of forecast sample.
Fig. 2 is the foundation characteristic distribution map of the source of houses conclusion of the business predictor method provided in an embodiment of the present invention based on xgboost. Further, based on above-described embodiment, the foundation characteristic data for obtaining the source of houses specifically include:
Obtain broker's characteristic, broker's characteristic include broker with see characteristic, broker with Into characteristic and safeguard people's characteristic;
Wherein, the broker with see characteristic include the nearest band on the 14th of broker seen different clients number, recently Different manager numbers that different shop numbers that band on the 14th has been seen, nearest band on the 14th have been seen, nearest total band on the 14th see number;It is described Broker's follow-up characteristic includes nearest 14 days common follow-up quantity of broker, nearest voice on the 14th follow-up quantity;The dimension Shield people characteristic includes safeguarding that people safeguards that people safeguards the probability of success, safeguards that people safeguards successfully house number within nearest 90 days for nearest 90 days Amount safeguards that people safeguards failure house quantity for nearest 90 days.
Dwelling feature data are obtained, the dwelling feature data include house physical property data, source of houses physical attribute number Characteristics and pirate recordings characteristic are veritified according to, three cards;
Wherein, the house physical property data includes physics floor, and floor type builds the age, if subway, if School district;The source of houses physical property data includes whether uniquely, source of houses grade, if full two, if full five, if registration contract tax Time, if registration this date of room, if register the original house purchase contract date, if safeguard people, have the time for safeguarding people, be It is no have it is real survey, have and real survey people's time, if having key, the time for having key people, if having a chain of single, there is the time of a chain of list; Three card veritifies characteristic and includes veritifying with the presence or absence of three cards for nearly 14 days, and the number that nearly 14 days three cards are veritified, history three is demonstrate,proved The number of veritification;The pirate recordings characteristic includes this room by how many different broker's pirate recordings, this room by pirate recordings number, Number of days of this typing away from last time typing, the number of days that this typing was checked and write off away from last time, all previous typing always be listed number of days, each typing Averagely listed number of days, average pirate recordings period.
Obtain tourist character data, the tourist character data include traveller's telephone feature data, on line characteristic and Intention gold characteristic;
Traveller's telephone feature data include nearly 14 days with the presence or absence of 400 phones, and the number of nearly 14 days 400 phones is gone through The number of 400 phone of history (400 phones are the phone of customers dial);Characteristic includes nearly 14 days with the presence or absence of pass on the line Note, the number of concern in nearly 14 days, the number of history concern;The intention gold characteristic includes whether there is intention in nearly 14 days Gold, the number of nearly 14 days intentions gold, the number of history intention gold.
Business proprietary features data are obtained, the business proprietary features data include characteristic and price adjustment characteristic on owner's line;
Characteristic includes nearest owner on the 14th and exposes quantity on owner's line, if there is owner's self-appraisal, nearest 14 days Owner propagandas directed to communicate quantity;The price adjustment characteristic includes nearest 14 days price adjustment numbers, nearest 14 days up-regulation numbers, under nearest 14 days Number, the last time price adjustment Days from present time are adjusted, nearest 14 days how many days adjusted valency, price before 14 days, present price lattice, ratio of readjusting prices Example.
Market characteristics data are obtained, the market characteristics data include supply and demand ratio characteristic and conclusion of the business periodic characteristic number According to;
The different clients number that small zone where the supply and demand ratio characteristic includes the nearest source of houses on the 30th has been seen, source of houses place Nearest 30 days of cell is listed source of houses quantity with room, the different clients number that the nearest band on the 30th in commercial circle where the source of houses has been seen, source of houses institute It is listed source of houses quantity with room nearest 30 days of commercial circle;The conclusion of the business periodic characteristic data include cell where nearest 90 days sources of houses It strikes a bargain the period with room, cell where nearest 90 days sources of houses is with room fixture number amount, the same room in commercial circle where nearest 90 days sources of houses It strikes a bargain the period, commercial circle is the same as room fixture number amount where nearest 90 days sources of houses.
Price feature data are obtained, the price feature data include cell average price characteristic and price ranking characteristic According to.
Cell where the cell average price characteristic includes the source of houses fitting day average price, conclusion of the business median valency, strike a bargain it is equal Valency, listed median valency and listed average price;Cell where the price ranking characteristic includes the source of houses with room total price ranking, With room unit price ranking, cell where the source of houses is listed quantity cell where the source of houses with room.
It is noted that if the number of days of listed solar distance sample day is less than the corresponding time limit in Yi Shang foundation characteristic data statistics It is required that, then it is counted with practical number of days.
On the basis of above-described embodiment, the embodiment of the present invention is by setting 6 major class, 16 group foundation characteristic data, i.e., House personal feature data are contained, and contain market characteristics data, the comprehensive of sample data feature is ensure that, is conducive to Improve the precision of machine learning.
Further, based on above-described embodiment, the statistical nature that the source of houses data are carried out with statistics and obtains the source of houses Data specifically include:Maximum value, minimum value, mode value, average value and the standard of feature are preset according to the source of houses data statistics Difference, so as to obtain the statistical nature data of the source of houses.
The selected section feature with distribution characteristics rule corresponding with foundation characteristic data is as the default feature, system Maximum value, minimum value, mode value, average value and the standard deviation of the default feature of meter, so as to obtain the statistical nature of the source of houses Data.Number is such as seen according to the nearest daily band on the 14th of broker, nearest 14 days daily bands of broker can be counted and see several maximums Value, minimum value, mode value, average value and standard deviation.
On the basis of above-described embodiment, the embodiment of the present invention can reflect the statistic quality of distribution characteristics by setting According to, ensure that the comprehensive of sample data feature, be conducive to improve machine learning precision.
Further, based on above-described embodiment, the prediction according to the prediction model is worth to the prediction source of houses Source of houses conclusion of the business probability in the preset time specifically includes:The history sample before preset number of days is obtained according to the prediction model This predicted value;Using pav algorithms according to the predicted value of the historical sample and the mark of the historical sample Label obtain the predicted value and the mapping table of the source of houses conclusion of the business probability;According to the predicted value of the prediction model and described Mapping table obtains source of houses conclusion of the business probability of the prediction source of houses in the preset time.
Value between one 0~1, such as 0.2 can be obtained according to the output of prediction model, if it is considered to the source of houses Conclusion of the business probability is 20%, and possible error is larger.To further improve precision of prediction, using historical sample to the defeated of prediction model Go out to be modified.According to the label of the predicted value of historical sample and historical sample (namely practical source of houses conclusion of the business probability), obtain The predicted value for the model that historical sample is reflected and the mapping table of source of houses conclusion of the business probability.Due to obtaining the mapping table Need to know the practical source of houses conclusion of the business probability of historical sample, so if to predict the source of houses conclusion of the business probability within 14 days, then Selected historical sample was needed at least away from 14 days days of prediction.Since the date of historical sample and prediction day are closer, precision of prediction It is higher, therefore the historical sample that may be selected before 14 days days of prediction obtains the mapping table.
After choosing historical sample, the historical sample (data characteristics before such as 14 days) is input to the prediction mould Type obtains the predicted value of the historical sample;Since the label of the historical sample is the historical sample in sample The source of houses conclusion of the business probability of 14 days in the future is source of houses conclusion of the business probability known to reality, therefore can be gone through using pav algorithms according to described The predicted value of history sample obtains the predicted value and the source of houses conclusion of the business probability with the label of the historical sample Mapping table;The predicted value of the prediction model and the mapping table are compareed, the pre- measuring room can be obtained Source of houses conclusion of the business probability of the source in the preset time.
On the basis of above-described embodiment, the embodiment of the present invention is by using the predicted value of historical data to prediction model Output is modified, and improves the precision and reliability of prediction.
Fig. 3 is that platform structure schematic diagram is estimated in the source of houses conclusion of the business provided in an embodiment of the present invention based on xgboost.Such as Fig. 3 Shown, the platform of estimating includes sample generation module 10, label add module 20, machine learning module 30 and prediction module 40, wherein:
Sample generation module 10 is specifically used for obtaining source of houses data, and the characteristic of the source of houses is obtained according to the source of houses data According to the characteristic of a set of source of houses same day forms a sample;
Sample generation module 10 first by sparksql from data warehouse obtain source of houses data, the source of houses data be with The related original recorded data of the source of houses.To obtain the training sample of machine learning, then united according to the source of houses data Meter obtains the characteristic of the source of houses, and the characteristic of the source of houses can be used for generating sample.To reach better prediction effect, The characteristic of the source of houses can be multiple.One sample is formed by the characteristic of a set of source of houses same day.
Label add module 20 is specifically used for adding each sample on label, and the label is the room in preset time Source conclusion of the business probability;
The preset time is a period of time of the sample of the sample in the future, and such as preset time can be sample day Afterwards in 14 days, then the label is the source of houses conclusion of the business probability in sample 14 days in the future.Label add module 20 is to each sample This addition label, if predicting the source of houses conclusion of the business probability in 14 days, the label is the source of houses in sample 14 days in the future The practical conclusion of the business situation of the source of houses, if struck a bargain, the label is set as 1;If do not struck a bargain, then the label is set as 0.
Machine learning module 30 is specifically used for exercising supervision study to the sample set with the label using xgboost, Obtain prediction model;
Machine learning module 30 exercises supervision study to the sample set with the label using xgboost, inputs as sample This and label, the sample are the characteristic, and the label is the source of houses conclusion of the business probability in the preset time, export and are Source of houses conclusion of the business probability in the preset time.By the supervised learning of xgboos algorithms, when output and label value meet centainly When required precision, the prediction model is obtained.
Prediction module 40 is specifically used for the characteristic for predicting the source of houses being input to the prediction model, according to described The prediction of prediction model is worth to source of houses conclusion of the business probability of the prediction source of houses in the preset time.
After obtaining the prediction model, the characteristic for predicting the source of houses same day is input to described pre- by prediction module 40 Model is surveyed, the output according to the prediction model is that the prediction of the prediction model is worth to the prediction source of houses described default Source of houses conclusion of the business probability in time.The time span of prediction should be with the time span of training sample label during progress machine learning It is identical, that is, if the label is the source of houses conclusion of the business probability in sample 14 days in the future, the predicted value of prediction model is also prediction Source of houses conclusion of the business probability in sample 14 days in the future.The predicted value of the prediction model is the value between 0~1.
In order to enable prediction model is more accurate to the assurance of existing market rule, after data characteristics new daily and sample generation Sample set re -training can be updated, obtains new prediction model.
The embodiment of the present invention is estimated by the way that xgboost is applied to the conclusion of the business on sale of practical house in scene, is capable of providing The conclusion of the business prospect of reliable individual and integral house;Achievable unattended automatic operating is estimated in conclusion of the business, reduces manpower money Source wastes.
Further, based on above-described embodiment, the sample generation module 10 is additionally operable to:It unites to the source of houses data Meter obtains the foundation characteristic data of the source of houses;The source of houses data are carried out with the statistical nature data that statistics obtains the source of houses;According to institute The statistical nature data of foundation characteristic data and the source of houses for stating the source of houses obtain the characteristic of the source of houses.
Sample generation module 10 carries out the source of houses data foundation characteristic data that statistics obtains the source of houses, the source of houses Foundation characteristic data are generally quantity statistics value, statistics of yes/no etc.;The source of houses data are carried out with statistics and obtains the source of houses Statistical nature data, the statistical nature data of the source of houses are the data related with distribution characteristics.
The feature of the source of houses is obtained according to the statistical nature data of the foundation characteristic data of the source of houses and the source of houses Data, the characteristic of the source of houses include the foundation characteristic data of the source of houses and the statistical nature data of the source of houses.
For same set of house, it is fused together in the characteristic of same day and forms the sample, but all kinds of The phenomenon that missing is had in characteristic, therefore, it is necessary to carry out completion operation to necessary characteristic.
On the basis of above-described embodiment, set by the embodiment of the present invention composition sample characteristic, both include by The statistical value that initial data obtains, and the value that a step is counted is included into, it ensure that the comprehensive of forecast sample.
Further, based on above-described embodiment, the sample generation module 10 is additionally operable to:
Obtain broker's characteristic, broker's characteristic include broker with see characteristic, broker with Into characteristic and people's characteristic is safeguarded, nearest 14 days of the different clients number seen such as the nearest band on the 14th of broker, broker Common follow-up quantity safeguards that people safeguards that people safeguards probability of success etc. in nearest 90 days;
Dwelling feature data are obtained, the dwelling feature data include house physical property data, source of houses physical attribute number Veritify characteristics and pirate recordings characteristic according to, three cards, as physics floor, it is whether unique, veritified with the presence or absence of three cards within nearly 14 days, This room is by pirate recordings number etc.;
Obtain tourist character data, the tourist character data include traveller's telephone feature data, on line characteristic and Intention gold characteristic whether there is intention in such as nearly 14 days with the presence or absence of 400 phones, nearly 14 days with the presence or absence of concern, nearly 14 days Gold etc..
Business proprietary features data are obtained, the business proprietary features data include characteristic and price adjustment characteristic on owner's line, Such as nearest owner on the 14th exposes quantity, nearest 14 days price adjustment numbers;
Market characteristics data are obtained, the market characteristics data include supply and demand ratio characteristic and conclusion of the business periodic characteristic number According to, such as small zone where the nearest source of houses on the 30th seen different clients number, cell strikes a bargain week with room where nearest 90 days sources of houses Phase etc.;
Price feature data are obtained, the price feature data include cell average price characteristic and price ranking characteristic According to, such as cell where the source of houses fitting day average price, cell where source is the same as room total price ranking.
On the basis of above-described embodiment, the embodiment of the present invention is by setting 6 major class, 16 group foundation characteristic data, i.e., House personal feature data are contained, and contain market characteristics data, the comprehensive of sample data feature is ensure that, is conducive to Improve the precision of machine learning.
Further, based on above-described embodiment, the sample generation module 10 is additionally operable to:According to the source of houses data statistics Maximum value, minimum value, mode value, average value and the standard deviation of default feature, so as to obtain the statistic quality of the source of houses According to.
The selected section feature with distribution characteristics rule corresponding with foundation characteristic data is as the default feature, sample This generation module 10 counts maximum value, minimum value, mode value, average value and the standard deviation of the default feature, so as to obtain The statistical nature data of the source of houses.
On the basis of above-described embodiment, the embodiment of the present invention can reflect the statistic quality of distribution characteristics by setting According to, ensure that the comprehensive of sample data feature, be conducive to improve machine learning precision.
Further, based on above-described embodiment, the prediction module 40 is additionally operable to:It is obtained according to the prediction model default The predicted value of historical sample before number of days;It is gone through using pav algorithms according to the predicted value of the historical sample with described The label of history sample obtains the predicted value and the mapping table of the source of houses conclusion of the business probability;According to the prediction model Predicted value and the mapping table obtain the source of houses conclusion of the business probability of the prediction source of houses in the preset time.
To further improve precision of prediction, the output that prediction module 40 can carry out to prediction model using historical sample carries out It corrects.The predicted value and room for the model that historical sample is reflected are obtained according to the label of the predicted value of historical sample and historical sample The mapping table of source conclusion of the business probability.Due to showing that the mapping table needs to know that the practical source of houses of historical sample strikes a bargain Probability, so if to predict the source of houses conclusion of the business probability within 14 days, then selected historical sample was needed at least away from prediction day 14 My god.
Prediction module 40 chooses historical sample and the historical sample is input to the prediction model, obtains described go through The predicted value of history sample;It can be according to the predicted value of the historical sample and the historical sample using pav algorithms The label obtain the predicted value and the mapping table of the source of houses conclusion of the business probability;By the predicted value of the prediction model It is compareed with the mapping table, source of houses conclusion of the business probability of the prediction source of houses in the preset time can be obtained.
On the basis of above-described embodiment, the embodiment of the present invention is by using the predicted value of historical data to prediction model Output is modified, and improves the precision and reliability of prediction.
Platform provided in an embodiment of the present invention is for the above method, and concrete function can refer to above method flow, this Place repeats no more.
Fig. 4 is the structure diagram of electronic equipment provided in an embodiment of the present invention.As shown in figure 4, electronic equipment 1 includes place Manage device 401, memory 402 and bus 403.Wherein, the processor 401 and the memory 402 are complete by the bus 403 Into mutual communication;The processor 401 is used to call the program instruction in the memory 402, to perform above-mentioned each side The method that method embodiment is provided, such as including:Source of houses data are obtained, the characteristic of the source of houses is obtained according to the source of houses data According to the characteristic of a set of source of houses same day forms a sample;To each sample addition label, the label is Source of houses conclusion of the business probability in preset time;It is exercised supervision study, obtained pre- using xgboost to the sample set with the label Survey model;The characteristic for predicting the source of houses is input to the prediction model, is obtained according to the predicted value of the prediction model To source of houses conclusion of the business probability of the prediction source of houses in the preset time.
The embodiment of the present invention discloses a kind of computer program product, and the computer program product includes being stored in non-transient Computer program on computer readable storage medium, the computer program include program instruction, when described program instructs quilt When computer performs, computer is able to carry out the method that above-mentioned each method embodiment is provided, such as including:Obtain source of houses number According to obtaining the characteristic of the source of houses according to the source of houses data, the characteristic of a set of source of houses same day forms a sample This;To each sample addition label, the label is the source of houses conclusion of the business probability in preset time;To with the label Sample set is exercised supervision study using xgboost, obtains prediction model;The characteristic for predicting the source of houses is input to described Prediction model is worth to the source of houses of the prediction source of houses in the preset time according to the prediction of the prediction model and strikes a bargain generally Rate.
The embodiment of the present invention provides a kind of non-transient computer readable storage medium storing program for executing, the non-transient computer readable storage Medium storing computer instructs, and the computer instruction makes the computer perform the side that above-mentioned each method embodiment is provided Method, such as including:Source of houses data are obtained, the characteristic of the source of houses is obtained according to the source of houses data, a set of source of houses same day The characteristic forms a sample;To each sample addition label, the label for the source of houses in preset time into Hand over probability;It is exercised supervision study using xgboost to the sample set with the label, obtains prediction model;It will predict the source of houses The characteristic be input to the prediction model, according to the prediction of the prediction model be worth to it is described prediction the source of houses in institute State the source of houses conclusion of the business probability in preset time.
One of ordinary skill in the art will appreciate that:Realizing all or part of step of above method embodiment can pass through The relevant hardware of program instruction is completed, and aforementioned program can be stored in a computer read/write memory medium, the program When being executed, step including the steps of the foregoing method embodiments is performed;And aforementioned storage medium includes:ROM, RAM, magnetic disc or light The various media that can store program code such as disk.
The embodiments such as electronic equipment described above are only schematical, illustrate wherein described as separating component Unit may or may not be physically separate, and the component shown as unit may or may not be object Manage unit, you can be located at a place or can also be distributed in multiple network element.It can select according to the actual needs Some or all of module therein is selected to realize the purpose of this embodiment scheme.Those of ordinary skill in the art are not paying wound In the case of the labour for the property made, you can to understand and implement.
Through the above description of the embodiments, those skilled in the art can be understood that each embodiment can It is realized by the mode of software plus required general hardware platform, naturally it is also possible to pass through hardware.Based on such understanding, on Technical solution is stated substantially in other words to embody the part that the prior art contributes in the form of software product, it should Computer software product can store in a computer-readable storage medium, such as ROM/RAM, magnetic disc, CD, including several fingers It enables and (can be personal computer, server or the network equipment etc.) so that electronic equipment is used to perform each embodiment Or the method described in certain parts of embodiment.
Finally it should be noted that:The above embodiments are merely illustrative of the technical solutions of the present invention, rather than its limitations;Although The present invention is described in detail with reference to the foregoing embodiments, it will be understood by those of ordinary skill in the art that:It still may be used To modify to the technical solution recorded in foregoing embodiments or carry out equivalent replacement to which part technical characteristic; And these modification or replace, various embodiments of the present invention technical solution that it does not separate the essence of the corresponding technical solution spirit and Range.

Claims (10)

1. a kind of source of houses conclusion of the business predictor method based on xgboost, which is characterized in that including:
Source of houses data are obtained, the characteristic of the source of houses, the feature of a set of source of houses same day are obtained according to the source of houses data Data form a sample;
To each sample addition label, the label is the source of houses conclusion of the business probability in preset time;
It is exercised supervision study using xgboost to the sample set with the label, obtains prediction model;
The characteristic for predicting the source of houses is input to the prediction model, institute is worth to according to the prediction of the prediction model State source of houses conclusion of the business probability of the prediction source of houses in the preset time.
2. according to the method described in claim 1, it is characterized in that, had according to the characteristic that the source of houses data obtain the source of houses Body includes:
The source of houses data are carried out with the foundation characteristic data that statistics obtains the source of houses;
The source of houses data are carried out with the statistical nature data that statistics obtains the source of houses;
The characteristic of the source of houses is obtained according to the statistical nature data of the foundation characteristic data of the source of houses and the source of houses.
3. according to the method described in claim 2, it is characterized in that, the foundation characteristic data for obtaining the source of houses specifically include:
Obtain broker's characteristic, broker's characteristic includes broker with seeing that it is special that characteristic, broker follow up It levies data and safeguards people's characteristic;
Dwelling feature data are obtained, the dwelling feature data include house physical property data, source of houses physical property data, three Card veritifies characteristic and pirate recordings characteristic;
Tourist character data are obtained, the tourist character data include characteristic and intention on traveller's telephone feature data, line Golden characteristic;
Business proprietary features data are obtained, the business proprietary features data include characteristic and price adjustment characteristic on owner's line;
Market characteristics data are obtained, the market characteristics data include supply and demand ratio characteristic and conclusion of the business periodic characteristic data;
Price feature data are obtained, the price feature data include cell average price characteristic and price ranking characteristic.
4. according to the method described in claim 2, it is characterized in that, the statistics that carried out to the source of houses data obtains the source of houses Statistical nature data, specifically include:
Maximum value, minimum value, mode value, average value and the standard deviation of feature are preset according to the source of houses data statistics, so as to Obtain the statistical nature data of the source of houses.
5. according to the method described in claim 1, it is characterized in that, described in the prediction according to the prediction model is worth to Predict that source of houses conclusion of the business probability of the source of houses in the preset time specifically includes:
The predicted value of the historical sample before preset number of days is obtained according to the prediction model;
It is described pre- according to the predicted value of the historical sample and the label acquisition of the historical sample using pav algorithms The mapping table of measured value and the source of houses conclusion of the business probability;
The prediction source of houses is obtained in the preset time according to the predicted value of the prediction model and the mapping table Source of houses conclusion of the business probability.
6. platform is estimated in a kind of source of houses conclusion of the business based on xgboost, which is characterized in that including:
Specifically for acquisition source of houses data, the characteristic of the source of houses is obtained according to the source of houses data for sample generation module, a set of The characteristic of the source of houses same day forms a sample;
Label add module, specifically for adding label to each sample, the label for the source of houses in preset time into Hand over probability;
Machine learning module specifically for being exercised supervision study using xgboost to the sample set with the label, is obtained pre- Survey model;
Prediction module, specifically for the characteristic for predicting the source of houses is input to the prediction model, according to the prediction The prediction of model is worth to source of houses conclusion of the business probability of the prediction source of houses in the preset time.
7. according to claim 6 estimate platform, which is characterized in that the sample generation module is additionally operable to:
The source of houses data are carried out with the foundation characteristic data that statistics obtains the source of houses;
The source of houses data are carried out with the statistical nature data that statistics obtains the source of houses;
The characteristic of the source of houses is obtained according to the statistical nature data of the foundation characteristic data of the source of houses and the source of houses.
8. according to claim 6 estimate platform, which is characterized in that the prediction module is additionally operable to:
The predicted value of the historical sample before preset number of days is obtained according to the prediction model;
It is described pre- according to the predicted value of the historical sample and the label acquisition of the historical sample using pav algorithms The mapping table of measured value and the source of houses conclusion of the business probability;
The prediction source of houses is obtained in the preset time according to the predicted value of the prediction model and the mapping table Source of houses conclusion of the business probability.
9. a kind of electronic equipment, which is characterized in that including memory and processor, the processor and the memory pass through total Line completes mutual communication;The memory is stored with the program instruction that can be performed by the processor, the processor tune The method as described in claim 1 to 5 is any is able to carry out with described program instruction.
10. a kind of computer readable storage medium, is stored thereon with computer program, which is characterized in that the computer program quilt The method as described in claim 1 to 5 is any is realized when processor performs.
CN201810022667.XA 2018-01-10 2018-01-10 A kind of source of houses conclusion of the business predictor method based on xgboost and estimate platform Withdrawn CN108256757A (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN201810022667.XA CN108256757A (en) 2018-01-10 2018-01-10 A kind of source of houses conclusion of the business predictor method based on xgboost and estimate platform

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN201810022667.XA CN108256757A (en) 2018-01-10 2018-01-10 A kind of source of houses conclusion of the business predictor method based on xgboost and estimate platform

Publications (1)

Publication Number Publication Date
CN108256757A true CN108256757A (en) 2018-07-06

Family

ID=62724801

Family Applications (1)

Application Number Title Priority Date Filing Date
CN201810022667.XA Withdrawn CN108256757A (en) 2018-01-10 2018-01-10 A kind of source of houses conclusion of the business predictor method based on xgboost and estimate platform

Country Status (1)

Country Link
CN (1) CN108256757A (en)

Cited By (7)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN108876487A (en) * 2018-08-29 2018-11-23 盈盈(杭州)网络技术有限公司 A kind of industrial plot estimation method based on big data and intelligent decision mechanism
CN109272165A (en) * 2018-09-30 2019-01-25 江苏满运软件科技有限公司 Register probability predictor method, device, storage medium and electronic equipment
CN111091426A (en) * 2019-12-31 2020-05-01 青梧桐有限责任公司 House resource pricing method and system
CN111126849A (en) * 2019-12-25 2020-05-08 贝壳技术有限公司 Method, device and equipment for assisting target logistics transfer by computer
CN111292140A (en) * 2020-03-19 2020-06-16 重庆锐云科技有限公司 Online customer intelligent distribution method
CN112597181A (en) * 2020-11-25 2021-04-02 贝壳技术有限公司 Link list searching method and device
WO2022032332A1 (en) * 2020-08-12 2022-02-17 Domain Holdings Australia Limited Property lead finder systems and methods of its use

Cited By (10)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN108876487A (en) * 2018-08-29 2018-11-23 盈盈(杭州)网络技术有限公司 A kind of industrial plot estimation method based on big data and intelligent decision mechanism
CN109272165A (en) * 2018-09-30 2019-01-25 江苏满运软件科技有限公司 Register probability predictor method, device, storage medium and electronic equipment
CN109272165B (en) * 2018-09-30 2021-04-20 满帮信息咨询有限公司 Registration probability estimation method and device, storage medium and electronic equipment
CN111126849A (en) * 2019-12-25 2020-05-08 贝壳技术有限公司 Method, device and equipment for assisting target logistics transfer by computer
CN111091426A (en) * 2019-12-31 2020-05-01 青梧桐有限责任公司 House resource pricing method and system
CN111292140A (en) * 2020-03-19 2020-06-16 重庆锐云科技有限公司 Online customer intelligent distribution method
WO2022032332A1 (en) * 2020-08-12 2022-02-17 Domain Holdings Australia Limited Property lead finder systems and methods of its use
GB2612278A (en) * 2020-08-12 2023-04-26 Domain Holdings Australia Ltd Property lead finder systems and methods of its use
CN112597181A (en) * 2020-11-25 2021-04-02 贝壳技术有限公司 Link list searching method and device
CN112597181B (en) * 2020-11-25 2022-11-04 贝壳技术有限公司 Link list searching method and device

Similar Documents

Publication Publication Date Title
CN108256757A (en) A kind of source of houses conclusion of the business predictor method based on xgboost and estimate platform
US11699198B2 (en) Methods and systems for machine-learning for prediction of grid carbon emissions
Gonzalez et al. Forecasting power prices using a hybrid fundamental-econometric model
Weber Uncertainty in the electric power industry: methods and models for decision support
Cincotti et al. Macroprudential policies in an agent-based artificial economy
CN106203773A (en) A kind of method and device of automatic management employee
CN108280541A (en) Customer service strategies formulating method, device based on random forest and decision tree
WO2013102932A2 (en) System and method facilitating forecasting, optimization and visualization of energy data for an industry
CN108550090A (en) A kind of processing method and system of determining source of houses pricing information
CN109993366A (en) Comprehensive energy Market Competition Strategy determines method, apparatus, equipment and storage medium
CN110046981A (en) A kind of credit estimation method, device and storage medium
CN108572988A (en) A kind of house property assessment data creation method and device
CN108921425A (en) A kind of method, system and the server of asset item classifcation of investment
Scarcioffolo et al. Counterfactual comparisons of investment options for wind power and agricultural production in the United States: Lessons from Northern Ohio
CN109146316A (en) Power marketing checking method, device and computer readable storage medium
Yates et al. Using economic and other performance measures to evaluate a municipal drought plan
Michailidis et al. A socioeconomic valuation of an irrigation system project based on real option analysis approach
Kuzior et al. The current state of scientific research of the process of risk management of Ukrainian energy sector enterprises
Kriz et al. The role of policy in the relationship between ICT adoption and economic development: a comparative analysis of Singapore and Malaysia
US20130031021A1 (en) Method and system for analyzing investments and investment plans and risks and related businesses with dynamic decision making
WO2019043729A2 (en) Performance management system and method for rotating savings and credit asset
Zagar et al. Energy cost forecasting for event venues
van der Pas et al. Improving the Predictability of it Business Value Using Reference Classes: Insights from Post Project Investment Reviews
US20140214720A1 (en) Financial Options System and Method
Watel-Dehaynin Moving towards a more sustainable model of energy production & consumption: a case for Indonesia

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination
WW01 Invention patent application withdrawn after publication

Application publication date: 20180706