CN111126714A - Long-rental apartment house renting scene-based refund prediction system and method - Google Patents

Long-rental apartment house renting scene-based refund prediction system and method Download PDF

Info

Publication number
CN111126714A
CN111126714A CN201911412712.3A CN201911412712A CN111126714A CN 111126714 A CN111126714 A CN 111126714A CN 201911412712 A CN201911412712 A CN 201911412712A CN 111126714 A CN111126714 A CN 111126714A
Authority
CN
China
Prior art keywords
lease
quitting
data
user
sample
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Withdrawn
Application number
CN201911412712.3A
Other languages
Chinese (zh)
Inventor
李志武
李昭
陈浩
高靖
崔岩
卢述奇
陈呈
张宵
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Qingwutong Co ltd
Original Assignee
Qingwutong Co ltd
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Qingwutong Co ltd filed Critical Qingwutong Co ltd
Priority to CN201911412712.3A priority Critical patent/CN111126714A/en
Publication of CN111126714A publication Critical patent/CN111126714A/en
Withdrawn legal-status Critical Current

Links

Images

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06QINFORMATION AND COMMUNICATION TECHNOLOGY [ICT] SPECIALLY ADAPTED FOR ADMINISTRATIVE, COMMERCIAL, FINANCIAL, MANAGERIAL OR SUPERVISORY PURPOSES; SYSTEMS OR METHODS SPECIALLY ADAPTED FOR ADMINISTRATIVE, COMMERCIAL, FINANCIAL, MANAGERIAL OR SUPERVISORY PURPOSES, NOT OTHERWISE PROVIDED FOR
    • G06Q10/00Administration; Management
    • G06Q10/04Forecasting or optimisation specially adapted for administrative or management purposes, e.g. linear programming or "cutting stock problem"
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06QINFORMATION AND COMMUNICATION TECHNOLOGY [ICT] SPECIALLY ADAPTED FOR ADMINISTRATIVE, COMMERCIAL, FINANCIAL, MANAGERIAL OR SUPERVISORY PURPOSES; SYSTEMS OR METHODS SPECIALLY ADAPTED FOR ADMINISTRATIVE, COMMERCIAL, FINANCIAL, MANAGERIAL OR SUPERVISORY PURPOSES, NOT OTHERWISE PROVIDED FOR
    • G06Q10/00Administration; Management
    • G06Q10/06Resources, workflows, human or project management; Enterprise or organisation planning; Enterprise or organisation modelling
    • G06Q10/067Enterprise or organisation modelling
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06QINFORMATION AND COMMUNICATION TECHNOLOGY [ICT] SPECIALLY ADAPTED FOR ADMINISTRATIVE, COMMERCIAL, FINANCIAL, MANAGERIAL OR SUPERVISORY PURPOSES; SYSTEMS OR METHODS SPECIALLY ADAPTED FOR ADMINISTRATIVE, COMMERCIAL, FINANCIAL, MANAGERIAL OR SUPERVISORY PURPOSES, NOT OTHERWISE PROVIDED FOR
    • G06Q30/00Commerce
    • G06Q30/06Buying, selling or leasing transactions
    • G06Q30/0645Rental transactions; Leasing transactions
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06QINFORMATION AND COMMUNICATION TECHNOLOGY [ICT] SPECIALLY ADAPTED FOR ADMINISTRATIVE, COMMERCIAL, FINANCIAL, MANAGERIAL OR SUPERVISORY PURPOSES; SYSTEMS OR METHODS SPECIALLY ADAPTED FOR ADMINISTRATIVE, COMMERCIAL, FINANCIAL, MANAGERIAL OR SUPERVISORY PURPOSES, NOT OTHERWISE PROVIDED FOR
    • G06Q50/00Systems or methods specially adapted for specific business sectors, e.g. utilities or tourism
    • G06Q50/10Services
    • G06Q50/16Real estate

Abstract

The application discloses a system and a method for predicting refund based on a long-rent apartment renting scene, which relate to the technical field of data statistics, and a specific implementation mode of the system comprises the following steps: the system comprises a quit lease sample acquisition module, a feature module and a model training module; the lease quitting sample acquisition module is coupled with the characteristic module and used for acquiring lease quitting sample data of a user to construct a lease quitting sample set and sending the lease quitting sample set to the characteristic module; the characteristic module is respectively coupled with the leased withdrawal sample acquisition module and the model training module and used for receiving the leased withdrawal sample set, acquiring characteristic data, preprocessing the characteristic data to obtain a leased withdrawal data set and sending the leased withdrawal data set to the model training module; the model training module is coupled with the characteristic obtaining module and used for receiving the leased refunding data set and training the leased refunding data set according to the XGboost model to obtain a leased refunding prediction model for prediction. According to the implementation mode, the refund behavior of the user can be pre-judged in time by constructing the refund prediction model, and the economic loss of long-rented apartments is reduced.

Description

Long-rental apartment house renting scene-based refund prediction system and method
Technical Field
The application relates to the technical field of data statistics, in particular to a system and a method for predicting refund based on a long-rent apartment renting scene.
Background
With the development of socio-economy, the size of the floating population flowing from a less-developed area to a developed area is continuously enlarged, the housing price of the developed area is high, and many foreign people select renting houses to solve temporary lodging problems. In recent years, long-rental apartments gradually become adaptive residential places, however, when a user rents, the user cannot know in time how to rent and cannot make a pre-judgment to perform a remedial measure, the benefits of the long-rental apartments are not benefited, and a model for predicting the renting and the renting based on the renting scene of the long-rental apartments is required to be designed.
In the process of implementing the invention, the inventor finds out that the applicability of the current long-rental apartment user quitting model prediction field has the following problems:
1. the ratio difference between the renting-quitting sample (positive sample) and the non-renting-quitting sample (negative sample) is large, and the samples are unbalanced;
2. the signing time of the long rental apartment user is long, the behavior of the user is not high frequency, and the behavior is sparse;
3. the updating iteration of the product form, the long-term sample characteristic data and the like are not accurate.
Disclosure of Invention
In view of this, the application discloses a system and a method for predicting refund based on a house renting scene of a long renting apartment, which can pre-judge the house refund behavior of a user in time and reduce the economic loss of the long renting apartment.
To achieve the above object, according to an aspect of an embodiment of the present invention, there is provided a system for predicting a lease-back based on a house-renting scenario of a long-lease apartment, including: the system comprises a quit lease sample acquisition module, a feature module and a model training module;
the lease quitting sample acquisition module is coupled with the characteristic module and used for acquiring lease quitting sample data of the user, constructing a lease quitting sample set through the lease quitting sample data of the user and sending the lease quitting sample set to the characteristic module; the lease quitting sample set comprises positive lease quitting sample data and negative lease quitting sample data; the data of the positive quitting sample is the data of the quitted user, and the data of the negative quitting sample is the data before the quitted user quits.
The characteristic module is respectively coupled with the leased withdrawal sample acquisition module and the model training module and is used for receiving a leased withdrawal sample set sent by the leased withdrawal sample acquisition module, acquiring characteristic data according to the leased withdrawal sample set, preprocessing the characteristic data to obtain a leased withdrawal data set and sending the leased withdrawal data set to the model training module;
and the model training module is coupled with the characteristic obtaining module and used for receiving the leased refunding data set sent by the characteristic module and training the leased refunding data set according to the XGboost model to obtain a leased refunding prediction model for prediction.
Preferably, the sample data of the user quitting lease is constructed by a K-Means algorithm to form a sample set of the quitting lease.
Preferably, the feature module comprises a user feature unit, a cross feature unit and a house source feature unit, and the feature data comprises user feature data, cross feature data and house source feature data;
the user characteristic unit is respectively coupled with the leased withdrawal sample acquisition module and the model training module and is used for receiving the leased withdrawal sample set sent by the leased withdrawal sample acquisition module, acquiring user characteristic data according to the leased withdrawal sample set and sending the user characteristic data to the model training module;
the cross feature unit is respectively coupled with the leased withdrawal sample acquisition module and the model training module and is used for receiving a leased withdrawal sample set sent by the leased withdrawal sample acquisition module, acquiring cross feature data according to the leased withdrawal sample set and sending the cross feature data to the model training module;
and the house source characteristic unit is respectively coupled with the renting sample acquisition module and the model training module and is used for receiving the renting quitting sample set sent by the renting quitting sample acquisition module, acquiring house source characteristic data according to the renting quitting sample set and sending the house source characteristic data to the model training module.
Preferably, the user characteristic data comprises user basic information and user characteristic behaviors, wherein the user basic information comprises occupation, age, gender, education degree, nationality and user channels, and the user characteristic behaviors comprise renting duration, refunding behaviors, complaint behaviors and webpage browsing behaviors;
the cross feature data comprises user return visit information and user preference information, wherein the user return visit information comprises house source comprehensive scores and service quality scores, and the user preference information comprises commuting distance, price requirements and surrounding business circle requirements;
the house source characteristic data comprises business district characteristic information, cell characteristic information and house state characteristic information, wherein the business district characteristic information comprises business district ratings based on different cities, the cell characteristic information comprises basic information, grade characteristics, price characteristics, traffic characteristics and commuting time of cells, and the house state characteristic information comprises basic attributes of house sources, dynamic attributes of the house sources and suite attributes of the house sources.
Preferably, preprocessing the feature data by using a Spark frame and a Hive data warehouse to obtain a leased withdrawal data set; wherein the preprocessing includes selection, filtering, deduplication, sampling, transformation, data replacement, weighting, attribute generation, and data padding.
According to another aspect of the embodiments of the present invention, there is provided a method for predicting a refund based on a long-rental apartment renting scenario, including the steps of:
acquiring user quitting sample data, and constructing a quitting sample set through the user quitting sample data;
acquiring feature data according to the lease quitting sample set, preprocessing the feature data to obtain a lease quitting data set,
and training the leased withdrawal data set according to the XGboost model to obtain a leased withdrawal prediction model for prediction.
Preferably, the user quitting sample data constructs a quitting sample set through a K-Means algorithm;
the lease quitting sample set comprises positive lease quitting sample data and negative lease quitting sample data;
the data of the positive quitting sample is the data of the quitted user, and the data of the negative quitting sample is the data before the quitted user quits.
Preferably, the feature data includes user feature data, cross feature data, and house source feature data.
Preferably, the user characteristic data comprises user basic information and user characteristic behaviors, wherein the user basic information comprises occupation, age, gender, education degree, nationality and user channels, and the user characteristic behaviors comprise renting duration, refunding behaviors, complaint behaviors and webpage browsing behaviors;
the cross feature data comprises user return visit information and user preference information, wherein the user return visit information comprises house source comprehensive scores and service quality scores, and the user preference information comprises commuting distance, price requirements and surrounding business circle requirements;
the house source characteristic data comprises business district characteristic information, cell characteristic information and house state characteristic information, wherein the business district characteristic information comprises business district ratings based on different cities, the cell characteristic information comprises basic information, grade characteristics, price characteristics, traffic characteristics and commuting time of cells, and the house state characteristic information comprises basic attributes of house sources, dynamic attributes of the house sources and suite attributes of the house sources.
Preferably, preprocessing the feature data by using a Spark frame and a Hive data warehouse to obtain a leased withdrawal data set; wherein the preprocessing includes selection, filtering, deduplication, sampling, transformation, data replacement, weighting, attribute generation, and data padding.
Compared with the prior art, the system and the method for predicting the rent withdrawal based on the house renting scene of the long-rent apartment provided by the invention have the following beneficial effects:
1. according to the system and the method for predicting the rent withdrawal based on the house-renting scene of the long-renting apartment, provided by the invention, the rent withdrawal prediction model can be obtained through training, the probability of midway rent withdrawal of the long-renter in the field of the long-renting apartment can be predicted through the rent withdrawal prediction model, the rent withdrawal probability of the user is predicted, different operation preferential activities can be carried out for the users with different rent withdrawal probabilities, the user can be retrieved as far as possible, the vacant time of the house is reduced, and the loss of an enterprise and the loss of the user caused by midway rent withdrawal of the user are reduced. Meanwhile, the service of the long-rental apartment can be optimized according to the predicted user rental withdrawal probability and the withdrawal reason thereof, the withdrawal caused by some human factors is avoided, and the user rental experience and the enterprise competitiveness are improved; and according to the predicted result, the consideration factors of a part of users are solved, and different solutions are provided for the users according to different lease withdrawal reasons, so that the user requirements are met.
2. According to the system and the method for predicting the leased refund based on the house renting scene of the long-rented apartment, the leased refund sample set is constructed through the K-Means algorithm, the problems that the proportion difference between a sample (positive sample) for the leased refund and a sample (negative sample) for the non-leased refund is large, the samples are unbalanced and the like can be solved, and the problems that the time for signing a contract of a user of the long-rented apartment is long, the behavior of the user is not high, the behavior is sparse and the like can be solved.
3. According to the system and the method for predicting the leased refund based on the house renting scene of the long-rented apartment, provided by the invention, the feature data are preprocessed by utilizing the Spark framework and the Hive data warehouse to obtain the leased refund data set, the obtained leased refund data set is all effective values, and the problems that the product form is updated and iterated, the feature data of a long sample and the like are inaccurate and the like can be solved.
Further effects of the above-mentioned non-conventional alternatives will be described below in connection with the embodiments.
Drawings
The accompanying drawings, which are included to provide a further understanding of the application and are incorporated in and constitute a part of this application, illustrate embodiment(s) of the application and together with the description serve to explain the application and not to limit the application. In the drawings:
fig. 1 is a schematic structural diagram of an embodiment of a refund prediction system based on a long-rental apartment renting scene according to the present invention;
fig. 2 is a flowchart illustrating an example of a lease quitting prediction method based on a long-lease apartment house renting scenario according to the present invention.
Detailed Description
Exemplary embodiments of the present invention are described below with reference to the accompanying drawings, in which various details of embodiments of the invention are included to assist understanding, and which are to be considered as merely exemplary. Accordingly, those of ordinary skill in the art will recognize that various changes and modifications of the embodiments described herein can be made without departing from the scope and spirit of the invention. Also, descriptions of well-known functions and constructions are omitted in the following description for clarity and conciseness.
Example 1:
fig. 1 is a schematic structural diagram of an embodiment of a refund prediction system based on a long-rental apartment renting scene according to the present invention; as shown in fig. 1, an embodiment of the present invention provides a system 100 for predicting a refund based on a long-rental apartment renting scenario, including: a lease quitting sample obtaining module 1, a characteristic module 2 and a model training module 3;
the lease quitting sample acquisition module 1 is coupled with the characteristic module 2 and used for acquiring lease quitting sample data of a user, constructing a lease quitting sample set through the lease quitting sample data of the user and sending the lease quitting sample set to the characteristic module 2; the lease quitting sample set comprises positive lease quitting sample data and negative lease quitting sample data; the data of the positive quitting sample is the data of the quitted user, and the data of the negative quitting sample is the data before the quitted user quits.
It is to be understood that a positive exemplar is defined as a leased user and a negative exemplar is defined as a non-leased user; historical data are all leased quit users and are sample data of the returning lease; the negative quit sample data is the behavior data of the quit user in different periods; because of the business scenario relationship, the overall sample size is not large. We need to interpolate through a small number of samples to generate additional new samples. We useThe method of (1) is the K-Means method. Constructing a quitting sample set by the user quitting sample data through a K-Means algorithm; calculating Euclidean distance of n-dimensional space between samples, wherein the coordinate of the sample is (x)i,yi) And (x)j,yj) The euclidean distance d is calculated according to the following method:
Figure BDA0002350392120000051
the nearest k samples of the same type (again positive samples or again negative samples) are then randomly selected from the k sample points, and new sample points are generated according to the following method:
Figure BDA0002350392120000061
wherein xnewIs newly generated quit sample data, xiIs a selected k neighbor, δ ∈ [0,1 ]]Is a random number; and the newly generated sample data of the quitting lease forms a sample set of the quitting lease.
For example, the data of the same positive sample has coordinates of (1,1), (2,2),
(2,1),(3,3),(5,5). Randomly selecting (1,1) sample points to generate new samples, calculating 2 samples closest to the (1,1) sample points, and easily obtaining the points (2,1) and (2,2) closest to the (1,1) sample points according to the Euclidean distance calculation formula, wherein the points are equivalent to x in the formulaiIs (1,1),
Figure BDA0002350392120000062
these are ((2+1)/2, (1+1)/2) ═ 1.5,1 and ((2+1)/2, (2+1)/2) ═ 1.5, 1.5), respectively. Corresponding x generated when delta takes 0.5newThe newly generated lease sample data constitute a lease withdrawal sample set, where (1+ (1.5-1) × 0.5,1+ (1-1) × 0.5) ═ 1.25,1, and (1+ (1.5-1) × 0.5,1+ (1.5-1) × 0.5) ═ 1.25, respectively.
The lease returning sample acquisition module 1 constructs a lease returning sample set through a K-Means algorithm, can solve the problems that the proportion difference between lease returning samples (positive samples) and lease returning samples (negative samples) is large, samples are unbalanced and the like, and can also solve the problems that the time for signing a contract of a long lease apartment user is long, the behavior of the user is not high frequency, the behavior is sparse and the like.
The characteristic module 2 is respectively coupled with the lease returning sample acquisition module 1 and the model training module 3 and is used for receiving the lease returning sample set sent by the lease returning sample acquisition module 1, acquiring characteristic data according to the lease returning sample set, preprocessing the characteristic data to obtain a lease returning data set and sending the lease returning data set to the model training module 3; the feature module 2 can be used for judging the lease-back factor of the user.
Further, preprocessing the feature data by using a Spark frame and a Hive data warehouse to obtain a leased withdrawal data set; the preprocessing comprises selection, filtering, deduplication, sampling, transformation, data replacement, weighting, attribute generation and data padding, and the main functions of the preprocessing comprise checking the consistency of data, invalid values of the data, missing values and the like.
It can be understood that, the feature module 2 preprocesses the feature data by using a Spark frame and a Hive data warehouse to obtain a lease-quitting data set, and the obtained lease-quitting data sets are all valid values, so that the problems of inaccurate updating iteration of product forms, inaccurate long-term sample feature data and the like can be solved.
The feature module 2 comprises a user feature unit 21, a cross feature unit 22 and a house source feature unit 23, and the feature data comprises user feature data, cross feature data and house source feature data;
the user characteristic unit 21 is respectively coupled with the leased withdrawal sample acquisition module 1 and the model training module 3, and is used for receiving the leased withdrawal sample set sent by the leased withdrawal sample acquisition module 1, acquiring user characteristic data according to the leased withdrawal sample set, and sending the user characteristic data to the model training module 3; the user characteristic data comprises basic user information and user characteristic behavior, wherein,
the basic information of the user comprises the characteristics of occupation, age, gender, education degree, ethnicity, user channels and the like,
the user characteristic behaviors comprise a lease duration, a withdrawal behavior, a complaint behavior, a webpage browsing behavior and APP browsing information;
the cross feature unit 22 is respectively coupled to the lease-quitting sample acquisition module 1 and the model training module 3, and is configured to receive the lease-quitting sample set sent by the lease-quitting sample acquisition module 1, acquire cross feature data according to the lease-quitting sample set, and send the cross feature data to the model training module 3; the cross-profile data comprises user return visit information and user preference information, wherein,
the return visit information of the user comprises house source comprehensive scores and service quality scores,
the user preference information comprises commuting distance, price requirements and peripheral business circle requirements;
and the house source characteristic unit 23 is respectively coupled with the lease-quitting sample acquisition module 1 and the model training module 3, and is used for receiving the lease-quitting sample set sent by the lease-quitting sample acquisition module 1, acquiring house source characteristic data according to the lease-quitting sample set, and sending the house source characteristic data to the model training module 3. The house source characteristic data comprises business district characteristic information, cell characteristic information and house state characteristic information, wherein,
the business district characteristic information comprises business district ratings based on different cities, and the business district characteristics mainly comprise lease withdrawal probability information of different cities, different areas of the same city and lease withdrawal probability information of the business district;
the cell characteristic information comprises basic information, grade characteristics, price characteristics, traffic characteristics and commute time of the cell;
the house state characteristic information comprises basic attributes of a house source, dynamic attributes of the house source and suite attributes of the house source, wherein the basic attributes comprise basic information such as house source infrastructure conditions (whether an elevator exists or not, the number of floors) and house orientation; the dynamic attributes comprise rent level, signing time length, leasing type, vacancy rate and light busy season; the suite attributes comprise the number of suite rooms, the suite area and complaint information; .
And the model training module 3 is coupled with the characteristic obtaining module 2 and used for receiving the lease quitting data set sent by the characteristic module 2 and training the lease quitting data set according to the XGboost model to obtain a lease quitting prediction model for prediction.
The module mainly adopts an XGboost model, namely a boosting algorithm; because the XGboost is a lifting tree model, a plurality of tree models are integrated together to form a strong classifier, and the used tree model is a CART regression tree model; and (3) obtaining the user, the house source and the cross characteristics through the characteristic engineering processing of the characteristic module 2, inputting the characteristics into the model for training, and obtaining a final prediction lease quitting model.
And substituting the user data of the rented house into the prediction renting-out model to obtain a prediction result. The user lease withdrawal probability is predicted, different operation preferential activities can be carried out for users with different lease withdrawal probabilities, the users can be saved as far as possible, the vacant time of houses is reduced, and the loss of enterprises and the loss of the users caused by midway lease withdrawal of the users are reduced. Meanwhile, the service of the long-rental apartment can be optimized according to the predicted user rental withdrawal probability and the withdrawal reason thereof, the withdrawal caused by some human factors is avoided, and the user rental experience and the enterprise competitiveness are improved; and according to the predicted result, the consideration factors of a part of users are solved, and different solutions are provided for the users according to different lease withdrawal reasons, so that the user requirements are met. For example, when a user frequently complains, the complaint characteristics of the user are stronger because the complaint is more frequent, and the lease withdrawal probability of the user is higher through the model identification; according to the type and time interval of complaints of the user, the model can predict the reason of withdrawal of the user, and after the complaint is predicted by the model, the complaint is subsequently communicated with the user through customer service, and a corresponding solution is proposed to sooth the user, so that the loss of the user is avoided; for example, when the user frequently clicks to withdraw from the lease and browses other house source information; this feature is also a stronger feature; the model can identify the lease-quitting behavior of the user, and compares the position information of the house source browsed by the model with the position of the house source where the user is currently living, and judges that the user possibly needs to change houses due to other reasons such as work change and the like; and predicting the user intention through the model, subsequently communicating with the user through customer service, and proposing corresponding solutions such as lease change preference and the like to save the user.
Example 2:
fig. 2 is a flowchart illustrating an example of a lease quitting prediction method based on a long-lease apartment house renting scenario according to the present invention. As shown in fig. 1, an embodiment of the present invention provides a method for predicting a refund based on a long-rental apartment renting scenario, including the steps of:
step S101, collecting user lease quitting sample data, and constructing a lease quitting sample set through the user lease quitting sample data;
in the step S101, the user quitting sample data constructs a quitting sample set through a K-Means algorithm; calculating Euclidean distance of n-dimensional space between samples, wherein the coordinate of the sample is (x)i,yi) And (x)j,yj) The euclidean distance d is calculated according to the following method:
Figure BDA0002350392120000091
the nearest k samples of the same type (again positive samples or again negative samples) are then randomly selected from the k sample points, and new sample points are generated according to the following method:
Figure BDA0002350392120000092
xnewis newly generated quit sample data, xiIs a selected k neighbor, δ ∈ [0,1 ]]Is a random number; and the newly generated sample data of the quitting lease forms a sample set of the quitting lease.
The lease quitting sample set comprises positive lease quitting sample data and negative lease quitting sample data; the data of the positive quitting sample is the data of the quitted user, and the data of the negative quitting sample is the data before the quitted user quits.
For example, the contract contracted by the user a is from rent in 1/2018 to 1/2019, the real lease quitting time of the user a is lease quitting in 8/1/2018, the behavior of the user a in 7/1/2018-8/1/2018 is taken as a positive sample, and the behavior of the user a in 5/1/2018-6/1/2018 is taken as a negative sample; the proportion of positive and negative samples for training constructed by the final system is about 4.7 ten thousand: 15.8 ten thousand, equal to about 1: 3.3;
it can be understood that the lease quitting sample set is constructed through the K-Means algorithm, the problems that the proportion difference between lease quitting samples (positive samples) and lease quitting samples (negative samples) is large, the samples are unbalanced and the like can be solved, and the problems that the time for signing a contract of a long-lease apartment user is long, the behavior of the user is not high, the behavior is sparse and the like can be solved.
And step S102, obtaining characteristic data according to the lease quitting sample set, and preprocessing the characteristic data to obtain a lease quitting data set.
In step 102, preprocessing the feature data by using a Spark frame and a Hive data warehouse to obtain a leased data set; wherein the preprocessing includes selection, filtering, deduplication, sampling, transformation, data replacement, weighting, attribute generation, and data padding.
Further, preprocessing the feature data by using a Spark frame and a Hive data warehouse to obtain a leased withdrawal data set; the preprocessing comprises selection, filtering, deduplication, sampling, transformation, data replacement, weighting, attribute generation and data padding, and the main functions of the preprocessing comprise checking the consistency of data, invalid values of the data, missing values and the like.
It can be understood that feature data are preprocessed by using a Spark frame and a Hive data warehouse to obtain a lease quitting data set, and the obtained lease quitting data sets are all effective values, so that the problems that the product form is updated and iterated, long-term sample feature data and the like are inaccurate can be solved.
In step S102, the feature data includes user feature data, cross feature data, and house source feature data.
The user characteristic data comprises user basic information and user characteristic behaviors,
the basic information of the user comprises occupation, age, gender, education degree, nationality and a user channel, for example, occupation influence is caused, the occupation of a certain user is a student, instability can be judged, and the probability of mid-course refunding is high; the age is also a relatively important characteristic, the smaller the age is, the enthusiasm for work and life is higher, the change of a company is more frequent, and the general lease withdrawal probability is higher;
the user characteristic behaviors comprise a lease duration, a withdrawal behavior, a complaint behavior and a webpage browsing behavior, and the complaint behavior of the user also specifically comprises the total number of complaints, the processing duration and the like; the using scene of the renting house APP is a scene with strong demand and low frequency; the reason for the low frequency is that the user is unlikely to use more frequently as e-commerce, news-like apps, but is a strong demand scenario, and if the user's behavior changes frequently and variously over a certain period of time, it can be inferred that the user is generally interested in demand, possibly continuing renting, quitting, changing rents, complaints, etc.;
the cross-profile data comprises user return access information and user preference information,
the user return visit information comprises house source comprehensive scores and service quality scores, and the service quality scoring factors comprise cleaning, maintenance, house source quality and housekeeper responsible persons;
the user preference information comprises a commuting distance, a price requirement and a peripheral business circle requirement, and if the commuting time and the distance between a working place of a user and a house rented by the user are too long, the renting-off probability of the user is increased; if the difference between the preference price of the user and the current price of renting the house is larger, the renting quitting probability of the user is larger; the business district where the user resides and the preference business district of the user are far away, and the lease quitting probability of the user is large.
The house source characteristic data comprises business district characteristic information, cell characteristic information and house state characteristic information,
the business district characteristic information comprises business district ratings based on different cities, and the midway lease withdrawal probabilities of tenants are inconsistent in different cities; the chartered-back probabilities in different areas and business circles of the same city are inconsistent; the ratings of the business circles are different, and the renting probabilities of the cities are different;
the cell characteristic information comprises basic information of the cell, rating characteristics of the cell and the price of the cell, and the basic information of the cell comprises traffic information, such as the distance between the cell and the nearest subway station and bus station, the passing time, the room average price of the cell and the like;
the house state characteristic information comprises the basic attribute of the house source, the dynamic attribute of the house source and the suite attribute of the house source. The basic attributes of the house source comprise the floor where the room is located, the direction, whether an elevator exists or not and the total floor; the dynamic attributes comprise the contract signing time length of the house source, the type of renting, the vacancy rate, the rent and the weak and busy season; the properties of the house source suite include the number of rooms in the suite, the area of the suite, the number of complaints made to the suite, and the like.
The method comprises the steps that a large data processing platform such as spark and the like is applied, a large data storage technology such as hive is utilized, unified preprocessing is conducted on user data, house resources and cross feature data, and a chartered data set which can be used for model training is obtained; the preprocessing mainly comprises the steps of checking the consistency of data, invalid values of the data, missing values and the like. The main basic operations include selection, filtering, de-weighting, sampling, transformation (normalization, scaling), data substitution (clipping, segmentation, merging), weighting (attribute weighting, automatic optimization), attribute generation, data padding, etc.
And step S103, training the leased withdrawal data set according to the XGboost model to obtain a leased withdrawal prediction model for prediction.
The XGboost model is a supervision model and is a boosting algorithm; the idea of Boosting algorithm is to integrate many weak classifiers together to form a strong classifier; because the XGboost is a lifting tree model, a plurality of tree models are integrated together to form a strong classifier; the used tree model is a CART regression tree model; the idea of the XGboost algorithm is to continuously add trees, continuously perform feature splitting to grow a tree, add a tree each time, learn a new function, and fit the residual error predicted last time; when the training is completed to obtain k trees, the score of a sample is to be predicted, namely, according to the characteristics of the sample, a corresponding leaf node is fallen in each tree, each leaf node corresponds to a score, and finally, the predicted value of the sample is obtained by only adding the scores corresponding to each tree.
And obtaining the lease quitting data sets through the characteristic engineering, inputting the lease quitting data sets into the model for training, and obtaining the final prediction lease quitting model.
The XGboost relates to the algorithm that a node instance set and sample characteristics are input, a base learner is generated in an iterative mode according to a first derivative gi and a second derivative hi of a lossy function, the base learner is added and updated, then all segmented gradients and GL and GR are scanned from left to right and enumerated, and then the score of each segmentation scheme is calculated by using the following formula to be output.
Input:I,instance set of current node
Input:d,feature dimension
gain←0
for k=1 to m do
GL←0,HL←0
for j in sorted(I,by xjk)do
GL←GL+gj,HL←HL+hj
GR←G-GL,HL←H-HL
end
end
Output:Split with max score
It is understood that xgboost (extreme gradientboosting) is an ensemble learning framework that performs iterative computations through a set of weak classifiers to achieve accurate classification results. The XGboost can automatically utilize the multithreading parallel computation of the CPU, a regular term is added in the realization of the algorithm, the model operation efficiency is greatly improved, the generalization capability of the model is improved, and the XGboost has obvious advantages in distributed application. Compared with the traditional GBDT (GradientBoostDesition Tree), the XGboost algorithm uses Taylor second-order information, and a parallel/multi-core computing mode is adopted in the implementation, so that the training convergence is quicker and the accuracy is higher. At present, XGboost is widely applied to aspects of store sales prediction, event classification, customer behavior prediction and the like.
According to the system and the method for predicting the rent withdrawal based on the house-renting scene of the long-renting apartment, provided by the invention, the rent withdrawal prediction model can be obtained through training, the probability of midway rent withdrawal of the long-renter in the field of the long-renting apartment can be predicted through the rent withdrawal prediction model, the rent withdrawal probability of the user is predicted, different operation preferential activities can be carried out for the users with different rent withdrawal probabilities, the user can be retrieved as far as possible, the vacant time of the house is reduced, and the loss of an enterprise and the loss of the user caused by midway rent withdrawal of the user are reduced. Meanwhile, the service of the long-rental apartment can be optimized according to the predicted user rental withdrawal probability and the withdrawal reason thereof, the withdrawal caused by some human factors is avoided, and the user rental experience and the enterprise competitiveness are improved; and according to the predicted result, the consideration factors of a part of users are solved, and different solutions are provided for the users according to different lease withdrawal reasons, so that the user requirements are met.
According to the embodiments, the application has the following beneficial effects:
1. according to the system and the method for predicting the rent withdrawal based on the house-renting scene of the long-renting apartment, provided by the invention, the rent withdrawal prediction model can be obtained through training, the probability of midway rent withdrawal of the long-renter in the field of the long-renting apartment can be predicted through the rent withdrawal prediction model, the rent withdrawal probability of the user is predicted, different operation preferential activities can be carried out for the users with different rent withdrawal probabilities, the user can be retrieved as far as possible, the vacant time of the house is reduced, and the loss of an enterprise and the loss of the user caused by midway rent withdrawal of the user are reduced. Meanwhile, the service of the long-rental apartment can be optimized according to the predicted user rental withdrawal probability and the withdrawal reason thereof, the withdrawal caused by some human factors is avoided, and the user rental experience and the enterprise competitiveness are improved; and according to the predicted result, the consideration factors of a part of users are solved, and different solutions are provided for the users according to different lease withdrawal reasons, so that the user requirements are met.
2. According to the system and the method for predicting the leased refund based on the house renting scene of the long-rented apartment, the leased refund sample set is constructed through the K-Means algorithm, the problems that the proportion difference between a sample (positive sample) for the leased refund and a sample (negative sample) for the non-leased refund is large, the samples are unbalanced and the like can be solved, and the problems that the time for signing a contract of a user of the long-rented apartment is long, the behavior of the user is not high, the behavior is sparse and the like can be solved.
3. According to the system and the method for predicting the leased refund based on the house renting scene of the long-rented apartment, provided by the invention, the feature data are preprocessed by utilizing the Spark framework and the Hive data warehouse to obtain the leased refund data set, the obtained leased refund data set is all effective values, and the problems that the product form is updated and iterated, the feature data of a long sample and the like are inaccurate and the like can be solved.
While the present invention has been described in detail with reference to the drawings and examples, it is to be understood that the foregoing examples are for illustrative purposes only and are not intended to limit the scope of the present invention. Although the present invention has been described in detail with reference to the foregoing embodiments, it will be apparent to those skilled in the art that various changes may be made and equivalents may be substituted for elements thereof. Any modification, equivalent replacement, or improvement made within the spirit and principle of the present invention should be included in the protection scope of the present invention. The scope of the invention is defined by the appended claims.

Claims (10)

1. A refund prediction system based on a house renting scene of a long renting apartment is characterized by comprising a refund sample acquisition module, a feature module and a model training module;
the lease quitting sample acquisition module is coupled with the characteristic module and used for acquiring lease quitting sample data of a user, constructing a lease quitting sample set through the lease quitting sample data of the user and sending the lease quitting sample set to the characteristic module; the lease quitting sample set comprises positive lease quitting sample data and negative lease quitting sample data, the positive lease quitting sample data is leased quitted user data, and the negative lease quitting sample data is data before the lease quitting of the leased quitted user;
the characteristic module is respectively coupled with the lease quitting sample acquisition module and the model training module and is used for receiving the lease quitting sample set sent by the lease quitting sample acquisition module, acquiring characteristic data according to the lease quitting sample set, preprocessing the characteristic data to obtain a lease quitting data set and sending the lease quitting data set to the model training module;
the model training module is coupled with the feature obtaining module and used for receiving the lease quitting data set sent by the feature module and training the lease quitting data set according to the XGboost model to obtain a lease quitting prediction model for prediction.
2. The system according to claim 1, wherein the sample data of the user lease quitting is constructed by a K-Means algorithm to form the sample set of the lease quitting.
3. The chargeback prediction system based on the long-rental apartment house-renting scene is characterized in that the feature module comprises a user feature unit, a cross feature unit and a house source feature unit, and the feature data comprises user feature data, cross feature data and the house source feature data;
the user characteristic unit is respectively coupled with the lease quitting sample acquisition module and the model training module, and is used for receiving the lease quitting sample set sent by the lease quitting sample acquisition module, acquiring the user characteristic data according to the lease quitting sample set, and sending the user characteristic data to the model training module;
the cross feature unit is respectively coupled with the lease-quitting sample acquisition module and the model training module, and is used for receiving the lease-quitting sample set sent by the lease-quitting sample acquisition module, acquiring the cross feature data according to the lease-quitting sample set, and sending the cross feature data to the model training module;
the house source characteristic unit is respectively coupled with the lease-quitting sample acquisition module and the model training module, and is used for receiving the lease-quitting sample set sent by the lease-quitting sample acquisition module, acquiring the house source characteristic data according to the lease-quitting sample set, and sending the house source characteristic data to the model training module.
4. The chargeback prediction system based on the long-rent apartment house renting scene according to claim 3, wherein the user characteristic data comprises user basic information and user characteristic behaviors, wherein the user basic information comprises occupation, age, sex, education degree, nationality and user channel, and the user characteristic behaviors comprise renting duration, chargeback behavior, complaint behavior and webpage browsing behavior;
the cross feature data comprises user return visit information and user preference information, wherein the user return visit information comprises house source comprehensive scores and service quality scores, and the user preference information comprises commuting distance, price requirements and peripheral business circle requirements;
the house source characteristic data comprises business district characteristic information, cell characteristic information and house state characteristic information, wherein the business district characteristic information comprises business district ratings based on different cities, the cell characteristic information comprises basic information, grade characteristics, price characteristics, traffic characteristics and commuting time of a cell, and the house state characteristic information comprises basic attributes of house sources, dynamic attributes of the house sources and suite attributes of the house sources.
5. The system for predicting refund lease based on the long-lease apartment house renting scene as claimed in claim 1, wherein a refund data set is obtained by preprocessing the feature data by using a Spark frame and a Hive data warehouse;
wherein the preprocessing comprises selection, filtering, deduplication, sampling, transformation, data replacement, weighting, attribute generation, and data padding.
6. A method for predicting refund based on a house renting scene of a long rented apartment is characterized by comprising the following steps:
acquiring user quitting sample data, and constructing a quitting sample set through the user quitting sample data; the lease quitting sample set comprises positive lease quitting sample data and negative lease quitting sample data, the positive lease quitting sample data is leased quitted user data, and the negative lease quitting sample data is data before the lease quitting of the leased quitted user;
acquiring feature data according to the lease quitting sample set, preprocessing the feature data to obtain a lease quitting data set,
and training the leased refunding data set according to the XGboost model to obtain a leased refunding prediction model for prediction.
7. The lease-quitting prediction method based on the long-lease apartment house-renting scene according to claim 1, wherein the lease-quitting sample data of the user constructs the lease-quitting sample set through a K-Means algorithm.
8. The chargeback prediction method based on the long-rental apartment house renting scene as claimed in claim 1, wherein the feature data comprises user feature data, cross feature data and the house source feature data.
9. The chargeback prediction method based on the long-rent apartment house renting scene according to claim 8, wherein the user characteristic data comprises user basic information and user characteristic behaviors, wherein the user basic information comprises occupation, age, sex, education degree, nationality and user channel, and the user characteristic behaviors comprise renting duration, chargeback behavior, complaint behavior and webpage browsing behavior;
the cross feature data comprises user return visit information and user preference information, wherein the user return visit information comprises house source comprehensive scores and service quality scores, and the user preference information comprises commuting distance, price requirements and peripheral business circle requirements;
the house source characteristic data comprises business district characteristic information, cell characteristic information and house state characteristic information, wherein the business district characteristic information comprises business district ratings based on different cities, the cell characteristic information comprises basic information, grade characteristics, price characteristics, traffic characteristics and commuting time of a cell, and the house state characteristic information comprises basic attributes of house sources, dynamic attributes of the house sources and suite attributes of the house sources.
10. The leasehold refund prediction method based on the long-lease apartment house renting scene as claimed in claim 1, wherein a leasehold refund data set is obtained by preprocessing the feature data by using a Spark frame and a Hive data warehouse;
wherein the preprocessing comprises selection, filtering, deduplication, sampling, transformation, data replacement, weighting, attribute generation, and data padding.
CN201911412712.3A 2019-12-31 2019-12-31 Long-rental apartment house renting scene-based refund prediction system and method Withdrawn CN111126714A (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN201911412712.3A CN111126714A (en) 2019-12-31 2019-12-31 Long-rental apartment house renting scene-based refund prediction system and method

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN201911412712.3A CN111126714A (en) 2019-12-31 2019-12-31 Long-rental apartment house renting scene-based refund prediction system and method

Publications (1)

Publication Number Publication Date
CN111126714A true CN111126714A (en) 2020-05-08

Family

ID=70506454

Family Applications (1)

Application Number Title Priority Date Filing Date
CN201911412712.3A Withdrawn CN111126714A (en) 2019-12-31 2019-12-31 Long-rental apartment house renting scene-based refund prediction system and method

Country Status (1)

Country Link
CN (1) CN111126714A (en)

Cited By (2)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN112365031A (en) * 2020-10-22 2021-02-12 汪永强 House renting system and method based on block chain
CN115186906A (en) * 2022-07-15 2022-10-14 中远海运科技股份有限公司 Intelligent prediction method and platform for returning containers for container operational leasing

Citations (4)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US20140257924A1 (en) * 2013-03-08 2014-09-11 Corelogic Solutions, Llc Automated rental amount modeling and prediction
CN107578277A (en) * 2017-08-24 2018-01-12 国网浙江省电力公司电力科学研究院 Rental housing client's localization method for power marketing
CN108734327A (en) * 2017-04-20 2018-11-02 腾讯科技(深圳)有限公司 A kind of data processing method, device and server
CN109389247A (en) * 2018-09-27 2019-02-26 智庭(北京)智能科技有限公司 A kind of region house rent prediction technique based on big data

Patent Citations (4)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US20140257924A1 (en) * 2013-03-08 2014-09-11 Corelogic Solutions, Llc Automated rental amount modeling and prediction
CN108734327A (en) * 2017-04-20 2018-11-02 腾讯科技(深圳)有限公司 A kind of data processing method, device and server
CN107578277A (en) * 2017-08-24 2018-01-12 国网浙江省电力公司电力科学研究院 Rental housing client's localization method for power marketing
CN109389247A (en) * 2018-09-27 2019-02-26 智庭(北京)智能科技有限公司 A kind of region house rent prediction technique based on big data

Non-Patent Citations (1)

* Cited by examiner, † Cited by third party
Title
陈兴达: "长租公寓租户退租原因分类模型的构建" *

Cited By (3)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN112365031A (en) * 2020-10-22 2021-02-12 汪永强 House renting system and method based on block chain
CN115186906A (en) * 2022-07-15 2022-10-14 中远海运科技股份有限公司 Intelligent prediction method and platform for returning containers for container operational leasing
CN115186906B (en) * 2022-07-15 2024-01-19 中远海运科技股份有限公司 Intelligent prediction method and platform for container business lease returns

Similar Documents

Publication Publication Date Title
US11748379B1 (en) Systems and methods for generating and implementing knowledge graphs for knowledge representation and analysis
US11531867B2 (en) User behavior prediction method and apparatus, and behavior prediction model training method and apparatus
CN108280670B (en) Seed crowd diffusion method and device and information delivery system
US20240078618A1 (en) Predictive segmentation of energy customers
CN111723292B (en) Recommendation method, system, electronic equipment and storage medium based on graph neural network
WO2020168851A1 (en) Behavior recognition
CN107391582A (en) The information recommendation method of user preference similarity is calculated based on context ontology tree
CN111126714A (en) Long-rental apartment house renting scene-based refund prediction system and method
CN111429161B (en) Feature extraction method, feature extraction device, storage medium and electronic equipment
CN111898247B (en) Landslide displacement prediction method, landslide displacement prediction equipment and storage medium
Liu et al. A Bayesian approach to residential property valuation based on built environment and house characteristics
CN111191825A (en) User default prediction method and device and electronic equipment
CN110555713A (en) method and device for determining sales prediction model
US20230342606A1 (en) Training method and apparatus for graph neural network
CN112559877A (en) CTR (China railway) estimation method and system based on cross-platform heterogeneous data and behavior context
JP2021177284A (en) Method, program, and device for estimating abnormality/change using multiple pieces of submission time series data
CN116862658A (en) Credit evaluation method, apparatus, electronic device, medium and program product
WO2023185125A1 (en) Product resource data processing method and apparatus, electronic device and storage medium
CN115858815A (en) Method for determining mapping information, advertisement recommendation method, device, equipment and medium
US10853820B2 (en) Method and apparatus for recommending topic-cohesive and interactive implicit communities in social customer relationship management
CN111126629B (en) Model generation method, brush list identification method, system, equipment and medium
CN112288482A (en) Virtual resource pool construction method, system, equipment and storage medium
CN111241483B (en) Resource value evaluation processing method based on cloud platform and related products
CN113837486B (en) RNN-RBM-based distribution network feeder long-term load prediction method
CN106933901B (en) Data integration method and system

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination
WW01 Invention patent application withdrawn after publication
WW01 Invention patent application withdrawn after publication

Application publication date: 20200508