CN110956296A - User loss probability prediction method and device - Google Patents

User loss probability prediction method and device Download PDF

Info

Publication number
CN110956296A
CN110956296A CN201811125784.5A CN201811125784A CN110956296A CN 110956296 A CN110956296 A CN 110956296A CN 201811125784 A CN201811125784 A CN 201811125784A CN 110956296 A CN110956296 A CN 110956296A
Authority
CN
China
Prior art keywords
order
user
historical
orders
time
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Pending
Application number
CN201811125784.5A
Other languages
Chinese (zh)
Inventor
孙华美
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Beijing Didi Infinity Technology and Development Co Ltd
Original Assignee
Beijing Didi Infinity Technology and Development Co Ltd
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Beijing Didi Infinity Technology and Development Co Ltd filed Critical Beijing Didi Infinity Technology and Development Co Ltd
Priority to CN201811125784.5A priority Critical patent/CN110956296A/en
Publication of CN110956296A publication Critical patent/CN110956296A/en
Pending legal-status Critical Current

Links

Images

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06QINFORMATION AND COMMUNICATION TECHNOLOGY [ICT] SPECIALLY ADAPTED FOR ADMINISTRATIVE, COMMERCIAL, FINANCIAL, MANAGERIAL OR SUPERVISORY PURPOSES; SYSTEMS OR METHODS SPECIALLY ADAPTED FOR ADMINISTRATIVE, COMMERCIAL, FINANCIAL, MANAGERIAL OR SUPERVISORY PURPOSES, NOT OTHERWISE PROVIDED FOR
    • G06Q10/00Administration; Management
    • G06Q10/04Forecasting or optimisation specially adapted for administrative or management purposes, e.g. linear programming or "cutting stock problem"
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06QINFORMATION AND COMMUNICATION TECHNOLOGY [ICT] SPECIALLY ADAPTED FOR ADMINISTRATIVE, COMMERCIAL, FINANCIAL, MANAGERIAL OR SUPERVISORY PURPOSES; SYSTEMS OR METHODS SPECIALLY ADAPTED FOR ADMINISTRATIVE, COMMERCIAL, FINANCIAL, MANAGERIAL OR SUPERVISORY PURPOSES, NOT OTHERWISE PROVIDED FOR
    • G06Q30/00Commerce
    • G06Q30/02Marketing; Price estimation or determination; Fundraising
    • G06Q30/0201Market modelling; Market analysis; Collecting market data
    • G06Q30/0202Market predictions or forecasting for commercial activities

Abstract

The application provides a user loss probability prediction method and a user loss probability prediction device, wherein the method comprises the following steps: acquiring historical order characteristic information of sample users in different area ranges in a first historical time period and loss result information of the sample users in a second historical time period; training user loss probability prediction models respectively corresponding to each area range based on the acquired historical order characteristic information and loss result information; and predicting the loss probability of the user to be predicted in a third time period in the future based on a user loss probability prediction model corresponding to the area range of the user to be predicted, so as to obtain the loss probability of the user to be predicted in the third time period. According to the embodiment of the application, the difference of users in different area ranges is considered, the user loss probability prediction model is trained for each area range, the user loss probability prediction model for different area ranges is used, the loss probability prediction is carried out for the users in different area ranges, and the accuracy is higher.

Description

User loss probability prediction method and device
Technical Field
The application relates to the technical field of data analysis, in particular to a user loss probability prediction method and device.
Background
The active users refer to users who can visit from time to time and bring value; an away user is a user who has been visited but is gradually removed for various reasons. The monthly active user quantity is a user quantity statistical term and is an important index for measuring user stickiness and evaluating enterprise operation effect. In order to ensure the stickiness of users, the current online appointment platform usually predicts the loss probability of the users in a future period of time and carries out a retaining activity before the users are lost according to the loss probability of the users.
The current method for predicting the loss probability of the vehicle booking users in the network is generally realized by the specific situation that the users book the vehicles through a vehicle booking platform within a certain current time period, and has the problem of low accuracy.
Disclosure of Invention
In view of this, an object of the present invention is to provide a method and an apparatus for predicting a user churn probability, which can predict a churn probability of a user more accurately.
In a first aspect, an embodiment of the present application provides a user churn probability prediction method, including:
acquiring historical order feature information of sample users in different area ranges in a first historical time period and loss result information of the sample users in a second historical time period;
training user loss probability prediction models respectively corresponding to each area range based on the acquired historical order characteristic information and the loss result information;
and predicting the loss probability of the user to be predicted in a third time period in the future based on the user loss probability prediction model corresponding to the area range of the user to be predicted, so as to obtain the loss probability of the user to be predicted in the third time period.
With reference to the first aspect, an embodiment of the present application provides a first possible implementation manner of the first aspect, where: the training of the user loss probability prediction model corresponding to each area range based on the acquired historical order characteristic information and the loss result information specifically includes:
determining a prediction model on which a user churn probability prediction is based;
and taking the historical order characteristic information as the value of an explanation variable of the prediction model, taking the loss result information as the value of an explained variable of the prediction model, and training user loss probability prediction models respectively corresponding to each region range.
With reference to the first possible implementation manner of the first aspect, an embodiment of the present application provides a second possible implementation manner of the first aspect, where: the prediction model adopts any one of a logistic regression model, an autoregressive model, a moving average model, an autoregressive moving average model, an integrated moving average autoregressive model, a generalized autoregressive conditional variance model, a deep learning model and a decision tree model.
With reference to the first possible implementation manner of the first aspect, an embodiment of the present application provides a third possible implementation manner of the first aspect, where: the user loss probability prediction model to be trained comprises the following steps: a weight coefficient to be trained and an additional coefficient corresponding to each of the interpretation variables.
The training of the user loss probability prediction model corresponding to each region range by using the historical order characteristic information as the value of the interpretation variable of the prediction model and the loss result information as the value of the interpreted variable of the prediction model specifically comprises:
and taking the historical order characteristic information of the plurality of sample users as values of explanatory variables, taking the loss result information corresponding to each sample user as a value of an explained variable, and calculating a weight coefficient of each explanatory variable in the user loss probability prediction model and an additional coefficient of the user loss probability prediction model to obtain the trained user loss probability prediction model.
With reference to the first aspect, an embodiment of the present application provides a fourth possible implementation manner of the first aspect, where: training user loss probability prediction models respectively corresponding to each area range based on the acquired historical order characteristic information and the loss result information, wherein the user loss probability prediction models comprise:
determining the historical order number corresponding to each area range in a plurality of area ranges;
classifying the plurality of area ranges according to the historical order number corresponding to each area range;
and training a user loss probability prediction model corresponding to each region range in each classification based on the historical order characteristic information and the loss result information in the region range included by each classification.
With reference to the fourth possible implementation manner of the first aspect, an embodiment of the present application provides a fifth possible implementation manner of the first aspect, where: classifying the plurality of area ranges according to the historical order number corresponding to each area range, specifically comprising:
and dividing the area range of which the historical order quantity falls into the same quantity interval into a classification corresponding to the preset quantity interval.
With reference to the first aspect, an embodiment of the present application provides a sixth possible implementation manner of the first aspect, where: predicting the loss probability of the user to be predicted in a third time period in the future based on the trained user loss probability prediction model to obtain the loss probability of the user to be predicted in the third time period, wherein the predicting comprises the following steps:
determining the region range to which the user to be predicted belongs;
and predicting the loss probability of the user to be predicted in the third time period based on a user loss probability prediction model corresponding to the determined area range and historical order characteristic information of the user to be predicted in a fourth historical time period.
With reference to the first aspect, an embodiment of the present application provides a seventh possible implementation manner of the first aspect, where: the historical order characteristic information comprises: at least one of regional characteristic information, order behavior characteristics, price factor characteristics, experience factor characteristics and basic attribute characteristics;
wherein the order behavior characteristics include: at least one of an appointment frequency, a loss multiple, a periodic variation of an appointment number, an accumulated order number, a conversion rate between bubbling and invoicing, a bubbling number, a total work day order amount, a total weekend order amount, an appointment peak order amount, and an appointment peak balance order amount;
the price factor characteristics include: at least one of average amount payable, real payment coefficient of variation, average subsidy rate, subsidy order proportion, dynamic adjustment order proportion, conversion rate between dynamic adjustment price and dynamic adjustment issue order, and average dynamic adjustment price;
the experience factor characteristics include: at least one of the number of orders with the driver score smaller than a first preset threshold value, the number of orders with the user score smaller than a second preset threshold value, the actual average waiting time, the estimated average star rating of the user, the estimated average star rating of the driver, the number of complaints of the user and the number of complaints of the driver;
the basic attribute features include: user registration time;
the region feature information includes: at least one of climate information and traffic condition information.
With reference to the seventh possible implementation manner of the first aspect, an embodiment of the present application provides an eighth possible implementation manner of the first aspect, where: aiming at the condition that the historical order characteristic information comprises order behavior characteristics and the order behavior characteristics comprise car booking frequency, the following steps are adopted to obtain the car booking frequency:
for each sample user, obtaining order time of all historical orders of the sample user in a first historical time period; the order time is order receiving time or order completion time;
calculating the average interval time of adjacent historical orders based on the order time corresponding to all historical orders;
determining the car booking frequency based on the average interval time.
With reference to the seventh possible implementation manner of the first aspect, an embodiment of the present application provides a ninth possible implementation manner of the first aspect, where: aiming at the condition that the historical order characteristic information comprises order behavior characteristics and the order behavior characteristics comprise loss multiples, the loss multiples are obtained by adopting the following steps:
for each sample user, obtaining the latest order time of all historical orders of the sample user in a first historical time period; the latest order time is the latest order receiving time or the latest order completion time;
calculating the silent time length between the latest order time and the starting time of a second historical time period;
and taking the ratio of the silent time length to the car-booking frequency as the loss multiple.
With reference to the seventh possible implementation manner of the first aspect, an embodiment of the present application provides a tenth possible implementation manner of the first aspect, where: aiming at the condition that the historical order characteristic information comprises order behavior characteristics and the order behavior characteristics comprise bubbling times, the following steps are adopted to obtain the bubbling times:
and when the destination sent by the client is received, taking the behavior of the destination sent by the client as bubbling behavior, and recording or updating the bubbling times.
With reference to the seventh possible implementation manner of the first aspect, this application example provides an eleventh possible implementation manner of the first aspect, where: aiming at the condition that the historical order characteristic information comprises order behavior characteristics and the order behavior characteristics comprise conversion rate between bubbling and invoicing, the following steps are adopted to obtain the conversion rate between bubbling and invoicing:
when a destination sent by a client is received, taking the behavior of the destination sent by the client as a bubbling behavior, and recording the bubbling times;
when an order sent by a client is received, taking the action of sending the order by the client as an order sending action, and recording the order sending times;
and taking the ratio of the bubbling times to the billing times as the conversion rate between the bubbling and the billing.
With reference to the seventh possible implementation manner of the first aspect, an embodiment of the present application provides a twelfth possible implementation manner of the first aspect, where: aiming at the condition that the historical order characteristic information comprises order behavior characteristics and the order behavior characteristics comprise the taxi appointment peak order total amount, the taxi appointment peak order total amount is obtained by adopting the following steps:
when receiving an order sent by a client, recording the time when the client sends the order;
taking a historical order in which the time when a client sends the order falls in a preset peak time period as an appointment peak order, and counting the number of the appointment peak orders to obtain the total number of the appointment peak orders;
aiming at the condition that the order characteristic information comprises order behavior characteristics and the order behavior characteristics comprise the car appointment flat peak order total amount, the following steps are adopted to obtain the car appointment flat peak order total amount:
when receiving an order sent by a client, recording the time when the client sends the order;
taking the historical orders of which the time for sending the orders by the client falls within a preset peak-settling time period as car-booking peak-settling orders, and counting the number of the car-booking peak-settling orders to obtain the total number of the car-booking peak-settling orders.
With reference to the seventh possible implementation manner of the first aspect, an embodiment of the present application provides a thirteenth possible implementation manner of the first aspect, where: aiming at the condition that the historical order characteristic information comprises price factor characteristics and the price factor characteristics comprise real payment variation coefficients, the real payment variation coefficients are obtained by adopting the following steps:
for each sample user, obtaining order information of all completed historical orders of the sample user in a first historical time period; the order information includes: a payment amount;
calculating the real payment standard deviation and the average real payment amount according to the real payment amounts of all historical orders;
and taking the ratio of the real payment standard deviation to the average real payment amount as the real payment variation coefficient.
With reference to the seventh possible implementation manner of the first aspect, an embodiment of the present application provides a fourteenth possible implementation manner of the first aspect, where: aiming at the condition that the historical order characteristic information comprises price factor characteristics and the price factor characteristics comprise average subsidy rates, the average subsidy rates are obtained by adopting the following steps:
for each sample user, obtaining order information of all historical orders of the sample user in a first historical time period; the order information includes: an amount due and an amount due;
calculating the average amount due according to the amount due of all historical orders, and calculating the average real payment amount according to the real payment amounts of all historical orders;
and taking the ratio of the difference between the average amount due and the actual average amount due to the actual payment to the average amount due to the actual payment as the average subsidy rate.
In a second aspect, an embodiment of the present application further provides a user churn probability prediction apparatus, including:
the acquisition module is used for acquiring historical order characteristic information of sample users in different area ranges in a first historical time period and loss result information of the sample users in a second historical time period;
the training module is used for training user loss probability prediction models respectively corresponding to each area range based on the acquired historical order characteristic information and the loss result information;
and the prediction module is used for predicting the loss probability of the user to be predicted in a third time period in the future based on the user loss probability prediction model corresponding to the area range of the user to be predicted, so as to obtain the loss probability of the user to be predicted in the third time period.
In combination with the second aspect, the present embodiments provide a first possible implementation manner of the second aspect, where: the training module is specifically configured to train user churn probability prediction models respectively corresponding to each area range based on the acquired historical order feature information and the churn result information by the following steps:
determining a prediction model on which a user churn probability prediction is based;
and taking the historical order characteristic information as the value of an explanation variable of the prediction model, taking the loss result information as the value of an explained variable of the prediction model, and training user loss probability prediction models respectively corresponding to each region range.
In combination with the first possible implementation manner of the second aspect, the present embodiments provide a second possible implementation manner of the second aspect, where: the prediction model adopts any one of a logistic regression model, an autoregressive model, a moving average model, an autoregressive moving average model, an integrated moving average autoregressive model, a generalized autoregressive conditional variance model, a deep learning model and a decision tree model.
In combination with the first possible implementation manner of the second aspect, the present application provides a third possible implementation manner of the second aspect, where: the user loss probability prediction model to be trained comprises the following steps: a weight coefficient to be trained and an additional coefficient corresponding to each of the interpretation variables.
The training module is specifically configured to train user churn probability prediction models respectively corresponding to each region range by using the historical order feature information as values of explanatory variables of the prediction models and using the churn result information as values of explained variables of the prediction models, through the following steps:
and taking the historical order characteristic information of the plurality of sample users as values of explanatory variables, taking the loss result information corresponding to each sample user as a value of an explained variable, and calculating a weight coefficient of each explanatory variable in the user loss probability prediction model and an additional coefficient of the user loss probability prediction model to obtain the trained user loss probability prediction model.
In combination with the second aspect, the present embodiments provide a fourth possible implementation manner of the second aspect, where: the training module is specifically configured to train user churn probability prediction models respectively corresponding to each area range based on the acquired historical order feature information and the churn result information:
determining the historical order number corresponding to each area range in a plurality of area ranges;
classifying the plurality of area ranges according to the historical order number corresponding to each area range;
and training a user loss probability prediction model corresponding to each region range in each classification based on the historical order characteristic information and the loss result information in the region range included by each classification.
In combination with the fourth possible implementation manner of the second aspect, the present application provides a fifth possible implementation manner of the second aspect, where: the training module is specifically configured to classify the plurality of area ranges according to the historical order number corresponding to each of the area ranges by:
and dividing the area range of which the historical order quantity falls into the same quantity interval into a classification corresponding to the preset quantity interval.
In combination with the second aspect, embodiments of the present application provide a sixth possible implementation manner of the second aspect, where: the prediction module is specifically configured to: determining the region range to which the user to be predicted belongs;
and predicting the loss probability of the user to be predicted in the third time period based on a user loss probability prediction model corresponding to the determined area range and historical order characteristic information of the user to be predicted in a fourth historical time period.
In combination with the second aspect, embodiments of the present application provide a seventh possible implementation manner of the second aspect, where: the historical order characteristic information comprises: at least one of regional characteristic information, order behavior characteristics, price factor characteristics, experience factor characteristics and basic attribute characteristics;
wherein the order behavior characteristics include: at least one of an appointment frequency, a loss multiple, a periodic variation of an appointment number, an accumulated order number, a conversion rate between bubbling and invoicing, a bubbling number, a total work day order amount, a total weekend order amount, an appointment peak order amount, and an appointment peak balance order amount;
the price factor characteristics include: at least one of average amount payable, real payment coefficient of variation, average subsidy rate, subsidy order proportion, dynamic adjustment order proportion, conversion rate between dynamic adjustment price and dynamic adjustment issue order, and average dynamic adjustment price;
the experience factor characteristics include: at least one of the number of orders with the driver score smaller than a first preset threshold value, the number of orders with the user score smaller than a second preset threshold value, the actual average waiting time, the estimated average star rating of the user, the estimated average star rating of the driver, the number of complaints of the user and the number of complaints of the driver;
the basic attribute features include: user registration time;
the region feature information includes: at least one of climate information and traffic condition information.
In combination with the seventh possible implementation manner of the second aspect, the present application provides an eighth possible implementation manner of the second aspect, where:
for a case that the historical order characteristic information includes an order behavior characteristic and the order behavior characteristic includes a car booking frequency, the obtaining module is specifically configured to obtain the car booking frequency by using the following steps:
for each sample user, obtaining order time of all historical orders of the sample user in a first historical time period; the order time is order receiving time or order completion time;
calculating the average interval time of adjacent historical orders based on the order time corresponding to all historical orders;
determining the car booking frequency based on the average interval time.
In combination with the seventh possible implementation manner of the second aspect, the present application provides a ninth possible implementation manner of the second aspect, where: the obtaining module is specifically configured to obtain the loss multiple by adopting the following steps, in response to a situation that the historical order characteristic information includes an order behavior characteristic and the order behavior characteristic includes a loss multiple:
for each sample user, obtaining the latest order time of all historical orders of the sample user in a first historical time period; the latest order time is the latest order receiving time or the latest order completion time;
calculating the silent time length between the latest order time and the starting time of a second historical time period;
and taking the ratio of the silent time length to the car-booking frequency as the loss multiple.
In combination with the seventh possible implementation manner of the second aspect, the present application provides a tenth possible implementation manner of the second aspect, where: for a case that the historical order characteristic information includes an order behavior characteristic and the order behavior characteristic includes a bubbling number, the obtaining module is specifically configured to obtain the bubbling number by:
and when the destination sent by the client is received, taking the behavior of the destination sent by the client as bubbling behavior, and recording or updating the bubbling times.
In combination with the seventh possible implementation manner of the second aspect, the present application provides an eleventh possible implementation manner of the second aspect, where: for a case that the historical order characteristic information includes an order behavior characteristic, and the order behavior characteristic includes a conversion rate between bubbling and billing, the obtaining module is specifically configured to obtain the conversion rate between bubbling and billing by adopting the following steps:
when a destination sent by a client is received, taking the behavior of the destination sent by the client as a bubbling behavior, and recording the bubbling times;
when an order sent by a client is received, taking the action of sending the order by the client as an order sending action, and recording the order sending times;
and taking the ratio of the bubbling times to the billing times as the conversion rate between the bubbling and the billing.
In combination with the seventh possible implementation manner of the second aspect, the present application provides a twelfth possible implementation manner of the second aspect, where: the obtaining module is specifically configured to obtain the taxi appointment peak order total amount by using the following steps, in response to a situation that the historical order characteristic information includes an order behavior characteristic and the order behavior characteristic includes taxi appointment peak order total amount:
when receiving an order sent by a client, recording the time when the client sends the order;
taking a historical order in which the time when a client sends the order falls in a preset peak time period as an appointment peak order, and counting the number of the appointment peak orders to obtain the total number of the appointment peak orders;
aiming at the condition that the order characteristic information comprises order behavior characteristics and the order behavior characteristics comprise the car appointment flat peak order total amount, the following steps are adopted to obtain the car appointment flat peak order total amount:
when receiving an order sent by a client, recording the time when the client sends the order;
taking the historical orders of which the time for sending the orders by the client falls within a preset peak-settling time period as car-booking peak-settling orders, and counting the number of the car-booking peak-settling orders to obtain the total number of the car-booking peak-settling orders.
In combination with the seventh possible implementation manner of the second aspect, the present application provides a thirteenth possible implementation manner of the second aspect, where: the obtaining module is specifically configured to obtain the real payment coefficient of variation by using the following steps:
for each sample user, obtaining order information of all completed historical orders of the sample user in a first historical time period; the order information includes: a payment amount;
calculating the real payment standard deviation and the average real payment amount according to the real payment amounts of all historical orders;
and taking the ratio of the real payment standard deviation to the average real payment amount as the real payment variation coefficient.
In combination with the seventh possible implementation manner of the second aspect, the present application provides a fourteenth possible implementation manner of the second aspect, where: for a case that the historical order characteristic information includes a price factor characteristic and the price factor characteristic includes an average subsidy rate, the obtaining module is specifically configured to obtain the average subsidy rate by using the following steps:
for each sample user, obtaining order information of all historical orders of the sample user in a first historical time period; the order information includes: an amount due and an amount due;
calculating the average amount due according to the amount due of all historical orders, and calculating the average real payment amount according to the real payment amounts of all historical orders;
and taking the ratio of the difference between the average amount due and the actual average amount due to the actual payment to the average amount due to the actual payment as the average subsidy rate.
In a third aspect, an embodiment of the present application provides an electronic device, including: a processor, a memory and a bus, the memory storing machine readable instructions executable by the processor, the processor and the memory communicating via the bus when the electronic device is running, the machine readable instructions when executed by the processor performing the method of predicting user churn probability according to any of the first aspect.
In a fourth aspect, an embodiment of the present application provides a computer-readable storage medium, on which a computer program is stored, where the computer program is executed by a processor to perform the user churn probability prediction method according to any one of the first aspect.
According to the embodiment of the application, through the acquired historical order characteristic information of the sample users in different area ranges in the first historical time period and the loss result information of the sample users in the second historical time period, user loss probability prediction models respectively corresponding to the area ranges are trained; when the loss probability of the user to be predicted is predicted, predicting the loss probability of the user to be predicted in a third time period in the future by using a user loss probability prediction model corresponding to the region range to which the user to be predicted belongs, so as to obtain the loss probability of the user to be predicted in the third time period; in the process, the difference of users in different area ranges is considered, a user loss probability prediction model is trained for each area range, the user loss probability prediction model for the different area ranges is used, loss probability prediction is carried out for the users in the different area ranges, and the accuracy is higher.
In order to make the aforementioned objects, features and advantages of the present application more comprehensible, preferred embodiments accompanied with figures are described in detail below.
Drawings
In order to more clearly illustrate the technical solutions of the embodiments of the present application, the drawings that are required to be used in the embodiments will be briefly described below, it should be understood that the following drawings only illustrate some embodiments of the present application and therefore should not be considered as limiting the scope, and for those skilled in the art, other related drawings can be obtained from the drawings without inventive effort.
Fig. 1 is a flowchart illustrating a method for predicting a user churn probability according to an embodiment of the present disclosure;
fig. 2 is a flowchart illustrating a specific method for training a user churn probability prediction model corresponding to each region range in the user churn probability prediction method according to the embodiment of the present application;
fig. 3 is a flowchart illustrating a specific method for predicting the churn probability of the user in the third time period in the user churn probability prediction method according to the embodiment of the present application;
fig. 4 is a schematic structural diagram illustrating a user churn probability prediction apparatus according to an embodiment of the present application;
fig. 5 shows a schematic structural diagram of a computer device provided in an embodiment of the present application.
Detailed Description
In order to make the objects, technical solutions and advantages of the embodiments of the present application clearer, the technical solutions in the embodiments of the present application will be clearly and completely described below with reference to the drawings in the embodiments of the present application, and it is obvious that the described embodiments are only a part of the embodiments of the present application, and not all the embodiments. The components of the embodiments of the present application, generally described and illustrated in the figures herein, can be arranged and designed in a wide variety of different configurations. Thus, the following detailed description of the embodiments of the present application, presented in the accompanying drawings, is not intended to limit the scope of the claimed application, but is merely representative of selected embodiments of the application. All other embodiments, which can be derived by a person skilled in the art from the embodiments of the present application without making any creative effort, shall fall within the protection scope of the present application.
At present, when the loss probability of the online taxi booking user is predicted, whether the user loses in a future period of time is generally predicted according to the historical condition that whether the user takes a taxi in the last month, and the method has the following problems: the user characteristics can be different according to different regions, for example, the taxi taking period of users in cities with one line and two lines is not obviously longer than that of users in cities with three lines and the taxi taking period of users in cities with four lines is shorter; the same prediction method is obviously not universal for users with different taxi taking periods, so that the loss probability prediction error of part of users is larger; based on the method and the device for predicting the user loss probability, the user loss probability can be predicted more accurately.
To facilitate understanding of the present embodiment, a user churn probability prediction method disclosed in the embodiments of the present application will be described in detail first.
Referring to fig. 1, a user churn probability prediction method provided in the embodiment of the present application includes:
s101: acquiring historical order feature information of sample users in different area ranges in a first historical time period and attrition result information of the sample users in a second historical time period.
In specific implementation, the area range may be specifically set according to actual needs. For different regional areas, due to the difference of various factors in the regional areas, the frequency of taxi appointment of the user network is different. For example, the probability that the users in different area ranges select the network appointment is different due to the difference of factors such as traffic conditions, the size of the area range, the habits of the users, the income of the users and the like; for example, for cities with poor traffic conditions and low user income, users can choose to go out in a subway mode or other travel modes; for cities with high user income, the probability of selecting to travel by car appointment through the network is higher than that of cities with low income cities. Because of the difference, if the same method is used to determine the user churn probabilities of users in different regions, it is obviously inaccurate, and therefore, the embodiment of the present application trains the user churn probability prediction models respectively for different region ranges.
When the region range is set, different region ranges can be obtained according to the division of administrative regions. For example, dividing the region range by taking a city and a region as a unit; in a large city, the situation of the city center and the suburban area can be distinguished, so that the regional scope can be divided according to the generalization degree of different areas of the city. While for some county-level small towns, the probability of the user using the net appointment car for traveling may be very low, and in order to reduce the calculation amount, these regions are removed, that is, these small towns are not taken as the regional scope in the present application.
In addition, a plurality of area ranges may be obtained by clustering the departure points or the destinations of a plurality of historical orders.
Specifically, the method comprises the following steps: obtaining order information of a historical order in a fourth historical time period in the target area; wherein, the order information includes: a departure place or a destination.
Clustering the order attributions in the clustering samples by taking the obtained historical orders as clustering samples and taking the departure place or the destination as the order attributions to obtain a plurality of classifications;
and determining the area range corresponding to each classification according to the order attribution included in each classification.
After the area range is determined, the user is selected from the area range. The sample user generally selects a user with a stable area range; if some users in which the area is frequently changed, for example, users who are frequently on business trips, training the user churn probability prediction model by using the user as a sample user may reduce the accuracy of the user churn probability prediction model obtained by training. The sample users comprise positive sample users and negative sample users; the positive sample user refers to a user whose loss result information in the second historical time period is not lost; the negative sample user refers to a user whose churn result in the second historical time period is churn.
The sample users belonging to a certain area range refer to users whose starting places and/or destination places of historical orders of the users are within the area range.
The first historical time period is generally a historical time period closer to the current time; determining a first historical time period, for example, from a time within a half year of the current time; or determining the first historical time period from the time within one year of the current time; the second historical period of time typically has a continuous time relationship with the first historical period of time. For example, the current time is 2018, 6 months and 1 day; taking 1 day in 3 months in 2018 to 30 days in 4 months in 2018 as a first historical time period; the 5/1/2018 day to the current time is taken as the second history time period. In addition, for the accuracy of the prediction, optionally, the length of the first history time period may be made larger than the length of the second history time period.
The historical order characteristic information comprises: at least one of regional characteristic information, order behavior characteristics, price factor characteristics, experience factor characteristics and basic attribute characteristics;
the region feature information includes: at least one of climate information and traffic condition information;
the order behavior characteristics include: at least one of an appointment frequency, a loss multiple, a periodic variation of an appointment number, an accumulated order number, a conversion rate between bubbling and invoicing, a bubbling number, a total work day order amount, a total weekend order amount, an appointment peak order amount, and an appointment peak balance order amount;
the price factor characteristics include: at least one of average amount payable, real payment coefficient of variation, average subsidy rate, subsidy order proportion, dynamic adjustment order proportion, conversion rate between dynamic adjustment price and dynamic adjustment issue order, and average dynamic adjustment price;
the experience factor characteristics include: at least one of the number of orders with the driver score smaller than a first preset threshold value, the number of orders with the user score smaller than a second preset threshold value, the actual average waiting time, the estimated average star rating of the user, the estimated average star rating of the driver, the number of complaints of the user and the number of complaints of the driver;
the basic attribute features include: the user registration time.
The area characteristic information, the order behavior characteristic, the price factor characteristic, the experience factor characteristic and the basic attribute characteristic are respectively described by 1 to 5 as follows:
1. region feature information: the characteristic information of the area range is referred to, such as climate information and traffic condition information.
(1) Aiming at the condition that the regional characteristic information comprises climate information, the climate information can be the climate condition when the online appointment platform receives an order sent by a user through a client; or the statistical climate information corresponding to the area range in the first historical time period, such as the proportion occupied by abnormal weather, such as rain and snow, and the like.
Specifically, when the climate information is the climate which the online taxi appointment platform has when receiving the order, the climate information may be acquired through the following steps:
when receiving an order sent by a client, recording climate information when the order is sent by the client.
When the climate information is statistical climate information corresponding to the area range in the first historical time period, the climate information may be obtained through the following steps:
and acquiring the climate information of each day in the first historical time period from a preset climate publishing platform, and generating statistical climate information according to the climate information of each day in the first historical time period.
(2) The pair of regional characteristic information comprises the condition of traffic condition information, and the traffic condition information can be the traffic condition from the departure place to the destination of the user or the traffic condition in the regional range when the online booking platform receives the order sent by the user through the client; the traffic condition may also be a traffic condition corresponding to the area coverage in the first historical time period, such as a total road congestion rate when the user arrives at the destination from the departure place, and a congestion situation of the whole area coverage.
Specifically, when the traffic condition information includes a traffic condition that the online taxi appointment platform receives an order sent by the user through the client and arrives at the destination from the departure place of the user, the traffic condition information can be acquired through the following steps:
when receiving an order sent by a client, recording traffic condition information from a starting place to a destination when the client sends the order based on the starting place and the destination carried in the order;
the traffic condition information may be: the length proportion of the total congested road sections occupied by the congested road sections in the planned route, the probability of occurrence of the congestion situation, the ratio between the shortest arrival time and the predicted arrival time and the like can reflect the information of the traffic congestion situation.
When the traffic condition information comprises the traffic condition of the area range when the online car booking platform receives the order sent by the user through the client, the traffic condition information can be obtained through the following steps:
when an order sent by a client is received, recording the information of the overall congestion condition of a road in the area range where a user is located, such as the road congestion rate, the average speed of the current vehicle and the like, when the order sent by the client is received, and taking the recorded overall congestion condition as the traffic condition information.
When the traffic condition information includes a traffic condition corresponding to the area range within the first historical time period, the traffic condition information may be acquired by:
and acquiring traffic condition statistical information in a first historical time period from a preset traffic management platform. And using the acquired traffic condition statistical information as traffic condition information.
Here, the traffic condition statistical information may be traffic condition information generated by a preset traffic management platform based on traffic conditions of a certain historical time period in the historical time period. The vehicle ordering platform can acquire the traffic condition information in the first historical time period through an interface provided by the traffic management platform.
2. Order behavior characteristics:
(1) for the case where the order behavior characteristics include car booking frequency: the car booking frequency can be obtained by adopting the following steps:
for each user, obtaining order time of all historical orders of the user in a first historical time period; the order time is order receiving time or order completion time;
calculating the average interval time of adjacent historical orders based on the order time corresponding to the historical orders;
the car booking frequency is determined based on the average interval time.
In addition, another mode can be adopted to obtain the car booking frequency:
for each user, acquiring the quantity of all historical orders of the user in a first historical time period;
based on the number of historical orders, and the duration of the first historical time period, a car reservation frequency is determined.
(2) Aiming at the condition that the order behavior characteristics comprise loss multiples, the loss multiples represent the size of the probability that the user is likely to lose, and are represented by the lengths of the silent time and the taxi restriction frequency; and the car-closing frequency and the loss multiple form a negative correlation, and the silent time length and the loss multiple form a negative correlation. In the application, the ratio of the silent time length to the car-booking frequency is taken as a loss multiple; when the loss multiple is 1 or tends to be 1, the state representing the use of the network appointment car of the user is relatively stable; the loss multiple is more than 1, and the larger the loss multiple is, the larger the probability of representing user loss is; the loss multiple is less than 1, and the smaller the loss multiple is, the frequency of representing the users to use the network appointment car is increased, and the loss probability is smaller.
Specifically, the loss multiple can be obtained by the following steps: for each user, obtaining the latest order time of all historical orders of the user in a first historical time period; the latest order time is the latest order receiving time or the latest order completion time;
calculating the silent time length between the latest order time and the starting time of a second historical time period;
and taking the ratio of the silent time length to the car-booking frequency as the loss multiple.
(3) Aiming at the condition that the order behavior characteristics comprise the periodic variation of the car booking quantity, the period can be specifically set according to the actual requirement and the length of the first historical time period; for example, if the length of the first history period is 1 month, the period here may be set to 1 week, 10 days, 15 days, or the like; if the length of the first history period is 3 months, the period here may be set to 1 week, 2 weeks, 1 month, 10 days, 15 days, 20 days, or the like. The periodic variation can be characterized by the order quantity, and also by the rate of change of the order quantity.
And the number of the periodical change quantity is related to the period according to the length of the first historical time period; for example, if the first historical time length is 3 months and the period is 1 month, the number of the periodic variation is 2; if the first history time length is 1 month and the period is 1 week, the number of the periodic variation is 3.
Preferably, the period may be a value related to the second historical length of time; for example, the period is set to a time during which the second history time length lasts.
Specifically, the periodic variation amount of the car reservation number may be acquired by:
for each sample user, obtaining order information of all historical orders of the user in a first historical time period; the order information carries order initiating time or order completing time;
determining the quantity of the historical orders corresponding to each period according to the order initiating time or the order completing time of all the historical orders in the first historical time based on a plurality of periods determined in the first historical time period;
and determining the periodical change quantity of each two adjacent periods based on the quantity of the historical orders corresponding to each period.
(4) For the case where the order behavior characteristic includes a cumulative count, this refers to the total number of orders completed by the user over the first historical period of time. Here, the order cancelled by the user is removed.
Specifically, the cumulative singular amount may be acquired by:
for each sample user, acquiring a history order with a finished state in all history orders of the user in a first history time period;
and taking the number of the historical orders with the determined state as the finished order number.
Here, the status of the historical order may include both completed and cancelled.
(5) Aiming at the condition that the order behavior characteristics comprise the conversion rate between bubbling and invoicing, the conversion rate between bubbling and invoicing is obtained by the following steps:
when a destination sent by a client is received, taking the behavior of the destination sent by the client as a bubbling behavior, and recording or updating the bubbling times;
when an order sent by a client is received, taking the action of sending the order by the client as an order sending action, and recording or updating the order sending action;
and taking the ratio of the bubbling times to the billing times as the conversion rate between bubbling and billing.
When the user uses the network car booking platform to book the car, the user firstly inputs a departure place and a destination through the network car booking platform; at the moment, the client side can initiate a scheduling request to the network car booking platform according to the departure place and the destination generated by the user; the network car booking platform generates information such as estimated price, queuing information, dynamic dispatching price and the like to feed back to the client according to the carrying condition, traffic condition, weather condition, departure place, destination and the like of the current network car booking; this behavior of the user is called bubbling.
Generally, the bubbling behavior of the user can be characterized by the fact that the networked car appointment platform receives the destination sent by the user. Therefore, after the network appointment platform receives the destination sent by the client, the behavior of sending the destination by the client is recorded as bubbling behavior no matter whether the user initiates an order through the client in the following process.
After receiving the information such as estimated price, queuing information, dynamic dispatching price and the like sent by the network appointment platform, the client displays the information to the user; and if the client receives an instruction that the user selects to initiate the order, the order is sent to the network appointment platform, and the line of the user is regarded as the issuing behavior to be recorded whether the order is cancelled in the subsequent waiting process or not.
(6) For the case that the order behavior characteristics include the number of bubbles, the number of bubbles can be obtained by a method similar to the method (5) above, and will not be described herein again.
(7) For the case that the order behavior characteristics include the total number of orders on a weekday, the order behavior characteristics refer to the total number of orders with the order completion time being weekday.
Specifically, the following steps may be taken to obtain the total amount of the order of the working day:
for each sample user, obtaining order information of all historical orders of the user in a first historical time period; the order information carries order sending time or order finishing time;
and taking the historical orders of which the order sending time or the order finishing time falls in the working days as working calendar history orders, counting the number of the working calendar history orders, and acquiring the total number of the working day orders.
Here, the order information of all historical orders in the first historical time period may be only completed orders, or completed orders and cancelled orders.
(8) For the case where the order behavior characteristic includes the total number of orders on a weekend, this refers to the total number of orders for which the order completion time is on the weekend.
Specifically, the following steps may be taken to obtain the total weekend order amount:
for each sample user, obtaining order information of all historical orders of the user in a first historical time period; the order information carries order sending time or order finishing time;
and taking the historical orders of which the order sending time or the order completion time falls into the weekend as weekend historical orders, counting the number of the weekend historical orders, and acquiring the total amount of the weekend orders.
Here, the order information of all historical orders in the first historical time period may be only completed orders, or completed orders and cancelled orders.
It should be noted that the weekend herein may include national statutory holidays other than saturday and sunday.
(9) At least one taxi appointment peak time period is preset according to the condition that the order behavior characteristics comprise taxi appointment peak order amount; and taking the time of sending the order or the number of the orders with the order completion time falling into the preset vehicle peak time period as the total number of the order of the taxi appointment peak at the client.
Specifically, taking the order sending time as an example, the following steps are adopted to obtain the total amount of the taxi appointment peak orders:
when receiving an order sent by a client, recording the time when the client sends the order;
taking a historical order in which the time when a client sends the order falls in a preset peak time period as an appointment peak order, and counting the number of the appointment peak orders to obtain the total number of the appointment peak orders;
(10) at least one vehicle-booking peak-balancing time period is preset according to the condition that the order behavior characteristics comprise the total number of vehicle-booking peak-balancing orders; and taking the time of sending the order or the number of the orders with the order completion time falling into a preset car-booking peak-balancing time period of the client as the total number of the car-booking peak-balancing orders.
Specifically, taking the order sending time as an example, the following steps are adopted to obtain the total amount of the car appointment flat peak orders:
when receiving an order sent by a client, recording the time when the client sends the order;
taking the historical orders of which the time for sending the orders by the client falls within a preset peak-settling time period as car-booking peak-settling orders, and counting the number of the car-booking peak-settling orders to obtain the total number of the car-booking peak-settling orders.
In addition, the order behavior feature may further include: the total amount of orders in the third historical time period before the start time of the second historical time period.
Here, the third history period may be the same duration as the second history period; for example, the first historical period of time is 2, 3, and 4 three months; the second historical period of time is 5 months; the third history period is 6 months.
3. Price factor characteristics:
(1) aiming at the condition that the price factor characteristics comprise average amount due, the online booking platform often has some preferential activities, such as subsidies of taxi taking red envelope, discount coupon and the like; the amount due by the user is the amount that the user should pay when not using any special offers while using the online car appointment to go out. The average amount due refers to the average amount due of all completed historical orders of the user in the first historical time period of the acquired sample.
Specifically, the average amount due may be acquired by:
for each sample user, obtaining order information of all completed historical orders of the sample user in a first historical time period; the order information includes: a payment amount;
calculating the average value of the amount due of all historical orders according to the amount due of all historical orders; and takes the average value as the average amount due.
(2) For the case that the price factor characteristics include real payment coefficient of variation:
the real payment variation coefficient refers to the ratio of the real payment standard deviation of all completed historical orders of the sample user in the first historical time period to the average real payment amount.
Specifically, the actual coefficient of variation may be obtained by the following steps:
for each sample user, obtaining order information of all completed historical orders of the sample user in a first historical time period; the order information includes: a payment amount;
calculating the real payment standard deviation and the average real payment amount according to the real payment amounts of all historical orders;
and taking the ratio of the real payment standard deviation to the average real payment amount as the real payment variation coefficient.
(3) For the case where the price factor characteristics include average subsidy rate:
the average subsidy rate is the ratio of the difference between the average amount due and the average amount actually paid of all completed historical orders of the sample user in the first historical period of time to the average amount due.
Specifically, the average subsidy rate may be obtained by:
for each sample user, obtaining order information of all historical orders of the sample user in a first historical time period; the order information includes: an amount due and an amount due;
calculating the average amount due according to the amount due of all historical orders, and calculating the average real payment amount according to the real payment amounts of all historical orders;
and taking the ratio of the difference between the average amount due and the actual average amount due to the actual payment to the average amount due to the actual payment as the average subsidy rate.
(4) Aiming at the condition that the price factor characteristics comprise subsidy order proportion:
the subsidized order proportion refers to the ratio of the number of orders using subsidies to the total historical order number in all completed historical orders of the sample user in the first historical time period.
Specifically, the subsidy order percentage may be obtained by:
for each sample user, obtaining order information of all historical orders of the sample user in a first historical time period; the order information includes: an amount due and an amount due;
determining the historical orders with the payment amount smaller than the payable amount as subsidy orders, and calculating the ratio of the quantity of the subsidy orders to the quantity of the historical orders;
and taking the ratio of the obtained number of the subsidy orders to the historical number of the subsidy orders as the ratio of the subsidy orders.
(5) Aiming at the condition that the price factor comprises dynamic adjustment order proportion, wherein dynamic adjustment refers to that the price is increased compared with the estimated price under special conditions; the user can have a more preferential order sending authority by selecting to accept the dynamic dispatching, and the online appointment vehicle can preferentially distribute the transport capacity of the user selected to accept the dynamic dispatching; for example, when the transport capacity is insufficient and special conditions such as wind, frost, snow, rain and the like are special, the network appointment platform generates dynamic dispatching information and sends the dynamic dispatching information to the client; the client displays the relevant dynamic tone information to the user and provides corresponding selection items for the user to select.
Specifically, the dynamic dispatching order proportion can be obtained through the following steps;
for each sample user, obtaining order information of all historical orders of the sample user in a first historical time period; the order information includes: amount due and amount of transfer;
determining the historical order with the median dynamic adjustment amount larger than 0 as a dynamic adjustment order;
and taking the ratio of the dynamic order quantity to the total quantity of the historical orders as the dynamic order proportion.
(6) Aiming at the condition that the price factor characteristics comprise the conversion rate between dynamic dispatching pricing and dynamic dispatching invoice, the dynamic dispatching pricing refers to the fact that a network car booking platform generates dynamic dispatching information to be sent to a client; the dynamic call order is an order which is initiated by the user through the selection of the client side to accept the dynamic call.
Specifically, the conversion rate between the dynamic invoicing price and the dynamic invoicing can be obtained by the following steps:
when the dynamic dispatching information is sent to the client, taking the behavior of sending the dynamic dispatching information to the client as dynamic dispatching pricing behavior, and recording or updating the times of dynamic dispatching pricing;
for each sample user, obtaining order information of all historical orders of the sample user in a first historical time period; the order information includes: amount due and amount of transfer;
determining the historical order with the median dynamic adjustment amount larger than 0 as a dynamic adjustment order;
and taking the ratio of the dynamic adjustment pricing frequency to the dynamic adjustment order quantity as the conversion rate between the dynamic adjustment pricing and the dynamic adjustment issuing order.
Here, it should be noted that for orders for which the user has not selected a call, the call amount is 0. The order information may include the transferring amount item, and the value of the item is determined to be 0, or the transferring amount item may not be included; in this case, the default order may be that the order is not a maneuver order.
(7) For the case where the price factor characteristic includes an average move price, the average move price is an average of the move amounts in the move order.
Specifically, the average dynamic price may be obtained by:
for each sample user, obtaining order information of all historical orders of the sample user in a first historical time period; the order information includes: amount due and amount of transfer;
determining the historical order with the median dynamic adjustment amount larger than 0 as a dynamic adjustment order;
and calculating the average dynamic adjustment price according to the quantity of the dynamic adjustment orders and the dynamic adjustment amount corresponding to all the dynamic adjustment orders.
4. Experience factor characteristics:
(1) for the case that the experience factor characteristics include the number of orders that the driver scores less than the first preset threshold, the driver score refers to the driver's score of the user.
Specifically, the number of orders with a driver score smaller than a first preset threshold may be obtained by:
for each sample user, obtaining order information of all historical orders of the sample user in a first historical time period; the order information includes: grading the driver;
and counting the number of the historical orders with the driver scores smaller than a first preset threshold according to the driver scores of the historical orders.
Here, in practical situations, the user may not actively score the driver in many cases, and therefore, the driver score included in the order information may be the driver score given by the user, the driver score may not be given by the user within a preset scoring time, the vehicle reservation platform automatically scores the driver, or the driver score may not be given by the user within the preset scoring time, and the driver score item is null.
(2) And aiming at the condition that the experience factor characteristics comprise the order quantity with the user score smaller than a second preset threshold value, the user score refers to the score of the driver for the user.
Specifically, the order number with the user score smaller than the second preset threshold may be obtained through the following steps:
for each sample user, obtaining order information of all historical orders of the sample user in a first historical time period; the order information includes: scoring the user;
and counting the number of the historical orders with the user scores smaller than a second preset threshold according to the user scores of the historical orders.
In practical situations, the driver may not actively score the user in many cases, and therefore, the user score included in the order information may be the score of the driver on the user, the score of the driver on the user may be the score of the driver on the user within a preset scoring time, the score of the user may be automatically scored by the car booking platform, or the score of the driver on the user may not be scored within the preset scoring time, and the user scoring item is null.
(3) Aiming at the condition that the experience factor characteristics comprise the actual average waiting time, the actual waiting time refers to the time interval between the time when the user sends the order to the network appointment platform through the client and the time when the network appointment platform receives the user.
Specifically, for this case, the actual average waiting time period may be obtained by:
when an order sent by a client is received, recording the first time when the client sends the order;
when feedback sent by the client and received by the passenger is received, or when feedback that the driver arrives at the departure point is received, recording second time when the feedback is received;
taking the time interval length between the first time and the second time as the actual waiting time corresponding to the order;
for each sample user, acquiring the actual waiting time of all historical orders of the sample user in a first historical time period;
and calculating the actual average waiting time according to the number of the historical orders and the actual waiting time of all the historical orders.
Alternatively, the actual average waiting time may also refer to a time interval between the time when the order is accepted by the car ordering driver and the time when the order is accepted by the car ordering driver.
For this case, the actual average waiting time period may be obtained by:
when the online taxi booking driver receives the order, recording the third time when the online taxi booking driver receives the order;
when feedback sent by the client and received by the passenger is received, or when feedback that the online car appointment driver arrives at the departure point is received, recording second time when the feedback is received;
taking the time interval length between the third time and the second time as the actual waiting time corresponding to the order;
for each sample user, acquiring the actual waiting time of all historical orders of the sample user in a first historical time period;
and calculating the actual average waiting time according to the number of the historical orders and the actual waiting time of all the historical orders.
(4) The estimated waiting time refers to the estimated waiting time of the networked car appointment arriving passengers estimated by the networked car appointment platform according to the departure place of the user, the place where the networked car appointment driver is located, the current road condition and other conditions after the networked car appointment driver receives the order.
Specifically, the estimated average waiting time may be obtained by:
when the network car booking driver receives the order, estimating the estimated waiting time of the user according to the departure place of the user, the place of the network car booking driver and the current road condition;
recording the estimated waiting time;
for each sample user, obtaining the estimated waiting time of all historical orders of the sample user in a first historical time period;
and calculating the estimated average waiting time according to the quantity of the historical orders and the estimated waiting time of all the historical orders.
(5) Aiming at the condition that the experience factor characteristics comprise the average star rating of the user, the star rating of the user is the rating of the driver to the user, the rating is similar to the user rating and can be directly obtained by the driver rating, or the driver does not rate the user within the preset rating time, and the star rating item of the user is empty.
(6) For the situation that the experience factor characteristics include the driver's evaluated average star rating, the driver's evaluated star rating is similar to the user's evaluated star rating, and is not described herein again.
(7) For the case where the experience factor characteristics include the number of complaints the user has,
specifically, the number of complaints of the user can be obtained by the following steps:
for each sample user, obtaining order information of all historical orders of the sample user in a first historical time period; the order information comprises: complaint information;
and taking the historical orders of which the complained information is not empty as the complained orders, determining the number of the complained orders, and determining the number of the complained orders as the number of times of complaining of the user.
Alternatively, the number of complaints of the user can be obtained by the following steps:
for each sample user, recording the behavior of complaints of the user received by the driver each time, and generating the complaint information of the sample; the complained information carries the complained time.
And counting the number of complained times of which the complained time falls into the first historical time period, and taking the counted number of complained times as the number of complained times of the user.
(8) For situations where the experience factor characteristics include driver complaints,
specifically, the number of times the driver is complained can be obtained by:
for each sample user, obtaining order information of all historical orders of the sample user in a first historical time period; the order information comprises: complaint information;
and taking the historical orders with the complaint information not being empty as complaint orders, determining the number of the complaint orders, and determining the number of the complaint orders as the complaint times of the driver.
5. Basic attribute characteristics:
the basic attribute feature generally refers to user registration time.
And the online car appointment platform records the registration time of the user when the user registers. When the user registration time of the sample user needs to be obtained, the user registration time can be directly obtained according to the record.
Optionally, the basic attribute features may further include: gender information, age information, etc. of the user.
The gender information and age information are typically filled in at the time of user registration or when the user is required to perform real-name authentication.
After obtaining the historical order feature information of the sample user in the first historical time period and the attrition result information of the sample user in the second historical time period, training a user attrition probability prediction model. Specifically, the user churn probability prediction method provided by the embodiment of the present application further includes:
s102: and training user loss probability prediction models respectively corresponding to each area range based on the acquired historical order characteristic information and the loss result information.
In specific implementation, in order to implement training of a user churn probability prediction model, a prediction model on which user churn probability prediction is performed is determined. And then, taking the historical order characteristic information as the value of an explanation variable of the prediction model, taking the loss result information as the value of an explained variable of the prediction model, and training user loss probability prediction models respectively corresponding to each region range.
Alternatively, the prediction model may adopt any one of a logistic regression model, an autoregressive model, a moving average model, an autoregressive moving average model, an integrated moving average autoregressive model, a generalized autoregressive conditional variance model, a deep learning model, and a decision tree model.
Different model training methods are available for different prediction models. But the principle is similar.
For example, for a logistic regression model, an autoregressive model, a moving average model, an autoregressive moving average model, an integrated moving average autoregressive model, and a generalized autoregressive conditional variance model, the process of training the model is actually a process of solving unknown parameters in the model using the acquired historical order feature information and the attrition result information.
The parameters in the prediction model may be: a weight coefficient corresponding to each of the interpretation variables, and an additional coefficient. The process of model training, i.e. the process of solving for the weight coefficients and additional coefficients, is also: and taking the historical order characteristic information of the plurality of sample users as values of explanatory variables, taking the loss result information corresponding to each sample user as a value of an explained variable, and calculating a weight coefficient of each explanatory variable in the user loss probability prediction model and an additional coefficient of the user loss probability prediction model to obtain the trained user loss probability prediction model.
Specifically, when a user loss probability prediction model is trained, historical order features can form an explanatory variable matrix, parameters of each explanatory variable form a parameter matrix, and loss result information forms an explained variable matrix, wherein columns of the explanatory variable matrix represent characteristic information of each historical order, and rows of the explanatory variables represent each sample user; the rows of the parameter matrix characterize the parameters corresponding to the different interpretation variables. The rows of the interpreted variable matrix characterize the attrition results for each sample user. And then based on the formed interpretation variable matrix, parameter matrix and interpreted variable matrix, taking the historical order characteristic information as the value of each element in the interpretation variable matrix, taking the loss result information as the value of each element in the interpreted variable matrix, and solving the parameter matrix, thereby obtaining the user loss probability prediction model.
For a deep learning model, a deep learning network needs to be constructed in advance, historical order feature information of a sample user in any area range in a first historical time period is used as input of the deep learning network, loss result information of the sample user in a second historical time period is used as a reference result, supervised training is conducted on the deep learning network, and a user historical probability prediction model is obtained.
For the decision tree model, the process of training the model is actually a process of constructing the decision tree by using the acquired historical order characteristic information and the loss result information.
When constructing the decision tree, the following steps may be employed:
for each area range, taking the sample users in each area range as a sample user set, and calculating the information gain of each historical order characteristic based on the historical order characteristic information of a plurality of sample users in the sample user set in a first historical time period;
taking the corresponding historical order characteristics with the largest information gain as parent nodes of the decision tree model;
according to the characteristic intervals corresponding to the parent node, dividing a plurality of sample users in the sample user set into each characteristic interval respectively;
for each characteristic interval, calculating the information gain of each type of historical order characteristic except the existing node in the branch where the father node is located according to the historical order characteristic information of the sample user in the characteristic interval, and taking the historical order characteristic with the maximum corresponding information gain as a child node of the father node;
taking the child node as a new father node, forming a new sample user set by the sample users in the feature interval corresponding to the new father node, returning to the step of dividing the plurality of sample users in the sample user set into each feature interval according to the plurality of feature intervals corresponding to the father node until each branch contains all preset historical order features;
and respectively dividing the sample user corresponding to each branch into each characteristic interval according to a plurality of characteristic intervals corresponding to the last node of each branch in the decision tree model, and marking a loss result label for each characteristic interval corresponding to each node based on the loss result information of the sample user in each characteristic interval in the second historical time period.
Specifically, calculating an information gain of each historical order characteristic based on historical order characteristic information of a plurality of sample users in the sample user set in a first historical time period, wherein the information gain comprises:
and aiming at each historical order feature, calculating the information gain of the historical order feature according to the number of the sample users with lost results and the number of the sample users with non-lost results in the current sample user set, and the number of the sample users with lost results and the number of the sample users with non-lost results in each feature interval of the historical order feature length.
The information gain Info _ gain (D) of any row of historical order features is calculated according to the following formula:
Figure BDA0001812381610000331
wherein k represents the number of the characteristic intervals corresponding to any historical order characteristic; p (c)i) The ratio of the number of sample users included in the ith characteristic interval representing any historical order characteristic to the total number of lost sample users in the current sample user set is represented; p (t) represents that the churn result is the ratio of the number of churned sample users to the number of all non-churned users in the current sample user set;
Figure BDA0001812381610000332
indicating that the churn result is the ratio of the number of the sample users which are not churn and the number of all churn sample users in the current sample user set; p (c)iI t) represents that in the ith characteristic interval of any driving characteristic, the loss result is the ratio of the number of lost sample orders to the number of all lost sample users in the characteristic interval;
Figure BDA0001812381610000333
in the ith characteristic interval representing any historical order characteristic, the loss result is the ratio of the number of the sample users which are not lost to the number of all lost sample users in the characteristic interval.
In the embodiment of the application, a user churn probability prediction model is obtained by training aiming at each of a plurality of region ranges; in practical situations, situations in which users use the network appointment platform in many area ranges are relatively similar, so that in practical operation, a model training process can be performed for the area ranges in which the situations are similar, and the obtained user loss probability prediction model can predict the loss probability for the users in the area ranges in which the situations are similar. Therefore, in another embodiment of the present application, there is provided another method for training a user churn probability prediction model corresponding to each area range based on the acquired historical order feature information and the churn result information, where the method includes:
s201: and determining the historical order quantity corresponding to each area range in a plurality of area ranges.
Here, the historical order number corresponding to each of the plurality of area ranges may be the historical order number in the first historical time period, or may be the historical order number corresponding to another historical time period.
The historical order quantity here generally refers to the quantity of historical orders that have been completed. The historical orders corresponding to each area range may be the historical orders with the starting point in the area range, or the historical orders with the destination in the area range. In the present application, only one of the above-mentioned division principles is adopted here in order to clearly divide the historical orders whose departure points and destinations fall into different area ranges.
S202: and classifying the plurality of area ranges according to the historical order number corresponding to each area range.
Specifically, when classifying an area range, the area range in which the historical order quantity falls within the same quantity interval may be classified into a category corresponding to the preset quantity interval.
Here, a plurality of number intervals are preset; each quantity interval corresponds to one classification; the number and size of the number intervals can be specifically set according to actual conditions.
S203: and training a user loss probability prediction model corresponding to each region range in each classification based on the historical order characteristic information and the loss result information in the region range included by each classification.
When the user churn probability prediction model is trained for the area range included in each classification, the user churn probability prediction model can be predicted by using the historical order feature information and churn result information of all sample users in the area range included in the classification. The obtained user churn probability prediction model can be used for predicting churn probabilities of users in all area ranges included in the classification.
After the user churn probability prediction model corresponding to each region range is obtained, churn probabilities of users belonging to the region range can be predicted based on the user churn probability prediction model. The user churn probability prediction method provided by the embodiment of the application further comprises the following steps:
s103: and predicting the loss probability of the user to be predicted in a third time period in the future based on a user loss probability prediction model corresponding to the area range of the user to be predicted, so as to obtain the loss probability of the user to be predicted in the third time period.
The third time period is typically not longer than the second historical time period when implemented.
Referring to fig. 3, an embodiment of the present application further provides a specific method for predicting the attrition probability of the user in the third time period, including:
s301: and determining the region range to which the user to be predicted belongs.
Specifically, the area range of the user is determined according to an area range in which a departure place or a destination included in the historical order of the user falls.
Specifically, the area range of the user to be predicted may be determined by:
acquiring historical order information of a user to be predicted; wherein the historical order information comprises: a departure place or a destination; the departure place or the destination is taken as the location of the user to be predicted,
determining an area range in which the location of a user to be predicted falls;
taking the area range in which the location of the user to be predicted falls as the area range of the user to be predicted, aiming at the condition that the location of the user to be predicted falls into one area range;
and taking the area range with the highest frequency of falling of the location of the user to be predicted as the area range of the user to be predicted, aiming at the condition that the location of the user to be predicted falls into a plurality of area ranges.
Here, in order to improve the accuracy of prediction, the acquired historical order information of the user to be predicted is historical order information of historical orders within a current preset time period.
S302: and predicting the loss probability of the user to be predicted in the third time period based on a user loss probability prediction model corresponding to the determined area range and historical order characteristic information of the user to be predicted in a fourth historical time period.
Here, the fourth history period is generally the same length as the first history period. The fourth historical period is typically an end time at the predicted time. The third time period is typically started with the predicted time.
According to the embodiment of the application, through the acquired historical order characteristic information of the sample users in different area ranges in the first historical time period and the loss result information of the sample users in the second historical time period, user loss probability prediction models respectively corresponding to the area ranges are trained; when the loss probability of the user to be predicted is predicted, predicting the loss probability of the user to be predicted in a third time period in the future by using a user loss probability prediction model corresponding to the region range to which the user to be predicted belongs, so as to obtain the loss probability of the user to be predicted in the third time period; in the process, the difference of users in different area ranges is considered, a user loss probability prediction model is trained for each area range, the user loss probability prediction model for the different area ranges is used, loss probability prediction is carried out for the users in the different area ranges, and the accuracy is higher.
Based on the same inventive concept, a user churn probability prediction device corresponding to the user churn probability prediction method is further provided in the embodiments of the present application, and as the principle of solving the problem of the device in the embodiments of the present application is similar to the user churn probability prediction method in the embodiments of the present application, the implementation of the device may refer to the implementation of the method, and repeated details are not repeated.
Referring to fig. 4, the device for predicting user churn probability provided in the embodiment of the present application includes:
an obtaining module 41, configured to obtain historical order feature information of sample users in different area ranges in a first historical time period, and attrition result information of the sample users in a second historical time period;
a training module 42, configured to train user churn probability prediction models respectively corresponding to each area range based on the acquired historical order feature information and the churn result information;
and the prediction module 43 is configured to predict the churn probability of the user to be predicted in a third time period in the future based on the user churn probability prediction model corresponding to the area range where the user to be predicted is located, so as to obtain the churn probability of the user to be predicted in the third time period.
According to the embodiment of the application, through the acquired historical order characteristic information of the sample users in different area ranges in the first historical time period and the loss result information of the sample users in the second historical time period, user loss probability prediction models respectively corresponding to the area ranges are trained; when the loss probability of the user to be predicted is predicted, predicting the loss probability of the user to be predicted in a third time period in the future by using a user loss probability prediction model corresponding to the region range to which the user to be predicted belongs, so as to obtain the loss probability of the user to be predicted in the third time period; in the process, the difference of users in different area ranges is considered, a user loss probability prediction model is trained for each area range, the user loss probability prediction model for the different area ranges is used, loss probability prediction is carried out for the users in the different area ranges, and the accuracy is higher.
Optionally, the training module 42 is specifically configured to train user churn probability prediction models respectively corresponding to each area range based on the acquired historical order feature information and the churn result information through the following steps:
determining a prediction model on which a user churn probability prediction is based;
and taking the historical order characteristic information as the value of an explanation variable of the prediction model, taking the loss result information as the value of an explained variable of the prediction model, and training user loss probability prediction models respectively corresponding to each region range.
Optionally, the prediction model is any one of a logistic regression model, an autoregressive model, a moving average model, an autoregressive moving average model, an integrated moving average autoregressive model, a generalized autoregressive conditional variance model, a deep learning model, and a decision tree model.
Optionally, the user churn probability prediction model to be trained includes: a weight coefficient to be trained and an additional coefficient corresponding to each of the interpretation variables.
The training module 42 is specifically configured to train user churn probability prediction models respectively corresponding to each area range by using the historical order feature information as values of explanatory variables of the prediction models and using the churn result information as values of explained variables of the prediction models through the following steps:
and taking the historical order characteristic information of the plurality of sample users as values of explanatory variables, taking the loss result information corresponding to each sample user as a value of an explained variable, and calculating a weight coefficient of each explanatory variable in the user loss probability prediction model and an additional coefficient of the user loss probability prediction model to obtain the trained user loss probability prediction model.
Optionally, the training module 42 is specifically configured to train, based on the acquired historical order feature information and the churn result information, user churn probability prediction models respectively corresponding to each area range:
determining the historical order number corresponding to each area range in a plurality of area ranges;
classifying the plurality of area ranges according to the historical order number corresponding to each area range;
and training a user loss probability prediction model corresponding to each region range in each classification based on the historical order characteristic information and the loss result information in the region range included by each classification.
Optionally, the training module 42 is specifically configured to classify the plurality of area ranges according to the historical order quantity corresponding to each of the area ranges through the following steps:
and dividing the area range of which the historical order quantity falls into the same quantity interval into a classification corresponding to the preset quantity interval.
Optionally, the prediction module 43 is specifically configured to: determining the region range to which the user to be predicted belongs;
and predicting the loss probability of the user to be predicted in the third time period based on a user loss probability prediction model corresponding to the determined area range and historical order characteristic information of the user to be predicted in a fourth historical time period.
Optionally, the historical order characteristic information includes: at least one of regional characteristic information, order behavior characteristics, price factor characteristics, experience factor characteristics and basic attribute characteristics;
wherein the order behavior characteristics include: at least one of an appointment frequency, a loss multiple, a periodic variation of an appointment number, an accumulated order number, a conversion rate between bubbling and invoicing, a bubbling number, a total work day order amount, a total weekend order amount, an appointment peak order amount, and an appointment peak balance order amount;
the price factor characteristics include: at least one of average amount payable, real payment coefficient of variation, average subsidy rate, subsidy order proportion, dynamic adjustment order proportion, conversion rate between dynamic adjustment price and dynamic adjustment issue order, and average dynamic adjustment price;
the experience factor characteristics include: at least one of the number of orders with the driver score smaller than a first preset threshold value, the number of orders with the user score smaller than a second preset threshold value, the actual average waiting time, the estimated average star rating of the user, the estimated average star rating of the driver, the number of complaints of the user and the number of complaints of the driver;
the basic attribute features include: user registration time;
the region feature information includes: at least one of climate information and traffic condition information.
Optionally, for a case that the historical order characteristic information includes an order behavior characteristic, and the order behavior characteristic includes a car booking frequency, the obtaining module 41 is specifically configured to obtain the car booking frequency by adopting the following steps:
for each sample user, obtaining order time of all historical orders of the sample user in a first historical time period; the order time is order receiving time or order completion time;
calculating the average interval time of adjacent historical orders based on the order time corresponding to all historical orders;
determining the car booking frequency based on the average interval time.
Optionally, for a case that the historical order characteristic information includes an order behavior characteristic, and the order behavior characteristic includes a churn multiple, the obtaining module 41 is specifically configured to obtain the churn multiple by adopting the following steps:
for each sample user, obtaining the latest order time of all historical orders of the sample user in a first historical time period; the latest order time is the latest order receiving time or the latest order completion time;
calculating the silent time length between the latest order time and the starting time of a second historical time period;
and taking the ratio of the silent time length to the car-booking frequency as the loss multiple.
Optionally, for a case that the historical order characteristic information includes an order behavior characteristic, and the order behavior characteristic includes a bubbling number, the obtaining module 41 is specifically configured to obtain the bubbling number by:
and when the destination sent by the client is received, taking the behavior of the destination sent by the client as bubbling behavior, and recording or updating the bubbling times.
Optionally, for a case that the historical order characteristic information includes an order behavior characteristic, and the order behavior characteristic includes a conversion rate between bubbling and invoicing, the obtaining module 41 is specifically configured to obtain the conversion rate between bubbling and invoicing by adopting the following steps:
when a destination sent by a client is received, taking the behavior of the destination sent by the client as a bubbling behavior, and recording the bubbling times;
when an order sent by a client is received, taking the action of sending the order by the client as an order sending action, and recording the order sending times;
and taking the ratio of the bubbling times to the billing times as the conversion rate between the bubbling and the billing.
Optionally, for a case that the historical order characteristic information includes an order behavior characteristic, and the order behavior characteristic includes an amount of taxi appointment peak orders, the obtaining module 41 is specifically configured to obtain the amount of taxi appointment peak orders by using the following steps:
when receiving an order sent by a client, recording the time when the client sends the order;
taking a historical order in which the time when a client sends the order falls in a preset peak time period as an appointment peak order, and counting the number of the appointment peak orders to obtain the total number of the appointment peak orders;
aiming at the condition that the order characteristic information comprises order behavior characteristics and the order behavior characteristics comprise the car appointment flat peak order total amount, the following steps are adopted to obtain the car appointment flat peak order total amount:
when receiving an order sent by a client, recording the time when the client sends the order;
taking the historical orders of which the time for sending the orders by the client falls within a preset peak-settling time period as car-booking peak-settling orders, and counting the number of the car-booking peak-settling orders to obtain the total number of the car-booking peak-settling orders.
Optionally, for a case that the historical order characteristic information includes a price factor characteristic and the price factor characteristic includes an actual payment coefficient of variation, the obtaining module 41 is specifically configured to obtain the actual payment coefficient of variation by using the following steps:
for each sample user, obtaining order information of all completed historical orders of the sample user in a first historical time period; the order information includes: a payment amount;
calculating the real payment standard deviation and the average real payment amount according to the real payment amounts of all historical orders;
and taking the ratio of the real payment standard deviation to the average real payment amount as the real payment variation coefficient.
Optionally, for a case that the historical order characteristic information includes a price factor characteristic, and the price factor characteristic includes an average subsidy rate, the obtaining module 41 is specifically configured to obtain the average subsidy rate by using the following steps:
for each sample user, obtaining order information of all historical orders of the sample user in a first historical time period; the order information includes: an amount due and an amount due;
calculating the average amount due according to the amount due of all historical orders, and calculating the average real payment amount according to the real payment amounts of all historical orders;
and taking the ratio of the difference between the average amount due and the actual average amount due to the actual payment to the average amount due to the actual payment as the average subsidy rate.
Corresponding to the user churn probability prediction method in fig. 1, an embodiment of the present application further provides a computer device, as shown in fig. 5, the computer device includes a memory 1000, a processor 2000 and a computer program stored in the memory 1000 and executable on the processor 2000, wherein the processor 2000 implements the steps of the user churn probability prediction method when executing the computer program.
Specifically, the memory 1000 and the processor 2000 may be general memories and general processors, which are not specifically limited herein, and when the processor 2000 runs a computer program stored in the memory 1000, the user churn probability prediction method may be executed, so as to solve the problem of accuracy of user churn probability prediction in the prior art, and further achieve the effects of training a user churn probability prediction model for each area range, and performing churn probability prediction for users in different area ranges by using the user churn probability prediction model for different area ranges, which has higher accuracy.
Corresponding to the user churn probability prediction method in fig. 1, an embodiment of the present application further provides a computer-readable storage medium, where a computer program is stored on the computer-readable storage medium, and the computer program is executed by a processor to perform the steps of the user churn probability prediction method.
Specifically, the storage medium can be a general storage medium, such as a mobile disk, a hard disk, and the like, and when a computer program on the storage medium is executed, the user loss probability prediction method can be executed, so that the problem that the user loss probability prediction accuracy is high in the prior art is solved, a user loss probability prediction model is trained for each area range, and the user loss probability prediction model for different area ranges is used to predict the user loss probability in different area ranges, so that the user loss probability prediction method has an effect of high accuracy.
The computer program product of the user churn probability prediction method and device provided in the embodiments of the present application includes a computer-readable storage medium storing a program code, where instructions included in the program code may be used to execute the method described in the foregoing method embodiments, and specific implementations may refer to the method embodiments and are not described herein again.
It is clear to those skilled in the art that, for convenience and brevity of description, the specific working processes of the system and the apparatus described above may refer to the corresponding processes in the foregoing method embodiments, and are not described herein again.
The functions, if implemented in the form of software functional units and sold or used as a stand-alone product, may be stored in a computer readable storage medium. Based on such understanding, the technical solution of the present application or portions thereof that substantially contribute to the prior art may be embodied in the form of a software product stored in a storage medium and including instructions for causing a computer device (which may be a personal computer, a server, or a network device) to execute all or part of the steps of the method according to the embodiments of the present application. And the aforementioned storage medium includes: a U-disk, a removable hard disk, a Read-Only Memory (ROM), a Random Access Memory (RAM), a magnetic disk or an optical disk, and other various media capable of storing program codes.
The above description is only for the specific embodiments of the present application, but the scope of the present application is not limited thereto, and any person skilled in the art can easily conceive of the changes or substitutions within the technical scope of the present application, and shall be covered by the scope of the present application. Therefore, the protection scope of the present application shall be subject to the protection scope of the claims.

Claims (32)

1. A user churn probability prediction method is characterized by comprising the following steps:
acquiring historical order feature information of sample users in different area ranges in a first historical time period and loss result information of the sample users in a second historical time period;
training user loss probability prediction models respectively corresponding to each area range based on the acquired historical order characteristic information and the loss result information;
and predicting the loss probability of the user to be predicted in a third time period in the future based on the user loss probability prediction model corresponding to the area range of the user to be predicted, so as to obtain the loss probability of the user to be predicted in the third time period.
2. The method according to claim 1, wherein the training of the user churn probability prediction model corresponding to each area range based on the obtained historical order feature information and the churn result information specifically includes:
determining a prediction model on which a user churn probability prediction is based;
and taking the historical order characteristic information as the value of an explanation variable of the prediction model, taking the loss result information as the value of an explained variable of the prediction model, and training user loss probability prediction models respectively corresponding to each region range.
3. The method of claim 2, wherein the prediction model is any one of a logistic regression model, an autoregressive model, a moving average model, an autoregressive moving average model, an integrated moving average autoregressive model, a generalized autoregressive conditional variance model, a deep learning model, and a decision tree model.
4. The method of claim 2, wherein the user churn probability prediction model to be trained comprises: a weight coefficient to be trained and an additional coefficient corresponding to each of the interpretation variables.
The training of the user loss probability prediction model corresponding to each region range by using the historical order characteristic information as the value of the interpretation variable of the prediction model and the loss result information as the value of the interpreted variable of the prediction model specifically comprises:
and taking the historical order characteristic information of the plurality of sample users as values of explanatory variables, taking the loss result information corresponding to each sample user as a value of an explained variable, and calculating a weight coefficient of each explanatory variable in the user loss probability prediction model and an additional coefficient of the user loss probability prediction model to obtain the trained user loss probability prediction model.
5. The method according to claim 1, wherein training a user churn probability prediction model corresponding to each area range respectively based on the obtained historical order feature information and the churn result information comprises:
determining the historical order number corresponding to each area range in a plurality of area ranges;
classifying the plurality of area ranges according to the historical order number corresponding to each area range;
and training a user loss probability prediction model corresponding to each region range in each classification based on the historical order characteristic information and the loss result information in the region range included by each classification.
6. The method according to claim 5, wherein classifying the plurality of area ranges according to the historical order quantity corresponding to each of the area ranges specifically comprises:
and dividing the area range of which the historical order quantity falls into the same quantity interval into a classification corresponding to the preset quantity interval.
7. The method of claim 1, wherein predicting the churn probability of the user to be predicted in a third time period in the future based on the trained user churn probability prediction model to obtain the churn probability of the user to be predicted in the third time period comprises:
determining the region range to which the user to be predicted belongs;
and predicting the loss probability of the user to be predicted in the third time period based on a user loss probability prediction model corresponding to the determined area range and historical order characteristic information of the user to be predicted in a fourth historical time period.
8. The method of claim 1, wherein the historical order characteristics information comprises: at least one of regional characteristic information, order behavior characteristics, price factor characteristics, experience factor characteristics and basic attribute characteristics;
wherein the order behavior characteristics include: at least one of an appointment frequency, a loss multiple, a periodic variation of an appointment number, an accumulated order number, a conversion rate between bubbling and invoicing, a bubbling number, a total work day order amount, a total weekend order amount, an appointment peak order amount, and an appointment peak balance order amount;
the price factor characteristics include: at least one of average amount payable, real payment coefficient of variation, average subsidy rate, subsidy order proportion, dynamic adjustment order proportion, conversion rate between dynamic adjustment price and dynamic adjustment issue order, and average dynamic adjustment price;
the experience factor characteristics include: at least one of the number of orders with the driver score smaller than a first preset threshold value, the number of orders with the user score smaller than a second preset threshold value, the actual average waiting time, the estimated average star rating of the user, the estimated average star rating of the driver, the number of complaints of the user and the number of complaints of the driver;
the basic attribute features include: user registration time;
the region feature information includes: at least one of climate information and traffic condition information.
9. The method of claim 8, wherein for the case that the historical order characteristics information includes order behavior characteristics and the order behavior characteristics include an order taking frequency, the order taking frequency is obtained by:
for each sample user, obtaining order time of all historical orders of the sample user in a first historical time period; the order time is order receiving time or order completion time;
calculating the average interval time of adjacent historical orders based on the order time corresponding to all historical orders;
determining the car booking frequency based on the average interval time.
10. The method of claim 8, wherein for the case that the historical order characteristics information includes order behavior characteristics and the order behavior characteristics include churn multiples, the churn multiples are obtained by:
for each sample user, obtaining the latest order time of all historical orders of the sample user in a first historical time period; the latest order time is the latest order receiving time or the latest order completion time;
calculating the silent time length between the latest order time and the starting time of a second historical time period;
and taking the ratio of the silent time length to the car-booking frequency as the loss multiple.
11. The method of claim 8, wherein for the case that the historical order characteristics information includes order behavior characteristics and the order behavior characteristics include bubble times, the bubble times are obtained by the following steps:
and when the destination sent by the client is received, taking the behavior of the destination sent by the client as bubbling behavior, and recording or updating the bubbling times.
12. The method of claim 8, wherein for the case that the historical order characteristics information includes order behavior characteristics and the order behavior characteristics include conversion rates between bubbling and invoicing, the conversion rates between bubbling and invoicing are obtained by:
when a destination sent by a client is received, taking the behavior of the destination sent by the client as a bubbling behavior, and recording the bubbling times;
when an order sent by a client is received, taking the action of sending the order by the client as an order sending action, and recording the order sending times;
and taking the ratio of the bubbling times to the billing times as the conversion rate between the bubbling and the billing.
13. The method of claim 8, wherein the order peak order total is obtained for the case that the historical order characteristics information includes order behavior characteristics and the order behavior characteristics include an order peak order total, by:
when receiving an order sent by a client, recording the time when the client sends the order;
taking a historical order in which the time when a client sends the order falls in a preset peak time period as an appointment peak order, and counting the number of the appointment peak orders to obtain the total number of the appointment peak orders;
aiming at the condition that the order characteristic information comprises order behavior characteristics and the order behavior characteristics comprise the car appointment flat peak order total amount, the following steps are adopted to obtain the car appointment flat peak order total amount:
when receiving an order sent by a client, recording the time when the client sends the order;
taking the historical orders of which the time for sending the orders by the client falls within a preset peak-settling time period as car-booking peak-settling orders, and counting the number of the car-booking peak-settling orders to obtain the total number of the car-booking peak-settling orders.
14. The method of claim 8, wherein for a case that the historical order characteristics information includes price factor characteristics and the price factor characteristics include real payment coefficient of variation, the real payment coefficient of variation is obtained by the following steps:
for each sample user, obtaining order information of all completed historical orders of the sample user in a first historical time period; the order information includes: a payment amount;
calculating the real payment standard deviation and the average real payment amount according to the real payment amounts of all historical orders;
and taking the ratio of the real payment standard deviation to the average real payment amount as the real payment variation coefficient.
15. The method of claim 8, wherein for a case that the historical order characteristics information includes price factor characteristics and the price factor characteristics include an average subsidy rate, the average subsidy rate is obtained by:
for each sample user, obtaining order information of all historical orders of the sample user in a first historical time period; the order information includes: an amount due and an amount due;
calculating the average amount due according to the amount due of all historical orders, and calculating the average real payment amount according to the real payment amounts of all historical orders;
and taking the ratio of the difference between the average amount due and the actual average amount due to the actual payment to the average amount due to the actual payment as the average subsidy rate.
16. A user churn probability prediction apparatus, comprising:
the acquisition module is used for acquiring historical order characteristic information of sample users in different area ranges in a first historical time period and loss result information of the sample users in a second historical time period;
the training module is used for training user loss probability prediction models respectively corresponding to each area range based on the acquired historical order characteristic information and the loss result information;
and the prediction module is used for predicting the loss probability of the user to be predicted in a third time period in the future based on the user loss probability prediction model corresponding to the area range of the user to be predicted, so as to obtain the loss probability of the user to be predicted in the third time period.
17. The apparatus according to claim 16, wherein the training module is specifically configured to train a user churn probability prediction model corresponding to each area range based on the obtained historical order feature information and the churn result information by:
determining a prediction model on which a user churn probability prediction is based;
and taking the historical order characteristic information as the value of an explanation variable of the prediction model, taking the loss result information as the value of an explained variable of the prediction model, and training user loss probability prediction models respectively corresponding to each region range.
18. The apparatus of claim 17, wherein the prediction model is any one of a logistic regression model, an autoregressive model, a moving average model, an autoregressive moving average model, an integrated moving average autoregressive model, a generalized autoregressive conditional variance model, a deep learning model, and a decision tree model.
19. The apparatus of claim 17, wherein the user churn probability prediction model to be trained comprises: a weight coefficient to be trained and an additional coefficient corresponding to each of the interpretation variables.
The training module is specifically configured to train user churn probability prediction models respectively corresponding to each region range by using the historical order feature information as values of explanatory variables of the prediction models and using the churn result information as values of explained variables of the prediction models, through the following steps:
and taking the historical order characteristic information of the plurality of sample users as values of explanatory variables, taking the loss result information corresponding to each sample user as a value of an explained variable, and calculating a weight coefficient of each explanatory variable in the user loss probability prediction model and an additional coefficient of the user loss probability prediction model to obtain the trained user loss probability prediction model.
20. The apparatus according to claim 16, wherein the training module is specifically configured to train, based on the obtained historical order feature information and the churn result information, user churn probability prediction models respectively corresponding to each area range:
determining the historical order number corresponding to each area range in a plurality of area ranges;
classifying the plurality of area ranges according to the historical order number corresponding to each area range;
and training a user loss probability prediction model corresponding to each region range in each classification based on the historical order characteristic information and the loss result information in the region range included by each classification.
21. The apparatus of claim 20, wherein the training module is specifically configured to classify the plurality of area regions according to the historical order quantity corresponding to each of the area regions by:
and dividing the area range of which the historical order quantity falls into the same quantity interval into a classification corresponding to the preset quantity interval.
22. The apparatus of claim 16, wherein the prediction module is specifically configured to: determining the region range to which the user to be predicted belongs;
and predicting the loss probability of the user to be predicted in the third time period based on a user loss probability prediction model corresponding to the determined area range and historical order characteristic information of the user to be predicted in a fourth historical time period.
23. The apparatus of claim 16, wherein the historical order characteristics information comprises: at least one of regional characteristic information, order behavior characteristics, price factor characteristics, experience factor characteristics and basic attribute characteristics;
wherein the order behavior characteristics include: at least one of an appointment frequency, a loss multiple, a periodic variation of an appointment number, an accumulated order number, a conversion rate between bubbling and invoicing, a bubbling number, a total work day order amount, a total weekend order amount, an appointment peak order amount, and an appointment peak balance order amount;
the price factor characteristics include: at least one of average amount payable, real payment coefficient of variation, average subsidy rate, subsidy order proportion, dynamic adjustment order proportion, conversion rate between dynamic adjustment price and dynamic adjustment issue order, and average dynamic adjustment price;
the experience factor characteristics include: at least one of the number of orders with the driver score smaller than a first preset threshold value, the number of orders with the user score smaller than a second preset threshold value, the actual average waiting time, the estimated average star rating of the user, the estimated average star rating of the driver, the number of complaints of the user and the number of complaints of the driver;
the basic attribute features include: user registration time;
the region feature information includes: at least one of climate information and traffic condition information.
24. The apparatus according to claim 23, wherein, in a case that the historical order characteristic information includes an order behavior characteristic, and the order behavior characteristic includes a car booking frequency, the obtaining module is specifically configured to obtain the car booking frequency by:
for each sample user, obtaining order time of all historical orders of the sample user in a first historical time period; the order time is order receiving time or order completion time;
calculating the average interval time of adjacent historical orders based on the order time corresponding to all historical orders;
determining the car booking frequency based on the average interval time.
25. The apparatus of claim 23, wherein for a case that the historical order characteristics information includes order behavior characteristics and the order behavior characteristics include a churn multiple, the obtaining module is specifically configured to obtain the churn multiple by:
for each sample user, obtaining the latest order time of all historical orders of the sample user in a first historical time period; the latest order time is the latest order receiving time or the latest order completion time;
calculating the silent time length between the latest order time and the starting time of a second historical time period;
and taking the ratio of the silent time length to the car-booking frequency as the loss multiple.
26. The apparatus according to claim 23, wherein for a case that the historical order characteristics information includes order behavior characteristics, and the order behavior characteristics include bubble times, the obtaining module is specifically configured to obtain the bubble times by:
and when the destination sent by the client is received, taking the behavior of the destination sent by the client as bubbling behavior, and recording or updating the bubbling times.
27. The apparatus according to claim 23, wherein the obtaining module is specifically configured to obtain the conversion rate between bubbling and invoicing in case that the historical order characteristics information includes order behavior characteristics and the order behavior characteristics include conversion rate between bubbling and invoicing:
when a destination sent by a client is received, taking the behavior of the destination sent by the client as a bubbling behavior, and recording the bubbling times;
when an order sent by a client is received, taking the action of sending the order by the client as an order sending action, and recording the order sending times;
and taking the ratio of the bubbling times to the billing times as the conversion rate between the bubbling and the billing.
28. The apparatus of claim 23, wherein the obtaining module is specifically configured to obtain the total amount of the taxi appointment peak order by using the following steps, in case that the historical order characteristics information includes order behavior characteristics and the order behavior characteristics include a taxi appointment peak order amount:
when receiving an order sent by a client, recording the time when the client sends the order;
taking a historical order in which the time when a client sends the order falls in a preset peak time period as an appointment peak order, and counting the number of the appointment peak orders to obtain the total number of the appointment peak orders;
aiming at the condition that the order characteristic information comprises order behavior characteristics and the order behavior characteristics comprise the car appointment flat peak order total amount, the following steps are adopted to obtain the car appointment flat peak order total amount:
when receiving an order sent by a client, recording the time when the client sends the order;
taking the historical orders of which the time for sending the orders by the client falls within a preset peak-settling time period as car-booking peak-settling orders, and counting the number of the car-booking peak-settling orders to obtain the total number of the car-booking peak-settling orders.
29. The apparatus of claim 23, wherein for a case that the historical order characteristic information includes a price factor characteristic and the price factor characteristic includes a real payment coefficient of variation, the obtaining module is specifically configured to obtain the real payment coefficient of variation by:
for each sample user, obtaining order information of all completed historical orders of the sample user in a first historical time period; the order information includes: a payment amount;
calculating the real payment standard deviation and the average real payment amount according to the real payment amounts of all historical orders;
and taking the ratio of the real payment standard deviation to the average real payment amount as the real payment variation coefficient.
30. The apparatus according to claim 23, wherein for a case that the historical order characteristics information includes price factor characteristics, and the price factor characteristics include an average subsidy rate, the obtaining module is specifically configured to obtain the average subsidy rate by:
for each sample user, obtaining order information of all historical orders of the sample user in a first historical time period; the order information includes: an amount due and an amount due;
calculating the average amount due according to the amount due of all historical orders, and calculating the average real payment amount according to the real payment amounts of all historical orders;
and taking the ratio of the difference between the average amount due and the actual average amount due to the actual payment to the average amount due to the actual payment as the average subsidy rate.
31. An electronic device, comprising: a processor, a memory and a bus, the memory storing machine readable instructions executable by the processor, the processor and the memory communicating via the bus when the electronic device is operating, the machine readable instructions when executed by the processor performing the method of predicting user churn probability according to any of claims 1 to 15.
32. A computer-readable storage medium, having stored thereon a computer program which, when executed by a processor, performs a user churn probability prediction method according to any one of claims 1 to 15.
CN201811125784.5A 2018-09-26 2018-09-26 User loss probability prediction method and device Pending CN110956296A (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN201811125784.5A CN110956296A (en) 2018-09-26 2018-09-26 User loss probability prediction method and device

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN201811125784.5A CN110956296A (en) 2018-09-26 2018-09-26 User loss probability prediction method and device

Publications (1)

Publication Number Publication Date
CN110956296A true CN110956296A (en) 2020-04-03

Family

ID=69964698

Family Applications (1)

Application Number Title Priority Date Filing Date
CN201811125784.5A Pending CN110956296A (en) 2018-09-26 2018-09-26 User loss probability prediction method and device

Country Status (1)

Country Link
CN (1) CN110956296A (en)

Cited By (7)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN111639814A (en) * 2020-06-02 2020-09-08 贝壳技术有限公司 Method, apparatus, medium, and electronic device for predicting occurrence probability of fluctuating behavior
CN111861092A (en) * 2020-05-21 2020-10-30 北京骑胜科技有限公司 Parking area risk identification method and device, electronic equipment and storage medium
CN112381338A (en) * 2021-01-14 2021-02-19 北京新唐思创教育科技有限公司 Event probability prediction model training method, event probability prediction method and related device
CN112417267A (en) * 2020-10-10 2021-02-26 腾讯科技(深圳)有限公司 User behavior analysis method and device, computer equipment and storage medium
CN112686448A (en) * 2020-12-31 2021-04-20 重庆富民银行股份有限公司 Loss early warning method and system based on attribute data
CN112862527A (en) * 2021-02-04 2021-05-28 北京嘀嘀无限科技发展有限公司 User type determination method, device, equipment and storage medium
CN111861092B (en) * 2020-05-21 2024-05-14 北京骑胜科技有限公司 Parking area risk identification method and device, electronic equipment and storage medium

Cited By (9)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN111861092A (en) * 2020-05-21 2020-10-30 北京骑胜科技有限公司 Parking area risk identification method and device, electronic equipment and storage medium
CN111861092B (en) * 2020-05-21 2024-05-14 北京骑胜科技有限公司 Parking area risk identification method and device, electronic equipment and storage medium
CN111639814A (en) * 2020-06-02 2020-09-08 贝壳技术有限公司 Method, apparatus, medium, and electronic device for predicting occurrence probability of fluctuating behavior
CN112417267A (en) * 2020-10-10 2021-02-26 腾讯科技(深圳)有限公司 User behavior analysis method and device, computer equipment and storage medium
CN112686448A (en) * 2020-12-31 2021-04-20 重庆富民银行股份有限公司 Loss early warning method and system based on attribute data
CN112686448B (en) * 2020-12-31 2024-02-13 重庆富民银行股份有限公司 Loss early warning method and system based on attribute data
CN112381338A (en) * 2021-01-14 2021-02-19 北京新唐思创教育科技有限公司 Event probability prediction model training method, event probability prediction method and related device
CN112381338B (en) * 2021-01-14 2021-07-27 北京新唐思创教育科技有限公司 Event probability prediction model training method, event probability prediction method and related device
CN112862527A (en) * 2021-02-04 2021-05-28 北京嘀嘀无限科技发展有限公司 User type determination method, device, equipment and storage medium

Similar Documents

Publication Publication Date Title
CN110956296A (en) User loss probability prediction method and device
US11386359B2 (en) Systems and methods for managing a vehicle sharing facility
US11162803B2 (en) Providing alternative routing options to a rider of a transportation management system
JP6655939B2 (en) Transport service reservation method, transport service reservation device, and transport service reservation program
US11392861B2 (en) Systems and methods for managing a vehicle sharing facility
US8738289B2 (en) Advanced routing of vehicle fleets
WO2019136341A1 (en) Systems and methods for managing and scheduling ridesharing vehicles
US20180314998A1 (en) Resource Allocation in a Network System
US20220120572A9 (en) Real-time ride sharing solutions for unanticipated changes during a ride
US20150339595A1 (en) Method and system for balancing rental fleet of movable asset
GB2535718A (en) Resource management
DE112018007300T5 (en) ROUTE GUIDANCE WITH PERCEPTION OF THE SURROUNDINGS
JP5273106B2 (en) Traffic flow calculation device and program
JP7078357B2 (en) Distribution device, distribution method and distribution program
US20130290056A1 (en) Schedule optimisation
WO2016135650A1 (en) A system and method of calculating a price for a vehicle journey
CN108806249B (en) Passenger trip optimization method based on bus APP software
CN111861643A (en) Riding position recommendation method and device, electronic equipment and storage medium
AU2018217973A1 (en) Dynamic selection of geo-based service options in a network system
CN111932341A (en) Method and system for determining car pooling order
US20220229442A9 (en) Accounting for driver reaction time when providing driving instructions
CN116663811A (en) Scheduling matching method and device for reciprocating dynamic carpooling of inter-city passenger transport
CN110766492A (en) Order information processing method, device and equipment
JP2007207077A (en) Vehicle allocation information provision system and vehicle allocation reservation server
Xu et al. Quantifying the competitiveness of transit relative to taxi with multifaceted data

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination
RJ01 Rejection of invention patent application after publication

Application publication date: 20200403

RJ01 Rejection of invention patent application after publication