CN111784049A

CN111784049A - Passenger loss time prediction method and device

Info

Publication number: CN111784049A
Application number: CN202010620852.6A
Authority: CN
Inventors: 赵耀帅; 吴格; 冯迪; 杨程屹; 李忠虎
Original assignee: China Travelsky Holding Co
Current assignee: China Travelsky Technology Co Ltd; China Travelsky Holding Co
Priority date: 2020-06-30
Filing date: 2020-06-30
Publication date: 2020-10-16

Abstract

According to the method and the device for predicting the passenger loss time, the passenger information of passengers in delayed flights and flight associated information of the delayed flights can be obtained; extracting at least one target characteristic information from the passenger information and the flight associated information; inputting at least one target characteristic information into a pre-trained loss prediction model to obtain a loss prediction result output by the pre-trained loss prediction model; and when the loss prediction result is loss, inputting at least one target characteristic information into a pre-trained loss time prediction model to obtain the loss time of the passenger predicted by the pre-trained loss time prediction model. The method can use the passenger information of the passengers in the delayed flights and the target characteristic information extracted from the flight related information of the delayed flights to determine whether the passengers are lost through a loss prediction model and predict the loss time of the passengers through a loss time prediction model so as to facilitate the arrangement and deployment of the delayed flights by an airline company.

Description

Passenger loss time prediction method and device

Technical Field

The invention relates to the technical field of information processing, in particular to a method and a device for predicting passenger loss time.

Background

With the gradual improvement of public infrastructure in China and the increasing of living standard of people, more and more passengers select civil aviation flights to go out. When the departure time of the civil aviation flight taken by the passenger is delayed, the civil aviation flight is a delayed flight, and at the moment, the passenger may choose to continue waiting for the delayed flight or cancel taking the delayed flight by means of changing labels, refunding tickets and the like according to the travel requirement of the passenger.

Currently, airlines consider a passenger who cancels the delayed flight by changing his/her ticket or refunding the ticket as a lost passenger. In order to save aviation fuel and increase the boarding rate of delayed flights, airlines need to readjust the departure time and model number of delayed flights according to the number of passengers lost in each time phase. For example: the model of the passenger aircraft that delayed the flight is adjusted from a model a320-300 with a seat number 289 to a model a320-200 with a seat number 158. Since readjusting the departure time of the delayed flight and the model of the passenger aircraft also take a certain amount of time to arrange and deploy, the airline needs to predict in advance the churn time of each passenger in the delayed flight that may be churned.

Disclosure of Invention

In view of the above problems, the present invention provides a method and an apparatus for predicting passenger churn time, which overcomes or at least partially solves the above problems, and the technical solution is as follows:

a method of predicting passenger churn time, comprising:

obtaining passenger information of passengers in delayed flights and flight association information of the delayed flights;

extracting at least one target characteristic information from the passenger information and the flight associated information;

inputting the at least one target characteristic information into a pre-trained loss prediction model to obtain a loss prediction result output by the pre-trained loss prediction model;

and when the loss prediction result is loss, inputting the at least one target characteristic information into a pre-trained loss time prediction model to obtain the loss time of the passenger predicted by the pre-trained loss time prediction model.

A passenger churn time prediction apparatus comprising: an information obtaining unit, an information extracting unit, a loss prediction result obtaining unit and a loss time obtaining unit,

the information acquisition unit is used for acquiring passenger information of passengers in delayed flights and flight associated information of the delayed flights;

the information extraction unit is used for extracting at least one piece of target characteristic information from the passenger information and the flight related information;

the loss prediction result obtaining unit is used for inputting the at least one target characteristic information into a pre-trained loss prediction model to obtain a loss prediction result output by the pre-trained loss prediction model;

the churn time obtaining unit is configured to, when the churn prediction result obtained by the churn prediction result obtaining unit is churn, input the at least one target feature information into a pre-trained churn time prediction model to obtain churn time of the passenger predicted by the pre-trained churn time prediction model.

By means of the technical scheme, the method and the device for predicting the passenger loss time can obtain the passenger information of passengers in delayed flights and the flight associated information of the delayed flights; extracting at least one target characteristic information from the passenger information and the flight associated information; inputting at least one target characteristic information into a pre-trained loss prediction model to obtain a loss prediction result output by the pre-trained loss prediction model; and when the loss prediction result is loss, inputting at least one target characteristic information into a pre-trained loss time prediction model to obtain the loss time of the passenger predicted by the pre-trained loss time prediction model. The method can determine whether the passengers are lost through a loss prediction model and predict the loss time of the passengers through a loss time prediction model by using the passenger information of the passengers in the delayed flights and the target characteristic information extracted from the flight related information of the delayed flights. After obtaining the loss time, the airline can schedule and deploy delayed flights in time according to the loss time, so as to improve the boarding rate and effectively reduce the average energy consumption for transporting each passenger.

The foregoing description is only an overview of the technical solutions of the present invention, and the embodiments of the present invention are described below in order to make the technical means of the present invention more clearly understood and to make the above and other objects, features, and advantages of the present invention more clearly understandable.

Drawings

Various other advantages and benefits will become apparent to those of ordinary skill in the art upon reading the following detailed description of the preferred embodiments. The drawings are only for purposes of illustrating the preferred embodiments and are not to be construed as limiting the invention. Also, like reference numerals are used to refer to like parts throughout the drawings. In the drawings:

fig. 1 is a schematic flow chart illustrating a method for predicting passenger churn time according to an embodiment of the present disclosure;

FIG. 2 is a schematic flow chart diagram illustrating another method for predicting passenger churn time according to an embodiment of the present disclosure;

FIG. 3 is a flow chart illustrating another method for predicting passenger churn time according to an embodiment of the present disclosure;

FIG. 4 is a flow chart illustrating another method for predicting passenger churn time according to an embodiment of the present disclosure;

FIG. 5 is a schematic diagram illustrating a training process of an elapsed time prediction model according to an embodiment of the present disclosure;

FIG. 6 is a flow chart illustrating another method for predicting passenger churn time according to an embodiment of the present disclosure;

fig. 7 is a schematic structural diagram illustrating a passenger churn time prediction apparatus according to an embodiment of the present disclosure;

fig. 8 is a schematic structural diagram illustrating another passenger churn time prediction apparatus provided by the embodiment of the present disclosure;

fig. 9 shows a schematic composition diagram of an electronic device provided in an embodiment of the present disclosure.

Detailed Description

Embodiments of the present disclosure will be described in more detail below with reference to the accompanying drawings. While certain embodiments of the present disclosure are shown in the drawings, it is to be understood that the present disclosure may be embodied in various forms and should not be construed as limited to the embodiments set forth herein, but rather are provided for a more thorough and complete understanding of the present disclosure. It should be understood that the drawings and embodiments of the disclosure are for illustration purposes only and are not intended to limit the scope of the disclosure.

The term "include" and variations thereof as used herein are open-ended, i.e., "including but not limited to". The term "based on" is "based, at least in part, on". The term "one embodiment" means "at least one embodiment"; the term "another embodiment" means "at least one additional embodiment"; the term "some embodiments" means "at least some embodiments". Relevant definitions for other terms will be given in the following description.

It should be noted that the terms "first", "second", and the like in the present disclosure are only used for distinguishing different devices, modules or units, and are not used for limiting the order or interdependence relationship of the functions performed by the devices, modules or units.

It is noted that references to "a", "an", and "the" modifications in this disclosure are intended to be illustrative rather than limiting, and that those skilled in the art will recognize that "one or more" may be used unless the context clearly dictates otherwise.

As shown in fig. 1, a method for predicting passenger churn time provided by an embodiment of the present disclosure may include:

s100, obtaining passenger information of passengers in delayed flights and flight related information of the delayed flights.

The delayed flight can refer to a delayed departure flight, and specifically refers to a flight with the actual departure time of the wheel gear withdrawing later than the planned departure time and longer than the preset time. According to the flight normal management regulation formally implemented from 1 month to 1 day in 2017, the preset time length is 15 minutes. It will be appreciated that the criterion for delaying flight may be set by the relevant company, part or organization according to the actual situation. In general, the criterion for delaying flight in the embodiment of the present disclosure is defined as "flight departure delay" in "flight normal management regulation".

Optionally, the embodiment of the present disclosure may store the passenger information of each passenger and the flight association information of each flight in the database in advance. Optionally, the database for storing the passenger information and the database for storing the flight related information may be the same or different. Optionally, according to the destination information of the delayed flight, the embodiment of the disclosure may query, in the database, passenger information and flight-related information of the passenger corresponding to the destination information. Alternatively, the destination information may be a flight number. For example: according to the flight number of the delayed flight, the disclosed embodiment can inquire the passenger information of the passenger on the delayed flight in the database, and inquire the flight associated information of the delayed flight in the database.

Optionally, the passenger information includes: at least one of a Passenger's seat reservation Record (PNR), Passenger ticket information, Passenger personal information, and Passenger historical behavior information.

The passenger booking record mainly comprises information such as name group, flight segment group and group condition. Optionally, the passenger reservation record may include: PNR number and passenger number.

Wherein, the passenger ticket information may include: at least one information of the ticket number, the ticket state, the ticket booking date, the ticket booking time, the ticket drawing date and the ticket drawing time. Wherein the ticket status may include at least one of an un-ticketed, a pending ticket, a change, a normal ticket, a refunded ticket, and a partially refunded ticket.

The personal information of the passenger can comprise personal information registered on a ticket purchasing website and/or a ticket purchasing application program of the passenger and related information generated when the passenger is on duty. For example, the passenger personal information may include the passenger's age, gender, nationality, birth year and month, occupation, height, weight, constellation, family member status, check-in date, check-in time, check-in mode, number of pieces of baggage, and baggage weight, etc. It will be appreciated that the passenger personal information may include some of the information in the passenger subscription record, such as: the name, identification number, age, etc. of the passenger.

The passenger historical behavior information may include: the number of times the passenger traveled by the civil aviation during the first history period, the number of times the passenger went out of line during the airline delay during the second history period, the number of times the passenger went out of line during the airline delay during the third history period, the number of times the passenger refunded and/or changed sign and/or changed out during the fourth history period, and the number of times the passenger took the delayed flight and/or took the flight of the airline to which the delayed flight belongs during the fifth history period.

Alternatively, the history period may be a period from a certain history date to a specified date. For example: the first history period may be 1/2020 to 5/2020/5. In practical application of the embodiment of the present disclosure, the specified date may be the current date, or may be another date specified by the relevant person. Note that the specified date is after the history date. Alternatively, the first history period, the second history period, the third history period, the fourth history period, and the fifth history period may be the same.

Specifically, the embodiment of the present disclosure may count the number of times that the passenger travels by civil aviation in the database in which the passenger information is stored. Optionally, in the embodiment of the present disclosure, all historical travel records of the passenger in the first historical period may be firstly obtained by querying in the database, and then the first number of all historical travel records of the passenger in the first historical period is counted, where the first number is the number of times that the passenger travels by civil aviation in the first historical period. For example: in the embodiment of the present disclosure, each historical travel record of a passenger in a historical period from 1/2020 to 25/4/2020 of the passenger on the current date of the passenger on a flight trip in 25/4/2020 may be searched in a database, and statistics may be performed on the historical travel records in the historical period, where the statistical result is the number of times that the passenger travels by civil aviation in the historical period. The historical trip record may be a record of traffic activity of passengers moving from origin to destination over civil airline flights. Optionally, embodiments of the present disclosure may obtain historical travel records from passenger reservation records.

Optionally, the historical travel record may include a flight record with a corresponding ticket status of an in-line status and a flight record with an out-of-line status, wherein the out-of-line status includes: at least one of change of ticket, change out, refunded ticket and voided. Embodiments of the present disclosure may determine other ticket states, other than the out-of-line state, as in-line states.

Specifically, the embodiment of the disclosure may count the number of times that the passenger has left the line in the flight delay in the database in which the passenger information is stored. Optionally, in the embodiment of the present disclosure, all historical itinerary records of the passenger in the second historical period may be obtained by querying in the database, then the historical delayed flight records in which the flight status is delayed (including "flight delay" and "flight departure delay" specified in "flight normal management regulation") in each obtained historical itinerary record may be determined, then a second number of the historical delayed flight records in which the final status of the passenger ticket corresponding to each historical delayed flight record is in the non-in-line status may be determined, where the second number is the number of times that the passenger goes out of line in the second historical period due to the flight delay.

Specifically, the embodiment of the disclosure may count the number of times that the passenger went down in a line by mistake in the flight delay in the database in which the passenger information is stored. Optionally, in the embodiment of the present disclosure, all historical itinerary records of the traveler in the third historical period may be obtained by querying in the database, then the historical delayed flight records in which the flight status is delayed (including "flight delay" and "flight departure delay" specified in "flight normal management regulation") in each obtained historical itinerary record may be determined, then a third number of the historical delayed flight records in which the final status of the passenger ticket corresponding to each historical delayed flight record is in the in-line status may be determined, where the third number is the number of times that the traveler makes in a line with the flight delay in the third historical period.

Specifically, the disclosed embodiment can count the number of times of passenger ticket refunding and/or ticket change and/or exchange in the database in which the passenger information is stored. Optionally, the embodiment of the present disclosure may first query the database to obtain all the historical travel records of the passenger in the fourth historical period, and then determine, in each historical travel record, that the corresponding passenger ticket status is the fourth number of the historical travel records that have been refunded and/or changed, where the fourth number is the number of times that the passenger refunded and/or changed in the fourth historical period.

Specifically, the disclosed embodiment may count the number of times that the traveler takes the delayed flight and/or the number of times that the traveler takes the flight of the airline company to which the delayed flight belongs, in a database in which the traveler information is stored. Optionally, in the embodiment of the present disclosure, all historical itinerary records of the traveler in a fifth historical period may be firstly obtained through query in the database, and then a fifth number of historical itinerary records of flights in each historical itinerary record that are the delayed flights and/or the flights of the airline company to which the delayed flights belong is determined, where the fifth number is the number of times that the traveler takes the delayed flights and/or the number of times that the traveler takes the flights of the airline company to which the delayed flights in the fifth historical period.

In practical cases, the passenger information of one passenger may be stored in different systems or databases, so that the embodiments of the present disclosure may collect the passenger information of the passenger in multiple systems or databases, thereby obtaining the passenger information of the passenger.

Optionally, the flight association information includes: at least one of flight information, associated airport information, associated weather information, and associated airline department information.

Specifically, the flight information may include: flight number, airline, origin airport, destination airport, and planned departure time.

Specifically, the associated airport information may include: at least one of the arrival flight delay number, departure flight delay number, arrival flight average waypoint rate, departure flight average waypoint rate, arrival flight average delay time, departure flight average delay time, flight average cancellation rate, flight average attendance rate, number of detained passengers, number of backlogs of luggage, number of refunds of passengers, and number of commutes of passengers of the departure airport and/or the destination airport.

Specifically, the associated weather information may include: weather information for the origin airport and/or the destination airport and/or the airline. For example: wind, cloud, rain, snow, frost, dew, rainbow, halo, lightning, thunder, and the like.

Specifically, the associated navigation information may include: at least one of the number of delayed arriving flights, the number of delayed departing flights, the average arrival point rate of arriving flights, the average arrival point rate of departing flights, the average delay duration of arriving flights, the average delay duration of departing flights, the average cancellation rate of flights, the average arrival rate of flights, the number of passengers detained, the number of backlogs of luggage, the number of refunds of passengers and the number of passengers to change over.

In practical cases, the flight associated information is updated in real time, and the flight associated information may be recorded in different systems or databases, so that the embodiment of the disclosure may collect the flight associated information in multiple systems or databases to obtain the flight associated information.

S200, extracting at least one target feature information from the passenger information and the flight related information.

Optionally, the embodiment of the disclosure may use, as the target feature information, information that can directly reflect the true potential trend of the data in the passenger information and the flight related information. On the premise that information capable of directly reflecting real data potential trends in the passenger information and flight associated information is regarded as original data, the original data is target characteristic information.

Optionally, the embodiment of the disclosure may process, using a feature engineering, information that cannot directly reflect a true data potential trend in the passenger information and the flight related information, to obtain the target feature information. On the premise that the information which cannot directly reflect the potential trend of the real data in the passenger information and the flight related information is regarded as the original data, the original data is not the target characteristic information. Feature engineering refers to the process of converting data attributes of raw data into data features, where the data attributes represent all dimensions of the data. In the case that the raw data cannot directly reflect the potential trend of the real data, the embodiment of the disclosure may use feature engineering to convert the raw data to obtain the target feature information. For example: the destination airport in the flight information belongs to information which can not directly reflect the real data potential trend, the disclosed embodiment can distinguish whether the destination airport is in the country or abroad through characteristic engineering, and the obtained target characteristic information comprises the characteristics of whether the destination airport is in the country or abroad. Compared with original data, the target characteristic information can better reflect the real potential trend of the data, and the prediction accuracy can be improved by predicting whether the passenger is lost and determining the loss time according to the target characteristic information.

It is understood that the related art may set, in advance, information capable of directly reflecting the actual data latent tendency and information incapable of directly reflecting the actual data latent tendency.

The expression form of the target characteristic information may be a characteristic matrix.

Optionally, based on the method shown in fig. 1, as shown in fig. 2, in another method for predicting passenger churn time provided by the embodiment of the present disclosure, step S200 may include:

s210, converting preset information in the passenger information and the flight related information through a characteristic project to obtain at least one target characteristic information.

The feature engineering may include: at least one of a barrel processing method and a feature extraction processing method. Optionally, the feature extraction processing method may be one-hot encoding.

The preset information may be information that cannot directly reflect the potential trend of the real data. The embodiment of the disclosure can be used for any information in the information which can not directly reflect the real data potential trend: presetting at least one characteristic project comprises converting the information. For example: the departure airport, the destination airport and the planned departure time in the flight information are all information which cannot directly reflect real data potential trends.

Optionally, the preset information may include discrete category information and continuous numerical information. The embodiment of the disclosure can convert discrete category information by using a feature extraction processing method. The embodiment of the disclosure can convert continuous numerical information by using a barrel processing method. For example: the embodiments of the present disclosure may be implemented according to continuous numerical information: age, converted to juvenile/young/middle-aged/elderly using split-barrel treatment.

S300, inputting the at least one target characteristic information into a pre-trained loss prediction model to obtain a loss prediction result output by the pre-trained loss prediction model.

Specifically, the attrition prediction model may be a classification model or a regression model. The classification model may include: at least one of a k-nearest neighbor classification algorithm (kNN) model, a Linear Discriminant Analysis (LDA) model, and a Random Forest (RF) model. The regression model may include: at least one of a Linear Regression (LR) model, a Polynomial Regression (PR) model, and an Elastic Network Regression (ENR) model.

Optionally, based on the method shown in fig. 1, as shown in fig. 3, in another method for predicting passenger churn time provided by the embodiment of the present disclosure, the churn prediction model may be a random forest model, and step S300 may include:

s310, inputting the at least one target feature information into a pre-trained random forest model, so that each decision tree in the random forest model is classified and judged according to the at least one target feature information, classification results output by each decision tree are obtained, and loss prediction results are determined according to the classification results output by each decision tree.

It is understood that embodiments of the present disclosure may train random forest models in advance. Specifically, the embodiment of the disclosure may train the random forest model using a first feature information training set. The first characteristic information training set comprises at least one piece of target characteristic information extracted from passenger information of at least one passenger in at least one delayed historical flight and flight association information of the delayed historical flight.

Optionally, the training process of the random forest model may include: the disclosed embodiment may first randomly and replaceably select at least one target feature information corresponding to one passenger from the first feature information training set. For the selected passenger: training a decision tree by using at least one target characteristic information corresponding to the passenger: firstly, when the nodes of the decision tree are split, randomly selecting a preset number of target characteristic information from the at least one target characteristic information, wherein the preset number is smaller than the total number of the target characteristic information in the at least one target characteristic information. The node is then split using the predetermined attribute type as a partition basis. It is noted that no pruning is performed during the splitting of the decision tree.

Optionally, in the embodiment of the present disclosure, the target feature information corresponding to each passenger in the first feature information training set may be used as a sample, and the sample is distinguished according to a preset attribute type, for example: the attribute types include a passenger churn type and a passenger non-churn type, and the embodiment of the disclosure can distinguish the attribute type of a certain sample from the passenger churn type or the passenger non-churn type. The disclosed embodiments may assign numbers to the attribute types. For example: assuming that the attribute types include a passenger churn type and a passenger non-churn type, the number corresponding to the passenger churn type may be 1, and the number corresponding to the passenger non-churn type may be 2.

Generally, the larger the information gain corresponding to the attribute type is, the higher the purity of splitting the node by using the attribute type as a division basis is. Therefore, the embodiment of the present disclosure may determine the information gain corresponding to each attribute type, and select the attribute type with the largest information gain as the division basis for splitting the node.

Optionally, the determining process of the information gain corresponding to the attribute type may include: determining the information entropy of the attribute type for dividing the first characteristic information training set according to the proportion of each type of sample in the attribute type in the first characteristic information training set; and determining the information gain of the attribute type for dividing the first characteristic information training set according to the information entropy.

Specifically, the information entropy of the attribute type dividing the first feature information training set may be:

wherein D is a first feature information training set, Encopy (D) is an information entropy for dividing the first feature information training set for the attribute type, k is a number corresponding to each type included in the attribute type, M is the number of types included in the attribute type, p is_kThe samples of the type numbered k account for the proportion of the first feature information training set.

Specifically, the information gain for determining, according to the information entropy, that the first feature information training set is divided according to the attribute type may be:

wherein | D | is the number of samples in the first characteristic information training set, and a is the attribute type; v is the number of subsets into which D is divided according to a; i.e. divide D into V subsets according to a: d¹，D²，...，D^V；D^vFor samples of the attribute type in DThe number of (2); gain (D, a) is an information Gain for dividing the first feature information training set according to the attribute type.

And obtaining a plurality of decision trees according to the steps, and constructing a random forest model by the decision trees.

It can be understood that, in the embodiment of the present disclosure, a feature information verification set may also be used to detect the constructed random forest model, and optimize the random forest model according to a detection result, where the feature information verification set includes at least one target feature information extracted from passenger information of at least one passenger in at least one delayed historical flight and flight association information of the delayed historical flight, and the at least one target feature information is labeled with a churn identifier or an unretired identifier.

S400, when the loss prediction result is loss, inputting the at least one target characteristic information into a pre-trained loss time prediction model to obtain the loss time of the passenger predicted by the pre-trained loss time prediction model.

The lapsed time may be a time for the traveler to refund and/or change and/or swap the delayed flight after the scheduled departure time of the delayed flight. In general, the pre-trained churn time prediction model may predict a time period during which the passenger will perform operations such as refund and/or change and/or swap after the planned departure time of the delayed flight, and determine the churn time of the passenger in conjunction with the planned departure time and the time period. For ease of understanding, the description is made herein by way of example: assuming that the planned departure time of a delayed flight is 17:00, the pre-trained churn time prediction model predicts that a passenger will refund and/or change his/her ticket and/or exchange for another passenger 36 minutes after the planned departure time, and the pre-trained churn time prediction model predicts that the passenger's churn time is 17: 36.

Specifically, based on the method shown in fig. 1, as shown in fig. 4, in another method for predicting passenger churn time provided by the embodiment of the present disclosure, step S400 may include:

s410, inputting the at least one target characteristic information into a pre-trained loss time prediction model so that the loss time prediction model determines the prediction loss probability of the passenger in each preset time interval, and predicting the loss time of the passenger according to the prediction loss probability of the passenger in each preset time interval.

It is understood that the embodiments of the present disclosure may train the elapsed time prediction model in advance. Specifically, the embodiments of the present disclosure may train the attrition time prediction model using the second feature information training set. The second characteristic information training set comprises at least one target characteristic information extracted from passenger information of at least one passenger in at least one delayed historical flight and flight association information of the delayed historical flight, wherein the passenger information of the passenger is determined to be out of the flight through an outage prediction model.

Optionally, as shown in fig. 5, a training process of the lapsed time prediction model provided in the embodiment of the present disclosure may include:

and S01, determining the real mark distribution of the target characteristic information corresponding to each passenger in the second characteristic information training set in the time interval.

Specifically, the embodiment of the present disclosure provides that for each passenger in the second feature information training set: dividing the time length from the planned departure time of the delayed flight corresponding to the passenger to the actual departure time of the wheel gear withdrawal into at least two time intervals; determining the time interval of the loss time of the passenger as the real loss time interval; and performing discrete Gaussian distribution processing by taking the real loss time interval as a center to determine the real mark distribution of the target characteristic information corresponding to the passenger on the time interval.

Specifically, the target feature information corresponding to any passenger is in the time interval y_jTrue mark distribution on:

wherein j is the number of the time interval; y is_jIs the jth time interval; x is target characteristic information corresponding to any passenger in the second characteristic information training set;

the target characteristic information corresponding to the passenger in the second characteristic information training set is in the time interval y_jCorresponding real mark distribution, namely the target characteristic information corresponding to the passenger in the second characteristic information training set in the time interval y_jCorresponding real loss probability, sigma is the standard deviation of discrete Gaussian distribution, α is the real loss time interval corresponding to the loss time of the passenger in the second characteristic information training set, G is the normalization parameter, so that

Namely, it is

And N is the number of preset time intervals.

And S02, determining the distribution of the predicted mark of the target characteristic information corresponding to each passenger in the second characteristic information training set in the time interval.

Specifically, for any given X ∈ X, Y ∈ Y can be output with probability P (Y | X), the method is converted into an unconstrained optimization problem through a Lagrangian multiplier method, and a Lagrangian multiplier w is introduced₀，w₁，...，w_nThen can obtain

Wherein X is target feature information corresponding to any passenger in the second feature information training set, X is target feature information corresponding to all passengers in the second feature information training set, Y is a pre-divided whole time interval, Y is a time interval in the pre-divided whole time interval, and P (Y | X) is a prediction mark distribution corresponding to the target feature information corresponding to the passenger in the time interval Y, namely a prediction loss probability of the passenger in the time interval Y; i is the serial number of the passenger in the second characteristic information training set; f. of_i(x, y) is a characteristic function of the maximum entropy corresponding to the target characteristic information corresponding to the passenger with the serial number i in the time interval y; n is second characteristic information trainingThe number of passengers concentrated;

due to ∑_yP (y | x) ═ 1, given:

wherein Z is_w(x) In order to normalize the factors, the method comprises the steps of,

because in practical application, the class label is not relied on, the above formula can be further:

wherein, g_i(x) The characteristic value of the target characteristic information corresponding to the passenger with the serial number i.

And S03, training the maximum entropy model by taking the minimum KL divergence (Kullback-Leibler divergence) of the real mark distribution and the prediction mark distribution of the target feature information corresponding to each passenger in the second feature information training set in the time interval as a target, determining the parameters of the maximum entropy model, and further obtaining the loss time prediction model.

Specifically, the minimum KL divergence (Kullback-Leibler divergence) of the real marker distribution and the predicted marker distribution of the target feature information corresponding to each passenger in the second feature information training set in the time interval is the target:

parameter w of maximum entropy model^*Comprises the following steps:

optionally, the determining the passenger churn time according to the passenger churn probability in each preset time interval includes:

wherein T is the passenger's lapsed time, N is the number of the preset time intervals, j is the number of the time intervals, y_jIs the jth time interval, wherein the time interval y_j＝[t_j，t_j+1)，t_jAnd t_j+1Is a time interval y_jTwo end points of (a), t_jAnd t_j+1Respectively represent time intervals y_jThe interval duration of the planned takeoff time of the delayed flight is internally distant, x is the target characteristic information of the passenger, P_w(y_j| x) is the passenger in time interval y_jIntra prediction run-off probability.

It is understood that when the churn prediction result is that the passenger is not missing, the passenger is not considered to perform operations such as ticket refunding, and/or ticket change and/or swapping on the delayed flight, and therefore the churn time of the passenger may not be predicted. Optionally, after determining that the churn prediction result of one passenger is not churned, the embodiments of the invention may determine a churn prediction result of another passenger.

According to the passenger loss time prediction method provided by the disclosure, the passenger information of passengers in delayed flights and flight associated information of the delayed flights can be obtained; extracting at least one target characteristic information from the passenger information and the flight associated information; inputting at least one target characteristic information into a pre-trained loss prediction model to obtain a loss prediction result output by the pre-trained loss prediction model; and when the loss prediction result is loss, inputting at least one target characteristic information into a pre-trained loss time prediction model to obtain the loss time of the passenger predicted by the pre-trained loss time prediction model. The method can use the passenger information of passengers in delayed flights and the target characteristic information extracted from the flight related information of the delayed flights to determine whether the passengers are lost through the loss prediction model and predict the loss time of the passengers through the loss time prediction model, and after the loss time is obtained, an airline company can arrange and deploy the delayed flights in time according to the loss time of the passengers, so that the attendance rate is improved, and the average energy consumption for transporting each passenger is effectively reduced.

It can be understood that after the airline schedules and deploys the delayed flights according to the lapsed time of the passengers, the attendance rate of the delayed flights can be significantly increased, so that the energy consumption of the delayed flights per capita is reduced.

Although the operations are depicted in a particular order, this should not be understood as requiring that such operations be performed in the particular order shown or in sequential order. Under certain circumstances, multitasking and parallel processing may be advantageous.

It should be understood that the various steps recited in the method embodiments of the present disclosure may be performed in a different order, and/or performed in parallel. Moreover, method embodiments may include additional steps and/or omit performing the illustrated steps. The scope of the present disclosure is not limited in this respect.

Optionally, based on the method shown in fig. 1, as shown in fig. 6, another method for predicting passenger churn time provided in the embodiment of the present disclosure may further include:

s500, determining the loss quantity of the passengers with the loss time in the delayed flight being earlier than the preset target time.

Wherein the preset target time may be after the scheduled departure time of the delayed flight.

S600, subtracting the total number of passengers in the delayed flight from the loss number, and determining the number of passengers required by the delayed flight in the preset target time.

According to the embodiment of the invention, the number of the passengers lost between the scheduled departure time of the delayed flight and the preset target time is calculated, so that the number of the passengers required by the flight at the preset target time can be accurately obtained, and then the airline company can arrange and deploy the delayed flight in time according to the number of the passengers required by the flight at the preset target time, so that the attendance rate is improved, and the average energy consumption for transporting each passenger is effectively reduced.

Corresponding to the above method embodiment, an embodiment of the present disclosure further provides a device for predicting passenger churn time, where the structure of the device is shown in fig. 7, and the device may include: an information obtaining unit 100, an information extracting unit 200, a attrition prediction result obtaining unit 300, and an attrition time obtaining unit 400.

The information obtaining unit 100 is configured to obtain passenger information of passengers on delayed flights and flight related information of the delayed flights.

Optionally, the embodiment of the present disclosure may store the passenger information of each passenger and the flight association information of each flight in the database in advance. Optionally, the database for storing the passenger information and the database for storing the flight related information may be the same or different. Optionally, according to the destination information of the delayed flight, the embodiment of the disclosure may query, in the database, passenger information and flight-related information of the passenger corresponding to the destination information. Alternatively, the destination information may be a flight number.

Alternatively, the history period may be a period from a certain history date to a specified date. In practical application of the embodiment of the present disclosure, the specified date may be the current date, or may be another date specified by the relevant person. Note that the specified date is after the history date. Alternatively, the first history period, the second history period, the third history period, the fourth history period, and the fifth history period may be the same.

Specifically, the embodiment of the present disclosure may count the number of times that the passenger travels by civil aviation in the database in which the passenger information is stored. Optionally, in the embodiment of the present disclosure, all historical travel records of the passenger in the first historical period may be firstly obtained by querying in the database, and then the first number of all historical travel records of the passenger in the first historical period is counted, where the first number is the number of times that the passenger travels by civil aviation in the first historical period. The historical trip record may be a record of traffic activity of passengers moving from origin to destination over civil airline flights. Optionally, embodiments of the present disclosure may obtain historical travel records from passenger reservation records.

The information extraction unit 200 is configured to extract at least one target feature information from the traveler information and the flight related information.

Optionally, the embodiment of the disclosure may process, using a feature engineering, information that cannot directly reflect a true data potential trend in the passenger information and the flight related information, to obtain the target feature information. On the premise that the information which cannot directly reflect the potential trend of the real data in the passenger information and the flight related information is regarded as the original data, the original data is not the target characteristic information. Feature engineering refers to the process of converting data attributes of raw data into data features, where the data attributes represent all dimensions of the data. In the case that the raw data cannot directly reflect the potential trend of the real data, the embodiment of the disclosure may use feature engineering to convert the raw data to obtain the target feature information. Compared with original data, the target characteristic information can better reflect the real potential trend of the data, and the prediction accuracy can be improved by predicting whether the passenger is lost and determining the loss time according to the target characteristic information.

Optionally, the information extraction unit 200 is specifically configured to convert preset information in the passenger information and the flight related information through a feature engineering to obtain at least one target feature information.

The preset information may be information that cannot directly reflect the potential trend of the real data. The embodiment of the disclosure can be used for any information in the information which can not directly reflect the real data potential trend: presetting at least one characteristic project comprises converting the information.

Optionally, the preset information may include discrete category information and continuous numerical information. The embodiment of the disclosure can convert discrete category information by using a feature extraction processing method. The embodiment of the disclosure can convert continuous numerical information by using a barrel processing method.

The attrition prediction result obtaining unit 300 is configured to input the at least one target feature information into a pre-trained attrition prediction model, and obtain an attrition prediction result output by the pre-trained attrition prediction model.

Optionally, the runoff prediction model is a random forest model, and the runoff prediction result obtaining unit 300 is specifically configured to input the at least one target feature information into a pre-trained random forest model, so that each decision tree included in the random forest model performs classification and determination according to the at least one target feature information, obtain a classification result output by each decision tree, and determine the runoff prediction result according to the classification result output by each decision tree.

The churn time obtaining unit 400 is configured to, when the churn prediction result obtained by the churn prediction result obtaining unit 300 is churn, input the at least one target feature information into the pre-trained churn time prediction model to obtain the churn time of the passenger predicted by the pre-trained churn time prediction model.

The lapsed time may be a time for the traveler to refund and/or change and/or swap the delayed flight after the scheduled departure time of the delayed flight. In general, the pre-trained churn time prediction model may predict a time period during which the passenger will perform operations such as refund and/or change and/or swap after the planned departure time of the delayed flight, and determine the churn time of the passenger in conjunction with the planned departure time and the time period.

Optionally, the churn time obtaining unit 400 is specifically configured to input the at least one target feature information into a churn time prediction model trained in advance, so that the churn time prediction model determines a predicted churn probability of the passenger in each preset time interval, and predicts the churn time of the passenger according to the predicted churn probability of the passenger in each preset time interval.

Optionally, the elapsed time obtaining unit 400 is specifically configured to:

The passenger loss time prediction device can obtain passenger information of passengers in delayed flights and flight associated information of the delayed flights; extracting at least one target characteristic information from the passenger information and the flight associated information; inputting at least one target characteristic information into a pre-trained loss prediction model to obtain a loss prediction result output by the pre-trained loss prediction model; and when the loss prediction result is loss, inputting at least one target characteristic information into a pre-trained loss time prediction model to obtain the loss time of the passenger predicted by the pre-trained loss time prediction model. The method can use the passenger information of passengers in delayed flights and the target characteristic information extracted from the flight related information of the delayed flights to determine whether the passengers are lost through the loss prediction model and predict the loss time of the passengers through the loss time prediction model, and after the loss time is obtained, an airline company can arrange and deploy the delayed flights in time according to the loss time of the passengers, so that the attendance rate is improved, and the average energy consumption for transporting each passenger is effectively reduced.

Optionally, based on the apparatus shown in fig. 7, as shown in fig. 8, another apparatus for predicting passenger churn time provided by the embodiment of the present disclosure may further include: a passenger churn number determination unit 500 and a flight demand passenger number determination unit 600.

The passenger churn number determining unit 500 is configured to determine a churn number of passengers whose churn time in the delayed flight is earlier than a preset target time.

The flight number of passengers required by the flight determining unit 600 is configured to determine the number of passengers required by the delayed flight at the preset target time by subtracting the total number of passengers in the delayed flight from the loss number.

The units described in the embodiments of the present disclosure may be implemented by software or hardware. Where the name of an element does not in some cases constitute a limitation on the element itself.

The embodiment of the present disclosure further provides a computer readable medium, on which a program is stored, and when the program is executed by a processor, the method for predicting the passenger churn time is implemented.

In the context of this disclosure, a computer-readable medium may be a tangible medium that can contain, or store a program for use by or in connection with an instruction execution system, apparatus, or device. The computer readable medium may be a machine readable signal medium or a machine readable storage medium. A computer readable medium may include, but is not limited to, an electronic, magnetic, optical, electromagnetic, infrared, or semiconductor system, apparatus, or device, or any suitable combination of the foregoing. More specific examples of a machine-readable storage medium would include an electrical connection based on one or more wires, a portable computer diskette, a hard disk, a Random Access Memory (RAM), a read-only memory (ROM), an erasable programmable read-only memory (EPROM or flash memory), an optical fiber, a portable compact disc read-only memory (CD-ROM), an optical storage device, a magnetic storage device, or any suitable combination of the foregoing.

It should be noted that the computer readable medium in the present disclosure can be a computer readable signal medium or a computer readable storage medium or any combination of the two. A computer readable storage medium may be, for example, but not limited to, an electronic, magnetic, optical, electromagnetic, infrared, or semiconductor system, apparatus, or device, or any combination of the foregoing. More specific examples of the computer readable storage medium may include, but are not limited to: an electrical connection having one or more wires, a portable computer diskette, a hard disk, a Random Access Memory (RAM), a read-only memory (ROM), an erasable programmable read-only memory (EPROM or flash memory), an optical fiber, a portable compact disc read-only memory (CD-ROM), an optical storage device, a magnetic storage device, or any suitable combination of the foregoing. In the present disclosure, a computer readable storage medium may be any tangible medium that can contain, or store a program for use by or in connection with an instruction execution system, apparatus, or device. In contrast, in the present disclosure, a computer readable signal medium may comprise a propagated data signal with computer readable program code embodied therein, either in baseband or as part of a carrier wave. Such a propagated data signal may take many forms, including, but not limited to, electro-magnetic, optical, or any suitable combination thereof. A computer readable signal medium may also be any computer readable medium that is not a computer readable storage medium and that can communicate, propagate, or transport a program for use by or in connection with an instruction execution system, apparatus, or device. Program code embodied on a computer readable medium may be transmitted using any appropriate medium, including but not limited to: electrical wires, optical cables, RF (radio frequency), etc., or any suitable combination of the foregoing.

The computer readable medium may be embodied in an electronic device; or may be present alone without being incorporated into the electronic device.

As shown in fig. 9, an embodiment of the present disclosure also provides an electronic device 1000, where the electronic device 1000 includes at least one processor 1100, and at least one memory 1200 and a bus 1300 connected to the processor 1100; the processor 1100 and the memory 1200 complete communication with each other through the bus 1300; the processor 1100 is configured to call program instructions in the memory 1200 to perform the passenger churn time prediction method described above.

Computer program code for carrying out operations for the present disclosure may be written in any combination of one or more programming languages, including but not limited to an object oriented programming language such as Java, Smalltalk, C + +, and conventional procedural programming languages, such as the "C" programming language or similar programming languages. The program code may execute entirely on the user's computer, partly on the user's computer, as a stand-alone software package, partly on the user's computer and partly on a remote computer or entirely on the remote computer or server. In the case of a remote computer, the remote computer may be connected to the user's computer through any type of network, including a Local Area Network (LAN) or a Wide Area Network (WAN), or the connection may be made to an external computer (for example, through the Internet using an Internet service provider).

Although the subject matter has been described in language specific to structural features and/or methodological acts, it is to be understood that the subject matter defined in the appended claims is not necessarily limited to the specific features or acts described above. Rather, the specific features and acts described above are disclosed as example forms of implementing the claims.

While several specific implementation details are included in the above discussion, these should not be construed as limitations on the scope of the disclosure. Certain features that are described in the context of separate embodiments can also be implemented in combination in a single embodiment. Conversely, various features that are described in the context of a single embodiment can also be implemented in multiple embodiments separately or in any suitable subcombination.

The foregoing description is only exemplary of the preferred embodiments of the disclosure and is illustrative of the principles of the technology employed. It will be appreciated by those skilled in the art that the scope of the disclosure herein is not limited to the particular combination of features described above, but also encompasses other embodiments in which any combination of the features described above or their equivalents does not depart from the spirit of the disclosure. For example, the above features and (but not limited to) the features disclosed in this disclosure having similar functions are replaced with each other to form the technical solution.

Claims

1. A method for predicting passenger churn time, comprising:

2. The method of claim 1, wherein the passenger information comprises: at least one of passenger reservation record, passenger ticket information, passenger personal information, and passenger historical behavior information.

3. The method of claim 1, wherein the flight-associated information comprises: at least one of flight information, associated airport information, associated weather information, and associated airline department information.

4. The method according to claim 1, wherein the extracting at least one target feature information from the passenger information and the flight related information comprises:

and converting preset information in the passenger information and the flight associated information through a characteristic project to obtain at least one target characteristic information.

5. The method as claimed in claim 1, wherein the runoff prediction model is a random forest model, and the inputting the at least one target feature information into a pre-trained runoff prediction model to obtain a runoff prediction result output by the pre-trained runoff prediction model comprises:

and inputting the at least one target characteristic information into a pre-trained random forest model, so that each decision tree in the random forest model is classified and judged according to the at least one target characteristic information respectively, a classification result output by each decision tree is obtained, and a loss prediction result is determined according to the classification result output by each decision tree.

6. The method of claim 1, wherein the inputting the at least one target feature information into a pre-trained churn time prediction model to obtain the churn time of the passenger predicted by the pre-trained churn time prediction model comprises:

inputting the at least one target characteristic information into a pre-trained loss time prediction model so that the loss time prediction model determines the prediction loss probability of the passengers in each preset time interval, and predicting the loss time of the passengers according to the prediction loss probability of the passengers in each preset time interval.

7. The method of claim 6, wherein predicting the passenger's churn time based on the passenger's predicted churn probability in each of the predetermined time intervals comprises:

wherein T is the passenger's lapsed time, N is the number of the preset time intervals, j is the number of the time intervals, y_jIs the jth time interval, wherein the time zoneIn y_j＝[t_j，t_j+1)，t_jAnd t_j+1Is a time interval y_jTwo end points of (a), t_jAnd t_j+1Respectively represent time intervals y_jThe interval duration of the planned takeoff time of the delayed flight is internally distant, x is the target characteristic information of the passenger, P_w(y_j| x) is the passenger in time interval y_jIntra prediction run-off probability.

8. The method of claim 1, further comprising:

determining the loss quantity of passengers with the loss time in the delayed flight being earlier than a preset target time;

and subtracting the total number of passengers in the delayed flight from the loss number, and determining the number of passengers required by the delayed flight at the preset target time.

9. A passenger churn time prediction apparatus, comprising: an information obtaining unit, an information extracting unit, a loss prediction result obtaining unit and a loss time obtaining unit,

10. The apparatus of claim 9, further comprising: a passenger churn number determination unit and a flight demand passenger number determination unit,

the passenger churn number determining unit is used for determining the churn number of the passengers with the churn time in the delayed flight being earlier than the preset target time;

and the flight demand passenger number determining unit is used for subtracting the total number of passengers in the delayed flight from the loss number, and determining the flight demand passenger number of the delayed flight at the preset target time.