CN112365051A - Agent retention prediction method and device, computer equipment and storage medium - Google Patents

Agent retention prediction method and device, computer equipment and storage medium Download PDF

Info

Publication number
CN112365051A
CN112365051A CN202011249497.2A CN202011249497A CN112365051A CN 112365051 A CN112365051 A CN 112365051A CN 202011249497 A CN202011249497 A CN 202011249497A CN 112365051 A CN112365051 A CN 112365051A
Authority
CN
China
Prior art keywords
agent
prediction
retention
target
training
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Withdrawn
Application number
CN202011249497.2A
Other languages
Chinese (zh)
Inventor
杜宇衡
萧梓健
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Ping An Life Insurance Company of China Ltd
Original Assignee
Ping An Life Insurance Company of China Ltd
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Ping An Life Insurance Company of China Ltd filed Critical Ping An Life Insurance Company of China Ltd
Priority to CN202011249497.2A priority Critical patent/CN112365051A/en
Publication of CN112365051A publication Critical patent/CN112365051A/en
Withdrawn legal-status Critical Current

Links

Images

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06QINFORMATION AND COMMUNICATION TECHNOLOGY [ICT] SPECIALLY ADAPTED FOR ADMINISTRATIVE, COMMERCIAL, FINANCIAL, MANAGERIAL OR SUPERVISORY PURPOSES; SYSTEMS OR METHODS SPECIALLY ADAPTED FOR ADMINISTRATIVE, COMMERCIAL, FINANCIAL, MANAGERIAL OR SUPERVISORY PURPOSES, NOT OTHERWISE PROVIDED FOR
    • G06Q10/00Administration; Management
    • G06Q10/04Forecasting or optimisation specially adapted for administrative or management purposes, e.g. linear programming or "cutting stock problem"
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F16/00Information retrieval; Database structures therefor; File system structures therefor
    • G06F16/20Information retrieval; Database structures therefor; File system structures therefor of structured data, e.g. relational data
    • G06F16/24Querying
    • G06F16/245Query processing
    • G06F16/2458Special types of queries, e.g. statistical queries, fuzzy queries or distributed queries
    • G06F16/2465Query processing support for facilitating data mining operations in structured databases
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06NCOMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
    • G06N3/00Computing arrangements based on biological models
    • G06N3/02Neural networks
    • G06N3/08Learning methods
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06QINFORMATION AND COMMUNICATION TECHNOLOGY [ICT] SPECIALLY ADAPTED FOR ADMINISTRATIVE, COMMERCIAL, FINANCIAL, MANAGERIAL OR SUPERVISORY PURPOSES; SYSTEMS OR METHODS SPECIALLY ADAPTED FOR ADMINISTRATIVE, COMMERCIAL, FINANCIAL, MANAGERIAL OR SUPERVISORY PURPOSES, NOT OTHERWISE PROVIDED FOR
    • G06Q30/00Commerce
    • G06Q30/02Marketing; Price estimation or determination; Fundraising
    • G06Q30/0201Market modelling; Market analysis; Collecting market data
    • G06Q30/0202Market predictions or forecasting for commercial activities
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06QINFORMATION AND COMMUNICATION TECHNOLOGY [ICT] SPECIALLY ADAPTED FOR ADMINISTRATIVE, COMMERCIAL, FINANCIAL, MANAGERIAL OR SUPERVISORY PURPOSES; SYSTEMS OR METHODS SPECIALLY ADAPTED FOR ADMINISTRATIVE, COMMERCIAL, FINANCIAL, MANAGERIAL OR SUPERVISORY PURPOSES, NOT OTHERWISE PROVIDED FOR
    • G06Q40/00Finance; Insurance; Tax strategies; Processing of corporate or income taxes

Landscapes

  • Engineering & Computer Science (AREA)
  • Business, Economics & Management (AREA)
  • Theoretical Computer Science (AREA)
  • Physics & Mathematics (AREA)
  • Strategic Management (AREA)
  • Development Economics (AREA)
  • General Physics & Mathematics (AREA)
  • Finance (AREA)
  • Economics (AREA)
  • Accounting & Taxation (AREA)
  • Data Mining & Analysis (AREA)
  • Marketing (AREA)
  • General Business, Economics & Management (AREA)
  • Entrepreneurship & Innovation (AREA)
  • Databases & Information Systems (AREA)
  • Computational Linguistics (AREA)
  • Software Systems (AREA)
  • General Engineering & Computer Science (AREA)
  • Human Resources & Organizations (AREA)
  • Game Theory and Decision Science (AREA)
  • Mathematical Physics (AREA)
  • Biophysics (AREA)
  • Technology Law (AREA)
  • Evolutionary Computation (AREA)
  • Molecular Biology (AREA)
  • Probability & Statistics with Applications (AREA)
  • Biomedical Technology (AREA)
  • Artificial Intelligence (AREA)
  • General Health & Medical Sciences (AREA)
  • Life Sciences & Earth Sciences (AREA)
  • Health & Medical Sciences (AREA)
  • Fuzzy Systems (AREA)
  • Computing Systems (AREA)
  • Operations Research (AREA)
  • Quality & Reliability (AREA)
  • Tourism & Hospitality (AREA)
  • Management, Administration, Business Operations System, And Electronic Commerce (AREA)

Abstract

The invention relates to the technical field of artificial intelligence, and provides a method and a device for predicting retention of an agent, computer equipment and a storage medium, wherein the method comprises the following steps: determining a prediction target retained by an agent, and acquiring a plurality of original characteristics of a plurality of historical agents according to the prediction target; training a GBDT model based on the original features, and acquiring the feature of each leaf node in the GBDT model when the training is finished; constructing a rule set according to the characteristics of each leaf node; matching the plurality of original features by using the rule set and generating an embedded matrix according to a matching result; training a neural network model by using the embedded matrix to obtain an agent retention prediction model; and predicting the retention score of the target agent through the agent retention prediction model. The invention can improve the accuracy and stability of the agent retention prediction.

Description

Agent retention prediction method and device, computer equipment and storage medium
Technical Field
The invention relates to the technical field of artificial intelligence, in particular to a method and a device for predicting retention of an agent, computer equipment and a storage medium.
Background
In common big data mining applications, data modeling is a common way to identify business laws. User retention is one of a class of important data mining scenarios. A user who starts using an application for a certain period of time and continues to use the application after a certain period of time is regarded as a remaining user. The percentage of the users to the newly added users at that time is the retention rate, and the statistics is performed at intervals of 1 unit time (example day, week, and month). As the name implies, retention refers to "how many users are left". The reserved user and the reservation rate reflect the quality of the application and the ability to reserve the user.
In some specific user retention prediction scenarios, the service is particularly concerned with the long-term retention of the user, which is bounded by one year. This business value presents a great challenge for the model to determine whether a user remains for a long time: at least one year ago, users known whether to be left for a long time can be used as model training targets, and the behavior characteristics of the users are far from those of the current users, so that the prediction effect of the trained models on the current users is greatly reduced. For this scenario, the common method in the industry is to find an intermediate variable, for example, whether the user is left for 3 months instead of 1 year, to shorten the time difference between the training sample and the prediction sample. The method can reflect the popular behavior characteristics of the current user, but the training target is greatly different from the actual business target, so that the retention prediction stability of the agent is poor and the prediction accuracy is low.
Disclosure of Invention
In view of the foregoing, there is a need for a method, an apparatus, a computer device and a storage medium for predicting human agent retention, which can improve the accuracy and stability of human agent retention prediction.
A first aspect of the invention provides a method of agent persistence prediction, the method comprising:
determining a prediction target retained by an agent, and acquiring a plurality of original characteristics of a plurality of historical agents according to the prediction target;
training a GBDT model based on the original features, and acquiring the feature of each leaf node in the GBDT model when the training is finished;
constructing a rule set according to the characteristics of each leaf node;
matching the plurality of original features by using the rule set and generating an embedded matrix according to a matching result;
training a neural network model by using the embedded matrix to obtain an agent retention prediction model;
and predicting the retention score of the target agent through the agent retention prediction model.
According to an alternative embodiment of the present invention, the determining a prediction target retained by the agent and obtaining a plurality of original features of a plurality of historical agents according to the prediction target comprises:
determining a plurality of acquisition times according to the predicted target;
acquiring sample data of a plurality of historical agents within the plurality of acquisition times;
selecting a plurality of positive sample data covering the prediction target from the sample data of the plurality of historical agents;
determining the plurality of positive sample data as the plurality of raw features.
According to an alternative embodiment of the invention, said training the GBDT model based on said plurality of raw features comprises:
determining a number of acquisition cycles of the plurality of raw features;
initializing the number of decision trees according to the acquisition cycle number;
and training the number of decision trees based on the plurality of original features to obtain a GBDT model.
According to an alternative embodiment of the present invention, the determining the number of acquisition cycles of the plurality of raw features comprises:
sorting the acquisition times of the plurality of original features;
calculating the time difference between the first-ordered acquisition time and the second-to-last acquisition time;
and calculating the ratio of the time difference to the acquisition period to obtain the acquisition period number.
According to an alternative embodiment of the present invention, the matching the plurality of original features using the rule set and generating the embedded matrix according to the matching result includes:
for each original feature, successively matching the original feature with each rule in the rule set;
when the original characteristic is successfully matched with any rule in the rule set, generating a first identifier;
when the matching of the original features and any rule in the rule set fails, generating a second identifier;
connecting corresponding first identifications or second identifications according to the indexes of the rules in the rule set to form embedded vectors;
and generating an embedding matrix according to the embedding vector corresponding to the original characteristic.
According to an alternative embodiment of the invention, said predicting a retention score of the target agent by said agent retention prediction model comprises:
acquiring the original characteristics of the target agent;
generating a target embedded vector according to the original features of the target agent and the rule set;
and inputting the target embedded vector into the agent retention prediction model for prediction to obtain the retention score of the target agent.
According to an alternative embodiment of the invention, after obtaining the retention score of the target agent, the method further comprises:
judging whether the retention score is larger than a preset score threshold value or not;
when the retention score is larger than or equal to the preset score threshold value, acquiring basic information of the target agent;
and triggering the real-time task to write the basic information into a preset database.
A second aspect of the invention provides an agent retention prediction apparatus, the apparatus comprising:
the system comprises a first acquisition module, a second acquisition module and a third acquisition module, wherein the first acquisition module is used for determining a prediction target retained by an agent and acquiring a plurality of original characteristics of a plurality of historical agents according to the prediction target;
the second obtaining module is used for training the GBDT model based on the original features and obtaining the feature of each leaf node in the GBDT model when the training is finished;
the rule building module is used for building a rule set according to the characteristics of each leaf node;
the matrix generation module is used for matching the original characteristics by using the rule set and generating an embedded matrix according to a matching result;
the model training module is used for training a neural network model by using the embedded matrix to obtain an agent retention prediction model;
and the retention prediction module is used for predicting the retention score of the target agent through the agent retention prediction model.
A third aspect of the present invention provides a computer apparatus comprising:
a memory for storing a computer program;
a processor for implementing the agent persistence prediction method when executing the computer program.
A fourth aspect of the invention provides a computer readable storage medium having stored thereon a computer program which, when executed by a processor, implements the agent persistence prediction method.
In summary, according to the method, the apparatus, the computer device and the storage medium for predicting retention of an agent, a GBDT model is trained based on the plurality of original features, and a feature of each leaf node in the GBDT model is obtained when training is finished, since the leaf node represents a predicted value of a sample in the GBDT model, a rule set is constructed according to the feature of each leaf node and an embedded matrix generated by the rule set is used to train the agent retention prediction model, so that a difference between training data and prediction data is reduced, and the trained agent retention prediction model has strong stability in a scene of cross-time prediction, thereby improving the stability of agent retention prediction; and the rule set is used for matching the plurality of original features, namely, the feature with stronger prediction capability is selected from the plurality of original features according to the rule set, so that the prediction effect of training the agent retention prediction model by using the embedded matrix generated by the feature with stronger prediction capability is better, and the accuracy of agent retention prediction can be improved.
Drawings
Fig. 1 is a flowchart of a method for predicting human agent retention according to an embodiment of the present invention.
Fig. 2 is a block diagram of an agent retention prediction apparatus according to a second embodiment of the present invention.
Fig. 3 is a schematic structural diagram of a computer device according to a third embodiment of the present invention.
Detailed Description
In order that the above objects, features and advantages of the present invention can be more clearly understood, a detailed description of the present invention will be given below with reference to the accompanying drawings and specific embodiments. It should be noted that the embodiments of the present invention and features of the embodiments may be combined with each other without conflict.
Unless defined otherwise, all technical and scientific terms used herein have the same meaning as commonly understood by one of ordinary skill in the art to which this invention belongs. The terminology used in the description of the invention herein is for the purpose of describing particular embodiments only and is not intended to be limiting of the invention.
The agent retention prediction method provided by the embodiment of the invention is executed by computer equipment, and correspondingly, the agent retention prediction device provided by the embodiment of the invention runs in the computer equipment.
Fig. 1 is a flowchart of a method for predicting human agent retention according to an embodiment of the present invention. The agent retention prediction method specifically comprises the following steps, and the sequence of the steps in the flow chart can be changed and some steps can be omitted according to different requirements.
And S11, determining a prediction target retained by the agent, and acquiring a plurality of original characteristics of a plurality of historical agents according to the prediction target.
And the prediction target is whether 13 leaves, namely whether the target agent can be reserved for 1 year is predicted through the agent retention prediction model.
The plurality of original features can be data of a plurality of historical dealers when using application programs in the processes of training, interviewing and the like before entering the driver and applying for the driver.
And acquiring a plurality of original characteristics of a plurality of historical agents according to the prediction target, wherein the purpose is to acquire a data set of a training agent retention prediction model in a targeted manner, so that the prediction accuracy of the trained agent retention prediction model is higher.
In an optional embodiment, the determining a prediction target retained by the agent and obtaining a plurality of original features of a plurality of historical agents according to the prediction target comprises:
determining a plurality of acquisition times according to the predicted target;
acquiring sample data of a plurality of historical agents within the plurality of acquisition times;
selecting a plurality of positive sample data covering the prediction target from the sample data of the plurality of historical agents;
determining the plurality of positive sample data as the plurality of raw features.
Illustratively, assuming that the predicted month is 2019 month, and whether the predicted target is left for 1 year is necessarily left for m months, where m <12, an agent who completes authentication in 2019 month 2 knows whether it is left for 7 months in 2019 month 9, and an agent who completes authentication in 2019 month 3 knows whether it is left for 6 months in 2019 month 9, the same can be generalized to an agent who completes authentication in 2019 months 4 to 7. Therefore, the positive samples in the training sample of month 2 in 2019 (i.e., the samples that have been retained for 7 months in month 9 in 2019) are necessarily a superset of target a, and are generalized to samples of months 3 to 7 in 2019 in the same way.
And S12, training the GBDT model based on the plurality of original features, and acquiring the feature of each leaf node in the GBDT model at the end of training.
A Gradient Boosting iterative Decision Tree (GBDT) is an algorithm that uses an additive model (i.e., a linear combination of basis functions) and continuously reduces the residual error generated by the training process to classify or regress the data. At the end of the GBDT model training, each leaf node represents the predicted value of a certain sample on the tree.
Illustratively, assume the modeling scenario is: the prediction object is a new agent submitting an application for entry, whether the prediction target is 13-left (left for 1 year) or not is determined, and the training data is at least data before 12 months in the prediction scene left by the agent 13, so that the training data and the test data have longer time span and lower prediction accuracy.
In an alternative embodiment, said training the GBDT model based on said plurality of raw features comprises:
determining a number of acquisition cycles of the plurality of raw features;
initializing the number of decision trees according to the acquisition cycle number;
and training the number of decision trees based on the plurality of original features to obtain a GBDT model.
Wherein the collection period refers to a collection frequency of the plurality of original features. To be more practical, the collection period may be set to be a month, that is, the original features of a plurality of agents in the month are collected in a month period.
And sequencing the acquisition times of the original features, calculating a time difference between the first acquisition time and the last acquisition time, and calculating a ratio of the time difference to the acquisition period to obtain the acquisition period number.
For example, assuming that the collection time of the plurality of original features is 2/1/2019-7/30/2019, the number of collection cycles obtained in a month period is 6, the GBDT model is trained based on the plurality of original features, and the leaf nodes of the trained GBDT model are 6 decision trees.
In the optional embodiment, the number of decision trees in the GBDT model is controlled by the number of acquisition cycles of a plurality of original features, and the training of the GBDT model refined according to actual requirements is controlled, so that the features obtained from leaf nodes are relatively comprehensive and the model can be more accurate when the decision trees cover all the possibilities whether the agents in all acquisition months can reserve 13 or not.
And S13, constructing a rule set according to the characteristics of each leaf node.
In the GBDT model, each leaf node corresponds to a rule, i.e., a combined feature. And combining the rules corresponding to all the leaf nodes to construct a rule set.
For example, assuming that there are 3 original features, gender, age, income, there are 16 leaf nodes after the GBDT model training, i.e. 16 combined features are as follows:
combination feature 1: sex: male; age: >25 years old; income: 10000 yuan;
combination characteristic 2: sex: male; age: >45 years old; income: 8000 yuan;
……;
combination feature 16: sex: a woman; age <35 years old; income: <20000 Yuan.
These 16 combined features constitute a Knowledge Base (KB), and each combined feature is a rule.
And S14, matching the original characteristics by using the rule set and generating an embedded matrix according to a matching result.
In an optional embodiment, the matching the plurality of original features by using the rule set and generating an embedded matrix according to a matching result includes:
for each original feature, successively matching the original feature with each rule in the rule set;
when the original characteristic is successfully matched with any rule in the rule set, generating a first identifier;
when the matching of the original features and any rule in the rule set fails, generating a second identifier;
connecting corresponding first identifications or second identifications according to the indexes of the rules in the rule set to form embedded vectors;
and generating an embedding matrix according to the embedding vector corresponding to the original characteristic.
For example, an existing sample (sex: male, age 28; income 18000 yuan) is judged in turn by each rule in KB, and if satisfied, the result of the rule is marked as 1; if not, the rule result is marked 0. The results of the 16 rules can be combined into a vector 10000000000000. The vector of n samples may constitute an n x 16 matrix, referred to as embedding.
And S15, training the neural network model by using the embedded matrix to obtain an agent retention prediction model.
Initializing a Deep Neural Network (DNN) frame in advance, inputting the embedded matrix into the initialized DNN frame as an input parameter for multi-round iterative training, and finishing the training when a training finishing condition is met to obtain an agent retention prediction model.
The training end condition may be that a difference between the test passing rate of the previous round and the test passing rate of the next round is smaller than a preset difference threshold, or that the test passing rate of a certain round is smaller than a preset passing rate threshold.
Wherein, the calculation process of the test passing rate comprises the following steps: obtaining a predicted value of the agent retention prediction model, and calculating a first number of predicted values which are the same as the corresponding true values; and calculating the ratio of the first number to the second number of the predicted values, and determining the ratio as the test passing rate.
S16, forecasting the retention score of the target agent through the agent retention forecasting model.
In an optional embodiment, said predicting a retention score for a target agent by said agent retention prediction model comprises:
acquiring the original characteristics of the target agent;
generating a target embedded vector according to the original features of the target agent and the rule set;
and inputting the target embedded vector into the agent retention prediction model for prediction to obtain the retention score of the target agent.
If retention prediction needs to be carried out on a certain target agent of the entrepreneur application, original features of the target agent are matched with the rule set to generate a target embedded vector which is used as an entry parameter of the agent retention prediction model, and the agent retention prediction model outputs a retention score.
In an optional embodiment, after obtaining the retention score of the target agent, the method further comprises:
judging whether the retention score is larger than a preset score threshold value or not;
when the retention score is larger than or equal to the preset score threshold value, acquiring basic information of the target agent;
and triggering the real-time task to write the basic information into a preset database.
Wherein the retention score is used to determine whether the target agent is able to 13-retention. A higher retention score (greater than or equal to the preset score threshold) indicates that the target agent is more able to 13 remain; the lower the retention score (less than the preset score threshold), the less likely the target agent is to be 13. Basic information of the agents which can be left at 13 can be written into a preset database in real time, so that a manager can quickly determine the list of the agents which are cultured in a key mode, and the assessment efficiency of the driver is improved.
In the prior art, the model training and prediction are not facilitated due to the fact that the time span of the model training is as long as 1 year by using 13 (target A), and the problem of difference between the training sample and the prediction sample can be solved by using 3 (target B, whether a new agent is reserved for at least 3 months), but the difference between the 3-stay and the 13-stay (business attention target) is large, and the improvement of the model prediction effect is limited. According to the invention, the GBDT model is trained based on the original characteristics, the characteristics of each leaf node in the GBDT model are obtained when the training is finished, and as the leaf nodes represent the predicted values of the sample in the GBDT model, a rule set is constructed according to the characteristics of each leaf node and the agent retention prediction model is trained by using the embedded matrix generated by the rule set, so that the difference between training data and prediction data is reduced, the stability of the trained agent retention prediction model is strong in a scene of cross-time prediction, and the stability of agent retention prediction can be improved; and the rule set is used for matching the plurality of original features, namely, the feature with stronger prediction capability is selected from the plurality of original features according to the rule set, so that the prediction effect of training the agent retention prediction model by using the embedded matrix generated by the feature with stronger prediction capability is better, and the accuracy of agent retention prediction can be improved.
The data in the agent retention prediction method is from data in the financial field, and can be applied to intelligent government affairs to promote the development of intelligent cities.
It is emphasized that to further ensure the privacy and security of the above-described agent-preserving prediction model, the above-described agent-preserving prediction model may be stored in a node of the blockchain.
Fig. 2 is a block diagram of an agent retention prediction apparatus according to a second embodiment of the present invention.
In some embodiments, the agent retention prediction apparatus 20 may include a plurality of functional modules comprised of computer program segments. The computer programs of the various program segments in the agent retention prediction apparatus 20 may be stored in a memory of a computer device and executed by at least one processor to perform (see detailed description of fig. 1) the functions of the agent retention prediction.
In this embodiment, the agent retention prediction apparatus 20 may be divided into a plurality of functional modules according to the functions performed by the agent retention prediction apparatus. The functional module may include: the system comprises a first acquisition module 201, a second acquisition module 202, a rule construction module 203, a matrix generation module 204, a model training module 205, a retention prediction module 206, a score comparison module 207 and an information writing module 208. The module referred to herein is a series of computer program segments capable of being executed by at least one processor and capable of performing a fixed function and is stored in memory. In the present embodiment, the functions of the modules will be described in detail in the following embodiments.
The first obtaining module 201 is configured to determine a prediction target retained by an agent, and obtain a plurality of original features of a plurality of historical agents according to the prediction target.
And the prediction target is whether 13 leaves, namely whether the target agent can be reserved for 1 year is predicted through the agent retention prediction model.
The plurality of original features can be data of a plurality of historical dealers when using application programs in the processes of training, interviewing and the like before entering the driver and applying for the driver.
And acquiring a plurality of original characteristics of a plurality of historical agents according to the prediction target, wherein the purpose is to acquire a data set of a training agent retention prediction model in a targeted manner, so that the prediction accuracy of the trained agent retention prediction model is higher.
In an optional embodiment, the determining, by the first obtaining module 201, a prediction target retained by an agent, and obtaining a plurality of original features of a plurality of historical agents according to the prediction target includes:
determining a plurality of acquisition times according to the predicted target;
acquiring sample data of a plurality of historical agents within the plurality of acquisition times;
selecting a plurality of positive sample data covering the prediction target from the sample data of the plurality of historical agents;
determining the plurality of positive sample data as the plurality of raw features.
Illustratively, assuming that the predicted month is 2019 month, and whether the predicted target is left for 1 year is necessarily left for m months, where m <12, an agent who completes authentication in 2019 month 2 knows whether it is left for 7 months in 2019 month 9, and an agent who completes authentication in 2019 month 3 knows whether it is left for 6 months in 2019 month 9, the same can be generalized to an agent who completes authentication in 2019 months 4 to 7. Therefore, the positive samples in the training sample of month 2 in 2019 (i.e., the samples that have been retained for 7 months in month 9 in 2019) are necessarily a superset of target a, and are generalized to samples of months 3 to 7 in 2019 in the same way.
The second obtaining module 202 is configured to train the GBDT model based on the plurality of original features, and obtain a feature of each leaf node in the GBDT model when the training is finished.
A Gradient Boosting iterative Decision Tree (GBDT) is an algorithm that uses an additive model (i.e., a linear combination of basis functions) and continuously reduces the residual error generated by the training process to classify or regress the data. At the end of the GBDT model training, each leaf node represents the predicted value of a certain sample on the tree.
Illustratively, assume the modeling scenario is: the prediction object is a new agent submitting an application for entry, whether the prediction target is 13-left (left for 1 year) or not is determined, and the training data is at least data before 12 months in the prediction scene left by the agent 13, so that the training data and the test data have longer time span and lower prediction accuracy.
In an alternative embodiment, the training of the GBDT model based on the plurality of raw features by the second obtaining module 202 includes:
determining a number of acquisition cycles of the plurality of raw features;
initializing the number of decision trees according to the acquisition cycle number;
and training the number of decision trees based on the plurality of original features to obtain a GBDT model.
Wherein the collection period refers to a collection frequency of the plurality of original features. To be more practical, the collection period may be set to be a month, that is, the original features of a plurality of agents in the month are collected in a month period.
And sequencing the acquisition times of the original features, calculating a time difference between the first acquisition time and the last acquisition time, and calculating a ratio of the time difference to the acquisition period to obtain the acquisition period number.
For example, assuming that the collection time of the plurality of original features is 2/1/2019-7/30/2019, the number of collection cycles obtained in a month period is 6, the GBDT model is trained based on the plurality of original features, and the leaf nodes of the trained GBDT model are 6 decision trees.
In the optional embodiment, the number of decision trees in the GBDT model is controlled by the number of acquisition cycles of a plurality of original features, and the training of the GBDT model refined according to actual requirements is controlled, so that the features obtained from leaf nodes are relatively comprehensive and the model can be more accurate when the decision trees cover all the possibilities whether the agents in all acquisition months can reserve 13 or not.
The rule building module 203 is configured to build a rule set according to the feature of each leaf node.
In the GBDT model, each leaf node corresponds to a rule, i.e., a combined feature. And combining the rules corresponding to all the leaf nodes to construct a rule set.
For example, assuming that there are 3 original features, gender, age, income, there are 16 leaf nodes after the GBDT model training, i.e. 16 combined features are as follows:
combination feature 1: sex: male; age: >25 years old; income: 10000 yuan;
combination characteristic 2: sex: male; age: >45 years old; income: 8000 yuan;
……;
combination feature 16: sex: a woman; age <35 years old; income: <20000 Yuan.
These 16 combined features constitute a Knowledge Base (KB), and each combined feature is a rule.
The matrix generating module 204 is configured to match the plurality of original features by using the rule set and generate an embedded matrix according to a matching result.
In an optional embodiment, the matching the plurality of original features with the rule set and generating the embedded matrix according to the matching result by the matrix generating module 204 includes:
for each original feature, successively matching the original feature with each rule in the rule set;
when the original characteristic is successfully matched with any rule in the rule set, generating a first identifier;
when the matching of the original features and any rule in the rule set fails, generating a second identifier;
connecting corresponding first identifications or second identifications according to the indexes of the rules in the rule set to form embedded vectors;
and generating an embedding matrix according to the embedding vector corresponding to the original characteristic.
For example, an existing sample (sex: male, age 28; income 18000 yuan) is judged in turn by each rule in KB, and if satisfied, the result of the rule is marked as 1; if not, the rule result is marked 0. The results of the 16 rules can be combined into a vector 10000000000000. The vector of n samples may constitute an n x 16 matrix, referred to as embedding.
The model training module 205 is configured to train a neural network model using the embedded matrix to obtain an agent retention prediction model.
Initializing a Deep Neural Network (DNN) frame in advance, inputting the embedded matrix into the initialized DNN frame as an input parameter for multi-round iterative training, and finishing the training when a training finishing condition is met to obtain an agent retention prediction model.
The training end condition may be that a difference between the test passing rate of the previous round and the test passing rate of the next round is smaller than a preset difference threshold, or that the test passing rate of a certain round is smaller than a preset passing rate threshold.
Wherein, the calculation process of the test passing rate comprises the following steps: obtaining a predicted value of the agent retention prediction model, and calculating a first number of predicted values which are the same as the corresponding true values; and calculating the ratio of the first number to the second number of the predicted values, and determining the ratio as the test passing rate.
The retention prediction module 206 is configured to predict a retention score of the target agent through the agent retention prediction model.
In an optional embodiment, said predicting a retention score for a target agent by said agent retention prediction model comprises:
acquiring the original characteristics of the target agent;
generating a target embedded vector according to the original features of the target agent and the rule set;
and inputting the target embedded vector into the agent retention prediction model for prediction to obtain the retention score of the target agent.
If retention prediction needs to be carried out on a certain target agent of the entrepreneur application, original features of the target agent are matched with the rule set to generate a target embedded vector which is used as an entry parameter of the agent retention prediction model, and the agent retention prediction model outputs a retention score.
The score comparison module 207 is configured to determine whether the retention score is greater than a preset score threshold;
the information writing module 208 is configured to, when the retention score is greater than or equal to the preset score threshold, obtain basic information of the target agent, and trigger a real-time task to write the basic information into a preset database.
Wherein the retention score is used to determine whether the target agent is able to 13-retention. A higher retention score (greater than or equal to the preset score threshold) indicates that the target agent is more able to 13 remain; the lower the retention score (less than the preset score threshold), the less likely the target agent is to be 13. Basic information of the agents which can be left at 13 can be written into a preset database in real time, so that a manager can quickly determine the list of the agents which are cultured in a key mode, and the assessment efficiency of the driver is improved.
In the prior art, the model training and prediction are not facilitated due to the fact that the time span of the model training is as long as 1 year by using 13 (target A), and the problem of difference between the training sample and the prediction sample can be solved by using 3 (target B, whether a new agent is reserved for at least 3 months), but the difference between the 3-stay and the 13-stay (business attention target) is large, and the improvement of the model prediction effect is limited. According to the invention, the GBDT model is trained based on the original characteristics, the characteristics of each leaf node in the GBDT model are obtained when the training is finished, and as the leaf nodes represent the predicted values of the sample in the GBDT model, a rule set is constructed according to the characteristics of each leaf node and the agent retention prediction model is trained by using the embedded matrix generated by the rule set, so that the difference between training data and prediction data is reduced, the stability of the trained agent retention prediction model is strong in a scene of cross-time prediction, and the stability of agent retention prediction can be improved; and the rule set is used for matching the plurality of original features, namely, the feature with stronger prediction capability is selected from the plurality of original features according to the rule set, so that the prediction effect of training the agent retention prediction model by using the embedded matrix generated by the feature with stronger prediction capability is better, and the accuracy of agent retention prediction can be improved.
The data in the agent retention prediction device is from data in the financial field, can be applied to intelligent government affairs, and promotes the development of intelligent cities.
It is emphasized that to further ensure the privacy and security of the above-described agent-preserving prediction model, the above-described agent-preserving prediction model may be stored in a node of the blockchain.
Fig. 3 is a schematic structural diagram of a computer device according to a third embodiment of the present invention. In the preferred embodiment of the present invention, the computer device 3 includes a memory 31, at least one processor 32, at least one communication bus 33, and a transceiver 34.
It will be appreciated by those skilled in the art that the configuration of the computer device shown in fig. 3 does not constitute a limitation of the embodiments of the present invention, and may be a bus-type configuration or a star-type configuration, and that the computer device 3 may include more or less hardware or software than those shown, or a different arrangement of components.
In some embodiments, the computer device 3 is a computer device capable of automatically performing numerical calculation and/or information processing according to instructions set or stored in advance, and the hardware thereof includes but is not limited to a microprocessor, an application specific integrated circuit, a programmable gate array, a digital processor, an embedded device, and the like. The computer device 3 may also include a client device, which includes, but is not limited to, any electronic product capable of interacting with a client through a keyboard, a mouse, a remote controller, a touch pad, or a voice control device, for example, a personal computer, a tablet computer, a smart phone, a digital camera, etc.
It should be noted that the computer device 3 is only an example, and other electronic products that are currently available or may come into existence in the future, such as electronic products that can be adapted to the present invention, should also be included in the scope of the present invention, and are included herein by reference.
In some embodiments, the memory 31 has stored therein a computer program that, when executed by the at least one processor 32, implements all or part of the steps of the agent retention prediction method as described. The Memory 31 includes a Read-Only Memory (ROM), a Programmable Read-Only Memory (PROM), an Erasable Programmable Read-Only Memory (EPROM), a One-time Programmable Read-Only Memory (OTPROM), an electronically Erasable rewritable Read-Only Memory (Electrically-Erasable Programmable Read-Only Memory (EEPROM)), an optical Read-Only disk (CD-ROM) or other optical disk Memory, a magnetic disk Memory, a tape Memory, or any other medium readable by a computer capable of carrying or storing data.
Further, the computer-readable storage medium may mainly include a storage program area and a storage data area, wherein the storage program area may store an operating system, an application program required for at least one function, and the like; the storage data area may store data created according to the use of the blockchain node, and the like.
The block chain is a novel application mode of computer technologies such as distributed data storage, point-to-point transmission, a consensus mechanism, an encryption algorithm and the like. A block chain (Blockchain), which is essentially a decentralized database, is a series of data blocks associated by using a cryptographic method, and each data block contains information of a batch of network transactions, so as to verify the validity (anti-counterfeiting) of the information and generate a next block. The blockchain may include a blockchain underlying platform, a platform product service layer, an application service layer, and the like.
In some embodiments, the at least one processor 32 is a Control Unit (Control Unit) of the computer device 3, connects various components of the entire computer device 3 by using various interfaces and lines, and executes various functions and processes data of the computer device 3 by running or executing programs or modules stored in the memory 31 and calling data stored in the memory 31. For example, the at least one processor 32, when executing the computer program stored in the memory, implements all or a portion of the steps of the agent retention prediction method described in embodiments of the present invention; or implement all or part of the functionality of the agent persistence prediction device. The at least one processor 32 may be composed of an integrated circuit, for example, a single packaged integrated circuit, or may be composed of a plurality of integrated circuits packaged with the same or different functions, including one or more Central Processing Units (CPUs), microprocessors, digital Processing chips, graphics processors, and combinations of various control chips.
In some embodiments, the at least one communication bus 33 is arranged to enable connection communication between the memory 31 and the at least one processor 32 or the like.
Although not shown, the computer device 3 may further include a power supply (such as a battery) for supplying power to each component, and preferably, the power supply may be logically connected to the at least one processor 32 through a power management device, so as to implement functions of managing charging, discharging, and power consumption through the power management device. The power supply may also include any component of one or more dc or ac power sources, recharging devices, power failure detection circuitry, power converters or inverters, power status indicators, and the like. The computer device 3 may further include various sensors, a bluetooth module, a Wi-Fi module, and the like, which are not described herein again.
The integrated unit implemented in the form of a software functional module may be stored in a computer-readable storage medium. The software functional module is stored in a storage medium and includes several instructions to enable a computer device (which may be a personal computer, a computer device, or a network device) or a processor (processor) to execute parts of the methods according to the embodiments of the present invention.
In the embodiments provided in the present invention, it should be understood that the disclosed apparatus and method may be implemented in other ways. For example, the above-described apparatus embodiments are merely illustrative, and for example, the division of the modules is only one logical functional division, and other divisions may be realized in practice.
The modules described as separate parts may or may not be physically separate, and parts displayed as modules may or may not be physical units, may be located in one place, or may be distributed on a plurality of network units. Some or all of the modules may be selected according to actual needs to achieve the purpose of the solution of the present embodiment.
In addition, functional modules in the embodiments of the present invention may be integrated into one processing unit, or each unit may exist alone physically, or two or more units are integrated into one unit. The integrated unit can be realized in a form of hardware, or in a form of hardware plus a software functional module.
It will be evident to those skilled in the art that the invention is not limited to the details of the foregoing illustrative embodiments, and that the present invention may be embodied in other specific forms without departing from the spirit or essential attributes thereof. The present embodiments are therefore to be considered in all respects as illustrative and not restrictive, the scope of the invention being indicated by the appended claims rather than by the foregoing description, and all changes which come within the meaning and range of equivalency of the claims are therefore intended to be embraced therein. Any reference sign in a claim should not be construed as limiting the claim concerned. Furthermore, it is obvious that the word "comprising" does not exclude other elements or that the singular does not exclude the plural. A plurality of units or means recited in the present invention may also be implemented by one unit or means through software or hardware. The terms first, second, etc. are used to denote names, but not any particular order.
Finally, it should be noted that the above embodiments are only for illustrating the technical solutions of the present invention and not for limiting, and although the present invention is described in detail with reference to the preferred embodiments, it should be understood by those skilled in the art that modifications or equivalent substitutions may be made on the technical solutions of the present invention without departing from the spirit and scope of the technical solutions of the present invention.

Claims (10)

1. A method of agent retention prediction, the method comprising:
determining a prediction target retained by an agent, and acquiring a plurality of original characteristics of a plurality of historical agents according to the prediction target;
training a GBDT model based on the original features, and acquiring the feature of each leaf node in the GBDT model when the training is finished;
constructing a rule set according to the characteristics of each leaf node;
matching the plurality of original features by using the rule set and generating an embedded matrix according to a matching result;
training a neural network model by using the embedded matrix to obtain an agent retention prediction model;
and predicting the retention score of the target agent through the agent retention prediction model.
2. The agent retention prediction method of claim 1, wherein determining a prediction objective of agent retention and obtaining a plurality of raw features of a plurality of historical agents according to the prediction objective comprises:
determining a plurality of acquisition times according to the predicted target;
acquiring sample data of a plurality of historical agents within the plurality of acquisition times;
selecting a plurality of positive sample data covering the prediction target from the sample data of the plurality of historical agents;
determining the plurality of positive sample data as the plurality of raw features.
3. The agent persistence prediction method of claim 2, wherein the training of the GBDT model based on the plurality of raw features comprises:
determining a number of acquisition cycles of the plurality of raw features;
initializing the number of decision trees according to the acquisition cycle number;
and training the number of decision trees based on the plurality of original features to obtain a GBDT model.
4. The agent retention prediction method of claim 3, wherein the determining the number of acquisition cycles of the plurality of raw features comprises:
sorting the acquisition times of the plurality of original features;
calculating the time difference between the first-ordered acquisition time and the second-to-last acquisition time;
and calculating the ratio of the time difference to the acquisition period to obtain the acquisition period number.
5. The agent retention prediction method of claim 1, wherein the matching the plurality of raw features using the rule set and generating an embedding matrix from matching results comprises:
for each original feature, successively matching the original feature with each rule in the rule set;
when the original characteristic is successfully matched with any rule in the rule set, generating a first identifier;
when the matching of the original features and any rule in the rule set fails, generating a second identifier;
connecting corresponding first identifications or second identifications according to the indexes of the rules in the rule set to form embedded vectors;
and generating an embedding matrix according to the embedding vector corresponding to the original characteristic.
6. The agent retention prediction method according to any one of claims 1 to 5, wherein the predicting a retention score for a target agent by the agent retention prediction model comprises:
acquiring the original characteristics of the target agent;
generating a target embedded vector according to the original features of the target agent and the rule set;
and inputting the target embedded vector into the agent retention prediction model for prediction to obtain the retention score of the target agent.
7. The agent retention prediction method of claim 6, wherein after obtaining the retention score for the target agent, the method further comprises:
judging whether the retention score is larger than a preset score threshold value or not;
when the retention score is larger than or equal to the preset score threshold value, acquiring basic information of the target agent;
and triggering the real-time task to write the basic information into a preset database.
8. An agent retention prediction apparatus, the apparatus comprising:
the system comprises a first acquisition module, a second acquisition module and a third acquisition module, wherein the first acquisition module is used for determining a prediction target retained by an agent and acquiring a plurality of original characteristics of a plurality of historical agents according to the prediction target;
the second obtaining module is used for training the GBDT model based on the original features and obtaining the feature of each leaf node in the GBDT model when the training is finished;
the rule building module is used for building a rule set according to the characteristics of each leaf node;
the matrix generation module is used for matching the original characteristics by using the rule set and generating an embedded matrix according to a matching result;
the model training module is used for training a neural network model by using the embedded matrix to obtain an agent retention prediction model;
and the retention prediction module is used for predicting the retention score of the target agent through the agent retention prediction model.
9. A computer device, characterized in that the computer device comprises:
a memory for storing a computer program;
a processor for implementing the agent retention prediction method according to any one of claims 1 to 7 when executing the computer program.
10. A computer-readable storage medium having stored thereon a computer program, wherein the computer program, when executed by a processor, implements the agent retention prediction method of any of claims 1 to 7.
CN202011249497.2A 2020-11-10 2020-11-10 Agent retention prediction method and device, computer equipment and storage medium Withdrawn CN112365051A (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN202011249497.2A CN112365051A (en) 2020-11-10 2020-11-10 Agent retention prediction method and device, computer equipment and storage medium

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN202011249497.2A CN112365051A (en) 2020-11-10 2020-11-10 Agent retention prediction method and device, computer equipment and storage medium

Publications (1)

Publication Number Publication Date
CN112365051A true CN112365051A (en) 2021-02-12

Family

ID=74509512

Family Applications (1)

Application Number Title Priority Date Filing Date
CN202011249497.2A Withdrawn CN112365051A (en) 2020-11-10 2020-11-10 Agent retention prediction method and device, computer equipment and storage medium

Country Status (1)

Country Link
CN (1) CN112365051A (en)

Cited By (1)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN112990480A (en) * 2021-03-10 2021-06-18 北京嘀嘀无限科技发展有限公司 Method and device for building model, electronic equipment and storage medium

Cited By (1)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN112990480A (en) * 2021-03-10 2021-06-18 北京嘀嘀无限科技发展有限公司 Method and device for building model, electronic equipment and storage medium

Similar Documents

Publication Publication Date Title
CN111506723B (en) Question-answer response method, device, equipment and storage medium
CN111950738A (en) Machine learning model optimization effect evaluation method and device, terminal and storage medium
CN112328646B (en) Multitask course recommendation method and device, computer equipment and storage medium
CN111476442B (en) Agent service output mode determining method, device, computer equipment and medium
CN111950622B (en) Behavior prediction method, device, terminal and storage medium based on artificial intelligence
CN111738778B (en) User portrait generation method and device, computer equipment and storage medium
CN112102011A (en) User grade prediction method, device, terminal and medium based on artificial intelligence
CN111950625A (en) Risk identification method and device based on artificial intelligence, computer equipment and medium
CN113435998A (en) Loan overdue prediction method and device, electronic equipment and storage medium
CN112199417B (en) Data processing method, device, terminal and storage medium based on artificial intelligence
CN112365051A (en) Agent retention prediction method and device, computer equipment and storage medium
CN112818028A (en) Data index screening method and device, computer equipment and storage medium
CN113420847B (en) Target object matching method based on artificial intelligence and related equipment
CN114968336A (en) Application gray level publishing method and device, computer equipment and storage medium
CN114240677A (en) Medical data risk identification method and device, electronic equipment and storage medium
CN114881313A (en) Behavior prediction method and device based on artificial intelligence and related equipment
CN114399368A (en) Commodity recommendation method and device based on artificial intelligence, electronic equipment and medium
Zhai et al. Research on Application of Meticulous Nursing Scheduling Management Based on Data‐Driven Intelligent Optimization Technology
CN112668788A (en) User scoring model training method based on deep learning and related equipment
CN113987351A (en) Artificial intelligence based intelligent recommendation method and device, electronic equipment and medium
CN111949867A (en) Cross-APP user behavior analysis model training method, analysis method and related equipment
CN112381595B (en) User value prediction method based on communication behavior and related equipment
CN112036641B (en) Artificial intelligence-based retention prediction method, apparatus, computer device and medium
CN113553513B (en) Course recommendation method and device based on artificial intelligence, electronic equipment and medium
CN112801144B (en) Resource allocation method, device, computer equipment and storage medium

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination
WW01 Invention patent application withdrawn after publication
WW01 Invention patent application withdrawn after publication

Application publication date: 20210212