WO2021098652A1 - Procédé et dispositif de traitement de données - Google Patents
Procédé et dispositif de traitement de données Download PDFInfo
- Publication number
- WO2021098652A1 WO2021098652A1 PCT/CN2020/129121 CN2020129121W WO2021098652A1 WO 2021098652 A1 WO2021098652 A1 WO 2021098652A1 CN 2020129121 W CN2020129121 W CN 2020129121W WO 2021098652 A1 WO2021098652 A1 WO 2021098652A1
- Authority
- WO
- WIPO (PCT)
- Prior art keywords
- feature
- user
- users
- call
- duration
- Prior art date
Links
Images
Classifications
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06Q—INFORMATION AND COMMUNICATION TECHNOLOGY [ICT] SPECIALLY ADAPTED FOR ADMINISTRATIVE, COMMERCIAL, FINANCIAL, MANAGERIAL OR SUPERVISORY PURPOSES; SYSTEMS OR METHODS SPECIALLY ADAPTED FOR ADMINISTRATIVE, COMMERCIAL, FINANCIAL, MANAGERIAL OR SUPERVISORY PURPOSES, NOT OTHERWISE PROVIDED FOR
- G06Q40/00—Finance; Insurance; Tax strategies; Processing of corporate or income taxes
- G06Q40/03—Credit; Loans; Processing thereof
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F18/00—Pattern recognition
- G06F18/20—Analysing
- G06F18/21—Design or setup of recognition systems or techniques; Extraction of features in feature space; Blind source separation
- G06F18/214—Generating training patterns; Bootstrap methods, e.g. bagging or boosting
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F18/00—Pattern recognition
- G06F18/20—Analysing
- G06F18/24—Classification techniques
- G06F18/241—Classification techniques relating to the classification model, e.g. parametric or non-parametric approaches
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06Q—INFORMATION AND COMMUNICATION TECHNOLOGY [ICT] SPECIALLY ADAPTED FOR ADMINISTRATIVE, COMMERCIAL, FINANCIAL, MANAGERIAL OR SUPERVISORY PURPOSES; SYSTEMS OR METHODS SPECIALLY ADAPTED FOR ADMINISTRATIVE, COMMERCIAL, FINANCIAL, MANAGERIAL OR SUPERVISORY PURPOSES, NOT OTHERWISE PROVIDED FOR
- G06Q10/00—Administration; Management
- G06Q10/04—Forecasting or optimisation specially adapted for administrative or management purposes, e.g. linear programming or "cutting stock problem"
Definitions
- the present invention relates to the technical field of financial technology (Fintech), in particular to a data processing method and device.
- the collection robot After the collection robot receives the collection list sent by various online lending companies, it generally collects the users in the collection list directly in accordance with the time sequence of receiving each collection list.
- the number of collection robots receiving online loan companies every day is not fixed, and the number of users to be collected given by each online loan company is also not fixed. In this case, the total number of users to be collected by the collection robot each day cannot be determined.
- this method uses a first-come, first-served manner to call collection calls to each user in turn, it may cause the collection task of the day to be unable to complete and reduce the collection effect.
- the present invention provides a data processing method and device, which are used to solve the technical problem of poor collection effect caused by sequentially dialing collection calls in a first-come, first-served manner in the prior art.
- the present invention provides a data processing method applied to a collection system.
- the method includes: obtaining a first list, using the first model to determine the user category of each user in the first list, and making statistics on the first list The number of users belonging to each user category in the, and the first duration is determined based on the number. If the first duration exceeds the set duration, the second model is used to determine the probability of each user belonging to the first user category performing the preset behavior, and Determine the second list based on probability.
- the first list includes multiple users who have not performed the preset behavior
- the user category of each user includes the first user category
- the first user category indicates that the user will answer the call made by the collection system.
- the first duration represents the duration required to make calls to all users in the first list
- the second list is used to indicate users who need to make calls after the current time.
- the first model is used to predict whether each user in the first list will answer the call (that is, the user category), and the time to complete the collection task is determined, and then the time to complete the collection task is determined.
- the second model is used to determine the user with a higher probability of successful collection.
- the user with a higher success rate can be given priority to call the collection phone, which helps to improve the collection effect.
- the user category of each user further includes a second user category
- the second user category indicates that the user will not answer the call made by the collection system.
- determining the first duration based on the number includes: first obtaining the first call duration corresponding to the first user category and the second call duration corresponding to the second user category, and then according to the first user category in the first list
- the number of users and the first call duration, the number of users belonging to the second user category in the first list and the second call duration determine the total call duration to make calls to all users in the first list, and finally based on the total call duration And the number of available phone numbers to determine the first duration.
- the first call duration is determined according to the call duration of each user who answered the call in the historical time period; the second call duration is determined according to the call duration waiting to be answered after the call is made to the user.
- the first call duration of the user who answered the call is determined by using the call duration of the call to the user to call the collection call within the historical time period, so that the first call duration combines the characteristics of the historical dialing information, so that each received call can be accurately identified
- the second call duration is the call duration waiting to be answered, so that the call duration of each user who does not answer the call can be accurately identified.
- the total call duration required to make a collection call to the users in the first list who answered the call can be determined, which is predicted by the second call and the first model
- the number of users who do not answer the call can determine the total call time required for the users who do not answer the call in the first list to make a collection call, so as to predict the total call time to make calls to all users in the first list.
- This method is based on The historical data is analyzed to better meet the actual business situation and make the predicted first time period more accurate.
- the available phone numbers can be determined in the following way: For multiple phone numbers previously applied for by the operator, first obtain the predicted duration based on the total call duration and the number of multiple phone numbers, and then determine the multiple phone numbers. The probability of a phone number going offline within the predicted time period, and then a phone number whose probability is not greater than the first preset threshold is used as an available phone number.
- the number of phone numbers that may go offline during the execution period of the collection task can be prejudged . In this way, by using the number of phone numbers that will not go offline to determine the first time period, the risk of phone numbers going offline can be predicted in advance, and the accuracy of the completion of the collection task can be guaranteed.
- the risk judgment can be used as an auxiliary means to help normal business execution without occupying the collection robot to make normal calls. Time to collect calls, thereby helping to reduce the impact of the risk judgment process on normal business.
- the first duration does not exceed the set duration
- a request message for processing the third list is received within the first duration
- it can also be determined based on the first model to send to the third list
- the total call duration for calling all users in the first list and the third list is judged in advance, and the total call duration exceeds the set duration and refuses to receive it.
- the third list can avoid accepting collection tasks that cannot be completed, thereby helping to reduce customer losses.
- the first model can be a classification model, and the first model can be obtained in the following way: first obtain the feature values of multiple users under each feature, and then for any feature, according to multiple users The number of users who answered the phone, the number of users who did not answer the phone, the number of users corresponding to each characteristic value of the characteristic, the number of users who answered the phone among the users corresponding to each characteristic value, and the users corresponding to each characteristic value The number of users who did not answer the call in the, determines the degree of correlation between the feature and the behavior of whether the user answered the call.
- the feature whose degree of association with the user's behavior of answering the phone is greater than or equal to the second preset threshold is taken as the strong correlation feature, based on the number of users who answered the phone, the number of users who did not answer the phone, and the strong correlation among multiple users.
- the first model can be trained based on only the features with a higher degree of association.
- the amount of data involved in training is less, and the training model is more efficient. High; and, because the training data used is more concentrated on the feature data that is strongly related to the behavior of answering the phone, the training process of the first model is more aggregated, and the model effect can also be better.
- the degree of association between each feature and whether the user answers the call can satisfy the following conditions:
- X is any feature
- R(X) is the feature value set of X feature, including each feature value of X feature
- x is any feature value of feature X
- Y is the behavior of whether the user answers the phone
- R(Y ) Is the behavior set of whether the user answers the phone, including the behavior of the user answering the phone and the behavior of the user not answering the phone
- y is the behavior of the user answering the phone or the behavior of the user not answering the phone
- I(X, Y) is the feature X and the user
- the degree of association of the behavior of answering the phone, P(x,y) is the ratio of the number of users who have performed the behavior y among the users corresponding to the characteristic value x to the total number of users
- P(x) is the proportion of the users corresponding to the characteristic value x
- the ratio of the total number of users, P(y) is the ratio of the number of users who have performed behavior y to the total number of users.
- the degree of association between each feature and the behavior of answering the phone is obtained, so that the degree of association integrates the relevant information of each feature value. , As the information used is richer, the degree of association can be made more accurate.
- the second model can be a neural network model, and the second model can be obtained in the following manner: first obtain the feature values of multiple users under each feature, and then target any user according to the user’s current status. The feature value under each feature and each feature value of each feature construct the feature vector of the user under each feature, and the feature vector of the user under each feature is spliced to obtain the first feature vector corresponding to the user.
- the second feature vector corresponding to the user is obtained according to whether the user performs the preset behavior, and then the first feature vector corresponding to the multiple users is used as the model input to obtain the prediction result of the multiple users performing the preset behavior, and finally based on the multiple users
- the second feature vector and the prediction results of multiple users performing preset behaviors adjust the model parameters to obtain the second model.
- the feature vector of the user can integrate the feature value of each feature. Feature information, the information is more comprehensive, and the form of expression is more concise. In this way, the model obtained based on the model input training with rich information and concise form has better effect and higher training efficiency.
- each feature value of each feature can be obtained in the following way: if the feature is a discrete feature, then the various values of multiple users under the feature can be counted, and these values are taken as the feature Each characteristic value of. If the feature is a continuous feature, you can count the value ranges of multiple users under the feature, and then divide the value range into multiple value range intervals, and set a corresponding characteristic value for each value range interval. Get each feature value of the feature.
- each feature (including the continuous feature and the discrete feature) can have the same discrete manifestation, so that each discrete feature value can be used as training when training the model Data, without the need to fit the probability distribution function to continuous features, which can improve the efficiency of data processing.
- the present invention provides a data processing device, the device includes: an acquisition module, configured to acquire a first list, the first list includes a plurality of users who have not performed a preset behavior; a determining module, configured to use the first list
- the model determines the user category of each user in the first list.
- the user category of each user includes the first user category.
- the first user category represents that the user will answer the call made by the collection system; the processing module is used to count the number of users in the first list. The number of users in the user category, and the first duration is determined based on the number.
- the first duration represents the duration required to make calls to all users in the first list; if the first duration exceeds the set duration, the second model is used to determine that they belong to The probability of each user in the first user category performing the preset behavior, and the second list is determined according to the probability; the second list is used to indicate users who need to make a call after the current moment.
- the user category of each user further includes a second user category, and the second user category indicates that the user will not answer the call made by the collection system.
- the acquiring module may also acquire the first call duration corresponding to the first user category and the second call duration corresponding to the second user category.
- the determining module can determine the number of users in the first list and the first call duration, the number of users belonging to the second user category in the first list, and the second call duration to determine the number of users in the first list
- the total call duration of the user's call; the first duration is determined based on the total call duration and the number of available phone numbers.
- the first call duration is determined based on the call duration of each user who answered the call in the historical time period, and the second call duration is determined based on the call duration waiting to be answered after the call is made to the user.
- the determining module can determine the available phone numbers in the following way: For multiple phone numbers previously applied for by the operator, based on the total call duration and the number of multiple phone numbers, obtain the predicted duration, and determine With regard to the probability of multiple phone numbers going offline within the predicted time period, a phone number whose probability is not greater than the first preset threshold is used as an available phone number.
- the device may further include a dialing module. While the determining module uses the first model to determine the user category of each user in the first list, the dialing module may use the contact information of each user in the first list. , Use the available phone number to call each user.
- the processing module may also determine to send to the third list based on the first model The second time required for all users in to make calls. If the sum of the first duration and the second duration exceeds the set duration, the processing module may also refuse to receive the third list.
- the first model may be a classification model.
- the processing module can also obtain the characteristic values of multiple users under each characteristic. For any characteristic, according to the number of users who answered the phone, the number of users who did not answer the phone, and the characteristic value of the multiple users.
- the number of users corresponding to each characteristic value, the number of users who answered the phone among the users corresponding to each characteristic value, and the number of users who did not answer the phone among the users corresponding to each characteristic value determine the characteristic and the behavior of whether the user answers the phone Then, the feature that is related to the behavior of whether the user answers the call is greater than or equal to the second preset threshold as a strong correlation feature, based on the number of users who answered the call and the number of users who did not answer the call among multiple users , The number of users corresponding to each feature value of the strong correlation feature, the number of users who answer the phone among the users corresponding to each feature value of the strong correlation feature, and the number of users who have not answered the phone among the users corresponding to each feature value of the strong correlation feature , The first model is obtained by training.
- the degree of association between each feature and whether the user answers the call satisfies the following conditions:
- X is any feature
- R(X) is the feature value set of X feature, including each feature value of X feature
- x is any feature value of feature X
- Y is the behavior of whether the user answers the phone
- R(Y ) Is the behavior set of whether the user answers the phone, including the behavior of the user answering the phone and the behavior of the user not answering the phone
- y is the behavior of the user answering the phone or the behavior of the user not answering the phone
- I(X, Y) is the feature X and the user
- the degree of association of the behavior of answering the phone, P(x,y) is the ratio of the number of users who have performed the behavior y among the users corresponding to the characteristic value x to the total number of users
- P(x) is the proportion of the users corresponding to the characteristic value x
- the ratio of the total number of users, P(y) is the ratio of the number of users who have performed behavior y to the total number of users.
- the second model can be a neural network model
- the processing module can also obtain the feature value of multiple users under each feature, for any user, according to the user's feature value under each feature Construct the feature vector of the user under each feature with each feature value of each feature, join the feature vector of the user under each feature to obtain the first feature vector corresponding to the user, and then obtain the user corresponding to the user according to whether the user performs a preset behavior
- the second feature vector, and then the first feature vector corresponding to multiple users is used as the model input, and the prediction result of multiple users performing preset behaviors is obtained, based on the second feature vector of multiple users and multiple users performing preset behaviors
- the prediction result adjusts the model parameters to obtain the second model.
- the processing module can also obtain each feature value of each feature in the following manner: if the feature is a discrete feature, then the various values of multiple users under the feature can be counted, and each value is taken as Each feature value of the feature; if the feature is a continuous feature, you can count the value ranges of multiple users under the feature, divide the value range into multiple value range intervals, and set each value range interval A corresponding characteristic value, each characteristic value of the characteristic is obtained.
- the present invention provides a computing device including at least one processor and at least one memory.
- the memory stores a computer program, and when the computer program is executed by the processor, the processor can execute any data processing method of the first aspect described above.
- the present invention provides a computer-readable storage medium that stores a computer program that can be executed by a computing device.
- the computing device can execute any of the data processing methods of the first aspect described above. .
- FIG. 1 is a schematic structural diagram of a collection system provided by an embodiment of the present invention
- FIG. 2 is a schematic flowchart of a data processing method provided by an embodiment of the present invention.
- FIG. 3 is a schematic structural diagram of a one-dimensional cell model provided by an embodiment of the present invention.
- FIG. 4 is a schematic structural diagram of a data processing device provided by an embodiment of the present invention.
- Fig. 5 is a schematic structural diagram of a computing device provided by an embodiment of the present invention.
- the preset behavior may refer to any behavior, such as shopping behavior in the advertising promotion field, card issuance behavior in the credit card promotion field, or repayment behavior in the collection field.
- the following embodiments of the present invention take the field of collection as an example to describe the data processing method in the embodiments of the present invention.
- FIG. 1 is a schematic diagram of the architecture of a collection system provided by an embodiment of the present invention.
- the collection system may be provided with a collection robot 110 and at least one client, such as client 121, client 122, and client 123 .
- the client can be any online loan client that provides loans to users in the financial technology field, such as an online loan client installed in a commercial bank, an online loan client installed in a financial company, or an online loan client installed in a trust company End, etc., without limitation.
- the collection system may also be provided with at least one client terminal, such as client terminal 131, client terminal 132, and client terminal 133.
- client terminal 131 client terminal 131
- client terminal 132 client terminal 132
- client terminal 133 client terminal 133
- the user terminal can be any terminal device with a call function, such as an elderly phone, a smart phone, a slide phone, etc., which is not limited.
- the collection robot 110 may be connected to at least one client and at least one client respectively, for example, it may be connected in a wired manner, or may also be connected in a wireless manner, which is not specifically limited.
- FIG. 2 is a schematic flowchart of a data processing method provided by an embodiment of the present invention.
- the method is applied to a collection robot, such as the collection robot 110 shown in FIG. 1.
- the method includes:
- Step 201 Obtain a first list.
- the first list includes a plurality of users who have not performed a predetermined behavior.
- the first list may include the contact information of each user who has not performed the preset behavior.
- each user who has not performed the preset behavior is each user who has overdue the loan after the loan is directed to the online lending institution.
- the collection system may also be provided with a pre-processing device (not shown in FIG. 1), and the pre-processing device can be provided between at least one client and the collection robot 110, or can be provided in the collection robot. 110's interior.
- the preprocessing device may receive the collection list sent by each client, and sort the users to be collected in each collection list according to the set dial strategy to obtain the first list.
- the set dialing strategy can be a dialing strategy set according to business needs. For example, it can be to sort the users to be collected in each collection list according to the chronological order of receiving the collection list, or it can be the order of each client corresponding to each collection list.
- the priority sorts the users to be collected in each collection list, and can also sort the users to be collected in each collection list according to the priority of the online loan product to which each collection list belongs, and can also be based on the corresponding collection list.
- the priority of the city where each client is located sorts the users to be collected in each collection list, and can also be a combination of the above-mentioned multiple dialing strategies, etc., which are not specifically limited.
- the preprocessing device may be a web server based on a worldwide web (web) technology
- the client may be a client provided with a web browser.
- the online lending institution can access the web service interface provided by the preprocessing device through the web browser of its client.
- the online lending institution may have a collection demand for multiple online loan products
- the online lending institution may have a collection demand for multiple online loan products.
- the loan company can pack the user information (including the user's age, gender, education information, marriage information, occupation information, current loan information and historical loan information, etc.) corresponding to each online loan product to be collected into a collection list, and To upload.
- the online loan structure can also select the termination time of the collection on the web service interface, so that the collection robot 110 feeds back the collection result before the termination time of the collection.
- the preprocessing device can first sort the collection list of each client according to the priority of each online loan product, and then sort the collection lists of each client according to the customer's priority.
- the priority of the terminal sorts the collection lists of each client after the initial sorting to obtain the first list.
- the collection lists of each client can be first sorted according to the priority of each client, and then the collection lists of each client can be sorted according to the priority of each online loan product to obtain the first list, which is not limited.
- the priority of the client 121>the priority of the client 123>the priority of the client 122, and the priority of the online loan product 2>the priority of the online loan product 1 if the collection list of the client 121 includes Online loan product 1 corresponds to the user to be collected 1 and user 2 to be collected, and the collection list of the client 122 includes the user to be collected 3 corresponding to the online loan product 1 and the user to be collected 4 and user 5 to be collected corresponding to the online loan product 2.
- the collection list of the client 123 includes users 6 to be collected corresponding to the online loan product 2.
- the first list can be: users to be collected 1, users to be collected 2, users to be collected 6, users to be collected 4, users to be collected 5, User to be collected 3.
- the first list may also be: user to be collected 6, user to be collected 4, user to be collected 5, user to be collected 1, user to be collected 2, user to be collected 3.
- the pre-processing device can send the first list to the collection robot 110, or the collection robot 110 can also use the file transfer protocol from the pre-processing device. Get the first list. If the pre-processing device is a device in the collection robot 110 (such as a pre-processing process), the pre-processing device can directly store the first list in the memory of the collection robot 110, so that the collection robot 110 calls the processing process to the first list. Each user of the company makes a collection call.
- each client may send the collection list to the preprocessing device the day before the collection is executed, or send the collection list to the preprocessing device on the day when the collection is executed.
- the embodiment of the present invention does not limit the device for the client to send the collection list.
- the client may directly send the collection list to the preprocessing device, or send the collection list to the collection robot 110, and then the collection robot 110 forwards it to the preprocessing device.
- Step 202 Use the first model to determine the user category of each user in the first list.
- the user category of each user includes a first user category, and the first user category represents that the user will answer the call made by the collection system.
- the collection robot 110 after the collection robot 110 obtains the first list, it can first determine the time difference between the current time and the time when the collection robot 110 starts the collection. If the time difference is greater than or equal to the first preset time difference (greater than or equal to the determined time difference) Time required for the collection strategy), the collection robot 110 can first analyze whether the collection task in the first list can be completed before the collection termination time point set by each client, and set the corresponding collection according to the analysis result of whether it can be completed Then, at the time when the collection robot 110 starts the collection, according to the corresponding collection strategy, the collection of each user in the first list is started.
- the time difference is less than or equal to the second preset time difference (any value less than or equal to 0)
- the parallel processing thread analyzes whether the collection tasks in the first list can be completed before the collection termination time point set by each client. After the corresponding collection strategy is set according to the result of whether it can be completed, the control dialing thread starts to check according to the corresponding collection strategy. Each user in the first list collects.
- the collection robot 110 may first call the processing process to analyze whether the collection task in the first list can be completed before the collection termination time point set by each client , And set the corresponding collection strategy according to the analysis result that can be completed.
- the parallel dialing process is called to call each user in the first list in the order of each user in the first list, and then After the corresponding collection policy is obtained, the parallel dialing process is controlled to call each user in the first list according to the corresponding collection policy.
- the first preset time difference can be set by those skilled in the art based on experience, or can be determined according to the duration of the collection strategy corresponding to each collection task determined in the historical period, for example, to determine the collection strategy corresponding to each collection task Average duration, or to determine the median duration of the collection strategy corresponding to each collection task, or to determine the weighted average duration of the collection strategy corresponding to each collection task, the closer the collection task is to this collection task, the collection task The greater the weight, and so on.
- the collection robot 110 can be equipped with two environments, an on-line production environment and a simulation environment.
- the collection robot 110 can push the first list to the online production environment and simulation at the same time. surroundings.
- the online production environment is used to perform the normal dialing process. For example, when the collection robot 110 is detected to start the collection time (such as 8:00), it will follow the order of each user in the first list (or collection strategy sent by the simulation environment) Call collection calls to each user in turn, record the phone information and the user's repayment willingness (such as the collection phase when the user ends the call), and send each user's call result to the corresponding client of the online lending institution to enable the online loan Institutions follow up the subsequent repayment of users.
- the collection stage can include the five stages of asking if the other party is the person, explaining the overdue situation, asking when the payment can be repaid, confirming the repayment date, and ending.
- the simulation environment is used to analyze the collection tasks corresponding to the first list, determine the corresponding collection strategy, and send the corresponding collection strategy to the online production environment, so that the online production environment executes the collection according to the corresponding collection strategy task.
- the online production environment can also send the collection results of each user obtained by executing the collection task to the simulation environment, so that the simulation environment can update various internal parameters, such as the first call duration, the first model parameter, the second model parameter, The average number of offline phone numbers per hour in the historical period, etc.
- the risk judgment can be used as a means to assist the execution of the normal collection task, avoiding the risk judgment taking up the time of the collection robot calling the collection call normally, thereby reducing The impact of risk judgment on normal collection tasks.
- the collection robot 110 can use the first model to predict each user in the first list, thereby determining the user category of each user.
- the user category of the user may include only the first user category, or may include both the first user category and the second user category. If the user category of a user is the first user category, it means that the user will answer the collection call made by the collection robot. If the user category of a user is the second user category, it means that the user will not answer the collection call made by the collection robot.
- Step 203 Count the number of users belonging to each user category in the first list, and determine a first duration based on the number.
- the first duration represents the amount of time needed to make calls to all users in the first list. duration.
- the collection robot 110 can count the number of users belonging to the first user category and the second user category in the prediction result, and then According to the number of users belonging to the first user category in the first list and the first call duration corresponding to the first user category, the number of users belonging to the second user category in the first list and the second call duration corresponding to the second user category To determine the first time required to make calls to all users in the first list.
- the first call duration is used to identify the call duration that may be consumed by each user who answers the call
- the second call duration is used to identify the call duration that may be consumed by each user who does not answer the call
- the first call duration and the second call duration are The call duration can be set by those skilled in the art based on experience, or can be set according to business needs, and is not specifically limited.
- the first call duration may be determined according to the duration required to make a call to each user who answered the call in the historical period
- the second call duration may be determined according to the duration of waiting to be answered after the call was made to the user.
- the collection robot 110 may first obtain the record from the statistical database and the call duration of all users who have answered the collection call made by the collection robot 110 in the last 2 weeks (the call duration of each user) Duration refers to the total call duration from the start of the dialing to the end of the call), and then take the median of the call durations of these users as the first call duration, or take the average of the call durations of these users as the first call duration, etc. .
- the second call duration refers to the waiting duration for the collection robot 110 to wait for the other party to answer, and the duration may be determined according to the set number of ringing times. For example, if it is set to hang up the call after the other party has not answered the call after waiting for 8 phone calls, the second call duration can be the total call duration of the 8 phone calls. Since the waiting time of each user who has not answered the collection call is the same, the collection robot 110 may set the second call duration to the waiting time of any user who has not answered the collection call in the historical period.
- the first call duration of the user who answered the call is determined by using the call duration of the call to the user to collect calls within the historical period, so that the first call duration is combined with the characteristics of the historical dialing information, so as to accurately identify the call duration of each call received.
- the call duration of the user, correspondingly, the second call duration is the call duration waiting to be answered, so that the call duration of each user who does not answer the call can be accurately identified.
- the total call duration required to make a collection call to the users in the first list who answered the call can be determined, which is predicted by the second call and the first model
- the number of users who do not answer the call can determine the total call time required for the users who do not answer the call in the first list to make a collection call, so as to predict the total call time to make calls to all users in the first list.
- This method is based on The historical data is analyzed to better meet the actual business situation and make the predicted first time period more accurate.
- the collection robot 110 may apply for multiple phone numbers in the operator in advance, and use the multiple phone numbers to jointly make a collection call to each user in the first list.
- the collection robot 110 can first base on the number of users belonging to the first user category, the first call duration, and the number of users belonging to the second user category.
- the second call duration determine the total call duration for making calls to all users in the first list, and then determine the first duration according to the multiple pre-applied phone numbers and the total call duration.
- the collection robot 110 may directly use the ratio of the total call duration to the number of multiple phone numbers as the first duration.
- the phone number may go offline as the dialing time increases. Therefore, if the ratio of the total call time to the number of multiple phone numbers is directly used as the first time length, it may be possible The first time length will be inaccurate due to the offline of some phone numbers.
- the collection robot 110 may determine the first duration in the following manner:
- the collection robot 110 may first determine the predicted duration required to make calls to all users in the first list based on the total call duration and the number of multiple phone numbers, and analyze the probability of each phone number being offline within the predicted duration. Among them, the probability of each phone number going offline can be determined based on the theory of probability. Since the time interval t from the start of calling the collection call to the offline of each phone number obeys the exponential distribution F(t) with the parameter ⁇ , the probability density function f(t) corresponding to the time interval t is:
- the exponential distribution F(t) corresponding to the time interval t can be:
- ⁇ can be set as the average number of offline phone numbers per hour in the historical period.
- the historical period can be set by those skilled in the art based on experience. For example, it can be the last 2 weeks. In this way, the value of ⁇ can be broken over time. Update.
- the collection robot 110 determines the total call duration and the available phone number.
- the number of phone numbers determines the first time required to make calls to all users in the first list. If the first duration is less than or equal to the set duration, it means that even if some phone numbers are offline during the dialing process, the collection robot 110 can complete the collection task corresponding to the first list, so that the collection robot 110 can follow the items in the first list. The user continues to dial the collection call in sequence.
- the collection robot 110 can then determine whether the predicted duration is greater than the set duration. If the predicted duration is less than or equal to the set duration, it means that when there is no phone number offline during the dialing process, the collection robot 110 can complete the collection task corresponding to the first list. At this time, if the operator supports the collection robot 110 to apply for a backup phone number, the collection robot 110 can apply for a backup phone number from the operator, and the number of backup phone numbers can be greater than or equal to the phone number whose offline probability is greater than the first preset threshold. quantity.
- the collection robot 110 may obtain a part of users with a higher collection success rate from the first list to form the second list.
- the predicted duration is greater than the set duration, it means that even if there is no phone number offline during the dialing process, the collection robot 110 cannot complete the collection task corresponding to the first list.
- the collection robot 110 may also determine to apply for a standby phone number or determine the second list according to the support of the operator.
- the collection robot when the operator supports the collection robot to apply for a backup phone number, the collection robot can also apply for a backup phone number while obtaining some users with a higher collection success rate from the first list to form the second list. Or when the operator supports the collection robot to apply for a backup phone number, the collection robot may not apply for a backup phone number. Instead, it obtains some users with a higher success rate of collection from the first list to form the second list.
- the collection robot may not apply for a backup phone number. Instead, it obtains some users with a higher success rate of collection from the first list to form the second list.
- the number of phone numbers that may go offline during the execution period of the collection task can be pre-determined. In this way, by using phones that will not go offline The number of numbers is re-determined for the first time length, which can predict the risk of phone numbers going offline in advance to ensure the accuracy of the completion of the collection task.
- the simulation environment can determine the first duration based on the cell model.
- a one-dimensional cell model can be set in the simulation environment, and the one-dimensional cell model is used to store all users in the first list.
- Figure 3 is a schematic structural diagram of a one-dimensional cell model provided by an embodiment of the present invention. As shown in Figure 3, each cell can be used to identify a user, and each cell has a left neighbor cell and/or a right cell. A neighboring cell, for example, cell A is the left neighbor of cell B, and cell C is the right neighbor of cell B.
- each cell can have three different states, and the state can be identified by color, for example, white is used to identify the state of not dialed, gray is used to identify the state of dialed but not answered, and black is used to identify the dialed and answered state. status.
- white is used to identify the state of not dialed
- gray is used to identify the state of dialed but not answered
- black is used to identify the dialed and answered state. status.
- the color of the cell changes from white to gray or black, it means that the collection robot 110 has dialed and collected the user corresponding to the cell. Therefore, when the cell changes from the white state to the black state, it can stay first.
- the duration of the call the duration of the second call when the status is changed from white to gray.
- the first call duration in order to save time, it can also be set to be less than the first call in proportion (The value of the duration) and then update the color of the cell from white to black.
- the second call duration in order to save time, it can also be set to a value less than the second call duration according to the ratio, and the ratio is the same as the ratio used for the first call duration).
- the cell color is updated from white to gray. And this process can be executed in parallel according to the number of available phone numbers. When the color of each cell in the one-dimensional cell model changes, the execution time is counted, and the first time length is determined according to the ratio.
- Step 204 If the first duration exceeds the set duration, use a second model to determine the probability of each user belonging to the first user category performing the preset behavior, and determine a second list according to the probability; The second list is used to indicate users who need to make a call after the current time.
- the collection robot 110 can use the second model to determine the repayment probability of each user at least for each user who will answer the call in the prediction result, and sort the users according to the repayment probability of each user to obtain the second List, so that the collection robot 110 makes a collection call to each user according to the second list.
- the collection robot 110 may only use the second model to determine the repayment probability of each user who will answer the phone in the prediction result, and then determine the repayment probability of each user who will answer the phone according to the repayment probability from large to small (or from small to small). To the largest) order to get the second list.
- the collection robot 110 can only make collection calls to users who will answer the call and have a high probability of repayment, instead of calling for those who will not answer the call or who will answer the call but the probability of repayment is low.
- the user makes a collection call, thereby improving the effect of the collection call, and can reduce the data processing volume of the collection robot 110 and improve the collection efficiency.
- the collection robot 110 can use the second model to determine the repayment probability of each user in the first list, and according to the repayment probability of each user in the first list from large to small (or from small to large) ) Order to get the second list. In this way, when it is determined that all users cannot be called for collection, the collection robot 110 can collect calls for each user in the first list in the descending order of the repayment probability, so as to be able to call as many users as possible. And avoid missing users who predict that they will not answer the phone but will actually answer the phone, thereby improving the accuracy of the call collection.
- the collection robot 110 can complete the collection task for the first list within the set time period. In this way, the collection robot 110 can continue to make a collection call to each user in the order of each user in the first list.
- the collection robot 110 may receive the third collection call when the collection call is made within the first time period. List of requests for collection tasks. In this case, the collection robot 110 may further determine the second time length required to make calls to all users in the third list based on the first model. If the sum of the first time length and the second time length exceeds the set time period, it means that the collection robot 110 cannot complete the collection tasks for all users in the first list and the third list within the set time period. Therefore, the collection robot 110 may refuse to accept The third list.
- the total call duration for calling collection calls to all users in the first list and the third list is determined in advance, and the total call duration exceeds the set duration If you refuse to accept the third list, you can avoid accepting collection tasks that cannot be completed and reduce customer losses.
- the collection robot 110 may suddenly go offline during the first time period. In this case, the collection robot 110 may determine the new first duration based on the total call duration and the number of phone numbers that are not offline. Alternatively, if the collection robot 110 has made a collection call to some users in the first list, the collection robot 110 may determine the total call time for making a collection call to the remaining users in the first list who have not made a collection call based on the first model. Then, a new first duration is determined based on the total call duration and the number of phone numbers that are not offline.
- the collection robot 110 may be sent to the operation and maintenance personnel, so that the operation and maintenance personnel can determine whether to apply for a backup phone number from the operator.
- the first model is used to predict the users who will answer the phone and the users who will not answer the phone in the first list, and the time to complete the collection task is determined, and then the time for completing the collection task is determined.
- the second model is used to determine the user with a higher probability of successful collection, so that when it is determined that the collection task cannot be completed, the user with a higher success rate of collection can be called first to improve the collection effect.
- the above process describes the process of using the first model and the second model to determine the collection strategy.
- the following describes the process of training to obtain the first model and the second model, respectively.
- the first model Since the first model is used to predict whether each user will answer the phone to determine the user category of each user, the first model can be set as a classification model.
- the collection robot 110 can first obtain the feature values of multiple users under each feature, and then for any feature, according to the number of users who answered the call, the number of users who did not answer the call, and the characteristic value of the multiple users.
- the collection robot 110 may regard the feature that has a degree of association with the user's behavior of answering the call greater than or equal to the second preset threshold as a strong correlation feature, and then according to the number of users who answered the call among the multiple users and the number of users who did not answer the call.
- the number of users, the number of users corresponding to each feature value of the strong correlation feature, the number of users who answer the phone among the users corresponding to each feature value of the strong correlation feature, and the number of users who have not answered the phone corresponding to each feature value of the strong correlation feature The number of users, and thus the first model is trained.
- the first model is trained based on the Naive Bayes algorithm. Since the naive Bayes algorithm can update the model parameters based on incremental data in real time, training the first model based on the naive Bayes algorithm can improve the efficiency of training and update.
- the data of multiple (for example, 20000) users who have made a collection call by the collection robot 110 in the historical period can be obtained first.
- the data of each user can include the value of the user under various characteristics, such as the user's gender, age, education, occupation, marital status, resident city, the amount of this loan, the amount of arrears, and the number of days overdue for this loan. , The number of historical loans and the number of historical loan overdues, etc., and also include the category characteristic value of whether the user has answered the collection call when the user makes a collection call.
- the above-mentioned various features include continuous features and discrete features, the above-mentioned various features cannot use a unified evaluation standard to unify data. Therefore, in an example, for any one of the features, if the feature is a discrete feature, the collection robot 110 can count the values of multiple users under the feature, and use each value as each of the feature. Eigenvalues. If the feature is a continuous feature, the collection robot 110 can count the value ranges of multiple users under the feature, divide the value range into multiple value range intervals, and set a corresponding value range for each value range interval. Feature value, and get each feature value of the feature.
- each feature (including continuous features and discrete features) can have the same discrete manifestation, so that each discrete feature value can be used as training data when training the model without Fitting the probability distribution function to continuous features can improve the efficiency of data processing.
- these features are discrete features, and the values of users under these discrete features are these discrete features.
- the values of age, the amount of this loan, the amount of arrears, the number of days that this loan is overdue, the number of historical loans, and the number of historical loan overdues are all infinitely many, so these characteristics are continuous characteristics.
- the continuous values of these continuous features are adjusted to discrete values.
- the age feature is discretized into feature value 1, feature value 2,..., feature value 7, and feature value 1 to feature value 7 in turn represent age (in years) in the following 7 age ranges: [0, 15) , [15, 25), [25, 35), [35, 45), [45, 55), [55, 65), [65, ⁇ ).
- Characteristic value 1 to characteristic value 5 represent the loan amount (unit: 10,000 yuan) in the following 5 loan amount ranges: [ 0, 0.5), [0.5, 1.5), [1.5, 3.5), [3.5, 5), [5, ⁇ ).
- Characteristic value 1 to characteristic value 5 represent the amount of arrears (unit: 10,000 yuan) in the following 5 amounts of arrears Interval: [0, 0.5), [0.5, 1.5), [1.5, 3.5), [3.5, 5), [5, ⁇ ). Discretize the characteristics of the overdue days of this loan into feature value 1, feature value 2, ..., feature value 5.
- Feature value 1 to feature value 5 represent the number of overdue days (in days) in the following 5 overdue days interval: [ 0, 1), [1, 3), [3, 5), [5, 7), [7, ⁇ ). Discretize the characteristics of historical loan times into characteristic value 1, characteristic value 2, ..., characteristic value 5.
- Characteristic value 1 to characteristic value 5 represent the historical loan times (unit: times) in the following 5 historical loan times ranges: [ 0, 1), [1, 2), [2, 3), [3, 5), [5, ⁇ ).
- the degree of association between the feature and the category feature can be calculated.
- the degree of association can be represented by mutual information.
- Mutual information refers to the measurement of information that a random variable contains another random variable. The greater the value of mutual information, the stronger the coupling between the two random variables and the greater the degree of association.
- the mutual information of each feature and the category feature of whether the user answers the phone or not can meet the following conditions:
- X is any feature
- R(X) is the feature value set of X feature, including each feature value of X feature
- x is any feature value of feature X
- Y is the behavior of whether the user answers the phone
- R(Y ) Is the behavior set of whether the user answers the phone, including the behavior of the user answering the phone and the behavior of the user not answering the phone
- y is the behavior of the user answering the phone or the behavior of the user not answering the phone
- I(X, Y) is the feature X and the user
- the degree of association of the behavior of answering the phone, P(x,y) is the ratio of the number of users who have performed the behavior y among the users corresponding to the characteristic value x to the total number of users
- P(x) is the proportion of the users corresponding to the characteristic value x
- the ratio of the total number of users, P(y) is the ratio of the number of users who have performed behavior y to the total number of users.
- the random variable X represents the age feature
- the random variable Y represents the categorical feature of whether the phone is answered
- P(x) represents the proportion of the number of users whose age feature is x to the number of 20,000 users.
- the degree of association between each feature and the behavior of answering the phone is obtained, so that the degree of association integrates the relevant information of each feature value. , As the information used is richer, the degree of association can be made more accurate.
- the feature whose mutual information is greater than the third preset threshold may be used as a strong correlation feature.
- the third preset threshold can be set by those skilled in the art based on experience, for example, it can be 0.5 or 0.8, which is not specifically limited.
- the embodiment of the present invention may be based on Naive Bayes using the feature value training of 20,000 users under strong correlation features to obtain the first model, specifically, for each feature value of each strong correlation feature (such as feature value
- the combination is x 1 , x 2 , x 3 ,..., x n , which are respectively a certain feature value of strong correlation feature X 1 , strong correlation feature X 2 , strong correlation feature X 3 ,..., strong correlation feature X n )
- the sample data obtained by combining, the type of whether the sample data will answer the call The value of can be:
- y) is the posterior probability
- x i is the sample data obtained by the combination of eigenvalues as x 1 , x 2 , x 3 ,..., x n.
- N is the number of 20,000 users, the number N y wherein y class feature value of a user, N y, wherein xi is the value of y class feature and characteristic features of the X i is the number of users xi , L xi is the size range characteristic of X i, i.e. the number of feature values possible values x i.
- the first model can be identified by the above formulas.
- the following formula can be used to determine the probability that the user’s characteristic value under the behavioral characteristics is yes and the user’s behavioral characteristics under the The probability that the characteristic value of is No:
- the probability that the characteristic value of the user under the behavior characteristics is yes is greater than the probability that the characteristic value of the user under the behavior characteristics is no, then it is determined that the user is a user who can answer the phone, and the user category of the user is the first user category . If the probability that the characteristic value of the user under the behavior characteristics is No is greater than the probability that the characteristic value of the user is yes under the behavior characteristics, the user is determined to be a user who will not answer the phone, and the user category of the user is the second user category.
- the first model can be updated quickly and in real time, so that the update efficiency of the first model is better.
- the first model can be trained based on only the features with a higher degree of association. In this way, the amount of data involved in training is less, and the efficiency of training the model is higher.
- the training data used is more concentrated on the feature data that is strongly related to the behavior of answering the phone, the training process of the first model is more aggregated, and the model effect is better.
- the second model may be a neural network model.
- the feature values of multiple users under each feature are acquired, and each feature value of each feature is identified by each numerical value.
- the second feature vector corresponding to the user is obtained according to the feature value value of the user under the category feature of whether to repay.
- the first feature vector corresponding to multiple users can be used as the model input to obtain the prediction vector results of multiple user repayments, and adjust based on the second feature vectors of multiple users and the prediction vector results of multiple user repayments
- the model parameters of the second model are used to obtain the optimized second model.
- the second model can include an input layer, a hidden layer, and an output layer.
- the input layer, hidden layer, and output layer adopt a fully connected structure.
- the hidden layer can be set with 10 neuron nodes, and the output layer can be set with 2.
- a neuron node, the activation function of the hidden layer uses the ReLU function, and the activation function of the output layer uses the Softmax function, which represents the probability value of the user's repayment.
- the data of multiple (for example, 50,000) users who have made a collection call by the collection robot 110 in the historical period can be obtained first.
- the data of each user can include the value of the user under various characteristics, such as the user's data. Gender, age, education, occupation, marital status, city of residence, the amount of this loan, the amount of this loan, the number of days that this loan is overdue, the number of historical loans and the number of historical loan overdue, etc., and can also include a collection call to the user The value under the category feature of whether the user repays at the time.
- the user used to train the second model and the user used to train the first model may be partially the same or completely different, which is not specifically limited.
- each continuous feature can be discretized according to the discrete method used when training the first model, and then one-hot encoding is used to convert each feature value of each feature into a numerical form. For example, since there are two feature values (male and female) for gender features, one-hot encoding can convert the two feature values of gender features into a vector with 1 row and 2 columns. If the gender of a user is male , Then the feature vector of the user under the gender feature is (1, 0). Since the marital status feature has 4 feature values (unmarried, married, widowed, divorced), one-hot encoding can convert the 4 feature values of the marital status feature into a vector with 1 row and 4 columns.
- one-hot encoding can transform the 11 feature values of academic features (primary school, junior high school, high school, technical secondary school, vocational school, technical school, junior college, undergraduate, master graduate, doctoral student, postdoctoral) into 1 row and 11 columns
- the vector of occupational characteristics (agriculture, forestry, animal husbandry, fishery, water conservancy, industry, geological survey and exploration, construction, transportation, post and telecommunications, commerce, public catering, material supply and storage, real estate management , Public utilities, resident services and consulting services, health, sports and social welfare, education, culture and art, radio and television, scientific research and comprehensive technical services, finance, insurance, state agencies, party and government agencies, and social organizations , Other industries) is transformed into a 1-row and 13-column vector, the 338 feature values of resident city features (337 major cities, other cities) are transformed into a 1-row and 338-column vector
- the feature vector of the user under each feature can be determined according to the feature value of the user under each feature, and then the feature vector of the user under each feature can be spliced head to tail to obtain the The first feature vector corresponding to the user.
- the first feature vector corresponding to the user can be a one-dimensional vector with 1 row and 400 columns.
- the second feature vector corresponding to the user is determined according to the feature value of the user under the category feature of whether to repay, and the second feature vector corresponding to the user may be a one-dimensional vector with 1 row and 2 columns.
- the second feature vector corresponding to the user may be [1, 0]
- the second feature vector corresponding to the user may be [0, 1].
- the 50,000 feature vectors can be divided into training feature vectors, test feature vectors, and verification feature directions. Among them, the division can be divided according to a random ratio, or can also be divided according to a preset ratio, without limitation.
- the first feature vector of the 35,000 training feature vectors can be input to the neural network model to make the neural network
- the model outputs 35,000 second prediction feature vectors, and then adjusts the parameters of the neural network model based on the 35,000 second prediction feature vectors and the second feature vector of the 35,000 training feature vectors to obtain the second model.
- 10,000 test feature vectors can be used to test the model effect of the second model
- 5000 verification feature vectors can be used to verify whether the test effect of the second model reaches the preset effect
- 10,000 test feature vectors and 5000 verification features The vector can also be used to optimize the model parameters of the second model.
- the user's feature vector is obtained by determining the user's feature vector under each feature, and joining the user's feature vector value under each feature to obtain the user's feature vector, so that the user's feature vector can integrate the characteristics of each feature value.
- Feature information the information is more comprehensive, and the form of expression is more concise. In this way, the model obtained based on the model input training with rich information and concise form has better effect and higher training efficiency.
- the first list is first obtained, the user category of each user in the first list is determined using the first model, and then the number of users belonging to each user category in the first list is counted, and the first list is determined based on the number. Duration, if the first duration exceeds the set duration, the second model is used to determine the probability of each user belonging to the first user category performing the preset behavior, and the second list is determined according to the probability.
- the first list includes multiple users who have not performed the preset behavior, and the user category of each user includes the first user category.
- the first user category indicates that the user will answer the call made by the collection system, and the first duration indicates that the user will receive calls from the collection system.
- the time required for all users in the list to make calls, and the second list is used to indicate the users who need to make calls after the current time.
- the first model is used to predict whether each user in the first list will answer the call, and the time to complete the collection task is determined, and then when it is determined that the collection task cannot be completed
- the second model is used to identify users with a higher probability of successful collection, so that when it is determined that the collection task cannot be completed, users with a higher success rate of collection can be given priority to call collection calls to improve the collection effect.
- an embodiment of the present invention also provides a data processing device, and the specific content of the device can be implemented with reference to the foregoing method.
- Fig. 4 is a schematic structural diagram of a data processing device provided by an embodiment of the present invention, including:
- the obtaining module 401 is configured to obtain a first list; the first list includes multiple users who have not performed a preset behavior;
- the determining module 402 is configured to determine the user category of each user in the first list using a first model, wherein the user category of each user includes a first user category, and the first user category indicates that the user will answer the collection Phone calls made by the system;
- the processing module 403 is configured to count the number of users belonging to each user category in the first list, and determine a first duration based on the number, and the first duration represents making calls to all users in the first list The required duration; if the first duration exceeds the set duration, the second model is used to determine the probability of each user belonging to the first user category performing the preset behavior, and the second is determined according to the probability List; the second list is used to indicate users who need to make a call after the current moment.
- the user category of each user further includes a second user category
- the second user category represents that the user will not answer calls made by the collection system.
- the acquiring module 401 may also acquire the first call duration corresponding to the first user category and the second call duration corresponding to the second user category.
- the first call duration is determined according to the call duration of each user who answered the call in the historical time period
- the second call duration is determined according to the call duration waiting to be answered after the call is made to the user.
- the determining module 402 may be based on the number of users belonging to the first user category in the first list and the first call duration, the number of users belonging to the second user category in the first list, and the second user category.
- the call duration determines the total call duration for making calls to all users in the first list; the first duration is determined based on the total call duration and the number of available phone numbers.
- the determining module 402 determines the available phone number in the following manner: for a plurality of phone numbers previously applied for by an operator, a prediction is obtained based on the total call duration and the number of the plurality of phone numbers Time length, determining the probability of the multiple phone numbers going offline within the predicted time length, and using a phone number with a probability not greater than a first preset threshold as the available phone number.
- the device further includes a dialing module 404. While the determining module 402 uses a first model to determine the user category of each user in the first list, the dialing module 404 can use the first model to determine the user category of each user in the first list. The contact information of each user in the list, using the available phone number to make a call to each user.
- the processing module 403 may also be based on the first duration.
- the model determines the second length of time required to make calls to all users in the third list. If the sum of the first duration and the second duration exceeds the set duration, the processing module 403 may also refuse to receive the third list.
- the first model is a classification model.
- the processing module 403 can also obtain the feature values of multiple users under each feature, and for any feature, according to the number of users who answered the call among the multiple users, and the number of users who did not answer the call.
- the number, the number of users corresponding to each characteristic value of the characteristic, the number of users answering the phone among the users corresponding to each characteristic value, and the number of users who have not answered the phone among the users corresponding to each characteristic value determine the The degree of association between the feature and the behavior of whether the user answers the phone, and the feature with the degree of association with the behavior of whether the user answers the phone is greater than or equal to the second preset threshold as a strong correlation feature, based on the characteristics of the user who answers the phone among the multiple users
- the number, the number of users who have not answered the call, the number of users corresponding to each feature value of the strong correlation feature, the number of users who answer the phone among the users corresponding to each feature value of the strong correlation feature, and the strong correlation feature The first model is obtained by training the number of users who have not answered the phone corresponding to each feature value of.
- the degree of association between each feature and whether the user answers the call satisfies the following conditions:
- X is any feature
- R(X) is the feature value set of X feature, including each feature value of X feature
- x is any feature value of feature X
- Y is the behavior of whether the user answers the phone
- R(Y ) Is the behavior set of whether the user answers the phone, including the behavior of the user answering the phone and the behavior of the user not answering the phone
- y is the behavior of the user answering the phone or the behavior of the user not answering the phone
- I(X, Y) is the feature X and the user
- the degree of association of the behavior of answering the phone, P(x,y) is the ratio of the number of users who have performed the behavior y among the users corresponding to the characteristic value x to the total number of users
- P(x) is the proportion of the users corresponding to the characteristic value x
- the ratio of the total number of users, P(y) is the ratio of the number of users who have performed behavior y to the total number of users.
- the second model is a neural network model.
- the processing module 403 can also obtain the feature values of multiple users under each feature, and for any user, according to the feature value of the user under each feature and each feature of each feature.
- the feature value constructs the feature vector of the user under each feature, and stitches the feature vector of the user under each feature to obtain the first feature vector corresponding to the user; according to whether the user performs the preset
- the behavior obtains the second feature vector corresponding to the user, and the first feature vector corresponding to the multiple users is used as a model input to obtain the prediction result of the multiple users performing the preset behavior, based on the multiple users
- the second feature vector of and the prediction results of the multiple users performing the preset behavior adjust model parameters to obtain the second model.
- the processing module 403 is further configured to obtain each feature value of each feature in the following manner: if the feature is a discrete feature, then count the various values of the multiple users under the feature, and calculate the The respective values are used as the respective feature values of the feature; if the feature is a continuous feature, the value ranges of the multiple users under the feature are counted, and the value range is divided into multiple value ranges Interval, a corresponding feature value is set for each value range interval, and each feature value of the feature is obtained.
- the first list is first obtained, the first model is used to determine the user category of each user in the first list, and then the number of users belonging to each user category in the first list is counted , And determine the first duration based on the number. If the first duration exceeds the set duration, the second model is used to determine the probability of each user belonging to the first user category performing the preset behavior, and the second list is determined according to the probability.
- the first list includes multiple users who have not performed the preset behavior, and the user category of each user includes the first user category. The first user category indicates that the user will answer the call made by the collection system.
- the time required for all users in the list to make calls, and the second list is used to indicate the users who need to make calls after the current time.
- the first model is used to predict whether each user in the first list will answer the call, and the time to complete the collection task is determined, and then when it is determined that the collection task cannot be completed
- the second model is used to identify users with a higher probability of successful collection, so that when it is determined that the collection task cannot be completed, users with a higher success rate of collection can be given priority to call collection calls to improve the collection effect.
- an embodiment of the present invention also provides a computing device. As shown in FIG. 5, it includes at least one processor 501 and a memory 502 connected to the at least one processor.
- the embodiment of the present invention does not limit the processor.
- the connection between the processor 501 and the memory 502 in FIG. 5 is taken as an example.
- the bus can be divided into address bus, data bus, control bus and so on.
- the memory 502 stores instructions that can be executed by at least one processor 501, and the at least one processor 501 can execute the steps included in the aforementioned data processing method by executing the instructions stored in the memory 502.
- the processor 501 is the control center of the computing device, which can use various interfaces and lines to connect various parts of the computing device, and realize data by running or executing instructions stored in the memory 502 and calling data stored in the memory 502. deal with.
- the processor 501 may include one or more processing units, and the processor 501 may integrate an application processor and a modem processor.
- the application processor mainly processes the operating system, user interface, and application programs.
- the adjustment processor mainly handles issuing instructions. It can be understood that the foregoing modem processor may not be integrated into the processor 501.
- the processor 501 and the memory 502 may be implemented on the same chip, and in some embodiments, they may also be implemented on separate chips.
- the processor 501 may be a general-purpose processor, such as a central processing unit (CPU), a digital signal processor, an application specific integrated circuit (ASIC), a field programmable gate array or other programmable logic devices, discrete gates or transistors Logic devices and discrete hardware components can implement or execute the methods, steps, and logic block diagrams disclosed in the embodiments of the present invention.
- the general-purpose processor may be a microprocessor or any conventional processor or the like. The steps of the method disclosed in combination with the data processing embodiment may be directly embodied as being executed and completed by a hardware processor, or executed and completed by a combination of hardware and software modules in the processor.
- the memory 502 can be used to store non-volatile software programs, non-volatile computer-executable programs, and modules.
- the memory 502 may include at least one type of storage medium, such as flash memory, hard disk, multimedia card, card-type memory, random access memory (Random Access Memory, RAM), static random access memory (Static Random Access Memory, SRAM), Programmable Read Only Memory (PROM), Read Only Memory (ROM), Electrically Erasable Programmable Read-Only Memory (EEPROM), magnetic memory, disk , CD, etc.
- the memory 502 is any other medium that can be used to carry or store desired program codes in the form of instructions or data structures and that can be accessed by a computer, but is not limited thereto.
- the memory 502 in the embodiment of the present invention may also be a circuit or any other device capable of realizing a storage function for storing program instructions and/or data.
- embodiments of the present invention also provide a computer-readable storage medium that stores a computer program executable by a computing device, and when the program runs on the computing device, the computing device executes Figure 2 arbitrarily described data processing method.
- the embodiments of the present invention can be provided as methods or computer program products. Therefore, the present invention may adopt the form of a complete hardware embodiment, a complete software embodiment, or an embodiment combining software and hardware. Moreover, the present invention may adopt the form of a computer program product implemented on one or more computer-usable storage media (including but not limited to disk storage, CD-ROM, optical storage, etc.) containing computer-usable program codes.
- a computer-usable storage media including but not limited to disk storage, CD-ROM, optical storage, etc.
- These computer program instructions can also be stored in a computer-readable memory that can guide a computer or other programmable data processing equipment to work in a specific manner, so that the instructions stored in the computer-readable memory produce an article of manufacture including the instruction device.
- the device implements the functions specified in one process or multiple processes in the flowchart and/or one block or multiple blocks in the block diagram.
- These computer program instructions can also be loaded on a computer or other programmable data processing equipment, so that a series of operation steps are executed on the computer or other programmable equipment to produce computer-implemented processing, so as to execute on the computer or other programmable equipment.
- the instructions provide steps for implementing the functions specified in one process or multiple processes in the flowchart and/or one block or multiple blocks in the block diagram.
Landscapes
- Engineering & Computer Science (AREA)
- Business, Economics & Management (AREA)
- Theoretical Computer Science (AREA)
- Physics & Mathematics (AREA)
- Data Mining & Analysis (AREA)
- General Physics & Mathematics (AREA)
- Strategic Management (AREA)
- Economics (AREA)
- Development Economics (AREA)
- General Business, Economics & Management (AREA)
- General Engineering & Computer Science (AREA)
- Evolutionary Biology (AREA)
- Computer Vision & Pattern Recognition (AREA)
- Bioinformatics & Computational Biology (AREA)
- Bioinformatics & Cheminformatics (AREA)
- Artificial Intelligence (AREA)
- Finance (AREA)
- Human Resources & Organizations (AREA)
- Life Sciences & Earth Sciences (AREA)
- Accounting & Taxation (AREA)
- Marketing (AREA)
- Evolutionary Computation (AREA)
- Quality & Reliability (AREA)
- Tourism & Hospitality (AREA)
- Operations Research (AREA)
- Entrepreneurship & Innovation (AREA)
- Game Theory and Decision Science (AREA)
- Technology Law (AREA)
- Telephonic Communication Services (AREA)
Abstract
Dispositif et procédé de traitement de données, se rapportant au domaine technique de la technologie financière (Fintech). Le procédé comprend consiste à : lors de la réception d'une première liste, utiliser un premier modèle pour déterminer des catégories d'utilisateur pour chaque utilisateur de la première liste ; déterminer, en fonction des catégories d'utilisateur des utilisateurs, une période de temps pour achever des tâches de collecte de dette de la première liste ; et s'il est déterminé que les tâches de collecte de dettes de la première liste ne peuvent pas être exécutées dans une période de temps prédéfinie, utiliser un second modèle pour déterminer le taux de réussite de collecte de dettes pour chaque utilisateur. De cette manière, lors de la détermination que des tâches de collecte de dettes ne peuvent pas être achevées, des appels de collecte peuvent être effectués avec une priorité donnée aux utilisateurs ayant des taux de réussite de collecte de dettes plus élevés, ce qui facilite la collecte de dettes.
Applications Claiming Priority (2)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN201911155084.5 | 2019-11-22 | ||
CN201911155084.5A CN111091460B (zh) | 2019-11-22 | 2019-11-22 | 一种数据处理方法及装置 |
Publications (1)
Publication Number | Publication Date |
---|---|
WO2021098652A1 true WO2021098652A1 (fr) | 2021-05-27 |
Family
ID=70393812
Family Applications (1)
Application Number | Title | Priority Date | Filing Date |
---|---|---|---|
PCT/CN2020/129121 WO2021098652A1 (fr) | 2019-11-22 | 2020-11-16 | Procédé et dispositif de traitement de données |
Country Status (2)
Country | Link |
---|---|
CN (1) | CN111091460B (fr) |
WO (1) | WO2021098652A1 (fr) |
Cited By (2)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
US20210200949A1 (en) * | 2019-12-30 | 2021-07-01 | Beijing Baidu Netcom Science And Technology Co., Ltd. | Pre-training method for sentiment analysis model, and electronic device |
CN115297212A (zh) * | 2022-06-25 | 2022-11-04 | 上海浦东发展银行股份有限公司 | 基于机器学习的语音机器人催收方法、系统、设备及介质 |
Families Citing this family (2)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN111091460B (zh) * | 2019-11-22 | 2024-07-02 | 深圳前海微众银行股份有限公司 | 一种数据处理方法及装置 |
CN113837861A (zh) * | 2021-09-22 | 2021-12-24 | 平安银行股份有限公司 | 基于用户分组的催收方法、装置、存储介质及设备 |
Citations (6)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN106952155A (zh) * | 2017-03-08 | 2017-07-14 | 深圳前海纵腾金融科技服务有限公司 | 一种基于信用评分的催收方法及装置 |
JP2018077671A (ja) * | 2016-11-09 | 2018-05-17 | ヤフー株式会社 | 情報処理装置、情報処理方法、予測モデルの生成装置、予測モデルの生成方法、およびプログラム |
CN109214936A (zh) * | 2018-09-03 | 2019-01-15 | 中国平安人寿保险股份有限公司 | 一种费用催收方法、系统及终端设备 |
CN109685336A (zh) * | 2018-12-10 | 2019-04-26 | 深圳市小牛普惠投资管理有限公司 | 催收任务分配方法、装置、计算机设备及存储介质 |
CN110475033A (zh) * | 2019-08-21 | 2019-11-19 | 深圳前海微众银行股份有限公司 | 智能拨号方法、装置、设备与计算机可读存储介质 |
CN111091460A (zh) * | 2019-11-22 | 2020-05-01 | 深圳前海微众银行股份有限公司 | 一种数据处理方法及装置 |
Family Cites Families (2)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN108090826B (zh) * | 2017-11-13 | 2021-11-19 | 平安科技(深圳)有限公司 | 一种电话催收方法及终端设备 |
CN109559221A (zh) * | 2018-11-20 | 2019-04-02 | 中国银行股份有限公司 | 基于用户数据的催收方法、装置和存储介质 |
-
2019
- 2019-11-22 CN CN201911155084.5A patent/CN111091460B/zh active Active
-
2020
- 2020-11-16 WO PCT/CN2020/129121 patent/WO2021098652A1/fr active Application Filing
Patent Citations (6)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
JP2018077671A (ja) * | 2016-11-09 | 2018-05-17 | ヤフー株式会社 | 情報処理装置、情報処理方法、予測モデルの生成装置、予測モデルの生成方法、およびプログラム |
CN106952155A (zh) * | 2017-03-08 | 2017-07-14 | 深圳前海纵腾金融科技服务有限公司 | 一种基于信用评分的催收方法及装置 |
CN109214936A (zh) * | 2018-09-03 | 2019-01-15 | 中国平安人寿保险股份有限公司 | 一种费用催收方法、系统及终端设备 |
CN109685336A (zh) * | 2018-12-10 | 2019-04-26 | 深圳市小牛普惠投资管理有限公司 | 催收任务分配方法、装置、计算机设备及存储介质 |
CN110475033A (zh) * | 2019-08-21 | 2019-11-19 | 深圳前海微众银行股份有限公司 | 智能拨号方法、装置、设备与计算机可读存储介质 |
CN111091460A (zh) * | 2019-11-22 | 2020-05-01 | 深圳前海微众银行股份有限公司 | 一种数据处理方法及装置 |
Cited By (3)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
US20210200949A1 (en) * | 2019-12-30 | 2021-07-01 | Beijing Baidu Netcom Science And Technology Co., Ltd. | Pre-training method for sentiment analysis model, and electronic device |
US11537792B2 (en) * | 2019-12-30 | 2022-12-27 | Beijing Baidu Netcom Science And Technology Co., Ltd. | Pre-training method for sentiment analysis model, and electronic device |
CN115297212A (zh) * | 2022-06-25 | 2022-11-04 | 上海浦东发展银行股份有限公司 | 基于机器学习的语音机器人催收方法、系统、设备及介质 |
Also Published As
Publication number | Publication date |
---|---|
CN111091460B (zh) | 2024-07-02 |
CN111091460A (zh) | 2020-05-01 |
Similar Documents
Publication | Publication Date | Title |
---|---|---|
WO2021098652A1 (fr) | Procédé et dispositif de traitement de données | |
WO2019205325A1 (fr) | Procédé de détermination de niveau de risque d'utilisateur, dispositif terminal et support de stockage lisible par ordinateur | |
CN108876600A (zh) | 预警信息推送方法、装置、计算机设备和介质 | |
US10637990B1 (en) | Call center load balancing and routing management | |
CN110852881B (zh) | 风险账户识别方法、装置、电子设备及介质 | |
CN109766454A (zh) | 一种投资者分类方法、装置、设备及介质 | |
CN110796513B (zh) | 多任务学习方法、装置、电子设备及存储介质 | |
CN109543925A (zh) | 基于机器学习的风险预测方法、装置、计算机设备和存储介质 | |
CN111061948B (zh) | 一种用户标签推荐方法、装置、计算机设备及存储介质 | |
CN110930038A (zh) | 一种贷款需求识别方法、装置、终端及存储介质 | |
CN110634060A (zh) | 一种用户信用风险的评估方法、系统、装置及存储介质 | |
CN112966189A (zh) | 一种基金产品推荐系统 | |
CN115423578A (zh) | 基于微服务容器化云平台的招投标方法和系统 | |
KR20230060128A (ko) | 적격심사엔진을 이용한 전자입찰정보 분석 서비스 제공 방법 | |
CN118096170A (zh) | 风险预测方法及装置、设备、存储介质和程序产品 | |
WO2019171492A1 (fr) | Dispositif et procédé d'assistance de tâche de prédiction | |
WO2021129368A1 (fr) | Procédé et appareil permettant de déterminer un type de client | |
CN117575773A (zh) | 业务数据的确定方法、装置、计算机设备、存储介质 | |
Keating et al. | Using decision analysis to determine the feasibility of a conservation translocation | |
CN117236384A (zh) | 终端换机预测模型的训练及预测方法、装置和存储介质 | |
CN116934341A (zh) | 交易风险的评估方法、装置、电子设备和介质 | |
CN117196630A (zh) | 交易风险预测方法、装置、终端设备以及存储介质 | |
KR102519878B1 (ko) | 금융기관 신용공여 사업에서의 인공지능 기반 리스크 관리 솔루션을 제공하기 위한 장치, 방법 및 명령을 기록한 기록 매체 | |
CN115914363A (zh) | 消息推送方法、装置、计算机设备和存储介质 | |
CN115099934A (zh) | 一种高潜客户识别方法、电子设备和存储介质 |
Legal Events
Date | Code | Title | Description |
---|---|---|---|
121 | Ep: the epo has been informed by wipo that ep was designated in this application |
Ref document number: 20890515 Country of ref document: EP Kind code of ref document: A1 |
|
NENP | Non-entry into the national phase |
Ref country code: DE |
|
32PN | Ep: public notification in the ep bulletin as address of the adressee cannot be established |
Free format text: NOTING OF LOSS OF RIGHTS PURSUANT TO RULE 112(1) EPC (EPO FORM 1205A DATED 07.10.2022) |
|
122 | Ep: pct application non-entry in european phase |
Ref document number: 20890515 Country of ref document: EP Kind code of ref document: A1 |