WO2021098652A1

WO2021098652A1 - Data processing method and device

Info

Publication number: WO2021098652A1
Application number: PCT/CN2020/129121
Authority: WO
Inventors: 蔡远航; 郑少杰; 易剑韬; 彭明; 杨波; 范增虎
Original assignee: 深圳前海微众银行股份有限公司
Priority date: 2019-11-22
Filing date: 2020-11-16
Publication date: 2021-05-27
Also published as: CN111091460A

Abstract

A data processing method and device, relating to the technical field of financial technology (Fintech). The method comprises: upon receiving a first list, using a first model to determine user categories for each user in the first list; determining, according to the user categories of the users, a period of time for completing debt collection tasks in the first list; and if it is determined that the debt collection tasks in the first list cannot be completed within a preset period of time, using a second model to determine the debt collection success rate for each user. In this way, upon determining that debt collection tasks cannot be completed, collection calls can be performed with priority given to users having higher debt collection success rates, thus facilitating debt collection.

Description

Data processing method and device

Cross-references to related applications

This application claims the priority of a Chinese patent application filed with the Chinese Patent Office, the application number is 201911155084.5, and the application name is "a data processing method and device" on November 22, 2019, the entire content of which is incorporated into this application by reference .

Technical field

The present invention relates to the technical field of financial technology (Fintech), in particular to a data processing method and device.

Background technique

With the development of computer technology, more and more technologies are applied in the financial field, and the traditional financial industry is gradually transforming to Fintech. However, due to the security and real-time requirements of the financial industry, the financial industry also puts forward higher requirements on technology. With the continuous maturity of voice dialogue technology, the field of financial technology has also begun to apply intelligent robots in collection scenarios. Such robots in collection scenarios are called collection robots. The collection robot can automatically call the collection phone to remind the customer to repay, and can record the customer's repayment willingness to facilitate follow-up follow-up of the customer's repayment progress. Compared with the manual collection method, the collection robot can not only greatly reduce the collection cost, but also can efficiently complete the collection task. Moreover, the collection robot will not have emotional fluctuations during the dialogue with the customer, which can also improve the customer's experience.

At this stage, after the collection robot receives the collection list sent by various online lending companies, it generally collects the users in the collection list directly in accordance with the time sequence of receiving each collection list. However, in actual business scenarios, the number of collection robots receiving online loan companies every day is not fixed, and the number of users to be collected given by each online loan company is also not fixed. In this case, the total number of users to be collected by the collection robot each day cannot be determined. When the total number of users to be collected on a certain day is large, because this method uses a first-come, first-served manner to call collection calls to each user in turn, it may cause the collection task of the day to be unable to complete and reduce the collection effect.

To sum up, there is an urgent need for a data processing method to solve the technical problem of poor collection effect caused by the prior art using a first-come, first-served manner to sequentially dial collection calls.

Summary of the invention

The present invention provides a data processing method and device, which are used to solve the technical problem of poor collection effect caused by sequentially dialing collection calls in a first-come, first-served manner in the prior art.

In a first aspect, the present invention provides a data processing method applied to a collection system. The method includes: obtaining a first list, using the first model to determine the user category of each user in the first list, and making statistics on the first list The number of users belonging to each user category in the, and the first duration is determined based on the number. If the first duration exceeds the set duration, the second model is used to determine the probability of each user belonging to the first user category performing the preset behavior, and Determine the second list based on probability. Among them, the first list includes multiple users who have not performed the preset behavior, and the user category of each user includes the first user category, and the first user category indicates that the user will answer the call made by the collection system. Among them, the first duration represents the duration required to make calls to all users in the first list, and the second list is used to indicate users who need to make calls after the current time. In the above implementation, after the first list is received, the first model is used to predict whether each user in the first list will answer the call (that is, the user category), and the time to complete the collection task is determined, and then the time to complete the collection task is determined. When the collection task cannot be completed, the second model is used to determine the user with a higher probability of successful collection. When it is determined that the collection task cannot be completed, the user with a higher success rate can be given priority to call the collection phone, which helps to improve the collection effect.

In a possible implementation manner, the user category of each user further includes a second user category, and the second user category indicates that the user will not answer the call made by the collection system. In this case, determining the first duration based on the number includes: first obtaining the first call duration corresponding to the first user category and the second call duration corresponding to the second user category, and then according to the first user category in the first list The number of users and the first call duration, the number of users belonging to the second user category in the first list and the second call duration, determine the total call duration to make calls to all users in the first list, and finally based on the total call duration And the number of available phone numbers to determine the first duration. Wherein, the first call duration is determined according to the call duration of each user who answered the call in the historical time period; the second call duration is determined according to the call duration waiting to be answered after the call is made to the user. In the foregoing implementation manner, the first call duration of the user who answered the call is determined by using the call duration of the call to the user to call the collection call within the historical time period, so that the first call duration combines the characteristics of the historical dialing information, so that each received call can be accurately identified Correspondingly, the second call duration is the call duration waiting to be answered, so that the call duration of each user who does not answer the call can be accurately identified. In this way, based on the first call duration and the number of users who answered the call predicted by the first model, the total call duration required to make a collection call to the users in the first list who answered the call can be determined, which is predicted by the second call and the first model The number of users who do not answer the call can determine the total call time required for the users who do not answer the call in the first list to make a collection call, so as to predict the total call time to make calls to all users in the first list. This method is based on The historical data is analyzed to better meet the actual business situation and make the predicted first time period more accurate.

In a possible implementation, the available phone numbers can be determined in the following way: For multiple phone numbers previously applied for by the operator, first obtain the predicted duration based on the total call duration and the number of multiple phone numbers, and then determine the multiple phone numbers. The probability of a phone number going offline within the predicted time period, and then a phone number whose probability is not greater than the first preset threshold is used as an available phone number. In the above implementation, after determining the predicted time required to complete the collection task, by judging the probability of each phone number going offline within the predicted time period, the number of phone numbers that may go offline during the execution period of the collection task can be prejudged . In this way, by using the number of phone numbers that will not go offline to determine the first time period, the risk of phone numbers going offline can be predicted in advance, and the accuracy of the completion of the collection task can be guaranteed.

In a possible implementation, while using the first model to determine the user category of each user in the first list, you can also use the available phone number to call each user according to the contact information of each user in the first list . In the above implementation manner, by setting the process of risk judgment on the collection tasks in the first list to be executed in parallel with the process of actually making a call, the risk judgment can be used as an auxiliary means to help normal business execution without occupying the collection robot to make normal calls. Time to collect calls, thereby helping to reduce the impact of the risk judgment process on normal business.

In a possible implementation manner, when the first duration does not exceed the set duration, if a request message for processing the third list is received within the first duration, it can also be determined based on the first model to send to the third list The second time required for all users to make calls. If the sum of the first duration and the second duration exceeds the set duration, you can refuse to accept the third list. In the above implementation, when a new third list is received, the total call duration for calling all users in the first list and the third list is judged in advance, and the total call duration exceeds the set duration and refuses to receive it. The third list can avoid accepting collection tasks that cannot be completed, thereby helping to reduce customer losses.

In a possible implementation, the first model can be a classification model, and the first model can be obtained in the following way: first obtain the feature values of multiple users under each feature, and then for any feature, according to multiple users The number of users who answered the phone, the number of users who did not answer the phone, the number of users corresponding to each characteristic value of the characteristic, the number of users who answered the phone among the users corresponding to each characteristic value, and the users corresponding to each characteristic value The number of users who did not answer the call in the, determines the degree of correlation between the feature and the behavior of whether the user answered the call. Then, the feature whose degree of association with the user's behavior of answering the phone is greater than or equal to the second preset threshold is taken as the strong correlation feature, based on the number of users who answered the phone, the number of users who did not answer the phone, and the strong correlation among multiple users. The number of users corresponding to each feature value of the feature, the number of users answering the phone among the users corresponding to each feature value of the strong correlation feature, and the number of users who have not answered the phone among the users corresponding to each feature value of the strong correlation feature, obtained by training The first model. In the above implementation, by determining the degree of association between each feature and the behavior of answering the phone, the first model can be trained based on only the features with a higher degree of association. In this way, the amount of data involved in training is less, and the training model is more efficient. High; and, because the training data used is more concentrated on the feature data that is strongly related to the behavior of answering the phone, the training process of the first model is more aggregated, and the model effect can also be better.

In a possible implementation, the degree of association between each feature and whether the user answers the call can satisfy the following conditions:

Among them, X is any feature, R(X) is the feature value set of X feature, including each feature value of X feature, x is any feature value of feature X; Y is the behavior of whether the user answers the phone, R(Y ) Is the behavior set of whether the user answers the phone, including the behavior of the user answering the phone and the behavior of the user not answering the phone, y is the behavior of the user answering the phone or the behavior of the user not answering the phone; I(X, Y) is the feature X and the user The degree of association of the behavior of answering the phone, P(x,y) is the ratio of the number of users who have performed the behavior y among the users corresponding to the characteristic value x to the total number of users, and P(x) is the proportion of the users corresponding to the characteristic value x The ratio of the total number of users, P(y) is the ratio of the number of users who have performed behavior y to the total number of users.

In the above-mentioned implementation, by using the probability that each feature value of a certain feature is related to the behavior of answering the phone, the degree of association between each feature and the behavior of answering the phone is obtained, so that the degree of association integrates the relevant information of each feature value. , As the information used is richer, the degree of association can be made more accurate.

In a possible implementation, the second model can be a neural network model, and the second model can be obtained in the following manner: first obtain the feature values of multiple users under each feature, and then target any user according to the user’s current status. The feature value under each feature and each feature value of each feature construct the feature vector of the user under each feature, and the feature vector of the user under each feature is spliced to obtain the first feature vector corresponding to the user. Then, the second feature vector corresponding to the user is obtained according to whether the user performs the preset behavior, and then the first feature vector corresponding to the multiple users is used as the model input to obtain the prediction result of the multiple users performing the preset behavior, and finally based on the multiple users The second feature vector and the prediction results of multiple users performing preset behaviors adjust the model parameters to obtain the second model. In the above implementation, by determining the feature vector of the user under each feature, and stitching the feature vector value of the user under each feature to obtain the feature vector of the user, the feature vector of the user can integrate the feature value of each feature. Feature information, the information is more comprehensive, and the form of expression is more concise. In this way, the model obtained based on the model input training with rich information and concise form has better effect and higher training efficiency.

In a possible implementation, each feature value of each feature can be obtained in the following way: if the feature is a discrete feature, then the various values of multiple users under the feature can be counted, and these values are taken as the feature Each characteristic value of. If the feature is a continuous feature, you can count the value ranges of multiple users under the feature, and then divide the value range into multiple value range intervals, and set a corresponding characteristic value for each value range interval. Get each feature value of the feature. In the above implementation, by discretizing the value of the continuous feature, each feature (including the continuous feature and the discrete feature) can have the same discrete manifestation, so that each discrete feature value can be used as training when training the model Data, without the need to fit the probability distribution function to continuous features, which can improve the efficiency of data processing.

In a second aspect, the present invention provides a data processing device, the device includes: an acquisition module, configured to acquire a first list, the first list includes a plurality of users who have not performed a preset behavior; a determining module, configured to use the first list The model determines the user category of each user in the first list. The user category of each user includes the first user category. The first user category represents that the user will answer the call made by the collection system; the processing module is used to count the number of users in the first list. The number of users in the user category, and the first duration is determined based on the number. The first duration represents the duration required to make calls to all users in the first list; if the first duration exceeds the set duration, the second model is used to determine that they belong to The probability of each user in the first user category performing the preset behavior, and the second list is determined according to the probability; the second list is used to indicate users who need to make a call after the current moment.

In a possible implementation manner, the user category of each user further includes a second user category, and the second user category indicates that the user will not answer the call made by the collection system. In this case, the acquiring module may also acquire the first call duration corresponding to the first user category and the second call duration corresponding to the second user category. The determining module can determine the number of users in the first list and the first call duration, the number of users belonging to the second user category in the first list, and the second call duration to determine the number of users in the first list The total call duration of the user's call; the first duration is determined based on the total call duration and the number of available phone numbers. The first call duration is determined based on the call duration of each user who answered the call in the historical time period, and the second call duration is determined based on the call duration waiting to be answered after the call is made to the user.

In a possible implementation, the determining module can determine the available phone numbers in the following way: For multiple phone numbers previously applied for by the operator, based on the total call duration and the number of multiple phone numbers, obtain the predicted duration, and determine With regard to the probability of multiple phone numbers going offline within the predicted time period, a phone number whose probability is not greater than the first preset threshold is used as an available phone number.

In a possible implementation manner, the device may further include a dialing module. While the determining module uses the first model to determine the user category of each user in the first list, the dialing module may use the contact information of each user in the first list. , Use the available phone number to call each user.

In a possible implementation manner, when the first duration does not exceed the set duration, if a request message for processing the third list is received within the first duration, the processing module may also determine to send to the third list based on the first model The second time required for all users in to make calls. If the sum of the first duration and the second duration exceeds the set duration, the processing module may also refuse to receive the third list.

In a possible implementation manner, the first model may be a classification model. In this case, the processing module can also obtain the characteristic values of multiple users under each characteristic. For any characteristic, according to the number of users who answered the phone, the number of users who did not answer the phone, and the characteristic value of the multiple users. The number of users corresponding to each characteristic value, the number of users who answered the phone among the users corresponding to each characteristic value, and the number of users who did not answer the phone among the users corresponding to each characteristic value, determine the characteristic and the behavior of whether the user answers the phone Then, the feature that is related to the behavior of whether the user answers the call is greater than or equal to the second preset threshold as a strong correlation feature, based on the number of users who answered the call and the number of users who did not answer the call among multiple users , The number of users corresponding to each feature value of the strong correlation feature, the number of users who answer the phone among the users corresponding to each feature value of the strong correlation feature, and the number of users who have not answered the phone among the users corresponding to each feature value of the strong correlation feature , The first model is obtained by training.

In a possible implementation, the degree of association between each feature and whether the user answers the call satisfies the following conditions:

In a possible implementation, the second model can be a neural network model, and the processing module can also obtain the feature value of multiple users under each feature, for any user, according to the user's feature value under each feature Construct the feature vector of the user under each feature with each feature value of each feature, join the feature vector of the user under each feature to obtain the first feature vector corresponding to the user, and then obtain the user corresponding to the user according to whether the user performs a preset behavior The second feature vector, and then the first feature vector corresponding to multiple users is used as the model input, and the prediction result of multiple users performing preset behaviors is obtained, based on the second feature vector of multiple users and multiple users performing preset behaviors The prediction result adjusts the model parameters to obtain the second model.

In a possible implementation, the processing module can also obtain each feature value of each feature in the following manner: if the feature is a discrete feature, then the various values of multiple users under the feature can be counted, and each value is taken as Each feature value of the feature; if the feature is a continuous feature, you can count the value ranges of multiple users under the feature, divide the value range into multiple value range intervals, and set each value range interval A corresponding characteristic value, each characteristic value of the characteristic is obtained.

In a third aspect, the present invention provides a computing device including at least one processor and at least one memory. Wherein, the memory stores a computer program, and when the computer program is executed by the processor, the processor can execute any data processing method of the first aspect described above.

In a fourth aspect, the present invention provides a computer-readable storage medium that stores a computer program that can be executed by a computing device. When the computer program runs on the computing device, the computing device can execute any of the data processing methods of the first aspect described above. .

These and other aspects of the present invention will be more concise and understandable in the description of the following embodiments.

Description of the drawings

In order to explain the technical solutions in the embodiments of the present invention more clearly, the following will briefly introduce the drawings needed in the description of the embodiments. Obviously, the drawings in the following description are only some embodiments of the present invention. For those of ordinary skill in the art, without creative labor, other drawings can be obtained based on these drawings.

FIG. 1 is a schematic structural diagram of a collection system provided by an embodiment of the present invention;

2 is a schematic flowchart of a data processing method provided by an embodiment of the present invention;

3 is a schematic structural diagram of a one-dimensional cell model provided by an embodiment of the present invention;

4 is a schematic structural diagram of a data processing device provided by an embodiment of the present invention;

Fig. 5 is a schematic structural diagram of a computing device provided by an embodiment of the present invention.

Detailed ways

In order to make the objectives, technical solutions, and advantages of the present invention clearer, the present invention will be further described in detail below with reference to the accompanying drawings. Obviously, the described embodiments are only a part of the embodiments of the present invention, rather than all of them. Based on the embodiments of the present invention, all other embodiments obtained by those of ordinary skill in the art without creative work shall fall within the protection scope of the present invention.

In the embodiment of the present invention, the preset behavior may refer to any behavior, such as shopping behavior in the advertising promotion field, card issuance behavior in the credit card promotion field, or repayment behavior in the collection field. For ease of understanding, the following embodiments of the present invention take the field of collection as an example to describe the data processing method in the embodiments of the present invention.

FIG. 1 is a schematic diagram of the architecture of a collection system provided by an embodiment of the present invention. As shown in FIG. 1, the collection system may be provided with a collection robot 110 and at least one client, such as client 121, client 122, and client 123 . Among them, the client can be any online loan client that provides loans to users in the financial technology field, such as an online loan client installed in a commercial bank, an online loan client installed in a financial company, or an online loan client installed in a trust company End, etc., without limitation.

As shown in FIG. 1, the collection system may also be provided with at least one client terminal, such as client terminal 131, client terminal 132, and client terminal 133. Among them, the user terminal can be any terminal device with a call function, such as an elderly phone, a smart phone, a slide phone, etc., which is not limited.

In the embodiment of the present invention, the collection robot 110 may be connected to at least one client and at least one client respectively, for example, it may be connected in a wired manner, or may also be connected in a wireless manner, which is not specifically limited.

Based on the system architecture shown in FIG. 1, FIG. 2 is a schematic flowchart of a data processing method provided by an embodiment of the present invention. The method is applied to a collection robot, such as the collection robot 110 shown in FIG. 1. As shown in Figure 2, the method includes:

Step 201: Obtain a first list. The first list includes a plurality of users who have not performed a predetermined behavior.

In an example, the first list may include the contact information of each user who has not performed the preset behavior. In the field of collection, each user who has not performed the preset behavior is each user who has overdue the loan after the loan is directed to the online lending institution.

In a possible implementation, the collection system may also be provided with a pre-processing device (not shown in FIG. 1), and the pre-processing device can be provided between at least one client and the collection robot 110, or can be provided in the collection robot. 110's interior. In specific implementation, the preprocessing device may receive the collection list sent by each client, and sort the users to be collected in each collection list according to the set dial strategy to obtain the first list. Among them, the set dialing strategy can be a dialing strategy set according to business needs. For example, it can be to sort the users to be collected in each collection list according to the chronological order of receiving the collection list, or it can be the order of each client corresponding to each collection list. The priority sorts the users to be collected in each collection list, and can also sort the users to be collected in each collection list according to the priority of the online loan product to which each collection list belongs, and can also be based on the corresponding collection list. The priority of the city where each client is located sorts the users to be collected in each collection list, and can also be a combination of the above-mentioned multiple dialing strategies, etc., which are not specifically limited.

For example, the preprocessing device may be a web server based on a worldwide web (web) technology, and the client may be a client provided with a web browser. In this way, when an online lending institution has a collection demand, the online lending institution can access the web service interface provided by the preprocessing device through the web browser of its client. Since the online lending institution may have a collection demand for multiple online loan products, the online lending institution may have a collection demand for multiple online loan products. The loan company can pack the user information (including the user's age, gender, education information, marriage information, occupation information, current loan information and historical loan information, etc.) corresponding to each online loan product to be collected into a collection list, and To upload. Moreover, the online loan structure can also select the termination time of the collection on the web service interface, so that the collection robot 110 feeds back the collection result before the termination time of the collection.

Correspondingly, after receiving the collection list of each online loan product sent by each client, the preprocessing device can first sort the collection list of each client according to the priority of each online loan product, and then sort the collection lists of each client according to the customer's priority. The priority of the terminal sorts the collection lists of each client after the initial sorting to obtain the first list. Alternatively, the collection lists of each client can be first sorted according to the priority of each client, and then the collection lists of each client can be sorted according to the priority of each online loan product to obtain the first list, which is not limited. For example, when the priority of the client 121>the priority of the client 123>the priority of the client 122, and the priority of the online loan product 2>the priority of the online loan product 1, if the collection list of the client 121 includes Online loan product 1 corresponds to the user to be collected 1 and user 2 to be collected, and the collection list of the client 122 includes the user to be collected 3 corresponding to the online loan product 1 and the user to be collected 4 and user 5 to be collected corresponding to the online loan product 2. The collection list of the client 123 includes users 6 to be collected corresponding to the online loan product 2. The first list can be: users to be collected 1, users to be collected 2, users to be collected 6, users to be collected 4, users to be collected 5, User to be collected 3. Or the first list may also be: user to be collected 6, user to be collected 4, user to be collected 5, user to be collected 1, user to be collected 2, user to be collected 3.

In the embodiment of the present invention, if the pre-processing device is not a device in the collection robot 110, the pre-processing device can send the first list to the collection robot 110, or the collection robot 110 can also use the file transfer protocol from the pre-processing device. Get the first list. If the pre-processing device is a device in the collection robot 110 (such as a pre-processing process), the pre-processing device can directly store the first list in the memory of the collection robot 110, so that the collection robot 110 calls the processing process to the first list. Each user of the company makes a collection call.

It should be noted that the embodiment of the present invention does not limit the time for each client to send the collection list. For example, each client may send the collection list to the preprocessing device the day before the collection is executed, or send the collection list to the preprocessing device on the day when the collection is executed. Correspondingly, the embodiment of the present invention does not limit the device for the client to send the collection list. For example, the client may directly send the collection list to the preprocessing device, or send the collection list to the collection robot 110, and then the collection robot 110 forwards it to the preprocessing device.

Step 202: Use the first model to determine the user category of each user in the first list. Wherein, the user category of each user includes a first user category, and the first user category represents that the user will answer the call made by the collection system.

In a possible implementation manner, after the collection robot 110 obtains the first list, it can first determine the time difference between the current time and the time when the collection robot 110 starts the collection. If the time difference is greater than or equal to the first preset time difference (greater than or equal to the determined time difference) Time required for the collection strategy), the collection robot 110 can first analyze whether the collection task in the first list can be completed before the collection termination time point set by each client, and set the corresponding collection according to the analysis result of whether it can be completed Then, at the time when the collection robot 110 starts the collection, according to the corresponding collection strategy, the collection of each user in the first list is started. If the time difference is less than or equal to the second preset time difference (any value less than or equal to 0), you can directly call the dialing thread to call each user in the order of each user in the first list, and call at the same time as the call The parallel processing thread analyzes whether the collection tasks in the first list can be completed before the collection termination time point set by each client. After the corresponding collection strategy is set according to the result of whether it can be completed, the control dialing thread starts to check according to the corresponding collection strategy. Each user in the first list collects.

Correspondingly, if the time difference is less than the first preset time difference and greater than the second preset time difference, the collection robot 110 may first call the processing process to analyze whether the collection task in the first list can be completed before the collection termination time point set by each client , And set the corresponding collection strategy according to the analysis result that can be completed. At the same time, in the process of analysis, if it is detected that the collection robot 110 has reached the start collection time, the parallel dialing process is called to call each user in the first list in the order of each user in the first list, and then After the corresponding collection policy is obtained, the parallel dialing process is controlled to call each user in the first list according to the corresponding collection policy.

Among them, the first preset time difference can be set by those skilled in the art based on experience, or can be determined according to the duration of the collection strategy corresponding to each collection task determined in the historical period, for example, to determine the collection strategy corresponding to each collection task Average duration, or to determine the median duration of the collection strategy corresponding to each collection task, or to determine the weighted average duration of the collection strategy corresponding to each collection task, the closer the collection task is to this collection task, the collection task The greater the weight, and so on.

From the perspective of hardware implementation, the collection robot 110 can be equipped with two environments, an on-line production environment and a simulation environment. When the first list is obtained, the collection robot 110 can push the first list to the online production environment and simulation at the same time. surroundings. The online production environment is used to perform the normal dialing process. For example, when the collection robot 110 is detected to start the collection time (such as 8:00), it will follow the order of each user in the first list (or collection strategy sent by the simulation environment) Call collection calls to each user in turn, record the phone information and the user's repayment willingness (such as the collection phase when the user ends the call), and send each user's call result to the corresponding client of the online lending institution to enable the online loan Institutions follow up the subsequent repayment of users. Among them, the collection stage can include the five stages of asking if the other party is the person, explaining the overdue situation, asking when the payment can be repaid, confirming the repayment date, and ending. Correspondingly, the simulation environment is used to analyze the collection tasks corresponding to the first list, determine the corresponding collection strategy, and send the corresponding collection strategy to the online production environment, so that the online production environment executes the collection according to the corresponding collection strategy task. Moreover, the online production environment can also send the collection results of each user obtained by executing the collection task to the simulation environment, so that the simulation environment can update various internal parameters, such as the first call duration, the first model parameter, the second model parameter, The average number of offline phone numbers per hour in the historical period, etc.

In the above implementation method, by controlling the parallel execution of the risk judgment process and the actual collection call process, the risk judgment can be used as a means to assist the execution of the normal collection task, avoiding the risk judgment taking up the time of the collection robot calling the collection call normally, thereby reducing The impact of risk judgment on normal collection tasks.

The following describes the specific implementation process of the collection strategy obtained from the analysis of each user in the first list.

In specific implementation, after obtaining the first list, the collection robot 110 can use the first model to predict each user in the first list, thereby determining the user category of each user. The user category of the user may include only the first user category, or may include both the first user category and the second user category. If the user category of a user is the first user category, it means that the user will answer the collection call made by the collection robot. If the user category of a user is the second user category, it means that the user will not answer the collection call made by the collection robot.

Step 203: Count the number of users belonging to each user category in the first list, and determine a first duration based on the number. The first duration represents the amount of time needed to make calls to all users in the first list. duration.

In a possible implementation, after the first model is used to predict all users in the first list, the collection robot 110 can count the number of users belonging to the first user category and the second user category in the prediction result, and then According to the number of users belonging to the first user category in the first list and the first call duration corresponding to the first user category, the number of users belonging to the second user category in the first list and the second call duration corresponding to the second user category To determine the first time required to make calls to all users in the first list. Among them, the first call duration is used to identify the call duration that may be consumed by each user who answers the call, the second call duration is used to identify the call duration that may be consumed by each user who does not answer the call, the first call duration and the second call duration are The call duration can be set by those skilled in the art based on experience, or can be set according to business needs, and is not specifically limited.

In an example, the first call duration may be determined according to the duration required to make a call to each user who answered the call in the historical period, and the second call duration may be determined according to the duration of waiting to be answered after the call was made to the user. For example, if the historical period is the last 2 weeks, the collection robot 110 may first obtain the record from the statistical database and the call duration of all users who have answered the collection call made by the collection robot 110 in the last 2 weeks (the call duration of each user) Duration refers to the total call duration from the start of the dialing to the end of the call), and then take the median of the call durations of these users as the first call duration, or take the average of the call durations of these users as the first call duration, etc. . Correspondingly, the second call duration refers to the waiting duration for the collection robot 110 to wait for the other party to answer, and the duration may be determined according to the set number of ringing times. For example, if it is set to hang up the call after the other party has not answered the call after waiting for 8 phone calls, the second call duration can be the total call duration of the 8 phone calls. Since the waiting time of each user who has not answered the collection call is the same, the collection robot 110 may set the second call duration to the waiting time of any user who has not answered the collection call in the historical period.

In the above example, the first call duration of the user who answered the call is determined by using the call duration of the call to the user to collect calls within the historical period, so that the first call duration is combined with the characteristics of the historical dialing information, so as to accurately identify the call duration of each call received. The call duration of the user, correspondingly, the second call duration is the call duration waiting to be answered, so that the call duration of each user who does not answer the call can be accurately identified. In this way, based on the first call duration and the number of users who answered the call predicted by the first model, the total call duration required to make a collection call to the users in the first list who answered the call can be determined, which is predicted by the second call and the first model The number of users who do not answer the call can determine the total call time required for the users who do not answer the call in the first list to make a collection call, so as to predict the total call time to make calls to all users in the first list. This method is based on The historical data is analyzed to better meet the actual business situation and make the predicted first time period more accurate.

In the embodiment of the present invention, the collection robot 110 may apply for multiple phone numbers in the operator in advance, and use the multiple phone numbers to jointly make a collection call to each user in the first list. In this way, after the collection robot 110 obtains the prediction result of the first model for all users in the first list, it can first base on the number of users belonging to the first user category, the first call duration, and the number of users belonging to the second user category. And the second call duration, determine the total call duration for making calls to all users in the first list, and then determine the first duration according to the multiple pre-applied phone numbers and the total call duration.

In an optional implementation manner, the collection robot 110 may directly use the ratio of the total call duration to the number of multiple phone numbers as the first duration. However, in the actual process of making a collection call, the phone number may go offline as the dialing time increases. Therefore, if the ratio of the total call time to the number of multiple phone numbers is directly used as the first time length, it may be possible The first time length will be inaccurate due to the offline of some phone numbers. Based on this, as a possible determination method, the collection robot 110 may determine the first duration in the following manner:

The collection robot 110 may first determine the predicted duration required to make calls to all users in the first list based on the total call duration and the number of multiple phone numbers, and analyze the probability of each phone number being offline within the predicted duration. Among them, the probability of each phone number going offline can be determined based on the theory of probability. Since the time interval t from the start of calling the collection call to the offline of each phone number obeys the exponential distribution F(t) with the parameter λ, the probability density function f(t) corresponding to the time interval t is:

f(t)=λe^(-λt), t≥0

Correspondingly, the exponential distribution F(t) corresponding to the time interval t can be:

F(t)=1-e^(-λt), t≥0

Among them, λ can be set as the average number of offline phone numbers per hour in the historical period. The historical period can be set by those skilled in the art based on experience. For example, it can be the last 2 weeks. In this way, the value of λ can be broken over time. Update.

In this way, according to the exponential distribution F(t) corresponding to the time interval t, if the predicted duration Δt, the probability of each phone number going offline within the predicted duration can be 1-e^(-λΔt).

Further, after the probability of each phone number being offline is determined, the phone number whose offline probability is not greater than the first preset threshold can be used as the available phone number. In this way, the collection robot 110 then determines the total call duration and the available phone number. The number of phone numbers determines the first time required to make calls to all users in the first list. If the first duration is less than or equal to the set duration, it means that even if some phone numbers are offline during the dialing process, the collection robot 110 can complete the collection task corresponding to the first list, so that the collection robot 110 can follow the items in the first list. The user continues to dial the collection call in sequence. Correspondingly, if the first duration is greater than the set duration, it means that if some phone numbers are offline during the dialing process, the collection robot 110 cannot complete the collection task corresponding to the first list. Therefore, the collection robot 110 can then determine whether the predicted duration is greater than the set duration. If the predicted duration is less than or equal to the set duration, it means that when there is no phone number offline during the dialing process, the collection robot 110 can complete the collection task corresponding to the first list. At this time, if the operator supports the collection robot 110 to apply for a backup phone number, the collection robot 110 can apply for a backup phone number from the operator, and the number of backup phone numbers can be greater than or equal to the phone number whose offline probability is greater than the first preset threshold. quantity. If the operator does not support the collection robot 110 to apply for a spare phone number, the collection robot 110 may obtain a part of users with a higher collection success rate from the first list to form the second list. Correspondingly, if the predicted duration is greater than the set duration, it means that even if there is no phone number offline during the dialing process, the collection robot 110 cannot complete the collection task corresponding to the first list. At this time, the collection robot 110 may also determine to apply for a standby phone number or determine the second list according to the support of the operator.

It should be noted that the above is only an exemplary description, and does not constitute a limitation to the solution. In specific implementation, when the operator supports the collection robot to apply for a backup phone number, the collection robot can also apply for a backup phone number while obtaining some users with a higher collection success rate from the first list to form the second list. Or when the operator supports the collection robot to apply for a backup phone number, the collection robot may not apply for a backup phone number. Instead, it obtains some users with a higher success rate of collection from the first list to form the second list. There are many implementation methods, which can be specifically set by those skilled in the art according to needs, and the details are not limited.

In the above determination method, by judging the probability of each phone number going offline in the first time period, the number of phone numbers that may go offline during the execution period of the collection task can be pre-determined. In this way, by using phones that will not go offline The number of numbers is re-determined for the first time length, which can predict the risk of phone numbers going offline in advance to ensure the accuracy of the completion of the collection task.

From the perspective of hardware implementation, the simulation environment can determine the first duration based on the cell model. A one-dimensional cell model can be set in the simulation environment, and the one-dimensional cell model is used to store all users in the first list. Figure 3 is a schematic structural diagram of a one-dimensional cell model provided by an embodiment of the present invention. As shown in Figure 3, each cell can be used to identify a user, and each cell has a left neighbor cell and/or a right cell. A neighboring cell, for example, cell A is the left neighbor of cell B, and cell C is the right neighbor of cell B. Moreover, each cell can have three different states, and the state can be identified by color, for example, white is used to identify the state of not dialed, gray is used to identify the state of dialed but not answered, and black is used to identify the dialed and answered state. status. In this way, when the color of the cell changes from white to gray or black, it means that the collection robot 110 has dialed and collected the user corresponding to the cell. Therefore, when the cell changes from the white state to the black state, it can stay first. The duration of the call, the duration of the second call when the status is changed from white to gray. In this way, after using the first model to predict the user category of a user corresponding to a cell, if the user belongs to the first user category, you can wait for the first call duration (in order to save time, it can also be set to be less than the first call in proportion (The value of the duration) and then update the color of the cell from white to black. If the user belongs to the second user category, you can wait for the second call duration (in order to save time, it can also be set to a value less than the second call duration according to the ratio, and the ratio is the same as the ratio used for the first call duration). The cell color is updated from white to gray. And this process can be executed in parallel according to the number of available phone numbers. When the color of each cell in the one-dimensional cell model changes, the execution time is counted, and the first time length is determined according to the ratio.

Step 204: If the first duration exceeds the set duration, use a second model to determine the probability of each user belonging to the first user category performing the preset behavior, and determine a second list according to the probability; The second list is used to indicate users who need to make a call after the current time.

In the embodiment of the present invention, if the first time period exceeds the set time period, it means that the collection robot 110 cannot complete the collection task of the first list within the set time period. In this way, the collection robot 110 can use the second model to determine the repayment probability of each user at least for each user who will answer the call in the prediction result, and sort the users according to the repayment probability of each user to obtain the second List, so that the collection robot 110 makes a collection call to each user according to the second list.

In one example, the collection robot 110 may only use the second model to determine the repayment probability of each user who will answer the phone in the prediction result, and then determine the repayment probability of each user who will answer the phone according to the repayment probability from large to small (or from small to small). To the largest) order to get the second list. In this way, when it is determined that it is not possible to make collection calls for all users, the collection robot 110 can only make collection calls to users who will answer the call and have a high probability of repayment, instead of calling for those who will not answer the call or who will answer the call but the probability of repayment is low. The user makes a collection call, thereby improving the effect of the collection call, and can reduce the data processing volume of the collection robot 110 and improve the collection efficiency.

In another example, the collection robot 110 can use the second model to determine the repayment probability of each user in the first list, and according to the repayment probability of each user in the first list from large to small (or from small to large) ) Order to get the second list. In this way, when it is determined that all users cannot be called for collection, the collection robot 110 can collect calls for each user in the first list in the descending order of the repayment probability, so as to be able to call as many users as possible. And avoid missing users who predict that they will not answer the phone but will actually answer the phone, thereby improving the accuracy of the call collection.

Correspondingly, if the first time period does not exceed the set time period, it means that the collection robot 110 can complete the collection task for the first list within the set time period. In this way, the collection robot 110 can continue to make a collection call to each user in the order of each user in the first list.

In a possible risk scenario, although the collection robot 110 can complete the collection task for the first list within the set time period, the collection robot 110 may receive the third collection call when the collection call is made within the first time period. List of requests for collection tasks. In this case, the collection robot 110 may further determine the second time length required to make calls to all users in the third list based on the first model. If the sum of the first time length and the second time length exceeds the set time period, it means that the collection robot 110 cannot complete the collection tasks for all users in the first list and the third list within the set time period. Therefore, the collection robot 110 may refuse to accept The third list.

In the above example, when a request to process the collection task of the third list is received, the total call duration for calling collection calls to all users in the first list and the third list is determined in advance, and the total call duration exceeds the set duration If you refuse to accept the third list, you can avoid accepting collection tasks that cannot be completed and reduce customer losses.

In another possible risk scenario, although the collection robot 110 can complete the collection task of the first list within the set time period, the collection robot 110 may suddenly go offline during the first time period. In this case, the collection robot 110 may determine the new first duration based on the total call duration and the number of phone numbers that are not offline. Alternatively, if the collection robot 110 has made a collection call to some users in the first list, the collection robot 110 may determine the total call time for making a collection call to the remaining users in the first list who have not made a collection call based on the first model. Then, a new first duration is determined based on the total call duration and the number of phone numbers that are not offline. Further, if the new first duration is less than the set duration, it means that the collection robot 110 cannot complete the collection tasks for all users in the first list within the set duration using the phone number that is not offline. Therefore, the collection robot 110 The first instruction information may be sent to the operation and maintenance personnel, so that the operation and maintenance personnel can determine whether to apply for a backup phone number from the operator.

In the embodiment of the present invention, after receiving the first list, the first model is used to predict the users who will answer the phone and the users who will not answer the phone in the first list, and the time to complete the collection task is determined, and then the time for completing the collection task is determined. When the collection task cannot be completed, the second model is used to determine the user with a higher probability of successful collection, so that when it is determined that the collection task cannot be completed, the user with a higher success rate of collection can be called first to improve the collection effect.

The above process describes the process of using the first model and the second model to determine the collection strategy. The following describes the process of training to obtain the first model and the second model, respectively.

First model

Since the first model is used to predict whether each user will answer the phone to determine the user category of each user, the first model can be set as a classification model.

In specific implementation, the collection robot 110 can first obtain the feature values of multiple users under each feature, and then for any feature, according to the number of users who answered the call, the number of users who did not answer the call, and the characteristic value of the multiple users. The number of users corresponding to each characteristic value, the number of users who answered the phone among the users corresponding to each characteristic value, and the number of users who did not answer the phone among the users corresponding to each characteristic value, determine the characteristic and the behavior of whether the user answers the phone The degree of relevance. Further, the collection robot 110 may regard the feature that has a degree of association with the user's behavior of answering the call greater than or equal to the second preset threshold as a strong correlation feature, and then according to the number of users who answered the call among the multiple users and the number of users who did not answer the call. The number of users, the number of users corresponding to each feature value of the strong correlation feature, the number of users who answer the phone among the users corresponding to each feature value of the strong correlation feature, and the number of users who have not answered the phone corresponding to each feature value of the strong correlation feature The number of users, and thus the first model is trained.

To facilitate understanding, a specific example is given below to describe the training process of the first model. In this example, the first model is trained based on the Naive Bayes algorithm. Since the naive Bayes algorithm can update the model parameters based on incremental data in real time, training the first model based on the naive Bayes algorithm can improve the efficiency of training and update.

In a specific implementation, the data of multiple (for example, 20000) users who have made a collection call by the collection robot 110 in the historical period can be obtained first. Among them, the data of each user can include the value of the user under various characteristics, such as the user's gender, age, education, occupation, marital status, resident city, the amount of this loan, the amount of arrears, and the number of days overdue for this loan. , The number of historical loans and the number of historical loan overdues, etc., and also include the category characteristic value of whether the user has answered the collection call when the user makes a collection call. Obviously, since the above-mentioned various features include continuous features and discrete features, the above-mentioned various features cannot use a unified evaluation standard to unify data. Therefore, in an example, for any one of the features, if the feature is a discrete feature, the collection robot 110 can count the values of multiple users under the feature, and use each value as each of the feature. Eigenvalues. If the feature is a continuous feature, the collection robot 110 can count the value ranges of multiple users under the feature, divide the value range into multiple value range intervals, and set a corresponding value range for each value range interval. Feature value, and get each feature value of the feature. In this way, by discretizing the values of continuous features, each feature (including continuous features and discrete features) can have the same discrete manifestation, so that each discrete feature value can be used as training data when training the model without Fitting the probability distribution function to continuous features can improve the efficiency of data processing.

For example, since the values of gender, education, occupation, marital status, and resident city mentioned above are fixed and multiple, these features are discrete features, and the values of users under these discrete features are these discrete features. Each characteristic value of. Correspondingly, the values of age, the amount of this loan, the amount of arrears, the number of days that this loan is overdue, the number of historical loans, and the number of historical loan overdues are all infinitely many, so these characteristics are continuous characteristics. The continuous values of these continuous features are adjusted to discrete values. For example, the age feature is discretized into feature value 1, feature value 2,..., feature value 7, and feature value 1 to feature value 7 in turn represent age (in years) in the following 7 age ranges: [0, 15) , [15, 25), [25, 35), [35, 45), [45, 55), [55, 65), [65, ∞). Discretize the characteristics of the loan amount into characteristic value 1, characteristic value 2, ..., characteristic value 5. Characteristic value 1 to characteristic value 5 represent the loan amount (unit: 10,000 yuan) in the following 5 loan amount ranges: [ 0, 0.5), [0.5, 1.5), [1.5, 3.5), [3.5, 5), [5, ∞). Discretize the characteristics of the amount of arrears into characteristic value 1, characteristic value 2, ..., characteristic value 5. Characteristic value 1 to characteristic value 5 represent the amount of arrears (unit: 10,000 yuan) in the following 5 amounts of arrears Interval: [0, 0.5), [0.5, 1.5), [1.5, 3.5), [3.5, 5), [5, ∞). Discretize the characteristics of the overdue days of this loan into feature value 1, feature value 2, ..., feature value 5. Feature value 1 to feature value 5 represent the number of overdue days (in days) in the following 5 overdue days interval: [ 0, 1), [1, 3), [3, 5), [5, 7), [7, ∞). Discretize the characteristics of historical loan times into characteristic value 1, characteristic value 2, ..., characteristic value 5. Characteristic value 1 to characteristic value 5 represent the historical loan times (unit: times) in the following 5 historical loan times ranges: [ 0, 1), [1, 2), [2, 3), [3, 5), [5, ∞). Discretize the feature value 1, feature value 2,..., feature value 5, feature value 1 to feature value 5 representing the number of historical loan overdue times (unit is times) in the following 5 historical loan overdue frequency ranges : [0, 1), [1, 2), [2, 3), [3, 5), [5, ∞).

Further, for any one of the various features, the degree of association between the feature and the category feature can be calculated. The degree of association can be represented by mutual information. Mutual information refers to the measurement of information that a random variable contains another random variable. The greater the value of mutual information, the stronger the coupling between the two random variables and the greater the degree of association. Among them, the mutual information of each feature and the category feature of whether the user answers the phone or not can meet the following conditions:

Take the age feature as an example. The random variable X represents the age feature, the random variable Y represents the categorical feature of whether the phone is answered, R(X) represents the range of the random variable X, because each feature value of the age feature is feature value 1～feature The value is 7, so R(X)={1,2,3,4,5,6,7}, R(Y) represents the value range of the random variable Y, because each feature value of the category feature of whether the call is answered is Yes or No, so R(Y)={Yes, No}. For any feature value (ie x) in the range R(X) of the random variable X, P(x) represents the proportion of the number of users whose age feature is x to the number of 20,000 users. For the random variable Any feature value (ie y) in the value range R(Y) of Y, P(y) represents the proportion of the number of users whose feature value of the category feature is y to the number of 20,000 users, and P(x, y) represents age The ratio of the number of users whose feature value is x and the feature value of category feature is y to the number of 20,000 users.

When the mutual information between each feature and the category feature of whether the user answers the call is determined, the feature whose mutual information is greater than the third preset threshold may be used as a strong correlation feature. Wherein, the third preset threshold can be set by those skilled in the art based on experience, for example, it can be 0.5 or 0.8, which is not specifically limited.

For ease of understanding, it is assumed that the strongly correlated features include X ₁ , X ₂ , X ₃ ,..., X _n .

Further, the embodiment of the present invention may be based on Naive Bayes using the feature value training of 20,000 users under strong correlation features to obtain the first model, specifically, for each feature value of each strong correlation feature (such as feature value The combination is x ₁ , x ₂ , x ₃ ,..., x _n , which are respectively a certain feature value of strong correlation feature X ₁ , strong correlation feature X ₂ , strong correlation feature X ₃ ,..., strong correlation feature X _n ) The sample data obtained by combining, the type of whether the sample data will answer the call

The value of can be:

Among them, P(x _i |y) is the posterior probability, and x _i is the sample data obtained by the combination of eigenvalues as x ₁ , x ₂ , x ₃ ,..., x _n.

Based on the probability formula, P(x _i |y) can be expressed as:

When the denominator is not considered, the above formula can be simplified to:

Since the number of samples corresponding to some eigenvalues may be 0, in order to avoid the situation where the denominator is 0 during the calculation process, the P(y) and P(xi|y ) Is rewritten as:

P(y)=(N _y +1)/(N+2)

P(x _i │y)=(N _y,xi +1)/(N _y +L _xi )

Wherein, N is the number of 20,000 users, the number N _y wherein y class feature value of a user, N _y, wherein _xi is the value of y class feature and characteristic features of the X _i is the number of users xi , L _xi is the size range characteristic of X _i, i.e. the number of feature values possible values x _i.

In this way, the first model can be identified by the above formulas. When predicting the value of any user under the behavioral characteristics, the following formula can be used to determine the probability that the user’s characteristic value under the behavioral characteristics is yes and the user’s behavioral characteristics under the The probability that the characteristic value of is No:

If the probability that the characteristic value of the user under the behavior characteristics is yes is greater than the probability that the characteristic value of the user under the behavior characteristics is no, then it is determined that the user is a user who can answer the phone, and the user category of the user is the first user category . If the probability that the characteristic value of the user under the behavior characteristics is No is greater than the probability that the characteristic value of the user is yes under the behavior characteristics, the user is determined to be a user who will not answer the phone, and the user category of the user is the second user category.

In an example, when using new data to update the first model, you can first based on the continuous characteristics of all users in the new data (ie age, the amount of this loan, the amount of arrears this time, the number of days overdue for this loan) , Historical loan times and historical loan overdue times) discretize each continuous feature, and then count the number of users who answered the phone and the number of users who did not answer the phone in the new data, and update the first model accordingly _{The N y} and N _y,xi in the formula of, then update P _y and P(x _i │y) based on the updated N _y and N _y,xi to complete the update of the first model.

Obviously, by setting the first model as a classification model and expressing the first model by various formulas, the first model can be updated quickly and in real time, so that the update efficiency of the first model is better. Moreover, by determining the degree of association between each feature and the behavior of answering the phone, the first model can be trained based on only the features with a higher degree of association. In this way, the amount of data involved in training is less, and the efficiency of training the model is higher. In addition, since the training data used is more concentrated on the feature data that is strongly related to the behavior of answering the phone, the training process of the first model is more aggregated, and the model effect is better.

Second model

In the embodiment of the present invention, the second model may be a neural network model.

In specific implementation, the feature values of multiple users under each feature are acquired, and each feature value of each feature is identified by each numerical value. For any user, construct the feature vector of the user under each feature according to the feature value of the user under each feature and the feature value of each feature, and stitch the feature vector of the user under each feature to obtain The first feature vector corresponding to the user. The second feature vector corresponding to the user is obtained according to the feature value value of the user under the category feature of whether to repay. Further, the first feature vector corresponding to multiple users can be used as the model input to obtain the prediction vector results of multiple user repayments, and adjust based on the second feature vectors of multiple users and the prediction vector results of multiple user repayments The model parameters of the second model are used to obtain the optimized second model.

For ease of understanding, a specific example is given below to describe the training process of the second model. In this example, the second model can include an input layer, a hidden layer, and an output layer. The input layer, hidden layer, and output layer adopt a fully connected structure. The hidden layer can be set with 10 neuron nodes, and the output layer can be set with 2. A neuron node, the activation function of the hidden layer uses the ReLU function, and the activation function of the output layer uses the Softmax function, which represents the probability value of the user's repayment.

In specific implementation, the data of multiple (for example, 50,000) users who have made a collection call by the collection robot 110 in the historical period can be obtained first. The data of each user can include the value of the user under various characteristics, such as the user's data. Gender, age, education, occupation, marital status, city of residence, the amount of this loan, the amount of this loan, the number of days that this loan is overdue, the number of historical loans and the number of historical loan overdue, etc., and can also include a collection call to the user The value under the category feature of whether the user repays at the time. Wherein, the user used to train the second model and the user used to train the first model may be partially the same or completely different, which is not specifically limited.

Further, each continuous feature can be discretized according to the discrete method used when training the first model, and then one-hot encoding is used to convert each feature value of each feature into a numerical form. For example, since there are two feature values (male and female) for gender features, one-hot encoding can convert the two feature values of gender features into a vector with 1 row and 2 columns. If the gender of a user is male , Then the feature vector of the user under the gender feature is (1, 0). Since the marital status feature has 4 feature values (unmarried, married, widowed, divorced), one-hot encoding can convert the 4 feature values of the marital status feature into a vector with 1 row and 4 columns. If the marital status of a user is widowed, the feature vector of the user under the marital status feature is (0, 0, 1, 0). Correspondingly, one-hot encoding can transform the 11 feature values of academic features (primary school, junior high school, high school, technical secondary school, vocational school, technical school, junior college, undergraduate, master graduate, doctoral student, postdoctoral) into 1 row and 11 columns The vector of occupational characteristics (agriculture, forestry, animal husbandry, fishery, water conservancy, industry, geological survey and exploration, construction, transportation, post and telecommunications, commerce, public catering, material supply and storage, real estate management , Public utilities, resident services and consulting services, health, sports and social welfare, education, culture and art, radio and television, scientific research and comprehensive technical services, finance, insurance, state agencies, party and government agencies, and social organizations , Other industries) is transformed into a 1-row and 13-column vector, the 338 feature values of resident city features (337 major cities, other cities) are transformed into a 1-row and 338-column vector, and the 7 feature values of age features are converted into A vector with 1 row and 7 columns converts the 5 feature values of the loan amount feature into a vector with 1 row and 5 columns, and converts the 5 feature values of the loan amount feature this time into a vector with 1 row and 5 columns. Convert the 5 feature values of the feature of overdue days of the sub-loan into a vector with 1 row and 5 columns, convert the 5 feature values of the historical loan frequency feature into a vector of 1 row and 5 columns, and convert the 5 feature value feature of the historical loan overdue feature It is a vector with 1 row and 5 columns.

In this way, for any user, the feature vector of the user under each feature can be determined according to the feature value of the user under each feature, and then the feature vector of the user under each feature can be spliced head to tail to obtain the The first feature vector corresponding to the user. According to the above analysis, the first feature vector corresponding to the user can be a one-dimensional vector with 1 row and 400 columns. Correspondingly, the second feature vector corresponding to the user is determined according to the feature value of the user under the category feature of whether to repay, and the second feature vector corresponding to the user may be a one-dimensional vector with 1 row and 2 columns. For example, if the user has repaid, the second feature vector corresponding to the user may be [1, 0], and if the user has not repaid, the second feature vector corresponding to the user may be [0, 1]. Further, after the feature vectors corresponding to 50,000 users (including the first feature vector and the second feature vector) are obtained, the 50,000 feature vectors can be divided into training feature vectors, test feature vectors, and verification feature directions. Among them, the division can be divided according to a random ratio, or can also be divided according to a preset ratio, without limitation. Assuming that these 50,000 feature vectors are divided into 35,000 training feature vectors, 10,000 test feature vectors, and 5000 verification feature vectors, the first feature vector of the 35,000 training feature vectors can be input to the neural network model to make the neural network The model outputs 35,000 second prediction feature vectors, and then adjusts the parameters of the neural network model based on the 35,000 second prediction feature vectors and the second feature vector of the 35,000 training feature vectors to obtain the second model. Correspondingly, 10,000 test feature vectors can be used to test the model effect of the second model, 5000 verification feature vectors can be used to verify whether the test effect of the second model reaches the preset effect, 10,000 test feature vectors and 5000 verification features The vector can also be used to optimize the model parameters of the second model.

In the embodiment of the present invention, the user's feature vector is obtained by determining the user's feature vector under each feature, and joining the user's feature vector value under each feature to obtain the user's feature vector, so that the user's feature vector can integrate the characteristics of each feature value. Feature information, the information is more comprehensive, and the form of expression is more concise. In this way, the model obtained based on the model input training with rich information and concise form has better effect and higher training efficiency.

In the above embodiment of the present invention, the first list is first obtained, the user category of each user in the first list is determined using the first model, and then the number of users belonging to each user category in the first list is counted, and the first list is determined based on the number. Duration, if the first duration exceeds the set duration, the second model is used to determine the probability of each user belonging to the first user category performing the preset behavior, and the second list is determined according to the probability. Among them, the first list includes multiple users who have not performed the preset behavior, and the user category of each user includes the first user category. The first user category indicates that the user will answer the call made by the collection system, and the first duration indicates that the user will receive calls from the collection system. The time required for all users in the list to make calls, and the second list is used to indicate the users who need to make calls after the current time. In the embodiment of the present invention, after receiving the first list, the first model is used to predict whether each user in the first list will answer the call, and the time to complete the collection task is determined, and then when it is determined that the collection task cannot be completed The second model is used to identify users with a higher probability of successful collection, so that when it is determined that the collection task cannot be completed, users with a higher success rate of collection can be given priority to call collection calls to improve the collection effect.

In view of the foregoing method flow, an embodiment of the present invention also provides a data processing device, and the specific content of the device can be implemented with reference to the foregoing method.

Fig. 4 is a schematic structural diagram of a data processing device provided by an embodiment of the present invention, including:

The obtaining module 401 is configured to obtain a first list; the first list includes multiple users who have not performed a preset behavior;

The determining module 402 is configured to determine the user category of each user in the first list using a first model, wherein the user category of each user includes a first user category, and the first user category indicates that the user will answer the collection Phone calls made by the system;

The processing module 403 is configured to count the number of users belonging to each user category in the first list, and determine a first duration based on the number, and the first duration represents making calls to all users in the first list The required duration; if the first duration exceeds the set duration, the second model is used to determine the probability of each user belonging to the first user category performing the preset behavior, and the second is determined according to the probability List; the second list is used to indicate users who need to make a call after the current moment.

Optionally, the user category of each user further includes a second user category, and the second user category represents that the user will not answer calls made by the collection system. In this case, the acquiring module 401 may also acquire the first call duration corresponding to the first user category and the second call duration corresponding to the second user category. Wherein, the first call duration is determined according to the call duration of each user who answered the call in the historical time period, and the second call duration is determined according to the call duration waiting to be answered after the call is made to the user. The determining module 402 may be based on the number of users belonging to the first user category in the first list and the first call duration, the number of users belonging to the second user category in the first list, and the second user category. The call duration determines the total call duration for making calls to all users in the first list; the first duration is determined based on the total call duration and the number of available phone numbers.

Optionally, the determining module 402 determines the available phone number in the following manner: for a plurality of phone numbers previously applied for by an operator, a prediction is obtained based on the total call duration and the number of the plurality of phone numbers Time length, determining the probability of the multiple phone numbers going offline within the predicted time length, and using a phone number with a probability not greater than a first preset threshold as the available phone number.

Optionally, the device further includes a dialing module 404. While the determining module 402 uses a first model to determine the user category of each user in the first list, the dialing module 404 can use the first model to determine the user category of each user in the first list. The contact information of each user in the list, using the available phone number to make a call to each user.

Optionally, when the first duration does not exceed the set duration, if a request message for processing the third list is received within the first duration, the processing module 403 may also be based on the first duration. The model determines the second length of time required to make calls to all users in the third list. If the sum of the first duration and the second duration exceeds the set duration, the processing module 403 may also refuse to receive the third list.

Optionally, the first model is a classification model. In this case, the processing module 403 can also obtain the feature values of multiple users under each feature, and for any feature, according to the number of users who answered the call among the multiple users, and the number of users who did not answer the call. The number, the number of users corresponding to each characteristic value of the characteristic, the number of users answering the phone among the users corresponding to each characteristic value, and the number of users who have not answered the phone among the users corresponding to each characteristic value, determine the The degree of association between the feature and the behavior of whether the user answers the phone, and the feature with the degree of association with the behavior of whether the user answers the phone is greater than or equal to the second preset threshold as a strong correlation feature, based on the characteristics of the user who answers the phone among the multiple users The number, the number of users who have not answered the call, the number of users corresponding to each feature value of the strong correlation feature, the number of users who answer the phone among the users corresponding to each feature value of the strong correlation feature, and the strong correlation feature The first model is obtained by training the number of users who have not answered the phone corresponding to each feature value of.

Optionally, the degree of association between each feature and whether the user answers the call satisfies the following conditions:

Optionally, the second model is a neural network model. In this case, the processing module 403 can also obtain the feature values of multiple users under each feature, and for any user, according to the feature value of the user under each feature and each feature of each feature. The feature value constructs the feature vector of the user under each feature, and stitches the feature vector of the user under each feature to obtain the first feature vector corresponding to the user; according to whether the user performs the preset The behavior obtains the second feature vector corresponding to the user, and the first feature vector corresponding to the multiple users is used as a model input to obtain the prediction result of the multiple users performing the preset behavior, based on the multiple users The second feature vector of and the prediction results of the multiple users performing the preset behavior adjust model parameters to obtain the second model.

Optionally, the processing module 403 is further configured to obtain each feature value of each feature in the following manner: if the feature is a discrete feature, then count the various values of the multiple users under the feature, and calculate the The respective values are used as the respective feature values of the feature; if the feature is a continuous feature, the value ranges of the multiple users under the feature are counted, and the value range is divided into multiple value ranges Interval, a corresponding feature value is set for each value range interval, and each feature value of the feature is obtained.

It can be seen from the foregoing that: in the foregoing embodiment of the present invention, the first list is first obtained, the first model is used to determine the user category of each user in the first list, and then the number of users belonging to each user category in the first list is counted , And determine the first duration based on the number. If the first duration exceeds the set duration, the second model is used to determine the probability of each user belonging to the first user category performing the preset behavior, and the second list is determined according to the probability. Among them, the first list includes multiple users who have not performed the preset behavior, and the user category of each user includes the first user category. The first user category indicates that the user will answer the call made by the collection system. The time required for all users in the list to make calls, and the second list is used to indicate the users who need to make calls after the current time. In the embodiment of the present invention, after receiving the first list, the first model is used to predict whether each user in the first list will answer the call, and the time to complete the collection task is determined, and then when it is determined that the collection task cannot be completed The second model is used to identify users with a higher probability of successful collection, so that when it is determined that the collection task cannot be completed, users with a higher success rate of collection can be given priority to call collection calls to improve the collection effect.

Based on the same inventive concept, an embodiment of the present invention also provides a computing device. As shown in FIG. 5, it includes at least one processor 501 and a memory 502 connected to the at least one processor. The embodiment of the present invention does not limit the processor. For the specific connection medium between the 501 and the memory 502, the connection between the processor 501 and the memory 502 in FIG. 5 is taken as an example. The bus can be divided into address bus, data bus, control bus and so on.

In the embodiment of the present invention, the memory 502 stores instructions that can be executed by at least one processor 501, and the at least one processor 501 can execute the steps included in the aforementioned data processing method by executing the instructions stored in the memory 502.

Among them, the processor 501 is the control center of the computing device, which can use various interfaces and lines to connect various parts of the computing device, and realize data by running or executing instructions stored in the memory 502 and calling data stored in the memory 502. deal with. Optionally, the processor 501 may include one or more processing units, and the processor 501 may integrate an application processor and a modem processor. The application processor mainly processes the operating system, user interface, and application programs. The adjustment processor mainly handles issuing instructions. It can be understood that the foregoing modem processor may not be integrated into the processor 501. In some embodiments, the processor 501 and the memory 502 may be implemented on the same chip, and in some embodiments, they may also be implemented on separate chips.

The processor 501 may be a general-purpose processor, such as a central processing unit (CPU), a digital signal processor, an application specific integrated circuit (ASIC), a field programmable gate array or other programmable logic devices, discrete gates or transistors Logic devices and discrete hardware components can implement or execute the methods, steps, and logic block diagrams disclosed in the embodiments of the present invention. The general-purpose processor may be a microprocessor or any conventional processor or the like. The steps of the method disclosed in combination with the data processing embodiment may be directly embodied as being executed and completed by a hardware processor, or executed and completed by a combination of hardware and software modules in the processor.

As a non-volatile computer-readable storage medium, the memory 502 can be used to store non-volatile software programs, non-volatile computer-executable programs, and modules. The memory 502 may include at least one type of storage medium, such as flash memory, hard disk, multimedia card, card-type memory, random access memory (Random Access Memory, RAM), static random access memory (Static Random Access Memory, SRAM), Programmable Read Only Memory (PROM), Read Only Memory (ROM), Electrically Erasable Programmable Read-Only Memory (EEPROM), magnetic memory, disk , CD, etc. The memory 502 is any other medium that can be used to carry or store desired program codes in the form of instructions or data structures and that can be accessed by a computer, but is not limited thereto. The memory 502 in the embodiment of the present invention may also be a circuit or any other device capable of realizing a storage function for storing program instructions and/or data.

Based on the same inventive concept, embodiments of the present invention also provide a computer-readable storage medium that stores a computer program executable by a computing device, and when the program runs on the computing device, the computing device executes Figure 2 arbitrarily described data processing method.

Those skilled in the art should understand that the embodiments of the present invention can be provided as methods or computer program products. Therefore, the present invention may adopt the form of a complete hardware embodiment, a complete software embodiment, or an embodiment combining software and hardware. Moreover, the present invention may adopt the form of a computer program product implemented on one or more computer-usable storage media (including but not limited to disk storage, CD-ROM, optical storage, etc.) containing computer-usable program codes.

The present invention is described with reference to flowcharts and/or block diagrams of methods, devices (systems), and computer program products according to embodiments of the present invention. It should be understood that each process and/or block in the flowchart and/or block diagram, and the combination of processes and/or blocks in the flowchart and/or block diagram can be implemented by computer program instructions. These computer program instructions can be provided to the processor of a general-purpose computer, a special-purpose computer, an embedded processor, or other programmable data processing equipment to generate a machine, so that the instructions executed by the processor of the computer or other programmable data processing equipment are generated It is a device that realizes the functions specified in one process or multiple processes in the flowchart and/or one block or multiple blocks in the block diagram.

These computer program instructions can also be stored in a computer-readable memory that can guide a computer or other programmable data processing equipment to work in a specific manner, so that the instructions stored in the computer-readable memory produce an article of manufacture including the instruction device. The device implements the functions specified in one process or multiple processes in the flowchart and/or one block or multiple blocks in the block diagram.

These computer program instructions can also be loaded on a computer or other programmable data processing equipment, so that a series of operation steps are executed on the computer or other programmable equipment to produce computer-implemented processing, so as to execute on the computer or other programmable equipment. The instructions provide steps for implementing the functions specified in one process or multiple processes in the flowchart and/or one block or multiple blocks in the block diagram.

Although the preferred embodiments of the present invention have been described, those skilled in the art can make additional changes and modifications to these embodiments once they learn the basic creative concept. Therefore, the appended claims are intended to be interpreted as including the preferred embodiments and all changes and modifications falling within the scope of the present invention.

Obviously, those skilled in the art can make various changes and modifications to the present invention without departing from the spirit and scope of the present invention. In this way, if these modifications and variations of the present invention fall within the scope of the claims of the present invention and their equivalent technologies, the present invention is also intended to include these modifications and variations.

Claims

A data processing method, characterized in that the data processing method is applied to a collection system, and the method includes:

Obtain a first list; the first list includes multiple users who have not performed a preset behavior;

Use the first model to determine the user category of each user in the first list, wherein the user category of each user includes a first user category, and the first user category represents that the user will answer the call made by the collection system;

Count the number of users belonging to each user category in the first list, and determine a first duration based on the number, where the first duration represents the duration required to make calls to all users in the first list;

If the first duration exceeds the set duration, a second model is used to determine the probability of each user belonging to the first user category performing the preset behavior, and a second list is determined according to the probability; The second list is used to indicate users who need to make a call after the current time.
The method according to claim 1, wherein the user category of each user further includes a second user category, and the second user category indicates that the user will not answer the call made by the collection system;

The determining the first duration based on the number includes:

Acquire the first call duration corresponding to the first user category and the second call duration corresponding to the second user category; the first call duration is determined according to the call duration of calls made to each user who answers the call in the historical period; the first 2. The duration of the call is determined based on the duration of the call waiting to be answered after making a call to the user;

According to the number of users belonging to the first user category and the first call duration in the first list, the number of users belonging to the second user category in the first list and the second call duration, determine the number of users State the total duration of calls made by all users in the first list;

The first duration is determined based on the total call duration and the number of available phone numbers.
The method according to claim 2, wherein the available telephone number is determined in the following manner:

For multiple phone numbers previously applied for by the operator, based on the total call duration and the number of the multiple phone numbers, the predicted duration is obtained, and the probability of the multiple phone numbers going offline within the predicted duration is determined, Use a phone number with a probability not greater than the first preset threshold as the available phone number.
The method according to claim 3, characterized in that, while said using the first model to determine the user category of each user in the first list, the method further comprises:

According to the contact information of each user in the first list, use the available phone number to make a call to each user.
The method according to any one of claims 1 to 4, wherein the method further comprises:

When the first time length does not exceed the set time length, if a request message for processing the third list is received within the first time length, it is determined based on the first model to send all users in the third list The second time required for the user to make a call;

If the sum of the first duration and the second duration exceeds the set duration, refuse to receive the third list.
The method according to claim 1, wherein the first model is a classification model, and the first model is obtained in the following manner:

Acquire the feature value of multiple users under each feature; for any feature, according to the number of users who answered the phone, the number of users who did not answer the phone, and the corresponding feature value of each feature in the multiple users The number of users, the number of users who answer the phone among the users corresponding to each characteristic value, and the number of users who do not answer the phone among the users corresponding to each characteristic value, determine the degree of association between the characteristic and the behavior of whether the user answers the phone;

The feature whose degree of association with the user’s behavior of answering a call is greater than or equal to the second preset threshold is taken as a strong correlation feature, based on the number of users who answered the call, the number of users who did not answer the call, and the The number of users corresponding to each feature value of the strong correlation feature, the number of users who answered the phone among the users corresponding to each feature value of the strong correlation feature, and the number of users who did not answer the call among the users corresponding to each feature value of the strong correlation feature The number of users is trained to obtain the first model.
The method according to claim 6, wherein the degree of association between each feature and whether the user answers the call satisfies the following conditions:

Among them, X is any feature, R(X) is the feature value set of X feature, including each feature value of X feature, x is any feature value of feature X; Y is the behavior of whether the user answers the phone, R(Y ) Is the behavior set of whether the user answers the phone, including the behavior of the user answering the phone and the behavior of the user not answering the phone, y is the behavior of the user answering the phone or the behavior of the user not answering the phone; I(X, Y) is the feature X and the user The degree of association of the behavior of answering the phone, P(x,y) is the ratio of the number of users who have performed the behavior y among the users corresponding to the characteristic value x to the total number of users, and P(x) is the proportion of the users corresponding to the characteristic value x The ratio of the total number of users, P(y) is the ratio of the number of users who have performed behavior y to the total number of users.
The method according to claim 1, wherein the second model is a neural network model, and the second model is obtained in the following manner:

Obtain feature values of multiple users under each feature;

For any user, construct the feature vector of the user under each feature according to the feature value of the user under each feature and the feature value of each feature, and splice the feature vector of the user under each feature The following feature vector to obtain the first feature vector corresponding to the user; to obtain the second feature vector corresponding to the user according to whether the user performs the preset behavior;

The first feature vectors corresponding to the multiple users are used as model input to obtain the prediction result of the multiple users performing the preset behavior, based on the second feature vectors of the multiple users and the multiple users performing The prediction result of the preset behavior adjusts model parameters to obtain the second model.
The method according to any one of claims 6 to 8, wherein each characteristic value of each characteristic is obtained in the following manner:

If the feature is a discrete feature, then count the values of the multiple users under the feature, and use each value as each feature value of the feature; if the feature is a continuous feature, then count the The value range of multiple users under the feature, the value range is divided into multiple value range intervals, a corresponding feature value is set for each value range interval, and each feature value of the feature is obtained .
A data processing device, characterized in that the device includes:

An obtaining module, configured to obtain a first list; the first list includes a plurality of users who have not performed a preset behavior;

The determining module is configured to determine the user category of each user in the first list using the first model, wherein the user category of each user includes a first user category, and the first user category indicates that the user will answer the collection system Phone number dialed;

The processing module is configured to count the number of users belonging to each user category in the first list, and determine a first duration based on the number, and the first duration represents the number of calls made to all users in the first list The required duration; if the first duration exceeds the set duration, the second model is used to determine the probability of each user belonging to the first user category performing the preset behavior, and a second list is determined according to the probability ; The second list is used to indicate users who need to make calls after the current moment.
The device according to claim 10, wherein the user category of each user further includes a second user category, and the second user category indicates that the user will not answer the call made by the collection system;

The acquiring module is further configured to: acquire the first call duration corresponding to the first user category and the second call duration corresponding to the second user category; the first call duration is based on making calls to each user who answers the call within a historical period of time The duration of the call is determined; the second duration of the call is determined according to the duration of the call waiting to be answered after making a call to the user;

The determining module is specifically configured to: according to the number of users belonging to the first user category in the first list and the first call duration, the number of users belonging to the second user category in the first list and the The second call duration determines the total call duration for making calls to all users in the first list; the first duration is determined based on the total call duration and the number of available phone numbers.
The device according to claim 11, wherein the determining module determines the available phone number in the following manner:

For multiple phone numbers previously applied for by the operator, based on the total call duration and the number of the multiple phone numbers, the predicted duration is obtained, and the probability of the multiple phone numbers going offline within the predicted duration is determined, Use a phone number with a probability not greater than the first preset threshold as the available phone number.
The device according to claim 12, wherein the device further comprises a dialing module, and while the determining module uses the first model to determine the user category of each user in the first list, the dialing module is configured to:

According to the contact information of each user in the first list, the available phone number is used to make a call to each user.
The device according to any one of claims 10 to 13, wherein the processing module is further configured to:

When the first time length does not exceed the set time length, if a request message for processing the third list is received within the first time length, it is determined based on the first model to send all users in the third list The second time required for the user to make a call;

If the sum of the first duration and the second duration exceeds the set duration, refuse to receive the third list.
The device according to claim 10, wherein the first model is a classification model; and the processing module is further configured to:

Obtain feature values of multiple users under each feature;

For any feature, according to the number of users who answered the phone, the number of users who did not answer the phone, the number of users corresponding to each feature value of the feature, and the number of users corresponding to each feature value The number of users who answer the phone and the number of users who have not answered the phone among the users corresponding to each feature value, and determine the degree of association between the feature and the behavior of whether the user answers the phone;

The feature whose degree of association with the user’s behavior of answering a call is greater than or equal to the second preset threshold is taken as a strong correlation feature, based on the number of users who answered the call, the number of users who did not answer the call, and the The number of users corresponding to each feature value of the strong correlation feature, the number of users who answered the phone among the users corresponding to each feature value of the strong correlation feature, and the number of users who did not answer the call among the users corresponding to each feature value of the strong correlation feature The number of users is trained to obtain the first model.
The device according to claim 15, wherein the degree of association between each feature and whether the user answers the call satisfies the following conditions:

Among them, X is any feature, R(X) is the feature value set of X feature, including each feature value of X feature, x is any feature value of feature X; Y is the behavior of whether the user answers the phone, R(Y ) Is the behavior set of whether the user answers the phone, including the behavior of the user answering the phone and the behavior of the user not answering the phone, y is the behavior of the user answering the phone or the behavior of the user not answering the phone; I(X, Y) is the feature X and the user The degree of association of the behavior of answering the phone, P(x,y) is the ratio of the number of users who have performed the behavior y among the users corresponding to the characteristic value x to the total number of users, and P(x) is the proportion of the users corresponding to the characteristic value x The ratio of the total number of users, P(y) is the ratio of the number of users who have performed behavior y to the total number of users.
The device according to claim 10, wherein the second model is a neural network model, and the processing module is further configured to:

Obtain feature values of multiple users under each feature;

For any user, construct the feature vector of the user under each feature according to the feature value of the user under each feature and the feature value of each feature, and splice the feature vector of the user under each feature The following feature vector to obtain the first feature vector corresponding to the user; to obtain the second feature vector corresponding to the user according to whether the user performs the preset behavior;

The first feature vectors corresponding to the multiple users are used as model input to obtain the prediction result of the multiple users performing the preset behavior, based on the second feature vectors of the multiple users and the multiple users performing The prediction result of the preset behavior adjusts model parameters to obtain the second model.
The device according to any one of claims 15 to 17, wherein the processing module is further configured to obtain each characteristic value of each characteristic in the following manner:

If the feature is a discrete feature, then count the values of the multiple users under the feature, and use each value as each feature value of the feature; if the feature is a continuous feature, then count the The value range of multiple users under the feature, the value range is divided into multiple value range intervals, a corresponding feature value is set for each value range interval, and each feature value of the feature is obtained .
A computing device, characterized by comprising at least one processor and at least one memory, wherein the memory stores a computer program, and when the program is executed by the processor, the processor executes claim 1 -9 The method of any one of claims.
A computer-readable storage medium, characterized in that it stores a computer program executable by a computing device, and when the program runs on the computing device, the computing device executes any one of claims 1-9 Require the described method.