WO2021098652A1 - Data processing method and device - Google Patents

Data processing method and device Download PDF

Info

Publication number
WO2021098652A1
WO2021098652A1 PCT/CN2020/129121 CN2020129121W WO2021098652A1 WO 2021098652 A1 WO2021098652 A1 WO 2021098652A1 CN 2020129121 W CN2020129121 W CN 2020129121W WO 2021098652 A1 WO2021098652 A1 WO 2021098652A1
Authority
WO
WIPO (PCT)
Prior art keywords
feature
user
users
call
duration
Prior art date
Application number
PCT/CN2020/129121
Other languages
French (fr)
Chinese (zh)
Inventor
蔡远航
郑少杰
易剑韬
彭明
杨波
范增虎
Original Assignee
深圳前海微众银行股份有限公司
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by 深圳前海微众银行股份有限公司 filed Critical 深圳前海微众银行股份有限公司
Publication of WO2021098652A1 publication Critical patent/WO2021098652A1/en

Links

Images

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06QINFORMATION AND COMMUNICATION TECHNOLOGY [ICT] SPECIALLY ADAPTED FOR ADMINISTRATIVE, COMMERCIAL, FINANCIAL, MANAGERIAL OR SUPERVISORY PURPOSES; SYSTEMS OR METHODS SPECIALLY ADAPTED FOR ADMINISTRATIVE, COMMERCIAL, FINANCIAL, MANAGERIAL OR SUPERVISORY PURPOSES, NOT OTHERWISE PROVIDED FOR
    • G06Q40/00Finance; Insurance; Tax strategies; Processing of corporate or income taxes
    • G06Q40/03Credit; Loans; Processing thereof
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F18/00Pattern recognition
    • G06F18/20Analysing
    • G06F18/21Design or setup of recognition systems or techniques; Extraction of features in feature space; Blind source separation
    • G06F18/214Generating training patterns; Bootstrap methods, e.g. bagging or boosting
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F18/00Pattern recognition
    • G06F18/20Analysing
    • G06F18/24Classification techniques
    • G06F18/241Classification techniques relating to the classification model, e.g. parametric or non-parametric approaches
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06QINFORMATION AND COMMUNICATION TECHNOLOGY [ICT] SPECIALLY ADAPTED FOR ADMINISTRATIVE, COMMERCIAL, FINANCIAL, MANAGERIAL OR SUPERVISORY PURPOSES; SYSTEMS OR METHODS SPECIALLY ADAPTED FOR ADMINISTRATIVE, COMMERCIAL, FINANCIAL, MANAGERIAL OR SUPERVISORY PURPOSES, NOT OTHERWISE PROVIDED FOR
    • G06Q10/00Administration; Management
    • G06Q10/04Forecasting or optimisation specially adapted for administrative or management purposes, e.g. linear programming or "cutting stock problem"

Definitions

  • the present invention relates to the technical field of financial technology (Fintech), in particular to a data processing method and device.
  • the collection robot After the collection robot receives the collection list sent by various online lending companies, it generally collects the users in the collection list directly in accordance with the time sequence of receiving each collection list.
  • the number of collection robots receiving online loan companies every day is not fixed, and the number of users to be collected given by each online loan company is also not fixed. In this case, the total number of users to be collected by the collection robot each day cannot be determined.
  • this method uses a first-come, first-served manner to call collection calls to each user in turn, it may cause the collection task of the day to be unable to complete and reduce the collection effect.
  • the present invention provides a data processing method and device, which are used to solve the technical problem of poor collection effect caused by sequentially dialing collection calls in a first-come, first-served manner in the prior art.
  • the present invention provides a data processing method applied to a collection system.
  • the method includes: obtaining a first list, using the first model to determine the user category of each user in the first list, and making statistics on the first list The number of users belonging to each user category in the, and the first duration is determined based on the number. If the first duration exceeds the set duration, the second model is used to determine the probability of each user belonging to the first user category performing the preset behavior, and Determine the second list based on probability.
  • the first list includes multiple users who have not performed the preset behavior
  • the user category of each user includes the first user category
  • the first user category indicates that the user will answer the call made by the collection system.
  • the first duration represents the duration required to make calls to all users in the first list
  • the second list is used to indicate users who need to make calls after the current time.
  • the first model is used to predict whether each user in the first list will answer the call (that is, the user category), and the time to complete the collection task is determined, and then the time to complete the collection task is determined.
  • the second model is used to determine the user with a higher probability of successful collection.
  • the user with a higher success rate can be given priority to call the collection phone, which helps to improve the collection effect.
  • the user category of each user further includes a second user category
  • the second user category indicates that the user will not answer the call made by the collection system.
  • determining the first duration based on the number includes: first obtaining the first call duration corresponding to the first user category and the second call duration corresponding to the second user category, and then according to the first user category in the first list
  • the number of users and the first call duration, the number of users belonging to the second user category in the first list and the second call duration determine the total call duration to make calls to all users in the first list, and finally based on the total call duration And the number of available phone numbers to determine the first duration.
  • the first call duration is determined according to the call duration of each user who answered the call in the historical time period; the second call duration is determined according to the call duration waiting to be answered after the call is made to the user.
  • the first call duration of the user who answered the call is determined by using the call duration of the call to the user to call the collection call within the historical time period, so that the first call duration combines the characteristics of the historical dialing information, so that each received call can be accurately identified
  • the second call duration is the call duration waiting to be answered, so that the call duration of each user who does not answer the call can be accurately identified.
  • the total call duration required to make a collection call to the users in the first list who answered the call can be determined, which is predicted by the second call and the first model
  • the number of users who do not answer the call can determine the total call time required for the users who do not answer the call in the first list to make a collection call, so as to predict the total call time to make calls to all users in the first list.
  • This method is based on The historical data is analyzed to better meet the actual business situation and make the predicted first time period more accurate.
  • the available phone numbers can be determined in the following way: For multiple phone numbers previously applied for by the operator, first obtain the predicted duration based on the total call duration and the number of multiple phone numbers, and then determine the multiple phone numbers. The probability of a phone number going offline within the predicted time period, and then a phone number whose probability is not greater than the first preset threshold is used as an available phone number.
  • the number of phone numbers that may go offline during the execution period of the collection task can be prejudged . In this way, by using the number of phone numbers that will not go offline to determine the first time period, the risk of phone numbers going offline can be predicted in advance, and the accuracy of the completion of the collection task can be guaranteed.
  • the risk judgment can be used as an auxiliary means to help normal business execution without occupying the collection robot to make normal calls. Time to collect calls, thereby helping to reduce the impact of the risk judgment process on normal business.
  • the first duration does not exceed the set duration
  • a request message for processing the third list is received within the first duration
  • it can also be determined based on the first model to send to the third list
  • the total call duration for calling all users in the first list and the third list is judged in advance, and the total call duration exceeds the set duration and refuses to receive it.
  • the third list can avoid accepting collection tasks that cannot be completed, thereby helping to reduce customer losses.
  • the first model can be a classification model, and the first model can be obtained in the following way: first obtain the feature values of multiple users under each feature, and then for any feature, according to multiple users The number of users who answered the phone, the number of users who did not answer the phone, the number of users corresponding to each characteristic value of the characteristic, the number of users who answered the phone among the users corresponding to each characteristic value, and the users corresponding to each characteristic value The number of users who did not answer the call in the, determines the degree of correlation between the feature and the behavior of whether the user answered the call.
  • the feature whose degree of association with the user's behavior of answering the phone is greater than or equal to the second preset threshold is taken as the strong correlation feature, based on the number of users who answered the phone, the number of users who did not answer the phone, and the strong correlation among multiple users.
  • the first model can be trained based on only the features with a higher degree of association.
  • the amount of data involved in training is less, and the training model is more efficient. High; and, because the training data used is more concentrated on the feature data that is strongly related to the behavior of answering the phone, the training process of the first model is more aggregated, and the model effect can also be better.
  • the degree of association between each feature and whether the user answers the call can satisfy the following conditions:
  • X is any feature
  • R(X) is the feature value set of X feature, including each feature value of X feature
  • x is any feature value of feature X
  • Y is the behavior of whether the user answers the phone
  • R(Y ) Is the behavior set of whether the user answers the phone, including the behavior of the user answering the phone and the behavior of the user not answering the phone
  • y is the behavior of the user answering the phone or the behavior of the user not answering the phone
  • I(X, Y) is the feature X and the user
  • the degree of association of the behavior of answering the phone, P(x,y) is the ratio of the number of users who have performed the behavior y among the users corresponding to the characteristic value x to the total number of users
  • P(x) is the proportion of the users corresponding to the characteristic value x
  • the ratio of the total number of users, P(y) is the ratio of the number of users who have performed behavior y to the total number of users.
  • the degree of association between each feature and the behavior of answering the phone is obtained, so that the degree of association integrates the relevant information of each feature value. , As the information used is richer, the degree of association can be made more accurate.
  • the second model can be a neural network model, and the second model can be obtained in the following manner: first obtain the feature values of multiple users under each feature, and then target any user according to the user’s current status. The feature value under each feature and each feature value of each feature construct the feature vector of the user under each feature, and the feature vector of the user under each feature is spliced to obtain the first feature vector corresponding to the user.
  • the second feature vector corresponding to the user is obtained according to whether the user performs the preset behavior, and then the first feature vector corresponding to the multiple users is used as the model input to obtain the prediction result of the multiple users performing the preset behavior, and finally based on the multiple users
  • the second feature vector and the prediction results of multiple users performing preset behaviors adjust the model parameters to obtain the second model.
  • the feature vector of the user can integrate the feature value of each feature. Feature information, the information is more comprehensive, and the form of expression is more concise. In this way, the model obtained based on the model input training with rich information and concise form has better effect and higher training efficiency.
  • each feature value of each feature can be obtained in the following way: if the feature is a discrete feature, then the various values of multiple users under the feature can be counted, and these values are taken as the feature Each characteristic value of. If the feature is a continuous feature, you can count the value ranges of multiple users under the feature, and then divide the value range into multiple value range intervals, and set a corresponding characteristic value for each value range interval. Get each feature value of the feature.
  • each feature (including the continuous feature and the discrete feature) can have the same discrete manifestation, so that each discrete feature value can be used as training when training the model Data, without the need to fit the probability distribution function to continuous features, which can improve the efficiency of data processing.
  • the present invention provides a data processing device, the device includes: an acquisition module, configured to acquire a first list, the first list includes a plurality of users who have not performed a preset behavior; a determining module, configured to use the first list
  • the model determines the user category of each user in the first list.
  • the user category of each user includes the first user category.
  • the first user category represents that the user will answer the call made by the collection system; the processing module is used to count the number of users in the first list. The number of users in the user category, and the first duration is determined based on the number.
  • the first duration represents the duration required to make calls to all users in the first list; if the first duration exceeds the set duration, the second model is used to determine that they belong to The probability of each user in the first user category performing the preset behavior, and the second list is determined according to the probability; the second list is used to indicate users who need to make a call after the current moment.
  • the user category of each user further includes a second user category, and the second user category indicates that the user will not answer the call made by the collection system.
  • the acquiring module may also acquire the first call duration corresponding to the first user category and the second call duration corresponding to the second user category.
  • the determining module can determine the number of users in the first list and the first call duration, the number of users belonging to the second user category in the first list, and the second call duration to determine the number of users in the first list
  • the total call duration of the user's call; the first duration is determined based on the total call duration and the number of available phone numbers.
  • the first call duration is determined based on the call duration of each user who answered the call in the historical time period, and the second call duration is determined based on the call duration waiting to be answered after the call is made to the user.
  • the determining module can determine the available phone numbers in the following way: For multiple phone numbers previously applied for by the operator, based on the total call duration and the number of multiple phone numbers, obtain the predicted duration, and determine With regard to the probability of multiple phone numbers going offline within the predicted time period, a phone number whose probability is not greater than the first preset threshold is used as an available phone number.
  • the device may further include a dialing module. While the determining module uses the first model to determine the user category of each user in the first list, the dialing module may use the contact information of each user in the first list. , Use the available phone number to call each user.
  • the processing module may also determine to send to the third list based on the first model The second time required for all users in to make calls. If the sum of the first duration and the second duration exceeds the set duration, the processing module may also refuse to receive the third list.
  • the first model may be a classification model.
  • the processing module can also obtain the characteristic values of multiple users under each characteristic. For any characteristic, according to the number of users who answered the phone, the number of users who did not answer the phone, and the characteristic value of the multiple users.
  • the number of users corresponding to each characteristic value, the number of users who answered the phone among the users corresponding to each characteristic value, and the number of users who did not answer the phone among the users corresponding to each characteristic value determine the characteristic and the behavior of whether the user answers the phone Then, the feature that is related to the behavior of whether the user answers the call is greater than or equal to the second preset threshold as a strong correlation feature, based on the number of users who answered the call and the number of users who did not answer the call among multiple users , The number of users corresponding to each feature value of the strong correlation feature, the number of users who answer the phone among the users corresponding to each feature value of the strong correlation feature, and the number of users who have not answered the phone among the users corresponding to each feature value of the strong correlation feature , The first model is obtained by training.
  • the degree of association between each feature and whether the user answers the call satisfies the following conditions:
  • X is any feature
  • R(X) is the feature value set of X feature, including each feature value of X feature
  • x is any feature value of feature X
  • Y is the behavior of whether the user answers the phone
  • R(Y ) Is the behavior set of whether the user answers the phone, including the behavior of the user answering the phone and the behavior of the user not answering the phone
  • y is the behavior of the user answering the phone or the behavior of the user not answering the phone
  • I(X, Y) is the feature X and the user
  • the degree of association of the behavior of answering the phone, P(x,y) is the ratio of the number of users who have performed the behavior y among the users corresponding to the characteristic value x to the total number of users
  • P(x) is the proportion of the users corresponding to the characteristic value x
  • the ratio of the total number of users, P(y) is the ratio of the number of users who have performed behavior y to the total number of users.
  • the second model can be a neural network model
  • the processing module can also obtain the feature value of multiple users under each feature, for any user, according to the user's feature value under each feature Construct the feature vector of the user under each feature with each feature value of each feature, join the feature vector of the user under each feature to obtain the first feature vector corresponding to the user, and then obtain the user corresponding to the user according to whether the user performs a preset behavior
  • the second feature vector, and then the first feature vector corresponding to multiple users is used as the model input, and the prediction result of multiple users performing preset behaviors is obtained, based on the second feature vector of multiple users and multiple users performing preset behaviors
  • the prediction result adjusts the model parameters to obtain the second model.
  • the processing module can also obtain each feature value of each feature in the following manner: if the feature is a discrete feature, then the various values of multiple users under the feature can be counted, and each value is taken as Each feature value of the feature; if the feature is a continuous feature, you can count the value ranges of multiple users under the feature, divide the value range into multiple value range intervals, and set each value range interval A corresponding characteristic value, each characteristic value of the characteristic is obtained.
  • the present invention provides a computing device including at least one processor and at least one memory.
  • the memory stores a computer program, and when the computer program is executed by the processor, the processor can execute any data processing method of the first aspect described above.
  • the present invention provides a computer-readable storage medium that stores a computer program that can be executed by a computing device.
  • the computing device can execute any of the data processing methods of the first aspect described above. .
  • FIG. 1 is a schematic structural diagram of a collection system provided by an embodiment of the present invention
  • FIG. 2 is a schematic flowchart of a data processing method provided by an embodiment of the present invention.
  • FIG. 3 is a schematic structural diagram of a one-dimensional cell model provided by an embodiment of the present invention.
  • FIG. 4 is a schematic structural diagram of a data processing device provided by an embodiment of the present invention.
  • Fig. 5 is a schematic structural diagram of a computing device provided by an embodiment of the present invention.
  • the preset behavior may refer to any behavior, such as shopping behavior in the advertising promotion field, card issuance behavior in the credit card promotion field, or repayment behavior in the collection field.
  • the following embodiments of the present invention take the field of collection as an example to describe the data processing method in the embodiments of the present invention.
  • FIG. 1 is a schematic diagram of the architecture of a collection system provided by an embodiment of the present invention.
  • the collection system may be provided with a collection robot 110 and at least one client, such as client 121, client 122, and client 123 .
  • the client can be any online loan client that provides loans to users in the financial technology field, such as an online loan client installed in a commercial bank, an online loan client installed in a financial company, or an online loan client installed in a trust company End, etc., without limitation.
  • the collection system may also be provided with at least one client terminal, such as client terminal 131, client terminal 132, and client terminal 133.
  • client terminal 131 client terminal 131
  • client terminal 132 client terminal 132
  • client terminal 133 client terminal 133
  • the user terminal can be any terminal device with a call function, such as an elderly phone, a smart phone, a slide phone, etc., which is not limited.
  • the collection robot 110 may be connected to at least one client and at least one client respectively, for example, it may be connected in a wired manner, or may also be connected in a wireless manner, which is not specifically limited.
  • FIG. 2 is a schematic flowchart of a data processing method provided by an embodiment of the present invention.
  • the method is applied to a collection robot, such as the collection robot 110 shown in FIG. 1.
  • the method includes:
  • Step 201 Obtain a first list.
  • the first list includes a plurality of users who have not performed a predetermined behavior.
  • the first list may include the contact information of each user who has not performed the preset behavior.
  • each user who has not performed the preset behavior is each user who has overdue the loan after the loan is directed to the online lending institution.
  • the collection system may also be provided with a pre-processing device (not shown in FIG. 1), and the pre-processing device can be provided between at least one client and the collection robot 110, or can be provided in the collection robot. 110's interior.
  • the preprocessing device may receive the collection list sent by each client, and sort the users to be collected in each collection list according to the set dial strategy to obtain the first list.
  • the set dialing strategy can be a dialing strategy set according to business needs. For example, it can be to sort the users to be collected in each collection list according to the chronological order of receiving the collection list, or it can be the order of each client corresponding to each collection list.
  • the priority sorts the users to be collected in each collection list, and can also sort the users to be collected in each collection list according to the priority of the online loan product to which each collection list belongs, and can also be based on the corresponding collection list.
  • the priority of the city where each client is located sorts the users to be collected in each collection list, and can also be a combination of the above-mentioned multiple dialing strategies, etc., which are not specifically limited.
  • the preprocessing device may be a web server based on a worldwide web (web) technology
  • the client may be a client provided with a web browser.
  • the online lending institution can access the web service interface provided by the preprocessing device through the web browser of its client.
  • the online lending institution may have a collection demand for multiple online loan products
  • the online lending institution may have a collection demand for multiple online loan products.
  • the loan company can pack the user information (including the user's age, gender, education information, marriage information, occupation information, current loan information and historical loan information, etc.) corresponding to each online loan product to be collected into a collection list, and To upload.
  • the online loan structure can also select the termination time of the collection on the web service interface, so that the collection robot 110 feeds back the collection result before the termination time of the collection.
  • the preprocessing device can first sort the collection list of each client according to the priority of each online loan product, and then sort the collection lists of each client according to the customer's priority.
  • the priority of the terminal sorts the collection lists of each client after the initial sorting to obtain the first list.
  • the collection lists of each client can be first sorted according to the priority of each client, and then the collection lists of each client can be sorted according to the priority of each online loan product to obtain the first list, which is not limited.
  • the priority of the client 121>the priority of the client 123>the priority of the client 122, and the priority of the online loan product 2>the priority of the online loan product 1 if the collection list of the client 121 includes Online loan product 1 corresponds to the user to be collected 1 and user 2 to be collected, and the collection list of the client 122 includes the user to be collected 3 corresponding to the online loan product 1 and the user to be collected 4 and user 5 to be collected corresponding to the online loan product 2.
  • the collection list of the client 123 includes users 6 to be collected corresponding to the online loan product 2.
  • the first list can be: users to be collected 1, users to be collected 2, users to be collected 6, users to be collected 4, users to be collected 5, User to be collected 3.
  • the first list may also be: user to be collected 6, user to be collected 4, user to be collected 5, user to be collected 1, user to be collected 2, user to be collected 3.
  • the pre-processing device can send the first list to the collection robot 110, or the collection robot 110 can also use the file transfer protocol from the pre-processing device. Get the first list. If the pre-processing device is a device in the collection robot 110 (such as a pre-processing process), the pre-processing device can directly store the first list in the memory of the collection robot 110, so that the collection robot 110 calls the processing process to the first list. Each user of the company makes a collection call.
  • each client may send the collection list to the preprocessing device the day before the collection is executed, or send the collection list to the preprocessing device on the day when the collection is executed.
  • the embodiment of the present invention does not limit the device for the client to send the collection list.
  • the client may directly send the collection list to the preprocessing device, or send the collection list to the collection robot 110, and then the collection robot 110 forwards it to the preprocessing device.
  • Step 202 Use the first model to determine the user category of each user in the first list.
  • the user category of each user includes a first user category, and the first user category represents that the user will answer the call made by the collection system.
  • the collection robot 110 after the collection robot 110 obtains the first list, it can first determine the time difference between the current time and the time when the collection robot 110 starts the collection. If the time difference is greater than or equal to the first preset time difference (greater than or equal to the determined time difference) Time required for the collection strategy), the collection robot 110 can first analyze whether the collection task in the first list can be completed before the collection termination time point set by each client, and set the corresponding collection according to the analysis result of whether it can be completed Then, at the time when the collection robot 110 starts the collection, according to the corresponding collection strategy, the collection of each user in the first list is started.
  • the time difference is less than or equal to the second preset time difference (any value less than or equal to 0)
  • the parallel processing thread analyzes whether the collection tasks in the first list can be completed before the collection termination time point set by each client. After the corresponding collection strategy is set according to the result of whether it can be completed, the control dialing thread starts to check according to the corresponding collection strategy. Each user in the first list collects.
  • the collection robot 110 may first call the processing process to analyze whether the collection task in the first list can be completed before the collection termination time point set by each client , And set the corresponding collection strategy according to the analysis result that can be completed.
  • the parallel dialing process is called to call each user in the first list in the order of each user in the first list, and then After the corresponding collection policy is obtained, the parallel dialing process is controlled to call each user in the first list according to the corresponding collection policy.
  • the first preset time difference can be set by those skilled in the art based on experience, or can be determined according to the duration of the collection strategy corresponding to each collection task determined in the historical period, for example, to determine the collection strategy corresponding to each collection task Average duration, or to determine the median duration of the collection strategy corresponding to each collection task, or to determine the weighted average duration of the collection strategy corresponding to each collection task, the closer the collection task is to this collection task, the collection task The greater the weight, and so on.
  • the collection robot 110 can be equipped with two environments, an on-line production environment and a simulation environment.
  • the collection robot 110 can push the first list to the online production environment and simulation at the same time. surroundings.
  • the online production environment is used to perform the normal dialing process. For example, when the collection robot 110 is detected to start the collection time (such as 8:00), it will follow the order of each user in the first list (or collection strategy sent by the simulation environment) Call collection calls to each user in turn, record the phone information and the user's repayment willingness (such as the collection phase when the user ends the call), and send each user's call result to the corresponding client of the online lending institution to enable the online loan Institutions follow up the subsequent repayment of users.
  • the collection stage can include the five stages of asking if the other party is the person, explaining the overdue situation, asking when the payment can be repaid, confirming the repayment date, and ending.
  • the simulation environment is used to analyze the collection tasks corresponding to the first list, determine the corresponding collection strategy, and send the corresponding collection strategy to the online production environment, so that the online production environment executes the collection according to the corresponding collection strategy task.
  • the online production environment can also send the collection results of each user obtained by executing the collection task to the simulation environment, so that the simulation environment can update various internal parameters, such as the first call duration, the first model parameter, the second model parameter, The average number of offline phone numbers per hour in the historical period, etc.
  • the risk judgment can be used as a means to assist the execution of the normal collection task, avoiding the risk judgment taking up the time of the collection robot calling the collection call normally, thereby reducing The impact of risk judgment on normal collection tasks.
  • the collection robot 110 can use the first model to predict each user in the first list, thereby determining the user category of each user.
  • the user category of the user may include only the first user category, or may include both the first user category and the second user category. If the user category of a user is the first user category, it means that the user will answer the collection call made by the collection robot. If the user category of a user is the second user category, it means that the user will not answer the collection call made by the collection robot.
  • Step 203 Count the number of users belonging to each user category in the first list, and determine a first duration based on the number.
  • the first duration represents the amount of time needed to make calls to all users in the first list. duration.
  • the collection robot 110 can count the number of users belonging to the first user category and the second user category in the prediction result, and then According to the number of users belonging to the first user category in the first list and the first call duration corresponding to the first user category, the number of users belonging to the second user category in the first list and the second call duration corresponding to the second user category To determine the first time required to make calls to all users in the first list.
  • the first call duration is used to identify the call duration that may be consumed by each user who answers the call
  • the second call duration is used to identify the call duration that may be consumed by each user who does not answer the call
  • the first call duration and the second call duration are The call duration can be set by those skilled in the art based on experience, or can be set according to business needs, and is not specifically limited.
  • the first call duration may be determined according to the duration required to make a call to each user who answered the call in the historical period
  • the second call duration may be determined according to the duration of waiting to be answered after the call was made to the user.
  • the collection robot 110 may first obtain the record from the statistical database and the call duration of all users who have answered the collection call made by the collection robot 110 in the last 2 weeks (the call duration of each user) Duration refers to the total call duration from the start of the dialing to the end of the call), and then take the median of the call durations of these users as the first call duration, or take the average of the call durations of these users as the first call duration, etc. .
  • the second call duration refers to the waiting duration for the collection robot 110 to wait for the other party to answer, and the duration may be determined according to the set number of ringing times. For example, if it is set to hang up the call after the other party has not answered the call after waiting for 8 phone calls, the second call duration can be the total call duration of the 8 phone calls. Since the waiting time of each user who has not answered the collection call is the same, the collection robot 110 may set the second call duration to the waiting time of any user who has not answered the collection call in the historical period.
  • the first call duration of the user who answered the call is determined by using the call duration of the call to the user to collect calls within the historical period, so that the first call duration is combined with the characteristics of the historical dialing information, so as to accurately identify the call duration of each call received.
  • the call duration of the user, correspondingly, the second call duration is the call duration waiting to be answered, so that the call duration of each user who does not answer the call can be accurately identified.
  • the total call duration required to make a collection call to the users in the first list who answered the call can be determined, which is predicted by the second call and the first model
  • the number of users who do not answer the call can determine the total call time required for the users who do not answer the call in the first list to make a collection call, so as to predict the total call time to make calls to all users in the first list.
  • This method is based on The historical data is analyzed to better meet the actual business situation and make the predicted first time period more accurate.
  • the collection robot 110 may apply for multiple phone numbers in the operator in advance, and use the multiple phone numbers to jointly make a collection call to each user in the first list.
  • the collection robot 110 can first base on the number of users belonging to the first user category, the first call duration, and the number of users belonging to the second user category.
  • the second call duration determine the total call duration for making calls to all users in the first list, and then determine the first duration according to the multiple pre-applied phone numbers and the total call duration.
  • the collection robot 110 may directly use the ratio of the total call duration to the number of multiple phone numbers as the first duration.
  • the phone number may go offline as the dialing time increases. Therefore, if the ratio of the total call time to the number of multiple phone numbers is directly used as the first time length, it may be possible The first time length will be inaccurate due to the offline of some phone numbers.
  • the collection robot 110 may determine the first duration in the following manner:
  • the collection robot 110 may first determine the predicted duration required to make calls to all users in the first list based on the total call duration and the number of multiple phone numbers, and analyze the probability of each phone number being offline within the predicted duration. Among them, the probability of each phone number going offline can be determined based on the theory of probability. Since the time interval t from the start of calling the collection call to the offline of each phone number obeys the exponential distribution F(t) with the parameter ⁇ , the probability density function f(t) corresponding to the time interval t is:
  • the exponential distribution F(t) corresponding to the time interval t can be:
  • can be set as the average number of offline phone numbers per hour in the historical period.
  • the historical period can be set by those skilled in the art based on experience. For example, it can be the last 2 weeks. In this way, the value of ⁇ can be broken over time. Update.
  • the collection robot 110 determines the total call duration and the available phone number.
  • the number of phone numbers determines the first time required to make calls to all users in the first list. If the first duration is less than or equal to the set duration, it means that even if some phone numbers are offline during the dialing process, the collection robot 110 can complete the collection task corresponding to the first list, so that the collection robot 110 can follow the items in the first list. The user continues to dial the collection call in sequence.
  • the collection robot 110 can then determine whether the predicted duration is greater than the set duration. If the predicted duration is less than or equal to the set duration, it means that when there is no phone number offline during the dialing process, the collection robot 110 can complete the collection task corresponding to the first list. At this time, if the operator supports the collection robot 110 to apply for a backup phone number, the collection robot 110 can apply for a backup phone number from the operator, and the number of backup phone numbers can be greater than or equal to the phone number whose offline probability is greater than the first preset threshold. quantity.
  • the collection robot 110 may obtain a part of users with a higher collection success rate from the first list to form the second list.
  • the predicted duration is greater than the set duration, it means that even if there is no phone number offline during the dialing process, the collection robot 110 cannot complete the collection task corresponding to the first list.
  • the collection robot 110 may also determine to apply for a standby phone number or determine the second list according to the support of the operator.
  • the collection robot when the operator supports the collection robot to apply for a backup phone number, the collection robot can also apply for a backup phone number while obtaining some users with a higher collection success rate from the first list to form the second list. Or when the operator supports the collection robot to apply for a backup phone number, the collection robot may not apply for a backup phone number. Instead, it obtains some users with a higher success rate of collection from the first list to form the second list.
  • the collection robot may not apply for a backup phone number. Instead, it obtains some users with a higher success rate of collection from the first list to form the second list.
  • the number of phone numbers that may go offline during the execution period of the collection task can be pre-determined. In this way, by using phones that will not go offline The number of numbers is re-determined for the first time length, which can predict the risk of phone numbers going offline in advance to ensure the accuracy of the completion of the collection task.
  • the simulation environment can determine the first duration based on the cell model.
  • a one-dimensional cell model can be set in the simulation environment, and the one-dimensional cell model is used to store all users in the first list.
  • Figure 3 is a schematic structural diagram of a one-dimensional cell model provided by an embodiment of the present invention. As shown in Figure 3, each cell can be used to identify a user, and each cell has a left neighbor cell and/or a right cell. A neighboring cell, for example, cell A is the left neighbor of cell B, and cell C is the right neighbor of cell B.
  • each cell can have three different states, and the state can be identified by color, for example, white is used to identify the state of not dialed, gray is used to identify the state of dialed but not answered, and black is used to identify the dialed and answered state. status.
  • white is used to identify the state of not dialed
  • gray is used to identify the state of dialed but not answered
  • black is used to identify the dialed and answered state. status.
  • the color of the cell changes from white to gray or black, it means that the collection robot 110 has dialed and collected the user corresponding to the cell. Therefore, when the cell changes from the white state to the black state, it can stay first.
  • the duration of the call the duration of the second call when the status is changed from white to gray.
  • the first call duration in order to save time, it can also be set to be less than the first call in proportion (The value of the duration) and then update the color of the cell from white to black.
  • the second call duration in order to save time, it can also be set to a value less than the second call duration according to the ratio, and the ratio is the same as the ratio used for the first call duration).
  • the cell color is updated from white to gray. And this process can be executed in parallel according to the number of available phone numbers. When the color of each cell in the one-dimensional cell model changes, the execution time is counted, and the first time length is determined according to the ratio.
  • Step 204 If the first duration exceeds the set duration, use a second model to determine the probability of each user belonging to the first user category performing the preset behavior, and determine a second list according to the probability; The second list is used to indicate users who need to make a call after the current time.
  • the collection robot 110 can use the second model to determine the repayment probability of each user at least for each user who will answer the call in the prediction result, and sort the users according to the repayment probability of each user to obtain the second List, so that the collection robot 110 makes a collection call to each user according to the second list.
  • the collection robot 110 may only use the second model to determine the repayment probability of each user who will answer the phone in the prediction result, and then determine the repayment probability of each user who will answer the phone according to the repayment probability from large to small (or from small to small). To the largest) order to get the second list.
  • the collection robot 110 can only make collection calls to users who will answer the call and have a high probability of repayment, instead of calling for those who will not answer the call or who will answer the call but the probability of repayment is low.
  • the user makes a collection call, thereby improving the effect of the collection call, and can reduce the data processing volume of the collection robot 110 and improve the collection efficiency.
  • the collection robot 110 can use the second model to determine the repayment probability of each user in the first list, and according to the repayment probability of each user in the first list from large to small (or from small to large) ) Order to get the second list. In this way, when it is determined that all users cannot be called for collection, the collection robot 110 can collect calls for each user in the first list in the descending order of the repayment probability, so as to be able to call as many users as possible. And avoid missing users who predict that they will not answer the phone but will actually answer the phone, thereby improving the accuracy of the call collection.
  • the collection robot 110 can complete the collection task for the first list within the set time period. In this way, the collection robot 110 can continue to make a collection call to each user in the order of each user in the first list.
  • the collection robot 110 may receive the third collection call when the collection call is made within the first time period. List of requests for collection tasks. In this case, the collection robot 110 may further determine the second time length required to make calls to all users in the third list based on the first model. If the sum of the first time length and the second time length exceeds the set time period, it means that the collection robot 110 cannot complete the collection tasks for all users in the first list and the third list within the set time period. Therefore, the collection robot 110 may refuse to accept The third list.
  • the total call duration for calling collection calls to all users in the first list and the third list is determined in advance, and the total call duration exceeds the set duration If you refuse to accept the third list, you can avoid accepting collection tasks that cannot be completed and reduce customer losses.
  • the collection robot 110 may suddenly go offline during the first time period. In this case, the collection robot 110 may determine the new first duration based on the total call duration and the number of phone numbers that are not offline. Alternatively, if the collection robot 110 has made a collection call to some users in the first list, the collection robot 110 may determine the total call time for making a collection call to the remaining users in the first list who have not made a collection call based on the first model. Then, a new first duration is determined based on the total call duration and the number of phone numbers that are not offline.
  • the collection robot 110 may be sent to the operation and maintenance personnel, so that the operation and maintenance personnel can determine whether to apply for a backup phone number from the operator.
  • the first model is used to predict the users who will answer the phone and the users who will not answer the phone in the first list, and the time to complete the collection task is determined, and then the time for completing the collection task is determined.
  • the second model is used to determine the user with a higher probability of successful collection, so that when it is determined that the collection task cannot be completed, the user with a higher success rate of collection can be called first to improve the collection effect.
  • the above process describes the process of using the first model and the second model to determine the collection strategy.
  • the following describes the process of training to obtain the first model and the second model, respectively.
  • the first model Since the first model is used to predict whether each user will answer the phone to determine the user category of each user, the first model can be set as a classification model.
  • the collection robot 110 can first obtain the feature values of multiple users under each feature, and then for any feature, according to the number of users who answered the call, the number of users who did not answer the call, and the characteristic value of the multiple users.
  • the collection robot 110 may regard the feature that has a degree of association with the user's behavior of answering the call greater than or equal to the second preset threshold as a strong correlation feature, and then according to the number of users who answered the call among the multiple users and the number of users who did not answer the call.
  • the number of users, the number of users corresponding to each feature value of the strong correlation feature, the number of users who answer the phone among the users corresponding to each feature value of the strong correlation feature, and the number of users who have not answered the phone corresponding to each feature value of the strong correlation feature The number of users, and thus the first model is trained.
  • the first model is trained based on the Naive Bayes algorithm. Since the naive Bayes algorithm can update the model parameters based on incremental data in real time, training the first model based on the naive Bayes algorithm can improve the efficiency of training and update.
  • the data of multiple (for example, 20000) users who have made a collection call by the collection robot 110 in the historical period can be obtained first.
  • the data of each user can include the value of the user under various characteristics, such as the user's gender, age, education, occupation, marital status, resident city, the amount of this loan, the amount of arrears, and the number of days overdue for this loan. , The number of historical loans and the number of historical loan overdues, etc., and also include the category characteristic value of whether the user has answered the collection call when the user makes a collection call.
  • the above-mentioned various features include continuous features and discrete features, the above-mentioned various features cannot use a unified evaluation standard to unify data. Therefore, in an example, for any one of the features, if the feature is a discrete feature, the collection robot 110 can count the values of multiple users under the feature, and use each value as each of the feature. Eigenvalues. If the feature is a continuous feature, the collection robot 110 can count the value ranges of multiple users under the feature, divide the value range into multiple value range intervals, and set a corresponding value range for each value range interval. Feature value, and get each feature value of the feature.
  • each feature (including continuous features and discrete features) can have the same discrete manifestation, so that each discrete feature value can be used as training data when training the model without Fitting the probability distribution function to continuous features can improve the efficiency of data processing.
  • these features are discrete features, and the values of users under these discrete features are these discrete features.
  • the values of age, the amount of this loan, the amount of arrears, the number of days that this loan is overdue, the number of historical loans, and the number of historical loan overdues are all infinitely many, so these characteristics are continuous characteristics.
  • the continuous values of these continuous features are adjusted to discrete values.
  • the age feature is discretized into feature value 1, feature value 2,..., feature value 7, and feature value 1 to feature value 7 in turn represent age (in years) in the following 7 age ranges: [0, 15) , [15, 25), [25, 35), [35, 45), [45, 55), [55, 65), [65, ⁇ ).
  • Characteristic value 1 to characteristic value 5 represent the loan amount (unit: 10,000 yuan) in the following 5 loan amount ranges: [ 0, 0.5), [0.5, 1.5), [1.5, 3.5), [3.5, 5), [5, ⁇ ).
  • Characteristic value 1 to characteristic value 5 represent the amount of arrears (unit: 10,000 yuan) in the following 5 amounts of arrears Interval: [0, 0.5), [0.5, 1.5), [1.5, 3.5), [3.5, 5), [5, ⁇ ). Discretize the characteristics of the overdue days of this loan into feature value 1, feature value 2, ..., feature value 5.
  • Feature value 1 to feature value 5 represent the number of overdue days (in days) in the following 5 overdue days interval: [ 0, 1), [1, 3), [3, 5), [5, 7), [7, ⁇ ). Discretize the characteristics of historical loan times into characteristic value 1, characteristic value 2, ..., characteristic value 5.
  • Characteristic value 1 to characteristic value 5 represent the historical loan times (unit: times) in the following 5 historical loan times ranges: [ 0, 1), [1, 2), [2, 3), [3, 5), [5, ⁇ ).
  • the degree of association between the feature and the category feature can be calculated.
  • the degree of association can be represented by mutual information.
  • Mutual information refers to the measurement of information that a random variable contains another random variable. The greater the value of mutual information, the stronger the coupling between the two random variables and the greater the degree of association.
  • the mutual information of each feature and the category feature of whether the user answers the phone or not can meet the following conditions:
  • X is any feature
  • R(X) is the feature value set of X feature, including each feature value of X feature
  • x is any feature value of feature X
  • Y is the behavior of whether the user answers the phone
  • R(Y ) Is the behavior set of whether the user answers the phone, including the behavior of the user answering the phone and the behavior of the user not answering the phone
  • y is the behavior of the user answering the phone or the behavior of the user not answering the phone
  • I(X, Y) is the feature X and the user
  • the degree of association of the behavior of answering the phone, P(x,y) is the ratio of the number of users who have performed the behavior y among the users corresponding to the characteristic value x to the total number of users
  • P(x) is the proportion of the users corresponding to the characteristic value x
  • the ratio of the total number of users, P(y) is the ratio of the number of users who have performed behavior y to the total number of users.
  • the random variable X represents the age feature
  • the random variable Y represents the categorical feature of whether the phone is answered
  • P(x) represents the proportion of the number of users whose age feature is x to the number of 20,000 users.
  • the degree of association between each feature and the behavior of answering the phone is obtained, so that the degree of association integrates the relevant information of each feature value. , As the information used is richer, the degree of association can be made more accurate.
  • the feature whose mutual information is greater than the third preset threshold may be used as a strong correlation feature.
  • the third preset threshold can be set by those skilled in the art based on experience, for example, it can be 0.5 or 0.8, which is not specifically limited.
  • the embodiment of the present invention may be based on Naive Bayes using the feature value training of 20,000 users under strong correlation features to obtain the first model, specifically, for each feature value of each strong correlation feature (such as feature value
  • the combination is x 1 , x 2 , x 3 ,..., x n , which are respectively a certain feature value of strong correlation feature X 1 , strong correlation feature X 2 , strong correlation feature X 3 ,..., strong correlation feature X n )
  • the sample data obtained by combining, the type of whether the sample data will answer the call The value of can be:
  • y) is the posterior probability
  • x i is the sample data obtained by the combination of eigenvalues as x 1 , x 2 , x 3 ,..., x n.
  • N is the number of 20,000 users, the number N y wherein y class feature value of a user, N y, wherein xi is the value of y class feature and characteristic features of the X i is the number of users xi , L xi is the size range characteristic of X i, i.e. the number of feature values possible values x i.
  • the first model can be identified by the above formulas.
  • the following formula can be used to determine the probability that the user’s characteristic value under the behavioral characteristics is yes and the user’s behavioral characteristics under the The probability that the characteristic value of is No:
  • the probability that the characteristic value of the user under the behavior characteristics is yes is greater than the probability that the characteristic value of the user under the behavior characteristics is no, then it is determined that the user is a user who can answer the phone, and the user category of the user is the first user category . If the probability that the characteristic value of the user under the behavior characteristics is No is greater than the probability that the characteristic value of the user is yes under the behavior characteristics, the user is determined to be a user who will not answer the phone, and the user category of the user is the second user category.
  • the first model can be updated quickly and in real time, so that the update efficiency of the first model is better.
  • the first model can be trained based on only the features with a higher degree of association. In this way, the amount of data involved in training is less, and the efficiency of training the model is higher.
  • the training data used is more concentrated on the feature data that is strongly related to the behavior of answering the phone, the training process of the first model is more aggregated, and the model effect is better.
  • the second model may be a neural network model.
  • the feature values of multiple users under each feature are acquired, and each feature value of each feature is identified by each numerical value.
  • the second feature vector corresponding to the user is obtained according to the feature value value of the user under the category feature of whether to repay.
  • the first feature vector corresponding to multiple users can be used as the model input to obtain the prediction vector results of multiple user repayments, and adjust based on the second feature vectors of multiple users and the prediction vector results of multiple user repayments
  • the model parameters of the second model are used to obtain the optimized second model.
  • the second model can include an input layer, a hidden layer, and an output layer.
  • the input layer, hidden layer, and output layer adopt a fully connected structure.
  • the hidden layer can be set with 10 neuron nodes, and the output layer can be set with 2.
  • a neuron node, the activation function of the hidden layer uses the ReLU function, and the activation function of the output layer uses the Softmax function, which represents the probability value of the user's repayment.
  • the data of multiple (for example, 50,000) users who have made a collection call by the collection robot 110 in the historical period can be obtained first.
  • the data of each user can include the value of the user under various characteristics, such as the user's data. Gender, age, education, occupation, marital status, city of residence, the amount of this loan, the amount of this loan, the number of days that this loan is overdue, the number of historical loans and the number of historical loan overdue, etc., and can also include a collection call to the user The value under the category feature of whether the user repays at the time.
  • the user used to train the second model and the user used to train the first model may be partially the same or completely different, which is not specifically limited.
  • each continuous feature can be discretized according to the discrete method used when training the first model, and then one-hot encoding is used to convert each feature value of each feature into a numerical form. For example, since there are two feature values (male and female) for gender features, one-hot encoding can convert the two feature values of gender features into a vector with 1 row and 2 columns. If the gender of a user is male , Then the feature vector of the user under the gender feature is (1, 0). Since the marital status feature has 4 feature values (unmarried, married, widowed, divorced), one-hot encoding can convert the 4 feature values of the marital status feature into a vector with 1 row and 4 columns.
  • one-hot encoding can transform the 11 feature values of academic features (primary school, junior high school, high school, technical secondary school, vocational school, technical school, junior college, undergraduate, master graduate, doctoral student, postdoctoral) into 1 row and 11 columns
  • the vector of occupational characteristics (agriculture, forestry, animal husbandry, fishery, water conservancy, industry, geological survey and exploration, construction, transportation, post and telecommunications, commerce, public catering, material supply and storage, real estate management , Public utilities, resident services and consulting services, health, sports and social welfare, education, culture and art, radio and television, scientific research and comprehensive technical services, finance, insurance, state agencies, party and government agencies, and social organizations , Other industries) is transformed into a 1-row and 13-column vector, the 338 feature values of resident city features (337 major cities, other cities) are transformed into a 1-row and 338-column vector
  • the feature vector of the user under each feature can be determined according to the feature value of the user under each feature, and then the feature vector of the user under each feature can be spliced head to tail to obtain the The first feature vector corresponding to the user.
  • the first feature vector corresponding to the user can be a one-dimensional vector with 1 row and 400 columns.
  • the second feature vector corresponding to the user is determined according to the feature value of the user under the category feature of whether to repay, and the second feature vector corresponding to the user may be a one-dimensional vector with 1 row and 2 columns.
  • the second feature vector corresponding to the user may be [1, 0]
  • the second feature vector corresponding to the user may be [0, 1].
  • the 50,000 feature vectors can be divided into training feature vectors, test feature vectors, and verification feature directions. Among them, the division can be divided according to a random ratio, or can also be divided according to a preset ratio, without limitation.
  • the first feature vector of the 35,000 training feature vectors can be input to the neural network model to make the neural network
  • the model outputs 35,000 second prediction feature vectors, and then adjusts the parameters of the neural network model based on the 35,000 second prediction feature vectors and the second feature vector of the 35,000 training feature vectors to obtain the second model.
  • 10,000 test feature vectors can be used to test the model effect of the second model
  • 5000 verification feature vectors can be used to verify whether the test effect of the second model reaches the preset effect
  • 10,000 test feature vectors and 5000 verification features The vector can also be used to optimize the model parameters of the second model.
  • the user's feature vector is obtained by determining the user's feature vector under each feature, and joining the user's feature vector value under each feature to obtain the user's feature vector, so that the user's feature vector can integrate the characteristics of each feature value.
  • Feature information the information is more comprehensive, and the form of expression is more concise. In this way, the model obtained based on the model input training with rich information and concise form has better effect and higher training efficiency.
  • the first list is first obtained, the user category of each user in the first list is determined using the first model, and then the number of users belonging to each user category in the first list is counted, and the first list is determined based on the number. Duration, if the first duration exceeds the set duration, the second model is used to determine the probability of each user belonging to the first user category performing the preset behavior, and the second list is determined according to the probability.
  • the first list includes multiple users who have not performed the preset behavior, and the user category of each user includes the first user category.
  • the first user category indicates that the user will answer the call made by the collection system, and the first duration indicates that the user will receive calls from the collection system.
  • the time required for all users in the list to make calls, and the second list is used to indicate the users who need to make calls after the current time.
  • the first model is used to predict whether each user in the first list will answer the call, and the time to complete the collection task is determined, and then when it is determined that the collection task cannot be completed
  • the second model is used to identify users with a higher probability of successful collection, so that when it is determined that the collection task cannot be completed, users with a higher success rate of collection can be given priority to call collection calls to improve the collection effect.
  • an embodiment of the present invention also provides a data processing device, and the specific content of the device can be implemented with reference to the foregoing method.
  • Fig. 4 is a schematic structural diagram of a data processing device provided by an embodiment of the present invention, including:
  • the obtaining module 401 is configured to obtain a first list; the first list includes multiple users who have not performed a preset behavior;
  • the determining module 402 is configured to determine the user category of each user in the first list using a first model, wherein the user category of each user includes a first user category, and the first user category indicates that the user will answer the collection Phone calls made by the system;
  • the processing module 403 is configured to count the number of users belonging to each user category in the first list, and determine a first duration based on the number, and the first duration represents making calls to all users in the first list The required duration; if the first duration exceeds the set duration, the second model is used to determine the probability of each user belonging to the first user category performing the preset behavior, and the second is determined according to the probability List; the second list is used to indicate users who need to make a call after the current moment.
  • the user category of each user further includes a second user category
  • the second user category represents that the user will not answer calls made by the collection system.
  • the acquiring module 401 may also acquire the first call duration corresponding to the first user category and the second call duration corresponding to the second user category.
  • the first call duration is determined according to the call duration of each user who answered the call in the historical time period
  • the second call duration is determined according to the call duration waiting to be answered after the call is made to the user.
  • the determining module 402 may be based on the number of users belonging to the first user category in the first list and the first call duration, the number of users belonging to the second user category in the first list, and the second user category.
  • the call duration determines the total call duration for making calls to all users in the first list; the first duration is determined based on the total call duration and the number of available phone numbers.
  • the determining module 402 determines the available phone number in the following manner: for a plurality of phone numbers previously applied for by an operator, a prediction is obtained based on the total call duration and the number of the plurality of phone numbers Time length, determining the probability of the multiple phone numbers going offline within the predicted time length, and using a phone number with a probability not greater than a first preset threshold as the available phone number.
  • the device further includes a dialing module 404. While the determining module 402 uses a first model to determine the user category of each user in the first list, the dialing module 404 can use the first model to determine the user category of each user in the first list. The contact information of each user in the list, using the available phone number to make a call to each user.
  • the processing module 403 may also be based on the first duration.
  • the model determines the second length of time required to make calls to all users in the third list. If the sum of the first duration and the second duration exceeds the set duration, the processing module 403 may also refuse to receive the third list.
  • the first model is a classification model.
  • the processing module 403 can also obtain the feature values of multiple users under each feature, and for any feature, according to the number of users who answered the call among the multiple users, and the number of users who did not answer the call.
  • the number, the number of users corresponding to each characteristic value of the characteristic, the number of users answering the phone among the users corresponding to each characteristic value, and the number of users who have not answered the phone among the users corresponding to each characteristic value determine the The degree of association between the feature and the behavior of whether the user answers the phone, and the feature with the degree of association with the behavior of whether the user answers the phone is greater than or equal to the second preset threshold as a strong correlation feature, based on the characteristics of the user who answers the phone among the multiple users
  • the number, the number of users who have not answered the call, the number of users corresponding to each feature value of the strong correlation feature, the number of users who answer the phone among the users corresponding to each feature value of the strong correlation feature, and the strong correlation feature The first model is obtained by training the number of users who have not answered the phone corresponding to each feature value of.
  • the degree of association between each feature and whether the user answers the call satisfies the following conditions:
  • X is any feature
  • R(X) is the feature value set of X feature, including each feature value of X feature
  • x is any feature value of feature X
  • Y is the behavior of whether the user answers the phone
  • R(Y ) Is the behavior set of whether the user answers the phone, including the behavior of the user answering the phone and the behavior of the user not answering the phone
  • y is the behavior of the user answering the phone or the behavior of the user not answering the phone
  • I(X, Y) is the feature X and the user
  • the degree of association of the behavior of answering the phone, P(x,y) is the ratio of the number of users who have performed the behavior y among the users corresponding to the characteristic value x to the total number of users
  • P(x) is the proportion of the users corresponding to the characteristic value x
  • the ratio of the total number of users, P(y) is the ratio of the number of users who have performed behavior y to the total number of users.
  • the second model is a neural network model.
  • the processing module 403 can also obtain the feature values of multiple users under each feature, and for any user, according to the feature value of the user under each feature and each feature of each feature.
  • the feature value constructs the feature vector of the user under each feature, and stitches the feature vector of the user under each feature to obtain the first feature vector corresponding to the user; according to whether the user performs the preset
  • the behavior obtains the second feature vector corresponding to the user, and the first feature vector corresponding to the multiple users is used as a model input to obtain the prediction result of the multiple users performing the preset behavior, based on the multiple users
  • the second feature vector of and the prediction results of the multiple users performing the preset behavior adjust model parameters to obtain the second model.
  • the processing module 403 is further configured to obtain each feature value of each feature in the following manner: if the feature is a discrete feature, then count the various values of the multiple users under the feature, and calculate the The respective values are used as the respective feature values of the feature; if the feature is a continuous feature, the value ranges of the multiple users under the feature are counted, and the value range is divided into multiple value ranges Interval, a corresponding feature value is set for each value range interval, and each feature value of the feature is obtained.
  • the first list is first obtained, the first model is used to determine the user category of each user in the first list, and then the number of users belonging to each user category in the first list is counted , And determine the first duration based on the number. If the first duration exceeds the set duration, the second model is used to determine the probability of each user belonging to the first user category performing the preset behavior, and the second list is determined according to the probability.
  • the first list includes multiple users who have not performed the preset behavior, and the user category of each user includes the first user category. The first user category indicates that the user will answer the call made by the collection system.
  • the time required for all users in the list to make calls, and the second list is used to indicate the users who need to make calls after the current time.
  • the first model is used to predict whether each user in the first list will answer the call, and the time to complete the collection task is determined, and then when it is determined that the collection task cannot be completed
  • the second model is used to identify users with a higher probability of successful collection, so that when it is determined that the collection task cannot be completed, users with a higher success rate of collection can be given priority to call collection calls to improve the collection effect.
  • an embodiment of the present invention also provides a computing device. As shown in FIG. 5, it includes at least one processor 501 and a memory 502 connected to the at least one processor.
  • the embodiment of the present invention does not limit the processor.
  • the connection between the processor 501 and the memory 502 in FIG. 5 is taken as an example.
  • the bus can be divided into address bus, data bus, control bus and so on.
  • the memory 502 stores instructions that can be executed by at least one processor 501, and the at least one processor 501 can execute the steps included in the aforementioned data processing method by executing the instructions stored in the memory 502.
  • the processor 501 is the control center of the computing device, which can use various interfaces and lines to connect various parts of the computing device, and realize data by running or executing instructions stored in the memory 502 and calling data stored in the memory 502. deal with.
  • the processor 501 may include one or more processing units, and the processor 501 may integrate an application processor and a modem processor.
  • the application processor mainly processes the operating system, user interface, and application programs.
  • the adjustment processor mainly handles issuing instructions. It can be understood that the foregoing modem processor may not be integrated into the processor 501.
  • the processor 501 and the memory 502 may be implemented on the same chip, and in some embodiments, they may also be implemented on separate chips.
  • the processor 501 may be a general-purpose processor, such as a central processing unit (CPU), a digital signal processor, an application specific integrated circuit (ASIC), a field programmable gate array or other programmable logic devices, discrete gates or transistors Logic devices and discrete hardware components can implement or execute the methods, steps, and logic block diagrams disclosed in the embodiments of the present invention.
  • the general-purpose processor may be a microprocessor or any conventional processor or the like. The steps of the method disclosed in combination with the data processing embodiment may be directly embodied as being executed and completed by a hardware processor, or executed and completed by a combination of hardware and software modules in the processor.
  • the memory 502 can be used to store non-volatile software programs, non-volatile computer-executable programs, and modules.
  • the memory 502 may include at least one type of storage medium, such as flash memory, hard disk, multimedia card, card-type memory, random access memory (Random Access Memory, RAM), static random access memory (Static Random Access Memory, SRAM), Programmable Read Only Memory (PROM), Read Only Memory (ROM), Electrically Erasable Programmable Read-Only Memory (EEPROM), magnetic memory, disk , CD, etc.
  • the memory 502 is any other medium that can be used to carry or store desired program codes in the form of instructions or data structures and that can be accessed by a computer, but is not limited thereto.
  • the memory 502 in the embodiment of the present invention may also be a circuit or any other device capable of realizing a storage function for storing program instructions and/or data.
  • embodiments of the present invention also provide a computer-readable storage medium that stores a computer program executable by a computing device, and when the program runs on the computing device, the computing device executes Figure 2 arbitrarily described data processing method.
  • the embodiments of the present invention can be provided as methods or computer program products. Therefore, the present invention may adopt the form of a complete hardware embodiment, a complete software embodiment, or an embodiment combining software and hardware. Moreover, the present invention may adopt the form of a computer program product implemented on one or more computer-usable storage media (including but not limited to disk storage, CD-ROM, optical storage, etc.) containing computer-usable program codes.
  • a computer-usable storage media including but not limited to disk storage, CD-ROM, optical storage, etc.
  • These computer program instructions can also be stored in a computer-readable memory that can guide a computer or other programmable data processing equipment to work in a specific manner, so that the instructions stored in the computer-readable memory produce an article of manufacture including the instruction device.
  • the device implements the functions specified in one process or multiple processes in the flowchart and/or one block or multiple blocks in the block diagram.
  • These computer program instructions can also be loaded on a computer or other programmable data processing equipment, so that a series of operation steps are executed on the computer or other programmable equipment to produce computer-implemented processing, so as to execute on the computer or other programmable equipment.
  • the instructions provide steps for implementing the functions specified in one process or multiple processes in the flowchart and/or one block or multiple blocks in the block diagram.

Landscapes

  • Engineering & Computer Science (AREA)
  • Business, Economics & Management (AREA)
  • Theoretical Computer Science (AREA)
  • Physics & Mathematics (AREA)
  • Data Mining & Analysis (AREA)
  • General Physics & Mathematics (AREA)
  • Economics (AREA)
  • Strategic Management (AREA)
  • Life Sciences & Earth Sciences (AREA)
  • Evolutionary Biology (AREA)
  • General Business, Economics & Management (AREA)
  • Marketing (AREA)
  • Development Economics (AREA)
  • Finance (AREA)
  • Accounting & Taxation (AREA)
  • Artificial Intelligence (AREA)
  • Bioinformatics & Cheminformatics (AREA)
  • Bioinformatics & Computational Biology (AREA)
  • Computer Vision & Pattern Recognition (AREA)
  • Human Resources & Organizations (AREA)
  • Evolutionary Computation (AREA)
  • General Engineering & Computer Science (AREA)
  • Game Theory and Decision Science (AREA)
  • Technology Law (AREA)
  • Entrepreneurship & Innovation (AREA)
  • Operations Research (AREA)
  • Quality & Reliability (AREA)
  • Tourism & Hospitality (AREA)
  • Telephonic Communication Services (AREA)

Abstract

A data processing method and device, relating to the technical field of financial technology (Fintech). The method comprises: upon receiving a first list, using a first model to determine user categories for each user in the first list; determining, according to the user categories of the users, a period of time for completing debt collection tasks in the first list; and if it is determined that the debt collection tasks in the first list cannot be completed within a preset period of time, using a second model to determine the debt collection success rate for each user. In this way, upon determining that debt collection tasks cannot be completed, collection calls can be performed with priority given to users having higher debt collection success rates, thus facilitating debt collection.

Description

一种数据处理方法及装置Data processing method and device
相关申请的交叉引用Cross-references to related applications
本申请要求在2019年11月22日提交中国专利局、申请号为201911155084.5、申请名称为“一种数据处理方法及装置”的中国专利申请的优先权,其全部内容通过引用结合在本申请中。This application claims the priority of a Chinese patent application filed with the Chinese Patent Office, the application number is 201911155084.5, and the application name is "a data processing method and device" on November 22, 2019, the entire content of which is incorporated into this application by reference .
技术领域Technical field
本发明涉及金融科技(Fintech)技术领域,尤其涉及一种数据处理方法及装置。The present invention relates to the technical field of financial technology (Fintech), in particular to a data processing method and device.
背景技术Background technique
随着计算机技术的发展,越来越多的技术应用在金融领域,传统金融业正在逐步向金融科技(Fintech)转变。然而,由于金融行业的安全性、实时性要求,金融行业也对技术提出了更高的要求。随着语音对话技术的不断成熟,金融科技领域也开始将智能机器人应用于催收场景中,催收场景中的这类机器人称为催收机器人。催收机器人可以自动拨打催收电话来提醒客户还款,并可以记录客户的还款意愿,以便于后续跟进客户的还款进度。相比于人工催收的方式来说,采用催收机器人来催收不仅能大幅降低催收成本,还可以高效地完成催收任务。且,催收机器人在与客户的对话过程中也不会出现情绪波动,从而还能够提升客户的体验。With the development of computer technology, more and more technologies are applied in the financial field, and the traditional financial industry is gradually transforming to Fintech. However, due to the security and real-time requirements of the financial industry, the financial industry also puts forward higher requirements on technology. With the continuous maturity of voice dialogue technology, the field of financial technology has also begun to apply intelligent robots in collection scenarios. Such robots in collection scenarios are called collection robots. The collection robot can automatically call the collection phone to remind the customer to repay, and can record the customer's repayment willingness to facilitate follow-up follow-up of the customer's repayment progress. Compared with the manual collection method, the collection robot can not only greatly reduce the collection cost, but also can efficiently complete the collection task. Moreover, the collection robot will not have emotional fluctuations during the dialogue with the customer, which can also improve the customer's experience.
现阶段,催收机器人接收到各个网贷公司发送的催收名单后,一般都是直接按照接收各个催收名单的时间顺序对催收名单中的各个用户进行催收。然而,在实际的业务场景中,催收机器人每天接待网贷公司的数量是不固定的,且每个网贷公司给出的待催收的用户的数量也是不固定的。这种情况下,催收机器人每天待催收的用户总数量也就无法确定。在某天待催收的用户总数量较多的情况下,由于该种方式采用先到先服务的方式依次对各个用户拨打催收电话,因此可能会导致当天的催收任务无法完成,降低催收效果。At this stage, after the collection robot receives the collection list sent by various online lending companies, it generally collects the users in the collection list directly in accordance with the time sequence of receiving each collection list. However, in actual business scenarios, the number of collection robots receiving online loan companies every day is not fixed, and the number of users to be collected given by each online loan company is also not fixed. In this case, the total number of users to be collected by the collection robot each day cannot be determined. When the total number of users to be collected on a certain day is large, because this method uses a first-come, first-served manner to call collection calls to each user in turn, it may cause the collection task of the day to be unable to complete and reduce the collection effect.
综上,目前亟需一种数据处理方法,用以解决现有技术采用先到先服务的方式依次拨打催收电话所导致的催收效果不好的技术问题。To sum up, there is an urgent need for a data processing method to solve the technical problem of poor collection effect caused by the prior art using a first-come, first-served manner to sequentially dial collection calls.
发明内容Summary of the invention
本发明提供一种数据处理方法及装置,用以解决现有技术采用先到先服务的方式依次拨打催收电话所导致的催收效果不好的技术问题。The present invention provides a data processing method and device, which are used to solve the technical problem of poor collection effect caused by sequentially dialing collection calls in a first-come, first-served manner in the prior art.
第一方面,本发明提供一种数据处理方法,该数据处理方法应用于催收系统,该方法包括:获取第一名单,使用第一模型确定第一名单中各用户的用户类别,统计第一名单中属于各个用户类别的用户的数量,并基于数量确定第一时长,若第一时长超过设定时长,则使用第二模型确定属于第一用户类别的每个用户执行预设行为的概率,并根据概率确定第二名单。其中,第一名单中包括多个未执行预设行为的用户,各用户的用户类别中包括第一用户类别,第一用户类别表征用户会接听催收系统拨打的电话。其中,第一时长表征 向第一名单中的全部用户拨打电话所需的时长,第二名单用于指示在当前时刻之后需拨打电话的用户。在上述实现方式中,在接收到第一名单后,通过先使用第一模型预测出第一名单中的各个用户是否会接听电话(即用户类别),并确定完成催收任务的时间,再在确定无法完成催收任务时使用第二模型确定出催收成功概率较高的用户,可以在确定催收任务无法完成时优先向催收成功率较高的用户拨打催收电话,有助于提高催收效果。In a first aspect, the present invention provides a data processing method applied to a collection system. The method includes: obtaining a first list, using the first model to determine the user category of each user in the first list, and making statistics on the first list The number of users belonging to each user category in the, and the first duration is determined based on the number. If the first duration exceeds the set duration, the second model is used to determine the probability of each user belonging to the first user category performing the preset behavior, and Determine the second list based on probability. Among them, the first list includes multiple users who have not performed the preset behavior, and the user category of each user includes the first user category, and the first user category indicates that the user will answer the call made by the collection system. Among them, the first duration represents the duration required to make calls to all users in the first list, and the second list is used to indicate users who need to make calls after the current time. In the above implementation, after the first list is received, the first model is used to predict whether each user in the first list will answer the call (that is, the user category), and the time to complete the collection task is determined, and then the time to complete the collection task is determined. When the collection task cannot be completed, the second model is used to determine the user with a higher probability of successful collection. When it is determined that the collection task cannot be completed, the user with a higher success rate can be given priority to call the collection phone, which helps to improve the collection effect.
在一种可能的实现方式中,各用户的用户类别中还包括第二用户类别,第二用户类别表征用户不会接听催收系统拨打的电话。在这种情况下,基于数量确定第一时长,包括:先获取第一用户类别对应的第一通话时长和第二用户类别对应的第二通话时长,再根据第一名单中属于第一用户类别的用户的数量和第一通话时长、第一名单中属于第二用户类别的用户的数量和第二通话时长,确定向第一名单中的全部用户拨打电话的总通话时长,最后基于总通话时长和可用的电话号码的数量,确定第一时长。其中,第一通话时长是根据历史时段内向接听电话的各个用户拨打电话的通话时长确定的;第二通话时长是根据向用户拨打电话后等待接听的通话时长确定的。在上述实现方式中,通过使用历史时段内向用户拨打催收电话的通话时长确定接听电话的用户的第一通话时长,使得第一通话时长结合了历史拨打信息的特征,从而能够准确标识每个接听电话的用户的通话时长,相应地,第二通话时长为等待接听的通话时长,从而能够准确标识每个不接听电话的用户的通话时长。如此,基于第一通话时长和第一模型预测出的接听电话的用户数量可以确定向第一名单中接听电话的用户拨打催收电话所需的总通话时长,通过第二通话和第一模型预测出的不接听电话的用户数量可以确定第一名单中不接听电话的用户拨打催收电话所需的总通话时长,从而预判出向第一名单中的全部用户拨打电话的总通话时长,该种方式基于历史数据进行分析,从而更加满足实际的业务情况,使得预判出的第一时长更为准确。In a possible implementation manner, the user category of each user further includes a second user category, and the second user category indicates that the user will not answer the call made by the collection system. In this case, determining the first duration based on the number includes: first obtaining the first call duration corresponding to the first user category and the second call duration corresponding to the second user category, and then according to the first user category in the first list The number of users and the first call duration, the number of users belonging to the second user category in the first list and the second call duration, determine the total call duration to make calls to all users in the first list, and finally based on the total call duration And the number of available phone numbers to determine the first duration. Wherein, the first call duration is determined according to the call duration of each user who answered the call in the historical time period; the second call duration is determined according to the call duration waiting to be answered after the call is made to the user. In the foregoing implementation manner, the first call duration of the user who answered the call is determined by using the call duration of the call to the user to call the collection call within the historical time period, so that the first call duration combines the characteristics of the historical dialing information, so that each received call can be accurately identified Correspondingly, the second call duration is the call duration waiting to be answered, so that the call duration of each user who does not answer the call can be accurately identified. In this way, based on the first call duration and the number of users who answered the call predicted by the first model, the total call duration required to make a collection call to the users in the first list who answered the call can be determined, which is predicted by the second call and the first model The number of users who do not answer the call can determine the total call time required for the users who do not answer the call in the first list to make a collection call, so as to predict the total call time to make calls to all users in the first list. This method is based on The historical data is analyzed to better meet the actual business situation and make the predicted first time period more accurate.
在一种可能的实现方式中,可用的电话号码可以通过如下方式确定:针对预先在运营商申请的多个电话号码,先基于总通话时长和多个电话号码的数量得到预测时长,再确定多个电话号码在预测时长内下线的概率,进而将概率不大于第一预设阈值的电话号码作为可用的电话号码。在上述实现方式中,在确定完成催收任务所需的预测时长后,通过判断预测时长内各个电话号码下线的概率,可以预先判断出在催收任务执行时段内可能会下线的电话号码的数量。如此,通过使用不会下线的电话号码的数量确定第一时长,可以提前预判到电话号码下线的风险,保证催收任务完成的准确性。In a possible implementation, the available phone numbers can be determined in the following way: For multiple phone numbers previously applied for by the operator, first obtain the predicted duration based on the total call duration and the number of multiple phone numbers, and then determine the multiple phone numbers. The probability of a phone number going offline within the predicted time period, and then a phone number whose probability is not greater than the first preset threshold is used as an available phone number. In the above implementation, after determining the predicted time required to complete the collection task, by judging the probability of each phone number going offline within the predicted time period, the number of phone numbers that may go offline during the execution period of the collection task can be prejudged . In this way, by using the number of phone numbers that will not go offline to determine the first time period, the risk of phone numbers going offline can be predicted in advance, and the accuracy of the completion of the collection task can be guaranteed.
在一种可能的实现方式中,在使用第一模型确定第一名单中各用户的用户类别的同时,还可以根据第一名单中各用户的联系方式,使用可用的电话号码向各用户拨打电话。在上述实现方式中,通过设置对第一名单中的催收任务进行风险判断的过程与实际拨打电话的过程并行执行,可以将风险判断作为帮助正常业务执行的辅助手段,而无需占用催收机器人正常拨打催收电话的时间,从而有助于降低风险判断过程对正常业务的影响。In a possible implementation, while using the first model to determine the user category of each user in the first list, you can also use the available phone number to call each user according to the contact information of each user in the first list . In the above implementation manner, by setting the process of risk judgment on the collection tasks in the first list to be executed in parallel with the process of actually making a call, the risk judgment can be used as an auxiliary means to help normal business execution without occupying the collection robot to make normal calls. Time to collect calls, thereby helping to reduce the impact of the risk judgment process on normal business.
在一种可能的实现方式中,在第一时长未超过设定时长时,若在第一时长内接收到处理第三名单的请求消息,则还可以基于第一模型确定向第三名单中的全部用户拨打电话所需的第二时长。若第一时长和第二时长之和超过设定时长,则可以拒绝接收第三名单。在上述实现方式中,当接收新的第三名单时,通过预先判断对第一名单和第三名单中的全部用户拨打催收电话的总通话时长,并在总通话时长超过设定时长时拒绝接收第三名单,可以避免接受无法完成的催收任务,从而有助于降低客户的损失。In a possible implementation manner, when the first duration does not exceed the set duration, if a request message for processing the third list is received within the first duration, it can also be determined based on the first model to send to the third list The second time required for all users to make calls. If the sum of the first duration and the second duration exceeds the set duration, you can refuse to accept the third list. In the above implementation, when a new third list is received, the total call duration for calling all users in the first list and the third list is judged in advance, and the total call duration exceeds the set duration and refuses to receive it. The third list can avoid accepting collection tasks that cannot be completed, thereby helping to reduce customer losses.
在一种可能的实现方式中,第一模型可以为分类模型,第一模型可以通过如下方式得 到:先获取多个用户在各个特征下的特征值,然后针对于任一特征,根据多个用户中接听电话的用户的数量、未接听电话的用户的数量、特征的每个特征值对应的用户的数量、每个特征值对应的用户中接听电话的用户的数量和每个特征值对应的用户中未接听电话的用户的数量,确定特征与用户是否接听电话的行为的关联程度。接着,将与用户是否接听电话的行为的关联程度大于或等于第二预设阈值的特征作为强相关特征,根据多个用户中接听电话的用户的数量、未接听电话的用户的数量、强相关特征的各个特征值对应的用户的数量、强相关特征的各个特征值对应的用户中接听电话的用户的数量和强相关特征的各个特征值对应的用户中未接听电话的用户的数量,训练得到第一模型。在上述实现方式中,通过确定每个特征与接听电话的行为的关联程度,可以仅基于关联程度较高的特征训练得到第一模型,如此,参与训练的数据量较少,训练模型的效率较高;且,由于使用的训练数据更集中在与接听电话的行为强相关的特征数据上,因此第一模型的训练过程更为聚合,模型效果也能更好。In a possible implementation, the first model can be a classification model, and the first model can be obtained in the following way: first obtain the feature values of multiple users under each feature, and then for any feature, according to multiple users The number of users who answered the phone, the number of users who did not answer the phone, the number of users corresponding to each characteristic value of the characteristic, the number of users who answered the phone among the users corresponding to each characteristic value, and the users corresponding to each characteristic value The number of users who did not answer the call in the, determines the degree of correlation between the feature and the behavior of whether the user answered the call. Then, the feature whose degree of association with the user's behavior of answering the phone is greater than or equal to the second preset threshold is taken as the strong correlation feature, based on the number of users who answered the phone, the number of users who did not answer the phone, and the strong correlation among multiple users. The number of users corresponding to each feature value of the feature, the number of users answering the phone among the users corresponding to each feature value of the strong correlation feature, and the number of users who have not answered the phone among the users corresponding to each feature value of the strong correlation feature, obtained by training The first model. In the above implementation, by determining the degree of association between each feature and the behavior of answering the phone, the first model can be trained based on only the features with a higher degree of association. In this way, the amount of data involved in training is less, and the training model is more efficient. High; and, because the training data used is more concentrated on the feature data that is strongly related to the behavior of answering the phone, the training process of the first model is more aggregated, and the model effect can also be better.
在一种可能的实现方式中,每个特征与用户是否接听电话的行为的关联程度可以满足如下条件:In a possible implementation, the degree of association between each feature and whether the user answers the call can satisfy the following conditions:
Figure PCTCN2020129121-appb-000001
Figure PCTCN2020129121-appb-000001
其中,X为任一特征,R(X)为X特征的特征值集合,包括X特征的各个特征值,x为特征X的任一特征值;Y为用户是否接听电话的行为,R(Y)为用户是否接听电话的行为集合,包括用户接听电话的行为和用户未接听电话的行为,y为用户接听电话的行为或用户未接听电话的行为;I(X,Y)为特征X与用户是否接听电话的行为的关联程度,P(x,y)为特征值x对应的用户中执行了y行为的用户的数量占用户总数量的比例,P(x)为特征值x对应的用户占用户总数量的比例,P(y)为执行了y行为的用户的数量占用户总数量的比例。Among them, X is any feature, R(X) is the feature value set of X feature, including each feature value of X feature, x is any feature value of feature X; Y is the behavior of whether the user answers the phone, R(Y ) Is the behavior set of whether the user answers the phone, including the behavior of the user answering the phone and the behavior of the user not answering the phone, y is the behavior of the user answering the phone or the behavior of the user not answering the phone; I(X, Y) is the feature X and the user The degree of association of the behavior of answering the phone, P(x,y) is the ratio of the number of users who have performed the behavior y among the users corresponding to the characteristic value x to the total number of users, and P(x) is the proportion of the users corresponding to the characteristic value x The ratio of the total number of users, P(y) is the ratio of the number of users who have performed behavior y to the total number of users.
在上述实现方式中,通过使用某一特征的每个特征值与接听电话的行为相关的概率得到每个特征与接听电话的行为相关的关联程度,使得该关联程度综合了各个特征值的相关信息,由于使用的信息更为丰富,从而可以使得关联程度更为准确。In the above-mentioned implementation, by using the probability that each feature value of a certain feature is related to the behavior of answering the phone, the degree of association between each feature and the behavior of answering the phone is obtained, so that the degree of association integrates the relevant information of each feature value. , As the information used is richer, the degree of association can be made more accurate.
在一种可能的实现方式中,第二模型可以为神经网络模型,第二模型可以通过如下方式得到:先获取多个用户在各个特征下的特征值,然后针对于任一用户,根据用户在每个特征下的特征值和每个特征的各个特征值构建用户在每个特征下的特征向量,拼接用户在各个特征下的特征向量,得到用户对应的第一特征向量。接着,根据用户是否执行预设行为得到用户对应的第二特征向量,然后将多个用户对应的第一特征向量作为模型输入,得到多个用户执行预设行为的预测结果,最后基于多个用户的第二特征向量和多个用户执行预设行为的预测结果调整模型参数,得到第二模型。在上述实现方式中,通过确定用户在每个特征下的特征向量,并拼接用户在各个特征下的特征向量值得到用户的特征向量,使得用户的特征向量能够综合每个特征的各个特征值的特征信息,信息更为全面,且表现形式更为简洁,如此,基于信息丰富且形式简洁的模型输入训练的得到的模型的效果更好,训练效率更高。In a possible implementation, the second model can be a neural network model, and the second model can be obtained in the following manner: first obtain the feature values of multiple users under each feature, and then target any user according to the user’s current status. The feature value under each feature and each feature value of each feature construct the feature vector of the user under each feature, and the feature vector of the user under each feature is spliced to obtain the first feature vector corresponding to the user. Then, the second feature vector corresponding to the user is obtained according to whether the user performs the preset behavior, and then the first feature vector corresponding to the multiple users is used as the model input to obtain the prediction result of the multiple users performing the preset behavior, and finally based on the multiple users The second feature vector and the prediction results of multiple users performing preset behaviors adjust the model parameters to obtain the second model. In the above implementation, by determining the feature vector of the user under each feature, and stitching the feature vector value of the user under each feature to obtain the feature vector of the user, the feature vector of the user can integrate the feature value of each feature. Feature information, the information is more comprehensive, and the form of expression is more concise. In this way, the model obtained based on the model input training with rich information and concise form has better effect and higher training efficiency.
在一种可能的实现方式中,每个特征的各个特征值可以通过如下方式得到:若该特征属于离散特征,则可以统计多个用户在该特征下的各个值,将这各个值作为该特征的各个 特征值。若该特征属于连续特征,则可以统计多个用户在该特征下的取值范围,然后将取值范围划分为多个取值范围区间,为每个取值范围区间设置一个对应的特征值,得到该特征的各个特征值。在上述实现方式中,通过对连续特征的取值进行离散,可以使得各个特征(包括连续特征和离散特征)具有相同的离散的表现形式,从而在训练模型时可以使用各个离散的特征值作为训练数据,而无需对连续特征拟合概率分布函数,从而可以提高数据处理的效率。In a possible implementation, each feature value of each feature can be obtained in the following way: if the feature is a discrete feature, then the various values of multiple users under the feature can be counted, and these values are taken as the feature Each characteristic value of. If the feature is a continuous feature, you can count the value ranges of multiple users under the feature, and then divide the value range into multiple value range intervals, and set a corresponding characteristic value for each value range interval. Get each feature value of the feature. In the above implementation, by discretizing the value of the continuous feature, each feature (including the continuous feature and the discrete feature) can have the same discrete manifestation, so that each discrete feature value can be used as training when training the model Data, without the need to fit the probability distribution function to continuous features, which can improve the efficiency of data processing.
第二方面,本发明提供一种数据处理装置,该装置包括:获取模块,用于获取第一名单,第一名单中包括多个未执行预设行为的用户;确定模块,用于使用第一模型确定第一名单中各用户的用户类别,各用户的用户类别中包括第一用户类别,第一用户类别表征用户会接听催收系统拨打的电话;处理模块,用于统计第一名单中属于各个用户类别的用户的数量,并基于数量确定第一时长,第一时长表征向第一名单中的全部用户拨打电话所需的时长;若第一时长超过设定时长,则使用第二模型确定属于第一用户类别的每个用户执行预设行为的概率,并根据概率确定第二名单;第二名单用于指示在当前时刻之后需拨打电话的用户。In a second aspect, the present invention provides a data processing device, the device includes: an acquisition module, configured to acquire a first list, the first list includes a plurality of users who have not performed a preset behavior; a determining module, configured to use the first list The model determines the user category of each user in the first list. The user category of each user includes the first user category. The first user category represents that the user will answer the call made by the collection system; the processing module is used to count the number of users in the first list. The number of users in the user category, and the first duration is determined based on the number. The first duration represents the duration required to make calls to all users in the first list; if the first duration exceeds the set duration, the second model is used to determine that they belong to The probability of each user in the first user category performing the preset behavior, and the second list is determined according to the probability; the second list is used to indicate users who need to make a call after the current moment.
在一种可能的实现方式中,各用户的用户类别中还包括第二用户类别,第二用户类别表征用户不会接听催收系统拨打的电话。在这种情况下,获取模块还可以获取第一用户类别对应的第一通话时长和第二用户类别对应的第二通话时长。确定模块可以根据第一名单中属于第一用户类别的用户的数量和第一通话时长、第一名单中属于第二用户类别的用户的数量和第二通话时长,确定向第一名单中的全部用户拨打电话的总通话时长;基于总通话时长和可用的电话号码的数量,确定第一时长。其中,第一通话时长是根据历史时段内向接听电话的各个用户拨打电话的通话时长确定的,第二通话时长是根据向用户拨打电话后等待接听的通话时长确定的。In a possible implementation manner, the user category of each user further includes a second user category, and the second user category indicates that the user will not answer the call made by the collection system. In this case, the acquiring module may also acquire the first call duration corresponding to the first user category and the second call duration corresponding to the second user category. The determining module can determine the number of users in the first list and the first call duration, the number of users belonging to the second user category in the first list, and the second call duration to determine the number of users in the first list The total call duration of the user's call; the first duration is determined based on the total call duration and the number of available phone numbers. The first call duration is determined based on the call duration of each user who answered the call in the historical time period, and the second call duration is determined based on the call duration waiting to be answered after the call is made to the user.
在一种可能的实现方式中,确定模块可以通过如下方式确定可用的电话号码:针对预先在运营商申请的多个电话号码,基于总通话时长和多个电话号码的数量,得到预测时长,确定多个电话号码在预测时长内下线的概率,将概率不大于第一预设阈值的电话号码作为可用的电话号码。In a possible implementation, the determining module can determine the available phone numbers in the following way: For multiple phone numbers previously applied for by the operator, based on the total call duration and the number of multiple phone numbers, obtain the predicted duration, and determine With regard to the probability of multiple phone numbers going offline within the predicted time period, a phone number whose probability is not greater than the first preset threshold is used as an available phone number.
在一种可能的实现方式中,该装置还可以包括拨打模块,在确定模块使用第一模型确定第一名单中各用户的用户类别的同时,拨打模块可以根据第一名单中各用户的联系方式,使用可用的电话号码向各用户拨打电话。In a possible implementation manner, the device may further include a dialing module. While the determining module uses the first model to determine the user category of each user in the first list, the dialing module may use the contact information of each user in the first list. , Use the available phone number to call each user.
在一种可能的实现方式中,在第一时长未超过设定时长时,若在第一时长内接收到处理第三名单的请求消息,则处理模块还可以基于第一模型确定向第三名单中的全部用户拨打电话所需的第二时长。若第一时长和第二时长之和超过设定时长,则处理模块还可以拒绝接收第三名单。In a possible implementation manner, when the first duration does not exceed the set duration, if a request message for processing the third list is received within the first duration, the processing module may also determine to send to the third list based on the first model The second time required for all users in to make calls. If the sum of the first duration and the second duration exceeds the set duration, the processing module may also refuse to receive the third list.
在一种可能的实现方式中,第一模型可以为分类模型。在这种情况下,处理模块还可以获取多个用户在各个特征下的特征值,针对于任一特征,根据多个用户中接听电话的用户的数量、未接听电话的用户的数量、特征的每个特征值对应的用户的数量、每个特征值对应的用户中接听电话的用户的数量和每个特征值对应的用户中未接听电话的用户的数量,确定特征与用户是否接听电话的行为的关联程度,然后将与用户是否接听电话的行为的关联程度大于或等于第二预设阈值的特征作为强相关特征,根据多个用户中接听电话的用户的数量、未接听电话的用户的数量、强相关特征的各个特征值对应的用户的数量、强 相关特征的各个特征值对应的用户中接听电话的用户的数量和强相关特征的各个特征值对应的用户中未接听电话的用户的数量,训练得到第一模型。In a possible implementation manner, the first model may be a classification model. In this case, the processing module can also obtain the characteristic values of multiple users under each characteristic. For any characteristic, according to the number of users who answered the phone, the number of users who did not answer the phone, and the characteristic value of the multiple users. The number of users corresponding to each characteristic value, the number of users who answered the phone among the users corresponding to each characteristic value, and the number of users who did not answer the phone among the users corresponding to each characteristic value, determine the characteristic and the behavior of whether the user answers the phone Then, the feature that is related to the behavior of whether the user answers the call is greater than or equal to the second preset threshold as a strong correlation feature, based on the number of users who answered the call and the number of users who did not answer the call among multiple users , The number of users corresponding to each feature value of the strong correlation feature, the number of users who answer the phone among the users corresponding to each feature value of the strong correlation feature, and the number of users who have not answered the phone among the users corresponding to each feature value of the strong correlation feature , The first model is obtained by training.
在一种可能的实现方式中,每个特征与用户是否接听电话的行为的关联程度满足如下条件:In a possible implementation, the degree of association between each feature and whether the user answers the call satisfies the following conditions:
Figure PCTCN2020129121-appb-000002
Figure PCTCN2020129121-appb-000002
其中,X为任一特征,R(X)为X特征的特征值集合,包括X特征的各个特征值,x为特征X的任一特征值;Y为用户是否接听电话的行为,R(Y)为用户是否接听电话的行为集合,包括用户接听电话的行为和用户未接听电话的行为,y为用户接听电话的行为或用户未接听电话的行为;I(X,Y)为特征X与用户是否接听电话的行为的关联程度,P(x,y)为特征值x对应的用户中执行了y行为的用户的数量占用户总数量的比例,P(x)为特征值x对应的用户占用户总数量的比例,P(y)为执行了y行为的用户的数量占用户总数量的比例。Among them, X is any feature, R(X) is the feature value set of X feature, including each feature value of X feature, x is any feature value of feature X; Y is the behavior of whether the user answers the phone, R(Y ) Is the behavior set of whether the user answers the phone, including the behavior of the user answering the phone and the behavior of the user not answering the phone, y is the behavior of the user answering the phone or the behavior of the user not answering the phone; I(X, Y) is the feature X and the user The degree of association of the behavior of answering the phone, P(x,y) is the ratio of the number of users who have performed the behavior y among the users corresponding to the characteristic value x to the total number of users, and P(x) is the proportion of the users corresponding to the characteristic value x The ratio of the total number of users, P(y) is the ratio of the number of users who have performed behavior y to the total number of users.
在一种可能的实现方式中,第二模型可以为神经网络模型,处理模块还可以获取多个用户在各个特征下的特征值,针对于任一用户,根据用户在每个特征下的特征值和每个特征的各个特征值构建用户在每个特征下的特征向量,拼接用户在各个特征下的特征向量,得到用户对应的第一特征向量,然后根据用户是否执行预设行为得到用户对应的第二特征向量,之后将多个用户对应的第一特征向量作为模型输入,得到多个用户执行预设行为的预测结果,基于多个用户的第二特征向量和多个用户执行预设行为的预测结果调整模型参数,得到第二模型。In a possible implementation, the second model can be a neural network model, and the processing module can also obtain the feature value of multiple users under each feature, for any user, according to the user's feature value under each feature Construct the feature vector of the user under each feature with each feature value of each feature, join the feature vector of the user under each feature to obtain the first feature vector corresponding to the user, and then obtain the user corresponding to the user according to whether the user performs a preset behavior The second feature vector, and then the first feature vector corresponding to multiple users is used as the model input, and the prediction result of multiple users performing preset behaviors is obtained, based on the second feature vector of multiple users and multiple users performing preset behaviors The prediction result adjusts the model parameters to obtain the second model.
在一种可能的实现方式中,处理模块还可以通过如下方式得到每个特征的各个特征值:若该特征属于离散特征,则可以统计多个用户在该特征下的各个值,将各个值作为该特征的各个特征值;若该特征属于连续特征,则可以统计多个用户在该特征下的取值范围,将取值范围划分为多个取值范围区间,为每个取值范围区间设置一个对应的特征值,得到该特征的各个特征值。In a possible implementation, the processing module can also obtain each feature value of each feature in the following manner: if the feature is a discrete feature, then the various values of multiple users under the feature can be counted, and each value is taken as Each feature value of the feature; if the feature is a continuous feature, you can count the value ranges of multiple users under the feature, divide the value range into multiple value range intervals, and set each value range interval A corresponding characteristic value, each characteristic value of the characteristic is obtained.
第三方面,本发明提供一种计算设备,包括至少一个处理器以及至少一个存储器。其中,存储器存储有计算机程序,当该计算机程序被处理器执行时,处理器可以执行上述第一方面任意的数据处理方法。In a third aspect, the present invention provides a computing device including at least one processor and at least one memory. Wherein, the memory stores a computer program, and when the computer program is executed by the processor, the processor can execute any data processing method of the first aspect described above.
第四方面,本发明提供一种计算机可读存储介质,其存储有可由计算设备执行的计算机程序,当该计算机程序在计算设备上运行时,计算设备可以执行上述第一方面任意的数据处理方法。In a fourth aspect, the present invention provides a computer-readable storage medium that stores a computer program that can be executed by a computing device. When the computer program runs on the computing device, the computing device can execute any of the data processing methods of the first aspect described above. .
本发明的这些方面或其他方面在以下实施例的描述中会更加简明易懂。These and other aspects of the present invention will be more concise and understandable in the description of the following embodiments.
附图说明Description of the drawings
为了更清楚地说明本发明实施例中的技术方案,下面将对实施例描述中所需要使用的附图作简要介绍,显而易见地,下面描述中的附图仅仅是本发明的一些实施例,对于本领域的普通技术人员来讲,在不付出创造性劳动性的前提下,还可以根据这些附图获得其他的附图。In order to explain the technical solutions in the embodiments of the present invention more clearly, the following will briefly introduce the drawings needed in the description of the embodiments. Obviously, the drawings in the following description are only some embodiments of the present invention. For those of ordinary skill in the art, without creative labor, other drawings can be obtained based on these drawings.
图1为本发明实施例提供的一种催收系统的架构示意图;FIG. 1 is a schematic structural diagram of a collection system provided by an embodiment of the present invention;
图2为本发明实施例提供的一种数据处理方法的流程示意图;2 is a schematic flowchart of a data processing method provided by an embodiment of the present invention;
图3为本发明实施例提供的一种一维元胞模型的结构示意图;3 is a schematic structural diagram of a one-dimensional cell model provided by an embodiment of the present invention;
图4为本发明实施例提供的一种数据处理装置的结构示意图;4 is a schematic structural diagram of a data processing device provided by an embodiment of the present invention;
图5为本发明实施例提供的一种计算设备的结构示意图。Fig. 5 is a schematic structural diagram of a computing device provided by an embodiment of the present invention.
具体实施方式Detailed ways
为了使本发明的目的、技术方案和优点更加清楚,下面将结合附图对本发明作进一步地详细描述,显然,所描述的实施例仅仅是本发明一部分实施例,而不是全部的实施例。基于本发明中的实施例,本领域普通技术人员在没有做出创造性劳动前提下所获得的所有其它实施例,都属于本发明保护的范围。In order to make the objectives, technical solutions, and advantages of the present invention clearer, the present invention will be further described in detail below with reference to the accompanying drawings. Obviously, the described embodiments are only a part of the embodiments of the present invention, rather than all of them. Based on the embodiments of the present invention, all other embodiments obtained by those of ordinary skill in the art without creative work shall fall within the protection scope of the present invention.
本发明实施例中,预设行为可以是指任意行为,比如广告推广领域中的购物行为、信用卡推广领域中的开卡行为或催收领域中的还款行为等。为了便于理解,本发明的下列实施例以催收领域为例描述本发明实施例中的数据处理方法。In the embodiment of the present invention, the preset behavior may refer to any behavior, such as shopping behavior in the advertising promotion field, card issuance behavior in the credit card promotion field, or repayment behavior in the collection field. For ease of understanding, the following embodiments of the present invention take the field of collection as an example to describe the data processing method in the embodiments of the present invention.
图1为本发明实施例提供的一种催收系统的架构示意图,如图1所示,催收系统中可以设置有催收机器人110和至少一个客户端,比如客户端121、客户端122和客户端123。其中,客户端可以为金融科技领域中向用户提供贷款的任意网贷客户端,比如设置在商业银行的网贷客户端、设置在财务公司的网贷客户端或设置在信托公司的网贷客户端等,不作限定。FIG. 1 is a schematic diagram of the architecture of a collection system provided by an embodiment of the present invention. As shown in FIG. 1, the collection system may be provided with a collection robot 110 and at least one client, such as client 121, client 122, and client 123 . Among them, the client can be any online loan client that provides loans to users in the financial technology field, such as an online loan client installed in a commercial bank, an online loan client installed in a financial company, or an online loan client installed in a trust company End, etc., without limitation.
如图1所示,催收系统中还可以设置有至少一个用户端,比如用户端131、用户端132和用户端133。其中,用户端可以为具有通话功能的任意终端设备,比如老年机、智能手机、滑盖手机等,不作限定。As shown in FIG. 1, the collection system may also be provided with at least one client terminal, such as client terminal 131, client terminal 132, and client terminal 133. Among them, the user terminal can be any terminal device with a call function, such as an elderly phone, a smart phone, a slide phone, etc., which is not limited.
本发明实施例中,催收机器人110可以分别与至少一个客户端和至少一个用户端连接,比如可以通过有线方式连接,或者也可以通过无线方式连接,具体不作限定。In the embodiment of the present invention, the collection robot 110 may be connected to at least one client and at least one client respectively, for example, it may be connected in a wired manner, or may also be connected in a wireless manner, which is not specifically limited.
基于图1所示意的系统架构,图2为本发明实施例提供的一种数据处理方法的流程示意图,该方法应用于催收机器人,李如图1所示意的催收机器人110。如图2所示,该方法包括:Based on the system architecture shown in FIG. 1, FIG. 2 is a schematic flowchart of a data processing method provided by an embodiment of the present invention. The method is applied to a collection robot, such as the collection robot 110 shown in FIG. 1. As shown in Figure 2, the method includes:
步骤201,获取第一名单,所述第一名单中包括多个未执行预设行为的用户。Step 201: Obtain a first list. The first list includes a plurality of users who have not performed a predetermined behavior.
在一个示例中,第一名单中可以包括未执行预设行为的各个用户的联系方式。在催收领域,未执行预设行为的各个用户即是指向网贷机构贷款后逾期未还款的各个用户。In an example, the first list may include the contact information of each user who has not performed the preset behavior. In the field of collection, each user who has not performed the preset behavior is each user who has overdue the loan after the loan is directed to the online lending institution.
在一种可能的实现方式中,催收系统中还可以设置有预处理装置(图1未进行示意),预处理装置可以设置在至少一个客户端与催收机器人110之间,也可以设置在催收机器人110的内部。具体实施中,预处理装置可以接收每个客户端发送的催收名单,并根据设定拨打策略对各个催收名单中的待催收用户进行排序,得到第一名单。其中,设定拨打策略可以为根据业务需求设置的拨打策略,比如可以为按照接收催收名单的时间顺序对各个催收名单中的待催收用户进行排序,也可以为按照各个催收名单对应的各个客户端的优先级对各个催收名单中的待催收用户进行排序,还可以为根据各个催收名单所属的网贷产品的优先级对各个催收名单中的待催收用户进行排序,还可以为根据各个催收名单对应的各个客户端所在的城市的优先级对各个催收名单中的待催收用户进行排序,还可以为上述多种拨打策略的组合形式,等等,具体不作限定。In a possible implementation, the collection system may also be provided with a pre-processing device (not shown in FIG. 1), and the pre-processing device can be provided between at least one client and the collection robot 110, or can be provided in the collection robot. 110's interior. In specific implementation, the preprocessing device may receive the collection list sent by each client, and sort the users to be collected in each collection list according to the set dial strategy to obtain the first list. Among them, the set dialing strategy can be a dialing strategy set according to business needs. For example, it can be to sort the users to be collected in each collection list according to the chronological order of receiving the collection list, or it can be the order of each client corresponding to each collection list. The priority sorts the users to be collected in each collection list, and can also sort the users to be collected in each collection list according to the priority of the online loan product to which each collection list belongs, and can also be based on the corresponding collection list. The priority of the city where each client is located sorts the users to be collected in each collection list, and can also be a combination of the above-mentioned multiple dialing strategies, etc., which are not specifically limited.
举例来说,预处理装置可以为基于全球广域网(world wide web,web)技术的web服 务器,客户端可以为设置有web浏览器的客户端。如此,当网贷机构存在催收需求时,网贷机构可以通过其客户端的web浏览器访问预处理装置提供的web服务界面,由于网贷机构可能对多款网贷产品均存在催收需求,因此网贷公司可以将每个网贷产品对应的待催收的用户信息(包括用户的年龄、性别、学历信息、婚姻信息、职业信息、本次贷款信息和历史贷款信息等)打包成一份催收名单,并进行上传。且,网贷结构还可以在web服务界面上选择催收的终止时间,以使催收机器人110在催收的终止时间之前反馈催收结果。For example, the preprocessing device may be a web server based on a worldwide web (web) technology, and the client may be a client provided with a web browser. In this way, when an online lending institution has a collection demand, the online lending institution can access the web service interface provided by the preprocessing device through the web browser of its client. Since the online lending institution may have a collection demand for multiple online loan products, the online lending institution may have a collection demand for multiple online loan products. The loan company can pack the user information (including the user's age, gender, education information, marriage information, occupation information, current loan information and historical loan information, etc.) corresponding to each online loan product to be collected into a collection list, and To upload. Moreover, the online loan structure can also select the termination time of the collection on the web service interface, so that the collection robot 110 feeds back the collection result before the termination time of the collection.
相应地,预处理装置在接收到各个客户端发送的各个网贷产品的催收名单后,可以先根据各个网贷产品的优先级对每个客户端的各个催收名单进行初次排序,然后再根据各个客户端的优先级对初次排序后的各个客户端的催收名单进行排序,得到第一名单。或者,也可以先根据各个客户端的优先级对各个客户端的催收名单进行初次排序,然后再根据各个网贷产品的优先级对每个客户端的各个催收名单进行排序,得到第一名单,不作限定。比如,当客户端121的优先级>客户端123的优先级>客户端122的优先级,且网贷产品2的优先级>网贷产品1的优先级时,若客户端121的催收名单包括网贷产品1对应的待催收用户1和待催收用户2,客户端122的催收名单包括网贷产品1对应的待催收用户3、网贷产品2对应的待催收用户4和待催收用户5,客户端123的催收名单包括网贷产品2对应的待催收用户6,则第一名单可以为:待催收用户1、待催收用户2、待催收用户6、待催收用户4、待催收用户5、待催收用户3。或者第一名单也可以为:待催收用户6、待催收用户4、待催收用户5、待催收用户1、待催收用户2、待催收用户3。Correspondingly, after receiving the collection list of each online loan product sent by each client, the preprocessing device can first sort the collection list of each client according to the priority of each online loan product, and then sort the collection lists of each client according to the customer's priority. The priority of the terminal sorts the collection lists of each client after the initial sorting to obtain the first list. Alternatively, the collection lists of each client can be first sorted according to the priority of each client, and then the collection lists of each client can be sorted according to the priority of each online loan product to obtain the first list, which is not limited. For example, when the priority of the client 121>the priority of the client 123>the priority of the client 122, and the priority of the online loan product 2>the priority of the online loan product 1, if the collection list of the client 121 includes Online loan product 1 corresponds to the user to be collected 1 and user 2 to be collected, and the collection list of the client 122 includes the user to be collected 3 corresponding to the online loan product 1 and the user to be collected 4 and user 5 to be collected corresponding to the online loan product 2. The collection list of the client 123 includes users 6 to be collected corresponding to the online loan product 2. The first list can be: users to be collected 1, users to be collected 2, users to be collected 6, users to be collected 4, users to be collected 5, User to be collected 3. Or the first list may also be: user to be collected 6, user to be collected 4, user to be collected 5, user to be collected 1, user to be collected 2, user to be collected 3.
本发明实施例中,若预处理装置不为催收机器人110中的装置,则预处理装置可以将第一名单发送给催收机器人110,可以也可以由催收机器人110通过文件传输协议从预处理装置中获取第一名单。若预处理装置为催收机器人110中的装置(比如预处理进程),则预处理装置可以直接将第一名单存储在催收机器人110的存储器中,以使催收机器人110调用处理进程对第一名单中的各个用户进行催收拨打。In the embodiment of the present invention, if the pre-processing device is not a device in the collection robot 110, the pre-processing device can send the first list to the collection robot 110, or the collection robot 110 can also use the file transfer protocol from the pre-processing device. Get the first list. If the pre-processing device is a device in the collection robot 110 (such as a pre-processing process), the pre-processing device can directly store the first list in the memory of the collection robot 110, so that the collection robot 110 calls the processing process to the first list. Each user of the company makes a collection call.
需要说明的是,本发明实施例不限定各个客户端发送催收名单的时间。比如各个客户端可以在执行催收的前一天就将催收名单发送给预处理装置,也可以在执行催收的当天将催收名单发送给预处理装置。相应地,本发明实施例也不限定客户端发送催收名单的设备。比如客户端也可以将催收名单直接发送给预处理装置,也可以将催收名单发送给催收机器人110,再由催收机器人110转发给预处理装置。It should be noted that the embodiment of the present invention does not limit the time for each client to send the collection list. For example, each client may send the collection list to the preprocessing device the day before the collection is executed, or send the collection list to the preprocessing device on the day when the collection is executed. Correspondingly, the embodiment of the present invention does not limit the device for the client to send the collection list. For example, the client may directly send the collection list to the preprocessing device, or send the collection list to the collection robot 110, and then the collection robot 110 forwards it to the preprocessing device.
步骤202,使用第一模型确定第一名单中各用户的用户类别。其中,所述各用户的用户类别中包括第一用户类别,所述第一用户类别表征用户会接听所述催收系统拨打的电话。Step 202: Use the first model to determine the user category of each user in the first list. Wherein, the user category of each user includes a first user category, and the first user category represents that the user will answer the call made by the collection system.
在一种可能的实现方式中,催收机器人110获取到第一名单后,可以先确定当前时刻与催收机器人110的启动催收时刻的时间差,若时间差大于或等于第一预设时间差(大于或等于确定出催收策略所需的时间),则催收机器人110可以先分析第一名单中的催收任务是否能够在各个客户端设置的催收终止时间点之前完成,并根据能否完成的分析结果设置对应的催收策略,然后再在催收机器人110的启动催收时刻时按照对应的催收策略开始对第一名单中的各个用户进行催收。若时间差小于等于第二预设时间差(为小于或等于0的任意数值),则可以直接调用拨打线程按照第一名单中的各个用户的顺序对各个用户进行催收拨打,并在催收拨打的同时调用并行的处理线程分析第一名单中的催收任务是否能够在各个客户端设置的催收终止时间点之前完成,根据能否完成的结果设置对应的催收策略后,控制拨打线程按照对应的催收策略开始对第一名单中的各个用户进行催收。In a possible implementation manner, after the collection robot 110 obtains the first list, it can first determine the time difference between the current time and the time when the collection robot 110 starts the collection. If the time difference is greater than or equal to the first preset time difference (greater than or equal to the determined time difference) Time required for the collection strategy), the collection robot 110 can first analyze whether the collection task in the first list can be completed before the collection termination time point set by each client, and set the corresponding collection according to the analysis result of whether it can be completed Then, at the time when the collection robot 110 starts the collection, according to the corresponding collection strategy, the collection of each user in the first list is started. If the time difference is less than or equal to the second preset time difference (any value less than or equal to 0), you can directly call the dialing thread to call each user in the order of each user in the first list, and call at the same time as the call The parallel processing thread analyzes whether the collection tasks in the first list can be completed before the collection termination time point set by each client. After the corresponding collection strategy is set according to the result of whether it can be completed, the control dialing thread starts to check according to the corresponding collection strategy. Each user in the first list collects.
相应地,若时间差小于第一预设时间差且大于第二预设时间差,则催收机器人110可以先调用处理进程分析第一名单中的催收任务是否能够在各个客户端设置的催收终止时间点之前完成,并根据能否完成的分析结果设置对应的催收策略。同时,在分析的过程中若监测到已到达催收机器人110的启动催收时刻,则调用并行的拨打进程按照第一名单中的各个用户的顺序对第一名单中的各个用户进行催收拨打,并在得到对应的催收策略后,控制并行的拨打进程按照对应的催收策略对第一名单中的各个用户进行催收拨打。Correspondingly, if the time difference is less than the first preset time difference and greater than the second preset time difference, the collection robot 110 may first call the processing process to analyze whether the collection task in the first list can be completed before the collection termination time point set by each client , And set the corresponding collection strategy according to the analysis result that can be completed. At the same time, in the process of analysis, if it is detected that the collection robot 110 has reached the start collection time, the parallel dialing process is called to call each user in the first list in the order of each user in the first list, and then After the corresponding collection policy is obtained, the parallel dialing process is controlled to call each user in the first list according to the corresponding collection policy.
其中,第一预设时间差可以由本领域技术人员根据经验进行设置,也可以根据历史时段内确定出各次催收任务对应的催收策略的时长确定,比如为确定出各次催收任务对应的催收策略的平均时长,或者为确定出各次催收任务对应的催收策略的中位数时长,或者为确定出各次催收任务对应的催收策略的加权平均时长,催收任务越接近本次催收任务,则催收任务的权重越大,等等。Among them, the first preset time difference can be set by those skilled in the art based on experience, or can be determined according to the duration of the collection strategy corresponding to each collection task determined in the historical period, for example, to determine the collection strategy corresponding to each collection task Average duration, or to determine the median duration of the collection strategy corresponding to each collection task, or to determine the weighted average duration of the collection strategy corresponding to each collection task, the closer the collection task is to this collection task, the collection task The greater the weight, and so on.
从硬件实现角度来说,催收机器人110的内部可以设置有线上生产环境和仿真环境两个环境,当获取到第一名单后,催收机器人110可以将第一名单同时推送到线上生产环境和仿真环境。线上生产环境用于执行正常的拨打流程,比如检测到到达催收机器人110的启动催收时间(比如8:00)时,即按照第一名单(或仿真环境发送的催收策略)中各个用户的顺序对各个用户依次拨打催收电话,并记录电话信息和用户的还款意愿(比如用户结束通话时的催收阶段),将每个用户的拨打结果发送给网贷机构对应的客户端,以使网贷机构执行跟进用户后续的还款情况。其中,催收阶段可以包括询问对方是否为本人、说明逾期情况、询问何时能还款、确认还款日期、结束这5个阶段。相应地,仿真环境用于对第一名单对应的催收任务进行分析,确定对应的催收策略,并将对应的催收策略发送到线上生产环境,以使线上生产环境按照对应的催收策略执行催收任务。且,线上生产环境还可以将执行催收任务得到的各个用户的催收结果发送给仿真环境,以使仿真环境更新内部的各个参数,比如第一通话时长、第一模型参数、第二模型参数、历史时段内平均每小时发生电话号码下线的次数等。From the perspective of hardware implementation, the collection robot 110 can be equipped with two environments, an on-line production environment and a simulation environment. When the first list is obtained, the collection robot 110 can push the first list to the online production environment and simulation at the same time. surroundings. The online production environment is used to perform the normal dialing process. For example, when the collection robot 110 is detected to start the collection time (such as 8:00), it will follow the order of each user in the first list (or collection strategy sent by the simulation environment) Call collection calls to each user in turn, record the phone information and the user's repayment willingness (such as the collection phase when the user ends the call), and send each user's call result to the corresponding client of the online lending institution to enable the online loan Institutions follow up the subsequent repayment of users. Among them, the collection stage can include the five stages of asking if the other party is the person, explaining the overdue situation, asking when the payment can be repaid, confirming the repayment date, and ending. Correspondingly, the simulation environment is used to analyze the collection tasks corresponding to the first list, determine the corresponding collection strategy, and send the corresponding collection strategy to the online production environment, so that the online production environment executes the collection according to the corresponding collection strategy task. Moreover, the online production environment can also send the collection results of each user obtained by executing the collection task to the simulation environment, so that the simulation environment can update various internal parameters, such as the first call duration, the first model parameter, the second model parameter, The average number of offline phone numbers per hour in the historical period, etc.
在上述实现方式中,通过控制风险判断的过程与实际催收拨打的过程的并行执行,可以将风险判断作为辅助正常催收任务执行的手段,避免风险判断占用催收机器人正常拨打催收电话的时间,从而降低风险判断对正常催收任务的影响。In the above implementation method, by controlling the parallel execution of the risk judgment process and the actual collection call process, the risk judgment can be used as a means to assist the execution of the normal collection task, avoiding the risk judgment taking up the time of the collection robot calling the collection call normally, thereby reducing The impact of risk judgment on normal collection tasks.
下面描述根据第一名单中的各个用户分析得到催收策略的具体实现过程。The following describes the specific implementation process of the collection strategy obtained from the analysis of each user in the first list.
具体实施中,催收机器人110在获取到第一名单后,可以使用第一模型对第一名单中的每个用户进行预测,从而确定出每个用户的用户类别。其中,用户的用户类别可以仅包括第一用户类别,也可以同时包括第一用户类别和第二用户类别。若某一用户的用户类别为第一用户类别,则说明该用户会接听催收机器人拨打的催收电话。若某一用户的用户类别为第二用户类别,则说明该用户不会接听催收机器人拨打的催收电话。In specific implementation, after obtaining the first list, the collection robot 110 can use the first model to predict each user in the first list, thereby determining the user category of each user. The user category of the user may include only the first user category, or may include both the first user category and the second user category. If the user category of a user is the first user category, it means that the user will answer the collection call made by the collection robot. If the user category of a user is the second user category, it means that the user will not answer the collection call made by the collection robot.
步骤203,统计所述第一名单中属于各个用户类别的用户的数量,并基于所述数量确定第一时长,所述第一时长表征向所述第一名单中的全部用户拨打电话所需的时长。Step 203: Count the number of users belonging to each user category in the first list, and determine a first duration based on the number. The first duration represents the amount of time needed to make calls to all users in the first list. duration.
在一种可能的实现方式中,当使用第一模型对第一名单中的全部用户预测结束后,催收机器人110可以统计预测结果中属于第一用户类别和第二用户类别的用户的数量,然后根据第一名单中属于第一用户类别的用户的数量和第一用户类别对应的第一通话时长、第一名单中属于第二用户类别的用户的数量和第二用户类别对应的第二通话时长,确定向第一名单中的全部用户拨打电话所需的第一时长。其中,第一通话时长用于标识每个接听电 话的用户可能会耗费的通话时长,第二通话时长用于标识每个不接听电话的用户可能会耗费的通话时长,第一通话时长和第二通话时长可以由本领域技术人员根据经验进行设置,也可以根据业务需要进行设置,具体不作限定。In a possible implementation, after the first model is used to predict all users in the first list, the collection robot 110 can count the number of users belonging to the first user category and the second user category in the prediction result, and then According to the number of users belonging to the first user category in the first list and the first call duration corresponding to the first user category, the number of users belonging to the second user category in the first list and the second call duration corresponding to the second user category To determine the first time required to make calls to all users in the first list. Among them, the first call duration is used to identify the call duration that may be consumed by each user who answers the call, the second call duration is used to identify the call duration that may be consumed by each user who does not answer the call, the first call duration and the second call duration are The call duration can be set by those skilled in the art based on experience, or can be set according to business needs, and is not specifically limited.
在一个示例中,第一通话时长可以根据历史时段内向接听电话的各个用户拨打电话所需的时长确定,而第二通话时长可以根据向用户拨打电话后等待接听的时长确定。举例来说,若历史时段为最近2周,则催收机器人110可以先从统计数据库中获取记录的最近2周内接听了催收机器人110拨打的催收电话的所有用户的通话时长(每个用户的通话时长是指从拨号开始到结束通话的总通话时长),然后取这些用户的通话时长的中位数作为第一通话时长,或者取这些用户的通话时长的平均值作为第一通话时长,等等。相应地,第二通话时长是指催收机器人110等待对方接听的等待时长,该时长可以根据设置的响铃次数而定。比如若设置等待8次电话声响后对方还未接听则挂掉通话,则第二通话时长可以为这8次电话声响的总通话时长。由于每个未接听催收电话的用户的等待时长均相同,因此催收机器人110可以将第二通话时长设置为历史时段内任一个未接听催收电话的用户的等待时长。In an example, the first call duration may be determined according to the duration required to make a call to each user who answered the call in the historical period, and the second call duration may be determined according to the duration of waiting to be answered after the call was made to the user. For example, if the historical period is the last 2 weeks, the collection robot 110 may first obtain the record from the statistical database and the call duration of all users who have answered the collection call made by the collection robot 110 in the last 2 weeks (the call duration of each user) Duration refers to the total call duration from the start of the dialing to the end of the call), and then take the median of the call durations of these users as the first call duration, or take the average of the call durations of these users as the first call duration, etc. . Correspondingly, the second call duration refers to the waiting duration for the collection robot 110 to wait for the other party to answer, and the duration may be determined according to the set number of ringing times. For example, if it is set to hang up the call after the other party has not answered the call after waiting for 8 phone calls, the second call duration can be the total call duration of the 8 phone calls. Since the waiting time of each user who has not answered the collection call is the same, the collection robot 110 may set the second call duration to the waiting time of any user who has not answered the collection call in the historical period.
在上述示例中,通过使用历史时段内向用户拨打催收电话的通话时长确定接听电话的用户的第一通话时长,使得第一通话时长结合了历史拨打信息的特征,从而能够准确标识每个接听电话的用户的通话时长,相应地,第二通话时长为等待接听的通话时长,从而能够准确标识每个不接听电话的用户的通话时长。如此,基于第一通话时长和第一模型预测出的接听电话的用户数量可以确定向第一名单中接听电话的用户拨打催收电话所需的总通话时长,通过第二通话和第一模型预测出的不接听电话的用户数量可以确定第一名单中不接听电话的用户拨打催收电话所需的总通话时长,从而预判出向第一名单中的全部用户拨打电话的总通话时长,该种方式基于历史数据进行分析,从而更加满足实际的业务情况,使得预判出的第一时长更为准确。In the above example, the first call duration of the user who answered the call is determined by using the call duration of the call to the user to collect calls within the historical period, so that the first call duration is combined with the characteristics of the historical dialing information, so as to accurately identify the call duration of each call received. The call duration of the user, correspondingly, the second call duration is the call duration waiting to be answered, so that the call duration of each user who does not answer the call can be accurately identified. In this way, based on the first call duration and the number of users who answered the call predicted by the first model, the total call duration required to make a collection call to the users in the first list who answered the call can be determined, which is predicted by the second call and the first model The number of users who do not answer the call can determine the total call time required for the users who do not answer the call in the first list to make a collection call, so as to predict the total call time to make calls to all users in the first list. This method is based on The historical data is analyzed to better meet the actual business situation and make the predicted first time period more accurate.
本发明实施例中,催收机器人110可以预先在运营商中申请多个电话号码,并使用多个电话号码共同对第一名单中的各个用户拨打催收电话。如此,催收机器人110在得到第一模型对第一名单中的全部用户的预测结果后,可以先根据属于第一用户类别的用户的数量和第一通话时长、属于第二用户类别的用户的数量和第二通话时长,确定向第一名单中的全部用户拨打电话的总通话时长,然后根据多个预先申请的电话号码和总通话时长确定第一时长。In the embodiment of the present invention, the collection robot 110 may apply for multiple phone numbers in the operator in advance, and use the multiple phone numbers to jointly make a collection call to each user in the first list. In this way, after the collection robot 110 obtains the prediction result of the first model for all users in the first list, it can first base on the number of users belonging to the first user category, the first call duration, and the number of users belonging to the second user category. And the second call duration, determine the total call duration for making calls to all users in the first list, and then determine the first duration according to the multiple pre-applied phone numbers and the total call duration.
在一种可选地实施方式中,催收机器人110可以直接将总通话时长与多个电话号码的数量的比值作为第一时长。然而,在实际拨打催收电话的过程中,电话号码随着拨打时长的增加会存在下线的可能,因此,若直接将总通话时长与多个电话号码的数量的比值作为第一时长,则可能会由于某些电话号码下线导致第一时长不准确。基于此,作为一种可能的确定方式,催收机器人110可以按照如下方式确定第一时长:In an optional implementation manner, the collection robot 110 may directly use the ratio of the total call duration to the number of multiple phone numbers as the first duration. However, in the actual process of making a collection call, the phone number may go offline as the dialing time increases. Therefore, if the ratio of the total call time to the number of multiple phone numbers is directly used as the first time length, it may be possible The first time length will be inaccurate due to the offline of some phone numbers. Based on this, as a possible determination method, the collection robot 110 may determine the first duration in the following manner:
催收机器人110可以先根据总通话时长与多个电话号码的数量确定向第一名单中的全部用户拨打电话所需的预测时长,并分析每个电话号码在预测时长内下线的概率。其中,每个电话号码下线的概率可以基于概率学理论确定。由于每个电话号码在开始拨打催收电话到下线的时间间隔t服从参数为λ的指数分布F(t),因此时间间隔t对应的概率密度函数f(t)为:The collection robot 110 may first determine the predicted duration required to make calls to all users in the first list based on the total call duration and the number of multiple phone numbers, and analyze the probability of each phone number being offline within the predicted duration. Among them, the probability of each phone number going offline can be determined based on the theory of probability. Since the time interval t from the start of calling the collection call to the offline of each phone number obeys the exponential distribution F(t) with the parameter λ, the probability density function f(t) corresponding to the time interval t is:
f(t)=λe^(-λt),t≥0f(t)=λe^(-λt), t≥0
相应地,时间间隔t对应的指数分布F(t)可以为:Correspondingly, the exponential distribution F(t) corresponding to the time interval t can be:
F(t)=1-e^(-λt),t≥0F(t)=1-e^(-λt), t≥0
其中,λ可以为设置为历史时段内平均每小时发生电话号码下线的次数,历史时段可以由本领域技术人员根据经验进行设置,比如可以为最近2周,如此,λ的值可以随着时间断更新。Among them, λ can be set as the average number of offline phone numbers per hour in the historical period. The historical period can be set by those skilled in the art based on experience. For example, it can be the last 2 weeks. In this way, the value of λ can be broken over time. Update.
如此,根据时间间隔t对应的指数分布F(t)可知,若预测时长Δt,则每个电话号码在预测时长内下线的概率可以为1-e^(-λΔt)。In this way, according to the exponential distribution F(t) corresponding to the time interval t, if the predicted duration Δt, the probability of each phone number going offline within the predicted duration can be 1-e^(-λΔt).
进一步地,在确定每个电话号码下线的概率后,可以将下线的概率不大于第一预设阈值的电话号码作为可用的电话号码,如此,催收机器人110再根据总通话时长与可用的电话号码的数量确定向第一名单中的全部用户拨打电话所需的第一时长。若第一时长小于或等于设定时长,则说明即使在拨打过程中存在部分电话号码下线,催收机器人110也可以完成第一名单对应的催收任务,从而催收机器人110可以按照第一名单中的用户顺序继续拨打催收电话。相应地,若第一时长大于设定时长,则说明若在拨打过程中存在部分电话号码下线,催收机器人110无法完成第一名单对应的催收任务。因此,催收机器人110可以再判断预测时长是否大于设定时长。若预测时长小于或等于设定时长,则说明当拨打过程中不存在电话号码下线时,催收机器人110可以完成第一名单对应的催收任务。此时,若运营商支持催收机器人110申请备用电话号码,则催收机器人110可以向运营商申请备用电话号码,备用电话号码的数量可以大于或等于下线的概率大于第一预设阈值的电话号码的数量。若运营商不支持催收机器人110申请备用电话号码,则催收机器人110可以从第一名单中获取催收成功率较高的部分用户组成第二名单。相应地,若预测时长大于设定时长,说明即使拨打过程中不存在电话号码下线,催收机器人110也无法完成第一名单对应的催收任务。此时,催收机器人110也可以根据运营商的支持情况确定申请备用电话号码或确定第二名单。Further, after the probability of each phone number being offline is determined, the phone number whose offline probability is not greater than the first preset threshold can be used as the available phone number. In this way, the collection robot 110 then determines the total call duration and the available phone number. The number of phone numbers determines the first time required to make calls to all users in the first list. If the first duration is less than or equal to the set duration, it means that even if some phone numbers are offline during the dialing process, the collection robot 110 can complete the collection task corresponding to the first list, so that the collection robot 110 can follow the items in the first list. The user continues to dial the collection call in sequence. Correspondingly, if the first duration is greater than the set duration, it means that if some phone numbers are offline during the dialing process, the collection robot 110 cannot complete the collection task corresponding to the first list. Therefore, the collection robot 110 can then determine whether the predicted duration is greater than the set duration. If the predicted duration is less than or equal to the set duration, it means that when there is no phone number offline during the dialing process, the collection robot 110 can complete the collection task corresponding to the first list. At this time, if the operator supports the collection robot 110 to apply for a backup phone number, the collection robot 110 can apply for a backup phone number from the operator, and the number of backup phone numbers can be greater than or equal to the phone number whose offline probability is greater than the first preset threshold. quantity. If the operator does not support the collection robot 110 to apply for a spare phone number, the collection robot 110 may obtain a part of users with a higher collection success rate from the first list to form the second list. Correspondingly, if the predicted duration is greater than the set duration, it means that even if there is no phone number offline during the dialing process, the collection robot 110 cannot complete the collection task corresponding to the first list. At this time, the collection robot 110 may also determine to apply for a standby phone number or determine the second list according to the support of the operator.
需要说明的是,上述仅是一种示例性的说明,并不构成对本方案的限定。具体实施中,当运营商支持催收机器人申请备用电话号码时,催收机器人也可以在申请备用电话号码的同时,从第一名单中获取催收成功率较高的部分用户组成第二名单。或者当运营商支持催收机器人申请备用电话号码时,催收机器人也可以不申请备用电话号码,而是从第一名单中获取催收成功率较高的部分用户组成第二名单。实现的方式有很多,具体可以由本领域技术人员根据需要进行设置,具体不作限定。It should be noted that the above is only an exemplary description, and does not constitute a limitation to the solution. In specific implementation, when the operator supports the collection robot to apply for a backup phone number, the collection robot can also apply for a backup phone number while obtaining some users with a higher collection success rate from the first list to form the second list. Or when the operator supports the collection robot to apply for a backup phone number, the collection robot may not apply for a backup phone number. Instead, it obtains some users with a higher success rate of collection from the first list to form the second list. There are many implementation methods, which can be specifically set by those skilled in the art according to needs, and the details are not limited.
在上述确定方式中,通过判断第一时长内各个电话号码下线的概率,可以预先判断出在催收任务执行时段内可能会下线的电话号码的数量,如此,通过使用不会下线的电话号码的数量重新确定第一时长,可以提前预判到电话号码下线的风险,保证催收任务完成的准确性。In the above determination method, by judging the probability of each phone number going offline in the first time period, the number of phone numbers that may go offline during the execution period of the collection task can be pre-determined. In this way, by using phones that will not go offline The number of numbers is re-determined for the first time length, which can predict the risk of phone numbers going offline in advance to ensure the accuracy of the completion of the collection task.
从硬件实现角度来说,仿真环境可以基于元胞模型的方式确定第一时长,仿真环境中可以设置有一维元胞模型,一维元胞模型用于存储第一名单中的全部用户。图3为本发明实施例提供的一种一维元胞模型的结构示意图,如图3所示,每个元胞可以用于标识一个用户,每个元胞具有左邻元胞和/或右邻元胞,比如元胞A为元胞B的左邻元胞,元胞C为元胞B的右邻元胞。且,每个元胞可以存在三种不同的状态,状态可以由颜色来标识,例如白色用于标识未拨打状态,灰色标识用于已拨打但未接听状态,黑色用于标识已拨打且已接听状态。如此,当元胞的颜色由白色转换为灰色或黑色时,说明催收机器人110已 对该元胞对应的用户进行了拨打催收,因此,元胞在从白色状态进入黑色状态时,可以停留第一通话时长,从白色状态进入灰色状态时停留第二通话时长。如此,每使用第一模型预测出一个元胞对应的用户的用户类别后,若用户属于第一用户类别,则可以等待第一通话时长(为了节省时间,也可以按照比例设置为小于第一通话时长的值)后将该元胞的颜色由白的更新为黑色。若用户属于第二用户类别,则可以等待第二通话时长(为了节省时间,也可以按照比例设置为小于第二通话时长的值,比例与第一通话时长所使用的比例相同)后将该元胞的颜色由白的更新为灰色。且该过程可以根据可用的电话号码的数量并行执行。当一维元胞模型中各个元胞的颜色均发生变化后,统计所执行的时长,从而根据比例确定第一时长。From the perspective of hardware implementation, the simulation environment can determine the first duration based on the cell model. A one-dimensional cell model can be set in the simulation environment, and the one-dimensional cell model is used to store all users in the first list. Figure 3 is a schematic structural diagram of a one-dimensional cell model provided by an embodiment of the present invention. As shown in Figure 3, each cell can be used to identify a user, and each cell has a left neighbor cell and/or a right cell. A neighboring cell, for example, cell A is the left neighbor of cell B, and cell C is the right neighbor of cell B. Moreover, each cell can have three different states, and the state can be identified by color, for example, white is used to identify the state of not dialed, gray is used to identify the state of dialed but not answered, and black is used to identify the dialed and answered state. status. In this way, when the color of the cell changes from white to gray or black, it means that the collection robot 110 has dialed and collected the user corresponding to the cell. Therefore, when the cell changes from the white state to the black state, it can stay first. The duration of the call, the duration of the second call when the status is changed from white to gray. In this way, after using the first model to predict the user category of a user corresponding to a cell, if the user belongs to the first user category, you can wait for the first call duration (in order to save time, it can also be set to be less than the first call in proportion (The value of the duration) and then update the color of the cell from white to black. If the user belongs to the second user category, you can wait for the second call duration (in order to save time, it can also be set to a value less than the second call duration according to the ratio, and the ratio is the same as the ratio used for the first call duration). The cell color is updated from white to gray. And this process can be executed in parallel according to the number of available phone numbers. When the color of each cell in the one-dimensional cell model changes, the execution time is counted, and the first time length is determined according to the ratio.
步骤204,若所述第一时长超过设定时长,则使用第二模型确定属于所述第一用户类别的每个用户执行所述预设行为的概率,并根据所述概率确定第二名单;所述第二名单用于指示在当前时刻之后需拨打电话的用户。Step 204: If the first duration exceeds the set duration, use a second model to determine the probability of each user belonging to the first user category performing the preset behavior, and determine a second list according to the probability; The second list is used to indicate users who need to make a call after the current time.
本发明实施例中,若第一时长超过设定时长,说明催收机器人110在设定时长内无法完成对第一名单的催收任务。如此,催收机器人110可以至少针对预测结果中会接听电话的每个用户,使用第二模型确定每个用户的还款概率,并根据每个用户的还款概率对各个用户进行排序,得到第二名单,从而使得催收机器人110按照第二名单对各个用户拨打催收电话。In the embodiment of the present invention, if the first time period exceeds the set time period, it means that the collection robot 110 cannot complete the collection task of the first list within the set time period. In this way, the collection robot 110 can use the second model to determine the repayment probability of each user at least for each user who will answer the call in the prediction result, and sort the users according to the repayment probability of each user to obtain the second List, so that the collection robot 110 makes a collection call to each user according to the second list.
在一个示例中,催收机器人110可以只使用第二模型确定预测结果中会接听电话的每个用户的还款概率,然后对会接听电话的各个用户按照还款概率由大到小(或由小到大)的顺序进行排序,得到第二名单。如此,当确定无法对全部用户进行催收拨打时,催收机器人110可以只对会接听电话且还款概率高的用户进行催收拨打,而无需对不会接听电话或会接听电话但还款概率低的用户进行催收拨打,从而提高催收拨打的效果,且可以降低催收机器人110的数据处理量,提高催收效率。In one example, the collection robot 110 may only use the second model to determine the repayment probability of each user who will answer the phone in the prediction result, and then determine the repayment probability of each user who will answer the phone according to the repayment probability from large to small (or from small to small). To the largest) order to get the second list. In this way, when it is determined that it is not possible to make collection calls for all users, the collection robot 110 can only make collection calls to users who will answer the call and have a high probability of repayment, instead of calling for those who will not answer the call or who will answer the call but the probability of repayment is low. The user makes a collection call, thereby improving the effect of the collection call, and can reduce the data processing volume of the collection robot 110 and improve the collection efficiency.
在另一个示例中,催收机器人110可以使用第二模型确定第一名单中的每个用户的还款概率,并对第一名单的各个用户按照还款概率由大到小(或由小到大)的顺序进行排序,得到第二名单。如此,当确定无法对全部用户进行催收拨打时,催收机器人110可以按照还款概率由高到低的顺序对第一名单中的各个用户进行催收拨打,从而能够尽可能的拨打更多的用户,且避免遗漏预测不会接听电话但实际会接听电话的用户,从而提高催收拨打的准确性。In another example, the collection robot 110 can use the second model to determine the repayment probability of each user in the first list, and according to the repayment probability of each user in the first list from large to small (or from small to large) ) Order to get the second list. In this way, when it is determined that all users cannot be called for collection, the collection robot 110 can collect calls for each user in the first list in the descending order of the repayment probability, so as to be able to call as many users as possible. And avoid missing users who predict that they will not answer the phone but will actually answer the phone, thereby improving the accuracy of the call collection.
相应地,若第一时长未超过设定时长,说明催收机器人110在设定时长内可以完成对第一名单的催收任务。如此,催收机器人110可以继续按照第一名单中各个用户的顺序对各个用户拨打催收电话。Correspondingly, if the first time period does not exceed the set time period, it means that the collection robot 110 can complete the collection task for the first list within the set time period. In this way, the collection robot 110 can continue to make a collection call to each user in the order of each user in the first list.
在一种可能的风险场景中,虽然催收机器人110可以在设定时长内可以完成对第一名单的催收任务,但是催收机器人110在第一时长内拨打催收电话时可能会又接收到了处理第三名单的催收任务的请求。这种情况下,催收机器人110还可以基于第一模型再确定向第三名单中的全部用户拨打电话所需的第二时长。若第一时长和第二时长之和超过设定时长,说明催收机器人110无法在设定时长内完成对第一名单和第三名单中的全部用户的催收任务,因此,催收机器人110可以拒绝接收第三名单。In a possible risk scenario, although the collection robot 110 can complete the collection task for the first list within the set time period, the collection robot 110 may receive the third collection call when the collection call is made within the first time period. List of requests for collection tasks. In this case, the collection robot 110 may further determine the second time length required to make calls to all users in the third list based on the first model. If the sum of the first time length and the second time length exceeds the set time period, it means that the collection robot 110 cannot complete the collection tasks for all users in the first list and the third list within the set time period. Therefore, the collection robot 110 may refuse to accept The third list.
在上述示例中,当接收到处理第三名单的催收任务的请求时,预先判断对第一名单和第三名单中的全部用户拨打催收电话的总通话时长,并在总通话时长超过设定时长时拒绝 接收第三名单,可以避免接受无法完成的催收任务,降低客户的损失。In the above example, when a request to process the collection task of the third list is received, the total call duration for calling collection calls to all users in the first list and the third list is determined in advance, and the total call duration exceeds the set duration If you refuse to accept the third list, you can avoid accepting collection tasks that cannot be completed and reduce customer losses.
在另一种可能的风险场景中,虽然催收机器人110可以在设定时长内可以完成对第一名单的催收任务,但是催收机器人110在第一时长内可能会存在某些电话号码突然下线。这种情况下,催收机器人110可以基于总通话时长和未下线的电话号码的数量确定新的第一时长。或者,若催收机器人110已对第一名单中的部分用户拨打了催收电话,则催收机器人110可以基于第一模型确定对第一名单中未拨打催收电话的剩余用户拨打催收电话的总通话时长,然后基于总通话时长和未下线的电话号码的数量确定新的第一时长。进一步地,若新的第一时长小于设定时长,则说明催收机器人110使用未下线的电话号码无法在设定时长内完成对第一名单中的全部用户的催收任务,因此,催收机器人110可以向运维人员发送第一指示信息,以使运维人员确定是否向运营商申请备用电话号码。In another possible risk scenario, although the collection robot 110 can complete the collection task of the first list within the set time period, the collection robot 110 may suddenly go offline during the first time period. In this case, the collection robot 110 may determine the new first duration based on the total call duration and the number of phone numbers that are not offline. Alternatively, if the collection robot 110 has made a collection call to some users in the first list, the collection robot 110 may determine the total call time for making a collection call to the remaining users in the first list who have not made a collection call based on the first model. Then, a new first duration is determined based on the total call duration and the number of phone numbers that are not offline. Further, if the new first duration is less than the set duration, it means that the collection robot 110 cannot complete the collection tasks for all users in the first list within the set duration using the phone number that is not offline. Therefore, the collection robot 110 The first instruction information may be sent to the operation and maintenance personnel, so that the operation and maintenance personnel can determine whether to apply for a backup phone number from the operator.
本发明实施例中,在接收到第一名单后,通过先使用第一模型预测出第一名单中会接听电话的用户和不会接听电话的用户,并确定完成催收任务的时间,再在确定无法完成催收任务时使用第二模型确定出催收成功概率较高的用户,从而可以在确定催收任务无法完成时优先向催收成功率较高的用户拨打催收电话,提高催收效果。In the embodiment of the present invention, after receiving the first list, the first model is used to predict the users who will answer the phone and the users who will not answer the phone in the first list, and the time to complete the collection task is determined, and then the time for completing the collection task is determined. When the collection task cannot be completed, the second model is used to determine the user with a higher probability of successful collection, so that when it is determined that the collection task cannot be completed, the user with a higher success rate of collection can be called first to improve the collection effect.
上述过程描述了使用第一模型和第二模型确定催收策略的过程,下面分别描述训练得到第一模型和第二模型的过程。The above process describes the process of using the first model and the second model to determine the collection strategy. The following describes the process of training to obtain the first model and the second model, respectively.
第一模型First model
由于第一模型用于预测每个用户是否会接听电话以确定每个用户的用户类别,因此第一模型可以设置为分类模型。Since the first model is used to predict whether each user will answer the phone to determine the user category of each user, the first model can be set as a classification model.
具体实施中,催收机器人110可以先获取多个用户在各个特征下的特征值,然后针对于任一特征,根据多个用户中接听电话的用户的数量、未接听电话的用户的数量、特征的每个特征值对应的用户的数量、每个特征值对应的用户中接听电话的用户的数量和每个特征值对应的用户中未接听电话的用户的数量,确定特征与用户是否接听电话的行为的关联程度。进一步地,催收机器人110可以将与用户是否接听电话的行为的关联程度大于或等于第二预设阈值的特征作为强相关特征,然后根据多个用户中接听电话的用户的数量、未接听电话的用户的数量、强相关特征的各个特征值对应的用户的数量、强相关特征的各个特征值对应的用户中接听电话的用户的数量和强相关特征的各个特征值对应的用户中未接听电话的用户的数量,从而训练得到第一模型。In specific implementation, the collection robot 110 can first obtain the feature values of multiple users under each feature, and then for any feature, according to the number of users who answered the call, the number of users who did not answer the call, and the characteristic value of the multiple users. The number of users corresponding to each characteristic value, the number of users who answered the phone among the users corresponding to each characteristic value, and the number of users who did not answer the phone among the users corresponding to each characteristic value, determine the characteristic and the behavior of whether the user answers the phone The degree of relevance. Further, the collection robot 110 may regard the feature that has a degree of association with the user's behavior of answering the call greater than or equal to the second preset threshold as a strong correlation feature, and then according to the number of users who answered the call among the multiple users and the number of users who did not answer the call. The number of users, the number of users corresponding to each feature value of the strong correlation feature, the number of users who answer the phone among the users corresponding to each feature value of the strong correlation feature, and the number of users who have not answered the phone corresponding to each feature value of the strong correlation feature The number of users, and thus the first model is trained.
为了便于理解,下面举一个具体的示例描述第一模型的训练过程。在该示例中,第一模型基于朴素贝叶斯算法训练得到。由于朴素贝叶斯算法能够实时地根据增量数据更新模型参数,因此基于朴素贝叶斯算法训练第一模型可以提高训练和更新的效率。To facilitate understanding, a specific example is given below to describe the training process of the first model. In this example, the first model is trained based on the Naive Bayes algorithm. Since the naive Bayes algorithm can update the model parameters based on incremental data in real time, training the first model based on the naive Bayes algorithm can improve the efficiency of training and update.
具体实施中,可以先获取催收机器人110在历史时段中拨打过催收电话的多个(比如20000个)用户的数据。其中,每个用户的数据可以包括用户在各个特征下的值,比如用户的性别、年龄、学历、职业、婚姻状态、常住城市、本次贷款数额、本次欠款数额、本次贷款逾期天数、历史贷款次数和历史贷款逾期次数等,还包括向该用户拨打催收电话时用户是否接听了催收电话的类别特征值。显然地,由于上述各个特征中包括连续特征和离散特征,导致上述的各个特征无法用一个统一的评判标准进行统一化数据。因此,在一个示例中,针对于各个特征中的任一特征,若该特征属于离散特征,则催收机器人110可以统计多个用户在该特征下的各个值,并将各个值作为该特征的各个特征值。若该特征属于连续特征,则催收机器人110可以统计多个用户在该特征下的取值范围,并将取值范围划 分为多个取值范围区间,为每个取值范围区间设置一个对应的特征值,从而得到该特征的各个特征值。如此,通过对连续特征的取值进行离散,可以使得各个特征(包括连续特征和离散特征)具有相同的离散的表现形式,从而在训练模型时可以使用各个离散的特征值作为训练数据,而无需对连续特征拟合概率分布函数,从而可以提高数据处理的效率。In a specific implementation, the data of multiple (for example, 20000) users who have made a collection call by the collection robot 110 in the historical period can be obtained first. Among them, the data of each user can include the value of the user under various characteristics, such as the user's gender, age, education, occupation, marital status, resident city, the amount of this loan, the amount of arrears, and the number of days overdue for this loan. , The number of historical loans and the number of historical loan overdues, etc., and also include the category characteristic value of whether the user has answered the collection call when the user makes a collection call. Obviously, since the above-mentioned various features include continuous features and discrete features, the above-mentioned various features cannot use a unified evaluation standard to unify data. Therefore, in an example, for any one of the features, if the feature is a discrete feature, the collection robot 110 can count the values of multiple users under the feature, and use each value as each of the feature. Eigenvalues. If the feature is a continuous feature, the collection robot 110 can count the value ranges of multiple users under the feature, divide the value range into multiple value range intervals, and set a corresponding value range for each value range interval. Feature value, and get each feature value of the feature. In this way, by discretizing the values of continuous features, each feature (including continuous features and discrete features) can have the same discrete manifestation, so that each discrete feature value can be used as training data when training the model without Fitting the probability distribution function to continuous features can improve the efficiency of data processing.
举例来说,由于上述所述的性别、学历、职业、婚姻状态和常住城市的取值均为固定多个,因此这些特征为离散特征,用户在这些离散特征下的各个值即为这些离散特征的各个特征值。相应地,年龄、本次贷款数额、本次欠款数额、本次贷款逾期天数、历史贷款次数和历史贷款逾期次数的取值均为无限多个,因此这些特征为连续特征,如此,可以将这些连续特征中的连续的取值调整为离散取值。比如,将年龄特征离散化为特征值1、特征值2、……、特征值7,特征值1至特征值7依次代表年龄(单位为岁)位于以下7个年龄区间:[0,15)、[15,25)、[25,35),[35,45),[45,55),[55,65),[65,∞)。将本次贷款数额特征离散化为特征值1、特征值2、……、特征值5,特征值1至特征值5依次代表贷款数额(单位为万元)位于以下5个贷款数额区间:[0,0.5),[0.5,1.5),[1.5,3.5),[3.5,5),[5,∞)。将本次欠款数额特征离散化为特征值1、特征值2、……、特征值5,特征值1至特征值5依次代表欠款数额(单位为万元)位于以下5个欠款数额区间:[0,0.5),[0.5,1.5),[1.5,3.5),[3.5,5),[5,∞)。将本次贷款逾期天数特征离散化为特征值1、特征值2、……、特征值5,特征值1至特征值5依次代表逾期天数(单位为天)位于以下5个逾期天数区间:[0,1),[1,3),[3,5),[5,7),[7,∞)。将历史贷款次数特征离散化为特征值1、特征值2、……、特征值5,特征值1至特征值5依次代表历史贷款次数(单位为次)位于以下5个历史贷款次数区间:[0,1),[1,2),[2,3),[3,5),[5,∞)。将历史贷款逾期次数特征离散化特征值1、特征值2、……、特征值5,特征值1至特征值5依次代表历史贷款逾期次数(单位为次)位于以下5个历史贷款逾期次数区间:[0,1),[1,2),[2,3),[3,5),[5,∞)。For example, since the values of gender, education, occupation, marital status, and resident city mentioned above are fixed and multiple, these features are discrete features, and the values of users under these discrete features are these discrete features. Each characteristic value of. Correspondingly, the values of age, the amount of this loan, the amount of arrears, the number of days that this loan is overdue, the number of historical loans, and the number of historical loan overdues are all infinitely many, so these characteristics are continuous characteristics. The continuous values of these continuous features are adjusted to discrete values. For example, the age feature is discretized into feature value 1, feature value 2,..., feature value 7, and feature value 1 to feature value 7 in turn represent age (in years) in the following 7 age ranges: [0, 15) , [15, 25), [25, 35), [35, 45), [45, 55), [55, 65), [65, ∞). Discretize the characteristics of the loan amount into characteristic value 1, characteristic value 2, ..., characteristic value 5. Characteristic value 1 to characteristic value 5 represent the loan amount (unit: 10,000 yuan) in the following 5 loan amount ranges: [ 0, 0.5), [0.5, 1.5), [1.5, 3.5), [3.5, 5), [5, ∞). Discretize the characteristics of the amount of arrears into characteristic value 1, characteristic value 2, ..., characteristic value 5. Characteristic value 1 to characteristic value 5 represent the amount of arrears (unit: 10,000 yuan) in the following 5 amounts of arrears Interval: [0, 0.5), [0.5, 1.5), [1.5, 3.5), [3.5, 5), [5, ∞). Discretize the characteristics of the overdue days of this loan into feature value 1, feature value 2, ..., feature value 5. Feature value 1 to feature value 5 represent the number of overdue days (in days) in the following 5 overdue days interval: [ 0, 1), [1, 3), [3, 5), [5, 7), [7, ∞). Discretize the characteristics of historical loan times into characteristic value 1, characteristic value 2, ..., characteristic value 5. Characteristic value 1 to characteristic value 5 represent the historical loan times (unit: times) in the following 5 historical loan times ranges: [ 0, 1), [1, 2), [2, 3), [3, 5), [5, ∞). Discretize the feature value 1, feature value 2,..., feature value 5, feature value 1 to feature value 5 representing the number of historical loan overdue times (unit is times) in the following 5 historical loan overdue frequency ranges : [0, 1), [1, 2), [2, 3), [3, 5), [5, ∞).
进一步地,针对于各个特征中的任一特征,可以计算该与类别特征之间的关联程度。关联程度可以由互信息表示。互信息是指一个随机变量包含另一个随机变量的信息的度量,互信息的值越大,表示这两个随机变量之间的耦合性越强,关联程度越大。其中,每个特征与用户是否接听电话的行为的类别特征的互信息可以满足如下条件:Further, for any one of the various features, the degree of association between the feature and the category feature can be calculated. The degree of association can be represented by mutual information. Mutual information refers to the measurement of information that a random variable contains another random variable. The greater the value of mutual information, the stronger the coupling between the two random variables and the greater the degree of association. Among them, the mutual information of each feature and the category feature of whether the user answers the phone or not can meet the following conditions:
Figure PCTCN2020129121-appb-000003
Figure PCTCN2020129121-appb-000003
其中,X为任一特征,R(X)为X特征的特征值集合,包括X特征的各个特征值,x为特征X的任一特征值;Y为用户是否接听电话的行为,R(Y)为用户是否接听电话的行为集合,包括用户接听电话的行为和用户未接听电话的行为,y为用户接听电话的行为或用户未接听电话的行为;I(X,Y)为特征X与用户是否接听电话的行为的关联程度,P(x,y)为特征值x对应的用户中执行了y行为的用户的数量占用户总数量的比例,P(x)为特征值x对应的用户占用户总数量的比例,P(y)为执行了y行为的用户的数量占用户总数量的比例。Among them, X is any feature, R(X) is the feature value set of X feature, including each feature value of X feature, x is any feature value of feature X; Y is the behavior of whether the user answers the phone, R(Y ) Is the behavior set of whether the user answers the phone, including the behavior of the user answering the phone and the behavior of the user not answering the phone, y is the behavior of the user answering the phone or the behavior of the user not answering the phone; I(X, Y) is the feature X and the user The degree of association of the behavior of answering the phone, P(x,y) is the ratio of the number of users who have performed the behavior y among the users corresponding to the characteristic value x to the total number of users, and P(x) is the proportion of the users corresponding to the characteristic value x The ratio of the total number of users, P(y) is the ratio of the number of users who have performed behavior y to the total number of users.
以年龄特征为例,随机变量X表示年龄特征,随机变量Y表示电话是否被接听的类别特征,R(X)表示随机变量X的值域,由于年龄特征的各个特征值为特征值1~特征值7,因此R(X)={1,2,3,4,5,6,7},R(Y)表示随机变量Y的值域,由于电话是否被接听的类别特征的各个特征值为是或否,因此R(Y)={是,否}。针对于随机变量X的值域R(X)中的任一 特征值(即x),P(x)表示年龄特征的特征值为x的用户数量占20000个用户数量的比例,针对于随机变量Y的值域R(Y)中的任一特征值(即y),P(y)表示类别特征的特征值为y的用户数量占20000个用户数量的比例,P(x,y)表示年龄特征的特征值为x且类别特征的特征值为y的用户数量占20000个用户数量的比例。Take the age feature as an example. The random variable X represents the age feature, the random variable Y represents the categorical feature of whether the phone is answered, R(X) represents the range of the random variable X, because each feature value of the age feature is feature value 1~feature The value is 7, so R(X)={1,2,3,4,5,6,7}, R(Y) represents the value range of the random variable Y, because each feature value of the category feature of whether the call is answered is Yes or No, so R(Y)={Yes, No}. For any feature value (ie x) in the range R(X) of the random variable X, P(x) represents the proportion of the number of users whose age feature is x to the number of 20,000 users. For the random variable Any feature value (ie y) in the value range R(Y) of Y, P(y) represents the proportion of the number of users whose feature value of the category feature is y to the number of 20,000 users, and P(x, y) represents age The ratio of the number of users whose feature value is x and the feature value of category feature is y to the number of 20,000 users.
在上述实现方式中,通过使用某一特征的每个特征值与接听电话的行为相关的概率得到每个特征与接听电话的行为相关的关联程度,使得该关联程度综合了各个特征值的相关信息,由于使用的信息更为丰富,从而可以使得关联程度更为准确。In the above-mentioned implementation, by using the probability that each feature value of a certain feature is related to the behavior of answering the phone, the degree of association between each feature and the behavior of answering the phone is obtained, so that the degree of association integrates the relevant information of each feature value. , As the information used is richer, the degree of association can be made more accurate.
当确定出每个特征与用户是否接听电话的类别特征的互信息后,可以将互信息大于第三预设阈值的特征作为强相关特征。其中,第三预设阈值可以由本领域技术人员根据经验进行设置,比如可以为0.5,也可以为0.8,具体不作限定。When the mutual information between each feature and the category feature of whether the user answers the call is determined, the feature whose mutual information is greater than the third preset threshold may be used as a strong correlation feature. Wherein, the third preset threshold can be set by those skilled in the art based on experience, for example, it can be 0.5 or 0.8, which is not specifically limited.
为了便于理解,假设强相关特征包括X 1,X 2,X 3,…,X nFor ease of understanding, it is assumed that the strongly correlated features include X 1 , X 2 , X 3 ,..., X n .
进一步地,本发明实施例可以基于朴素贝叶斯使用20000个用户在强相关特征下的特征值训练得到第一模型,具体地说,针对于各个强相关特征的每个特征值(比如特征值组合为x 1、x 2、x 3、……、x n,分别为强相关特征X 1、强相关特征X 2、强相关特征X 3、……、强相关特征X n的某个特征值)组合得到的样本数据,该样本数据是否会接听电话的类别
Figure PCTCN2020129121-appb-000004
的取值可以为:
Further, the embodiment of the present invention may be based on Naive Bayes using the feature value training of 20,000 users under strong correlation features to obtain the first model, specifically, for each feature value of each strong correlation feature (such as feature value The combination is x 1 , x 2 , x 3 ,..., x n , which are respectively a certain feature value of strong correlation feature X 1 , strong correlation feature X 2 , strong correlation feature X 3 ,..., strong correlation feature X n ) The sample data obtained by combining, the type of whether the sample data will answer the call
Figure PCTCN2020129121-appb-000004
The value of can be:
Figure PCTCN2020129121-appb-000005
Figure PCTCN2020129121-appb-000005
其中,P(x i|y)为后验概率,x i为特征值组合为x 1、x 2、x 3、……、x n得到的样本数据。 Among them, P(x i |y) is the posterior probability, and x i is the sample data obtained by the combination of eigenvalues as x 1 , x 2 , x 3 ,..., x n.
基于概率学公式,P(x i|y)可以表示为: Based on the probability formula, P(x i |y) can be expressed as:
Figure PCTCN2020129121-appb-000006
Figure PCTCN2020129121-appb-000006
当不考虑分母时,上式可以简化为:When the denominator is not considered, the above formula can be simplified to:
Figure PCTCN2020129121-appb-000007
Figure PCTCN2020129121-appb-000007
由于某些特征值对应的样本数量可能为0,因此,为了避免计算过程中出现分母为0的情况,可以基于拉普拉斯平滑算法将上述公式中的P(y)和P(xi|y)改写为:Since the number of samples corresponding to some eigenvalues may be 0, in order to avoid the situation where the denominator is 0 during the calculation process, the P(y) and P(xi|y ) Is rewritten as:
P(y)=(N y+1)/(N+2) P(y)=(N y +1)/(N+2)
P(x i│y)=(N y,xi+1)/(N y+L xi) P(x i │y)=(N y,xi +1)/(N y +L xi )
其中,N为20000个用户的数量,N y为类别特征的特征值为y的用户的数量,N y,xi为类别特征的特征值为y且特征X i的特征值为xi的用户的数量,L xi为特征X i的值域的大小,即特征值x i可能取值的数量。 Wherein, N is the number of 20,000 users, the number N y wherein y class feature value of a user, N y, wherein xi is the value of y class feature and characteristic features of the X i is the number of users xi , L xi is the size range characteristic of X i, i.e. the number of feature values possible values x i.
如此,第一模型可以由上述各个公式标识,当预测任一用户在行为特征下的值时,可以使用如下公式确定该用户在行为特征下的特征值为是的概率和该用户在行为特征下的特征值为否的概率:In this way, the first model can be identified by the above formulas. When predicting the value of any user under the behavioral characteristics, the following formula can be used to determine the probability that the user’s characteristic value under the behavioral characteristics is yes and the user’s behavioral characteristics under the The probability that the characteristic value of is No:
Figure PCTCN2020129121-appb-000008
Figure PCTCN2020129121-appb-000008
若该用户在行为特征下的特征值为是的概率大于该用户在行为特征下的特征值为否的概率,则确定该用户为会接听电话的用户,该用户的用户类别为第一用户类别。若该用户在行为特征下的特征值为否的概率大于该用户在行为特征下的特征值为是的概率,则确定该用户为不会接听电话的用户,该用户的用户类别为第二用户类别。If the probability that the characteristic value of the user under the behavior characteristics is yes is greater than the probability that the characteristic value of the user under the behavior characteristics is no, then it is determined that the user is a user who can answer the phone, and the user category of the user is the first user category . If the probability that the characteristic value of the user under the behavior characteristics is No is greater than the probability that the characteristic value of the user is yes under the behavior characteristics, the user is determined to be a user who will not answer the phone, and the user category of the user is the second user category.
在一个示例中,在使用新的数据更新第一模型时,可以先根据新的数据中的全部用户在各个连续特征(即年龄,本次贷款数额,本次欠款数额,本次贷款逾期天数,历史贷款次数和历史贷款逾期次数)下的值对各个连续特征进行离散化,然后统计新的数据中接听电话的用户的数量和未接听电话的用户的数量,并以此更新第一模型对应的公式中的N y以及N y,xi,进而基于更新后的N y以及N y,xi更新P y与P(x i│y),从而完成对这第一模型的更新。 In an example, when using new data to update the first model, you can first based on the continuous characteristics of all users in the new data (ie age, the amount of this loan, the amount of arrears this time, the number of days overdue for this loan) , Historical loan times and historical loan overdue times) discretize each continuous feature, and then count the number of users who answered the phone and the number of users who did not answer the phone in the new data, and update the first model accordingly The N y and N y,xi in the formula of, then update P y and P(x i │y) based on the updated N y and N y,xi to complete the update of the first model.
显然地,通过设置第一模型为分类模型,且由各个公式表示第一模型,使得第一模型能够快速且实时的完成更新,从而第一模型的更新效率较好。且,通过确定每个特征与接听电话的行为的关联程度,可以仅基于关联程度较高的特征训练得到第一模型,如此,参与训练的数据量较少,训练模型的效率较高。此外,由于使用的训练数据更集中在与接听电话的行为强相关的特征数据上,因此第一模型的训练过程更为聚合,模型效果更好。Obviously, by setting the first model as a classification model and expressing the first model by various formulas, the first model can be updated quickly and in real time, so that the update efficiency of the first model is better. Moreover, by determining the degree of association between each feature and the behavior of answering the phone, the first model can be trained based on only the features with a higher degree of association. In this way, the amount of data involved in training is less, and the efficiency of training the model is higher. In addition, since the training data used is more concentrated on the feature data that is strongly related to the behavior of answering the phone, the training process of the first model is more aggregated, and the model effect is better.
第二模型Second model
本发明实施例中,第二模型可以为神经网络模型。In the embodiment of the present invention, the second model may be a neural network model.
具体实施中,获取多个用户在各个特征下的特征值,每个特征的各个特征值由各个数值标识。针对于任一用户,根据该用户在每个特征下的特征值和每个特征的各个特征值构建得到该用户在每个特征下的特征向量,拼接该用户在各个特征下的特征向量,得到该用户对应的第一特征向量。根据该用户在是否还款的类别特征下的特征值值得到该用户对应的第二特征向量。进一步地,可以将多个用户对应的第一特征向量作为模型输入,得到多个用户还款的预测向量结果,并基于多个用户的第二特征向量和多个用户还款的预测向量结果调整第二模型的模型参数,得到优化的第二模型。In specific implementation, the feature values of multiple users under each feature are acquired, and each feature value of each feature is identified by each numerical value. For any user, construct the feature vector of the user under each feature according to the feature value of the user under each feature and the feature value of each feature, and stitch the feature vector of the user under each feature to obtain The first feature vector corresponding to the user. The second feature vector corresponding to the user is obtained according to the feature value value of the user under the category feature of whether to repay. Further, the first feature vector corresponding to multiple users can be used as the model input to obtain the prediction vector results of multiple user repayments, and adjust based on the second feature vectors of multiple users and the prediction vector results of multiple user repayments The model parameters of the second model are used to obtain the optimized second model.
为了便于理解,下面举一个具体的示例描述第二模型的训练过程。在该示例中,第二模型可以包括输入层、隐含层和输出层,输入层、隐含层和输出层采用全连接结构,隐含层可以设置10个神经元节点,输出层可以设置2个神经元节点,隐含层的激活函数采用ReLU函数,输出层的激活函数采用Softmax函数,表示用户还款的概率值。For ease of understanding, a specific example is given below to describe the training process of the second model. In this example, the second model can include an input layer, a hidden layer, and an output layer. The input layer, hidden layer, and output layer adopt a fully connected structure. The hidden layer can be set with 10 neuron nodes, and the output layer can be set with 2. A neuron node, the activation function of the hidden layer uses the ReLU function, and the activation function of the output layer uses the Softmax function, which represents the probability value of the user's repayment.
具体实施中,可以先获取催收机器人110在历史时段中拨打过催收电话的多个(比如50000个)用户的数据,每个用户的数据可以包括该用户在各个特征下的值,比如该用户的性别、年龄、学历、职业、婚姻状态、常住城市、本次贷款数额、本次欠款数额、本次贷款逾期天数、历史贷款次数和历史贷款逾期次数等,还可以包括向该用户拨打催收电话时用户是否还款的类别特征下的值。其中,用于训练第二模型的用户与用于训练第一模型的用户可以部分相同,也可以完全不同,具体不作限定。In specific implementation, the data of multiple (for example, 50,000) users who have made a collection call by the collection robot 110 in the historical period can be obtained first. The data of each user can include the value of the user under various characteristics, such as the user's data. Gender, age, education, occupation, marital status, city of residence, the amount of this loan, the amount of this loan, the number of days that this loan is overdue, the number of historical loans and the number of historical loan overdue, etc., and can also include a collection call to the user The value under the category feature of whether the user repays at the time. Wherein, the user used to train the second model and the user used to train the first model may be partially the same or completely different, which is not specifically limited.
进一步地,可以按照训练第一模型时的离散方法对各个连续特征进行离散化,然后使用one-hot编码将每个特征的各个特征值转化为数值形式。举例来说,由于性别特征存在有2个特征值(男,女),因此one-hot编码可以将性别特征的2个特征值转化为1行2列的向量, 若某一用户的性别为男,则该用户在性别特征下的特征向量为(1,0)。由于婚姻状态特征存在有4个特征值(未婚,已婚,丧偶,离婚),因此one-hot编码可以将婚姻状态特征的4个特征值转化为1行4列的向量。若某一用户的婚姻状态为丧偶,则该用户在婚姻状态特征下的特征向量为(0,0,1,0)。相应地,one-hot编码可以将学历特征的11个特征值(小学,初中,高中,中专,职校,中技,专科,本科,硕士研究生,博士研究生,博士后)转化为1行11列的向量,将职业特征的13个特征值(农林牧渔水利业,工业,地质普查和勘探业,建筑业,交通运输业、邮电通信业,商业、公共饮食业、物资供应和仓储业,房地产管理、公用事业、居民服务和咨询服务业,卫生、体育和社会福利事业,教育、文化艺术和广播电视业,科学研究和综合技术服务业,金融、保险业,国家机关、党政机关和社会团体,其他行业)转化为1行13列的向量,将常住城市特征的338个特征值(337个主要城市,其他城市)转化为1行338列的向量,将年龄特征的7个特征值转化为1行7列的向量,将本次贷款数额特征的5个特征值转化为1行5列的向量,将本次欠款数额特征的5个特征值转化为1行5列的向量,将本次贷款逾期天数特征的5个特征值转化为1行5列的向量,将历史贷款次数特征的5个特征值转化为1行5列的向量,将历史贷款逾期次数特征的5个特征值转化为1行5列的向量。Further, each continuous feature can be discretized according to the discrete method used when training the first model, and then one-hot encoding is used to convert each feature value of each feature into a numerical form. For example, since there are two feature values (male and female) for gender features, one-hot encoding can convert the two feature values of gender features into a vector with 1 row and 2 columns. If the gender of a user is male , Then the feature vector of the user under the gender feature is (1, 0). Since the marital status feature has 4 feature values (unmarried, married, widowed, divorced), one-hot encoding can convert the 4 feature values of the marital status feature into a vector with 1 row and 4 columns. If the marital status of a user is widowed, the feature vector of the user under the marital status feature is (0, 0, 1, 0). Correspondingly, one-hot encoding can transform the 11 feature values of academic features (primary school, junior high school, high school, technical secondary school, vocational school, technical school, junior college, undergraduate, master graduate, doctoral student, postdoctoral) into 1 row and 11 columns The vector of occupational characteristics (agriculture, forestry, animal husbandry, fishery, water conservancy, industry, geological survey and exploration, construction, transportation, post and telecommunications, commerce, public catering, material supply and storage, real estate management , Public utilities, resident services and consulting services, health, sports and social welfare, education, culture and art, radio and television, scientific research and comprehensive technical services, finance, insurance, state agencies, party and government agencies, and social organizations , Other industries) is transformed into a 1-row and 13-column vector, the 338 feature values of resident city features (337 major cities, other cities) are transformed into a 1-row and 338-column vector, and the 7 feature values of age features are converted into A vector with 1 row and 7 columns converts the 5 feature values of the loan amount feature into a vector with 1 row and 5 columns, and converts the 5 feature values of the loan amount feature this time into a vector with 1 row and 5 columns. Convert the 5 feature values of the feature of overdue days of the sub-loan into a vector with 1 row and 5 columns, convert the 5 feature values of the historical loan frequency feature into a vector of 1 row and 5 columns, and convert the 5 feature value feature of the historical loan overdue feature It is a vector with 1 row and 5 columns.
如此,针对于任一用户,可以先根据该用户在每个特征下的特征值确定该用户在每个特征下的特征向量,然后再将该用户在各个特征下的特征向量首尾拼接,得到该用户对应的第一特征向量。根据上述分析可以,用户对应的第一特征向量可以为1行400列的一维向量。相应地,根据该用户在是否还款的类别特征下的特征值确定用户对应的第二特征向量,用户对应的第二特征向量可以为1行2列的一维向量。比如若用户已还款,则该用户对应的第二特征向量可以为[1,0],若用户未还款,则该用户对应的第二特征向量可以为[0,1]。进一步地,在得到50000个用户对应的特征向量(包括第一特征向量和第二特征向量)后,可以将这50000个特征向量划分为训练特征向量、测试特征向量和验证特征向。其中,划分时可以按照随机比例进行划分,或者也可以按照预设比例划分,不作限定。假设将这50000个特征向量划分为35000个训练特征向量、10000个测试特征向量和5000个验证特征向量,则可以将35000个训练特征向量中的第一特征向量输入神经网络模型,以使神经网络模型输出35000个第二预测特征向量,然后基于这35000个第二预测特征向量和35000个训练特征向量中的第二特征向量调整神经网络模型的参数,得到第二模型。相应地,10000个测试特征向量可以用于测试第二模型的模型效果,5000个验证特征向量可以用于验证第二模型的测试效果是否达到预设效果,10000个测试特征向量和5000个验证特征向量也可以用于优化第二模型的模型参数。In this way, for any user, the feature vector of the user under each feature can be determined according to the feature value of the user under each feature, and then the feature vector of the user under each feature can be spliced head to tail to obtain the The first feature vector corresponding to the user. According to the above analysis, the first feature vector corresponding to the user can be a one-dimensional vector with 1 row and 400 columns. Correspondingly, the second feature vector corresponding to the user is determined according to the feature value of the user under the category feature of whether to repay, and the second feature vector corresponding to the user may be a one-dimensional vector with 1 row and 2 columns. For example, if the user has repaid, the second feature vector corresponding to the user may be [1, 0], and if the user has not repaid, the second feature vector corresponding to the user may be [0, 1]. Further, after the feature vectors corresponding to 50,000 users (including the first feature vector and the second feature vector) are obtained, the 50,000 feature vectors can be divided into training feature vectors, test feature vectors, and verification feature directions. Among them, the division can be divided according to a random ratio, or can also be divided according to a preset ratio, without limitation. Assuming that these 50,000 feature vectors are divided into 35,000 training feature vectors, 10,000 test feature vectors, and 5000 verification feature vectors, the first feature vector of the 35,000 training feature vectors can be input to the neural network model to make the neural network The model outputs 35,000 second prediction feature vectors, and then adjusts the parameters of the neural network model based on the 35,000 second prediction feature vectors and the second feature vector of the 35,000 training feature vectors to obtain the second model. Correspondingly, 10,000 test feature vectors can be used to test the model effect of the second model, 5000 verification feature vectors can be used to verify whether the test effect of the second model reaches the preset effect, 10,000 test feature vectors and 5000 verification features The vector can also be used to optimize the model parameters of the second model.
本发明实施例中,通过确定用户在每个特征下的特征向量,并拼接用户在各个特征下的特征向量值得到用户的特征向量,使得用户的特征向量能够综合每个特征的各个特征值的特征信息,信息更为全面,且表现形式更为简洁,如此,基于信息丰富且形式简洁的模型输入训练的得到的模型的效果更好,训练效率更高。In the embodiment of the present invention, the user's feature vector is obtained by determining the user's feature vector under each feature, and joining the user's feature vector value under each feature to obtain the user's feature vector, so that the user's feature vector can integrate the characteristics of each feature value. Feature information, the information is more comprehensive, and the form of expression is more concise. In this way, the model obtained based on the model input training with rich information and concise form has better effect and higher training efficiency.
本发明的上述实施例中,先获取第一名单,使用第一模型确定第一名单中各用户的用户类别,再统计第一名单中属于各个用户类别的用户的数量,并基于数量确定第一时长,若第一时长超过设定时长,则使用第二模型确定属于第一用户类别的每个用户执行预设行为的概率,并根据该概率确定第二名单。其中,第一名单中包括多个未执行预设行为的用户,各用户的用户类别中包括第一用户类别,第一用户类别表征用户会接听催收系统拨打 的电话,第一时长表征向第一名单中的全部用户拨打电话所需的时长,第二名单用于指示在当前时刻之后需拨打电话的用户。本发明实施例中,在接收到第一名单后,通过先使用第一模型预测出第一名单中的各用户是否会接听电话,并确定完成催收任务的时间,再在确定无法完成催收任务时使用第二模型确定出催收成功概率较高的用户,从而可以在确定催收任务无法完成时优先向催收成功率较高的用户拨打催收电话,提高催收效果。In the above embodiment of the present invention, the first list is first obtained, the user category of each user in the first list is determined using the first model, and then the number of users belonging to each user category in the first list is counted, and the first list is determined based on the number. Duration, if the first duration exceeds the set duration, the second model is used to determine the probability of each user belonging to the first user category performing the preset behavior, and the second list is determined according to the probability. Among them, the first list includes multiple users who have not performed the preset behavior, and the user category of each user includes the first user category. The first user category indicates that the user will answer the call made by the collection system, and the first duration indicates that the user will receive calls from the collection system. The time required for all users in the list to make calls, and the second list is used to indicate the users who need to make calls after the current time. In the embodiment of the present invention, after receiving the first list, the first model is used to predict whether each user in the first list will answer the call, and the time to complete the collection task is determined, and then when it is determined that the collection task cannot be completed The second model is used to identify users with a higher probability of successful collection, so that when it is determined that the collection task cannot be completed, users with a higher success rate of collection can be given priority to call collection calls to improve the collection effect.
针对上述方法流程,本发明实施例还提供一种数据处理装置,该装置的具体内容可以参照上述方法实施。In view of the foregoing method flow, an embodiment of the present invention also provides a data processing device, and the specific content of the device can be implemented with reference to the foregoing method.
图4为本发明实施例提供的一种数据处理装置的结构示意图,包括:Fig. 4 is a schematic structural diagram of a data processing device provided by an embodiment of the present invention, including:
获取模块401,用于获取第一名单;所述第一名单中包括多个未执行预设行为的用户;The obtaining module 401 is configured to obtain a first list; the first list includes multiple users who have not performed a preset behavior;
确定模块402,用于使用第一模型确定第一名单中各用户的用户类别,其中,所述各用户的用户类别中包括第一用户类别,所述第一用户类别表征用户会接听所述催收系统拨打的电话;The determining module 402 is configured to determine the user category of each user in the first list using a first model, wherein the user category of each user includes a first user category, and the first user category indicates that the user will answer the collection Phone calls made by the system;
处理模块403,用于统计所述第一名单中属于各个用户类别的用户的数量,并基于所述数量确定第一时长,所述第一时长表征向所述第一名单中的全部用户拨打电话所需的时长;若所述第一时长超过设定时长,则使用第二模型确定属于所述第一用户类别的每个用户执行所述预设行为的概率,并根据所述概率确定第二名单;所述第二名单用于指示在当前时刻之后需拨打电话的用户。The processing module 403 is configured to count the number of users belonging to each user category in the first list, and determine a first duration based on the number, and the first duration represents making calls to all users in the first list The required duration; if the first duration exceeds the set duration, the second model is used to determine the probability of each user belonging to the first user category performing the preset behavior, and the second is determined according to the probability List; the second list is used to indicate users who need to make a call after the current moment.
可选地,所述各用户的用户类别中还包括第二用户类别,所述第二用户类别表征用户不会接听所述催收系统拨打的电话。这种情况下,所述获取模块401还可以获取第一用户类别对应的第一通话时长和第二用户类别对应的第二通话时长。其中,所述第一通话时长是根据历史时段内向接听电话的各个用户拨打电话的通话时长确定的,所述第二通话时长是根据向用户拨打电话后等待接听的通话时长确定的。所述确定模块402可以根据所述第一名单中属于第一用户类别的用户的数量和所述第一通话时长、所述第一名单中属于第二用户类别的用户的数量和所述第二通话时长,确定向所述第一名单中的全部用户拨打电话的总通话时长;基于所述总通话时长和可用的电话号码的数量,确定所述第一时长。Optionally, the user category of each user further includes a second user category, and the second user category represents that the user will not answer calls made by the collection system. In this case, the acquiring module 401 may also acquire the first call duration corresponding to the first user category and the second call duration corresponding to the second user category. Wherein, the first call duration is determined according to the call duration of each user who answered the call in the historical time period, and the second call duration is determined according to the call duration waiting to be answered after the call is made to the user. The determining module 402 may be based on the number of users belonging to the first user category in the first list and the first call duration, the number of users belonging to the second user category in the first list, and the second user category. The call duration determines the total call duration for making calls to all users in the first list; the first duration is determined based on the total call duration and the number of available phone numbers.
可选地,所述确定模块402通过如下方式确定所述可用的电话号码:针对预先在运营商申请的多个电话号码,基于所述总通话时长和所述多个电话号码的数量,得到预测时长,确定所述多个电话号码在所述预测时长内下线的概率,将概率不大于第一预设阈值的电话号码作为所述可用的电话号码。Optionally, the determining module 402 determines the available phone number in the following manner: for a plurality of phone numbers previously applied for by an operator, a prediction is obtained based on the total call duration and the number of the plurality of phone numbers Time length, determining the probability of the multiple phone numbers going offline within the predicted time length, and using a phone number with a probability not greater than a first preset threshold as the available phone number.
可选地,所述装置还包括拨打模块404,在所述确定模块402使用第一模型确定所述第一名单中的各用户的用户类别的同时,所述拨打模块404可以根据所述第一名单中各用户的联系方式,使用所述可用的电话号码向所述各用户拨打电话。Optionally, the device further includes a dialing module 404. While the determining module 402 uses a first model to determine the user category of each user in the first list, the dialing module 404 can use the first model to determine the user category of each user in the first list. The contact information of each user in the list, using the available phone number to make a call to each user.
可选地,在所述第一时长未超过所述设定时长时,若在所述第一时长内接收到处理第三名单的请求消息,则所述处理模块403还可以基于所述第一模型确定向所述第三名单中的全部用户拨打电话所需的第二时长。若所述第一时长和所述第二时长之和超过所述设定时长,则所述处理模块403还可以拒绝接收所述第三名单。Optionally, when the first duration does not exceed the set duration, if a request message for processing the third list is received within the first duration, the processing module 403 may also be based on the first duration. The model determines the second length of time required to make calls to all users in the third list. If the sum of the first duration and the second duration exceeds the set duration, the processing module 403 may also refuse to receive the third list.
可选地,所述第一模型为分类模型。这种情况下,所述处理模块403还可以获取多个用户在各个特征下的特征值,针对于任一特征,根据所述多个用户中接听电话的用户的数量、未接听电话的用户的数量、所述特征的每个特征值对应的用户的数量、每个特征值对应的用户中接听电话的用户的数量和每个特征值对应的用户中未接听电话的用户的数量, 确定所述特征与用户是否接听电话的行为的关联程度,将与用户是否接听电话的行为的关联程度大于或等于第二预设阈值的特征作为强相关特征,根据所述多个用户中接听电话的用户的数量、未接听电话的用户的数量、所述强相关特征的各个特征值对应的用户的数量、所述强相关特征的各个特征值对应的用户中接听电话的用户的数量和所述强相关特征的各个特征值对应的用户中未接听电话的用户的数量,训练得到所述第一模型。Optionally, the first model is a classification model. In this case, the processing module 403 can also obtain the feature values of multiple users under each feature, and for any feature, according to the number of users who answered the call among the multiple users, and the number of users who did not answer the call. The number, the number of users corresponding to each characteristic value of the characteristic, the number of users answering the phone among the users corresponding to each characteristic value, and the number of users who have not answered the phone among the users corresponding to each characteristic value, determine the The degree of association between the feature and the behavior of whether the user answers the phone, and the feature with the degree of association with the behavior of whether the user answers the phone is greater than or equal to the second preset threshold as a strong correlation feature, based on the characteristics of the user who answers the phone among the multiple users The number, the number of users who have not answered the call, the number of users corresponding to each feature value of the strong correlation feature, the number of users who answer the phone among the users corresponding to each feature value of the strong correlation feature, and the strong correlation feature The first model is obtained by training the number of users who have not answered the phone corresponding to each feature value of.
可选地,每个特征与用户是否接听电话的行为的关联程度满足如下条件:Optionally, the degree of association between each feature and whether the user answers the call satisfies the following conditions:
Figure PCTCN2020129121-appb-000009
Figure PCTCN2020129121-appb-000009
其中,X为任一特征,R(X)为X特征的特征值集合,包括X特征的各个特征值,x为特征X的任一特征值;Y为用户是否接听电话的行为,R(Y)为用户是否接听电话的行为集合,包括用户接听电话的行为和用户未接听电话的行为,y为用户接听电话的行为或用户未接听电话的行为;I(X,Y)为特征X与用户是否接听电话的行为的关联程度,P(x,y)为特征值x对应的用户中执行了y行为的用户的数量占用户总数量的比例,P(x)为特征值x对应的用户占用户总数量的比例,P(y)为执行了y行为的用户的数量占用户总数量的比例。Among them, X is any feature, R(X) is the feature value set of X feature, including each feature value of X feature, x is any feature value of feature X; Y is the behavior of whether the user answers the phone, R(Y ) Is the behavior set of whether the user answers the phone, including the behavior of the user answering the phone and the behavior of the user not answering the phone, y is the behavior of the user answering the phone or the behavior of the user not answering the phone; I(X, Y) is the feature X and the user The degree of association of the behavior of answering the phone, P(x,y) is the ratio of the number of users who have performed the behavior y among the users corresponding to the characteristic value x to the total number of users, and P(x) is the proportion of the users corresponding to the characteristic value x The ratio of the total number of users, P(y) is the ratio of the number of users who have performed behavior y to the total number of users.
可选地,所述第二模型为神经网络模型。这种情况下,所述处理模块403还可以获取多个用户在各个特征下的特征值,针对于任一用户,根据所述用户在每个特征下的特征值和所述每个特征的各个特征值构建所述用户在所述每个特征下的特征向量,拼接所述用户在各个特征下的特征向量,得到所述用户对应的第一特征向量;根据所述用户是否执行所述预设行为得到所述用户对应的第二特征向量,将所述多个用户对应的第一特征向量作为模型输入,得到所述多个用户执行所述预设行为的预测结果,基于所述多个用户的第二特征向量和所述多个用户执行所述预设行为的预测结果调整模型参数,得到所述第二模型。Optionally, the second model is a neural network model. In this case, the processing module 403 can also obtain the feature values of multiple users under each feature, and for any user, according to the feature value of the user under each feature and each feature of each feature. The feature value constructs the feature vector of the user under each feature, and stitches the feature vector of the user under each feature to obtain the first feature vector corresponding to the user; according to whether the user performs the preset The behavior obtains the second feature vector corresponding to the user, and the first feature vector corresponding to the multiple users is used as a model input to obtain the prediction result of the multiple users performing the preset behavior, based on the multiple users The second feature vector of and the prediction results of the multiple users performing the preset behavior adjust model parameters to obtain the second model.
可选地,所述处理模块403还用于通过如下方式得到每个特征的各个特征值:若所述特征属于离散特征,则统计所述多个用户在所述特征下的各个值,将所述各个值作为所述特征的各个特征值;若所述特征属于连续特征,则统计所述多个用户在所述特征下的取值范围,将所述取值范围划分为多个取值范围区间,为每个取值范围区间设置一个对应的特征值,得到所述特征的各个特征值。Optionally, the processing module 403 is further configured to obtain each feature value of each feature in the following manner: if the feature is a discrete feature, then count the various values of the multiple users under the feature, and calculate the The respective values are used as the respective feature values of the feature; if the feature is a continuous feature, the value ranges of the multiple users under the feature are counted, and the value range is divided into multiple value ranges Interval, a corresponding feature value is set for each value range interval, and each feature value of the feature is obtained.
从上述内容可以看出:本发明的上述实施例中,先获取第一名单,使用第一模型确定第一名单中各用户的用户类别,再统计第一名单中属于各个用户类别的用户的数量,并基于数量确定第一时长,若第一时长超过设定时长,则使用第二模型确定属于第一用户类别的每个用户执行预设行为的概率,并根据该概率确定第二名单。其中,第一名单中包括多个未执行预设行为的用户,各用户的用户类别中包括第一用户类别,第一用户类别表征用户会接听催收系统拨打的电话,第一时长表征向第一名单中的全部用户拨打电话所需的时长,第二名单用于指示在当前时刻之后需拨打电话的用户。本发明实施例中,在接收到第一名单后,通过先使用第一模型预测出第一名单中的各用户是否会接听电话,并确定完成催收任务的时间,再在确定无法完成催收任务时使用第二模型确定出催收成功概率较高的用户,从而可以在确定催收任务无法完成时优先向催收成功率较高的用户拨打催收电话,提高催收效果。It can be seen from the foregoing that: in the foregoing embodiment of the present invention, the first list is first obtained, the first model is used to determine the user category of each user in the first list, and then the number of users belonging to each user category in the first list is counted , And determine the first duration based on the number. If the first duration exceeds the set duration, the second model is used to determine the probability of each user belonging to the first user category performing the preset behavior, and the second list is determined according to the probability. Among them, the first list includes multiple users who have not performed the preset behavior, and the user category of each user includes the first user category. The first user category indicates that the user will answer the call made by the collection system. The time required for all users in the list to make calls, and the second list is used to indicate the users who need to make calls after the current time. In the embodiment of the present invention, after receiving the first list, the first model is used to predict whether each user in the first list will answer the call, and the time to complete the collection task is determined, and then when it is determined that the collection task cannot be completed The second model is used to identify users with a higher probability of successful collection, so that when it is determined that the collection task cannot be completed, users with a higher success rate of collection can be given priority to call collection calls to improve the collection effect.
基于同一发明构思,本发明实施例还提供了一种计算设备,如图5所示,包括至少一个处理器501,以及与至少一个处理器连接的存储器502,本发明实施例中不限定处理器501与存储器502之间的具体连接介质,图5中处理器501和存储器502之间通过总线连接为例。 总线可以分为地址总线、数据总线、控制总线等。Based on the same inventive concept, an embodiment of the present invention also provides a computing device. As shown in FIG. 5, it includes at least one processor 501 and a memory 502 connected to the at least one processor. The embodiment of the present invention does not limit the processor. For the specific connection medium between the 501 and the memory 502, the connection between the processor 501 and the memory 502 in FIG. 5 is taken as an example. The bus can be divided into address bus, data bus, control bus and so on.
在本发明实施例中,存储器502存储有可被至少一个处理器501执行的指令,至少一个处理器501通过执行存储器502存储的指令,可以执行前述的数据处理方法中所包括的步骤。In the embodiment of the present invention, the memory 502 stores instructions that can be executed by at least one processor 501, and the at least one processor 501 can execute the steps included in the aforementioned data processing method by executing the instructions stored in the memory 502.
其中,处理器501是计算设备的控制中心,可以利用各种接口和线路连接计算设备的各个部分,通过运行或执行存储在存储器502内的指令以及调用存储在存储器502内的数据,从而实现数据处理。可选的,处理器501可包括一个或多个处理单元,处理器501可集成应用处理器和调制解调处理器,其中,应用处理器主要处理操作系统、用户界面和应用程序等,调制解调处理器主要处理下发指令。可以理解的是,上述调制解调处理器也可以不集成到处理器501中。在一些实施例中,处理器501和存储器502可以在同一芯片上实现,在一些实施例中,它们也可以在独立的芯片上分别实现。Among them, the processor 501 is the control center of the computing device, which can use various interfaces and lines to connect various parts of the computing device, and realize data by running or executing instructions stored in the memory 502 and calling data stored in the memory 502. deal with. Optionally, the processor 501 may include one or more processing units, and the processor 501 may integrate an application processor and a modem processor. The application processor mainly processes the operating system, user interface, and application programs. The adjustment processor mainly handles issuing instructions. It can be understood that the foregoing modem processor may not be integrated into the processor 501. In some embodiments, the processor 501 and the memory 502 may be implemented on the same chip, and in some embodiments, they may also be implemented on separate chips.
处理器501可以是通用处理器,例如中央处理器(CPU)、数字信号处理器、专用集成电路(Application Specific Integrated Circuit,ASIC)、现场可编程门阵列或者其他可编程逻辑器件、分立门或者晶体管逻辑器件、分立硬件组件,可以实现或者执行本发明实施例中公开的各方法、步骤及逻辑框图。通用处理器可以是微处理器或者任何常规的处理器等。结合数据处理实施例所公开的方法的步骤可以直接体现为硬件处理器执行完成,或者用处理器中的硬件及软件模块组合执行完成。The processor 501 may be a general-purpose processor, such as a central processing unit (CPU), a digital signal processor, an application specific integrated circuit (ASIC), a field programmable gate array or other programmable logic devices, discrete gates or transistors Logic devices and discrete hardware components can implement or execute the methods, steps, and logic block diagrams disclosed in the embodiments of the present invention. The general-purpose processor may be a microprocessor or any conventional processor or the like. The steps of the method disclosed in combination with the data processing embodiment may be directly embodied as being executed and completed by a hardware processor, or executed and completed by a combination of hardware and software modules in the processor.
存储器502作为一种非易失性计算机可读存储介质,可用于存储非易失性软件程序、非易失性计算机可执行程序以及模块。存储器502可以包括至少一种类型的存储介质,例如可以包括闪存、硬盘、多媒体卡、卡型存储器、随机访问存储器(Random Access Memory,RAM)、静态随机访问存储器(Static Random Access Memory,SRAM)、可编程只读存储器(Programmable Read Only Memory,PROM)、只读存储器(Read Only Memory,ROM)、带电可擦除可编程只读存储器(Electrically Erasable Programmable Read-Only Memory,EEPROM)、磁性存储器、磁盘、光盘等等。存储器502是能够用于携带或存储具有指令或数据结构形式的期望的程序代码并能够由计算机存取的任何其他介质,但不限于此。本发明实施例中的存储器502还可以是电路或者其它任意能够实现存储功能的装置,用于存储程序指令和/或数据。As a non-volatile computer-readable storage medium, the memory 502 can be used to store non-volatile software programs, non-volatile computer-executable programs, and modules. The memory 502 may include at least one type of storage medium, such as flash memory, hard disk, multimedia card, card-type memory, random access memory (Random Access Memory, RAM), static random access memory (Static Random Access Memory, SRAM), Programmable Read Only Memory (PROM), Read Only Memory (ROM), Electrically Erasable Programmable Read-Only Memory (EEPROM), magnetic memory, disk , CD, etc. The memory 502 is any other medium that can be used to carry or store desired program codes in the form of instructions or data structures and that can be accessed by a computer, but is not limited thereto. The memory 502 in the embodiment of the present invention may also be a circuit or any other device capable of realizing a storage function for storing program instructions and/or data.
基于同一发明构思,本发明实施例还提供了一种计算机可读存储介质,其存储有可由计算设备执行的计算机程序,当所述程序在所述计算设备上运行时,使得所述计算设备执行图2任意所述的数据处理方法。Based on the same inventive concept, embodiments of the present invention also provide a computer-readable storage medium that stores a computer program executable by a computing device, and when the program runs on the computing device, the computing device executes Figure 2 arbitrarily described data processing method.
本领域内的技术人员应明白,本发明的实施例可提供为方法、或计算机程序产品。因此,本发明可采用完全硬件实施例、完全软件实施例、或结合软件和硬件方面的实施例的形式。而且,本发明可采用在一个或多个其中包含有计算机可用程序代码的计算机可用存储介质(包括但不限于磁盘存储器、CD-ROM、光学存储器等)上实施的计算机程序产品的形式。Those skilled in the art should understand that the embodiments of the present invention can be provided as methods or computer program products. Therefore, the present invention may adopt the form of a complete hardware embodiment, a complete software embodiment, or an embodiment combining software and hardware. Moreover, the present invention may adopt the form of a computer program product implemented on one or more computer-usable storage media (including but not limited to disk storage, CD-ROM, optical storage, etc.) containing computer-usable program codes.
本发明是参照根据本发明实施例的方法、设备(系统)、和计算机程序产品的流程图和/或方框图来描述的。应理解可由计算机程序指令实现流程图和/或方框图中的每一流程和/或方框、以及流程图和/或方框图中的流程和/或方框的结合。可提供这些计算机程序指令到通用计算机、专用计算机、嵌入式处理机或其他可编程数据处理设备的处理器以产生一个机器,使得通过计算机或其他可编程数据处理设备的处理器执行的指令产生用于实现在流程图一个流程或多个流程和/或方框图一个方框或多个方框中指定的功能的 装置。The present invention is described with reference to flowcharts and/or block diagrams of methods, devices (systems), and computer program products according to embodiments of the present invention. It should be understood that each process and/or block in the flowchart and/or block diagram, and the combination of processes and/or blocks in the flowchart and/or block diagram can be implemented by computer program instructions. These computer program instructions can be provided to the processor of a general-purpose computer, a special-purpose computer, an embedded processor, or other programmable data processing equipment to generate a machine, so that the instructions executed by the processor of the computer or other programmable data processing equipment are generated It is a device that realizes the functions specified in one process or multiple processes in the flowchart and/or one block or multiple blocks in the block diagram.
这些计算机程序指令也可存储在能引导计算机或其他可编程数据处理设备以特定方式工作的计算机可读存储器中,使得存储在该计算机可读存储器中的指令产生包括指令装置的制造品,该指令装置实现在流程图一个流程或多个流程和/或方框图一个方框或多个方框中指定的功能。These computer program instructions can also be stored in a computer-readable memory that can guide a computer or other programmable data processing equipment to work in a specific manner, so that the instructions stored in the computer-readable memory produce an article of manufacture including the instruction device. The device implements the functions specified in one process or multiple processes in the flowchart and/or one block or multiple blocks in the block diagram.
这些计算机程序指令也可装载到计算机或其他可编程数据处理设备上,使得在计算机或其他可编程设备上执行一系列操作步骤以产生计算机实现的处理,从而在计算机或其他可编程设备上执行的指令提供用于实现在流程图一个流程或多个流程和/或方框图一个方框或多个方框中指定的功能的步骤。These computer program instructions can also be loaded on a computer or other programmable data processing equipment, so that a series of operation steps are executed on the computer or other programmable equipment to produce computer-implemented processing, so as to execute on the computer or other programmable equipment. The instructions provide steps for implementing the functions specified in one process or multiple processes in the flowchart and/or one block or multiple blocks in the block diagram.
尽管已描述了本发明的优选实施例,但本领域内的技术人员一旦得知了基本创造性概念,则可对这些实施例作出另外的变更和修改。所以,所附权利要求意欲解释为包括优选实施例以及落入本发明范围的所有变更和修改。Although the preferred embodiments of the present invention have been described, those skilled in the art can make additional changes and modifications to these embodiments once they learn the basic creative concept. Therefore, the appended claims are intended to be interpreted as including the preferred embodiments and all changes and modifications falling within the scope of the present invention.
显然,本领域的技术人员可以对本发明进行各种改动和变型而不脱离本发明的精神和范围。这样,倘若本发明的这些修改和变型属于本发明权利要求及其等同技术的范围之内,则本发明也意图包含这些改动和变型在内。Obviously, those skilled in the art can make various changes and modifications to the present invention without departing from the spirit and scope of the present invention. In this way, if these modifications and variations of the present invention fall within the scope of the claims of the present invention and their equivalent technologies, the present invention is also intended to include these modifications and variations.

Claims (20)

  1. 一种数据处理方法,其特征在于,所述数据处理方法应用于催收系统,所述方法包括:A data processing method, characterized in that the data processing method is applied to a collection system, and the method includes:
    获取第一名单;所述第一名单中包括多个未执行预设行为的用户;Obtain a first list; the first list includes multiple users who have not performed a preset behavior;
    使用第一模型确定第一名单中各用户的用户类别,其中,所述各用户的用户类别中包括第一用户类别,所述第一用户类别表征用户会接听所述催收系统拨打的电话;Use the first model to determine the user category of each user in the first list, wherein the user category of each user includes a first user category, and the first user category represents that the user will answer the call made by the collection system;
    统计所述第一名单中属于各个用户类别的用户的数量,并基于所述数量确定第一时长,所述第一时长表征向所述第一名单中的全部用户拨打电话所需的时长;Count the number of users belonging to each user category in the first list, and determine a first duration based on the number, where the first duration represents the duration required to make calls to all users in the first list;
    若所述第一时长超过设定时长,则使用第二模型确定属于所述第一用户类别的每个用户执行所述预设行为的概率,并根据所述概率确定第二名单;所述第二名单用于指示在当前时刻之后需拨打电话的用户。If the first duration exceeds the set duration, a second model is used to determine the probability of each user belonging to the first user category performing the preset behavior, and a second list is determined according to the probability; The second list is used to indicate users who need to make a call after the current time.
  2. 根据权利要求1所述的方法,其特征在于,所述各用户的用户类别中还包括第二用户类别,所述第二用户类别表征用户不会接听所述催收系统拨打的电话;The method according to claim 1, wherein the user category of each user further includes a second user category, and the second user category indicates that the user will not answer the call made by the collection system;
    所述基于所述数量确定第一时长,包括:The determining the first duration based on the number includes:
    获取第一用户类别对应的第一通话时长和第二用户类别对应的第二通话时长;所述第一通话时长是根据历史时段内向接听电话的各个用户拨打电话的通话时长确定的;所述第二通话时长是根据向用户拨打电话后等待接听的通话时长确定的;Acquire the first call duration corresponding to the first user category and the second call duration corresponding to the second user category; the first call duration is determined according to the call duration of calls made to each user who answers the call in the historical period; the first 2. The duration of the call is determined based on the duration of the call waiting to be answered after making a call to the user;
    根据所述第一名单中属于第一用户类别的用户的数量和所述第一通话时长、所述第一名单中属于第二用户类别的用户的数量和所述第二通话时长,确定向所述第一名单中的全部用户拨打电话的总通话时长;According to the number of users belonging to the first user category and the first call duration in the first list, the number of users belonging to the second user category in the first list and the second call duration, determine the number of users State the total duration of calls made by all users in the first list;
    基于所述总通话时长和可用的电话号码的数量,确定所述第一时长。The first duration is determined based on the total call duration and the number of available phone numbers.
  3. 根据权利要求2所述的方法,其特征在于,所述可用的电话号码通过如下方式确定:The method according to claim 2, wherein the available telephone number is determined in the following manner:
    针对预先在运营商申请的多个电话号码,基于所述总通话时长和所述多个电话号码的数量,得到预测时长,确定所述多个电话号码在所述预测时长内下线的概率,将概率不大于第一预设阈值的电话号码作为所述可用的电话号码。For multiple phone numbers previously applied for by the operator, based on the total call duration and the number of the multiple phone numbers, the predicted duration is obtained, and the probability of the multiple phone numbers going offline within the predicted duration is determined, Use a phone number with a probability not greater than the first preset threshold as the available phone number.
  4. 根据权利要求3所述的方法,其特征在于,在所述使用第一模型确定第一名单中各用户的用户类别的同时,还包括:The method according to claim 3, characterized in that, while said using the first model to determine the user category of each user in the first list, the method further comprises:
    根据所述第一名单中各用户的联系方式,使用所述可用的电话号码向所述各用户拨打电话。According to the contact information of each user in the first list, use the available phone number to make a call to each user.
  5. 根据权利要求1至4中任一项所述的方法,其特征在于,所述方法还包括:The method according to any one of claims 1 to 4, wherein the method further comprises:
    在所述第一时长未超过所述设定时长时,若在所述第一时长内接收到处理第三名单的请求消息,则基于所述第一模型确定向所述第三名单中的全部用户拨打电话所需的第二时长;When the first time length does not exceed the set time length, if a request message for processing the third list is received within the first time length, it is determined based on the first model to send all users in the third list The second time required for the user to make a call;
    若所述第一时长和所述第二时长之和超过所述设定时长,则拒绝接收所述第三名单。If the sum of the first duration and the second duration exceeds the set duration, refuse to receive the third list.
  6. 根据权利要求1所述的方法,其特征在于,所述第一模型为分类模型,所述第一模型通过如下方式得到:The method according to claim 1, wherein the first model is a classification model, and the first model is obtained in the following manner:
    获取多个用户在各个特征下的特征值;针对于任一特征,根据所述多个用户中接听电话的用户的数量、未接听电话的用户的数量、所述特征的每个特征值对应的用户的数量、每个特征值对应的用户中接听电话的用户的数量和每个特征值对应的用户中未接听电话 的用户的数量,确定所述特征与用户是否接听电话的行为的关联程度;Acquire the feature value of multiple users under each feature; for any feature, according to the number of users who answered the phone, the number of users who did not answer the phone, and the corresponding feature value of each feature in the multiple users The number of users, the number of users who answer the phone among the users corresponding to each characteristic value, and the number of users who do not answer the phone among the users corresponding to each characteristic value, determine the degree of association between the characteristic and the behavior of whether the user answers the phone;
    将与用户是否接听电话的行为的关联程度大于或等于第二预设阈值的特征作为强相关特征,根据所述多个用户中接听电话的用户的数量、未接听电话的用户的数量、所述强相关特征的各个特征值对应的用户的数量、所述强相关特征的各个特征值对应的用户中接听电话的用户的数量和所述强相关特征的各个特征值对应的用户中未接听电话的用户的数量,训练得到所述第一模型。The feature whose degree of association with the user’s behavior of answering a call is greater than or equal to the second preset threshold is taken as a strong correlation feature, based on the number of users who answered the call, the number of users who did not answer the call, and the The number of users corresponding to each feature value of the strong correlation feature, the number of users who answered the phone among the users corresponding to each feature value of the strong correlation feature, and the number of users who did not answer the call among the users corresponding to each feature value of the strong correlation feature The number of users is trained to obtain the first model.
  7. 根据权利要求6所述的方法,其特征在于,每个特征与用户是否接听电话的行为的关联程度满足如下条件:The method according to claim 6, wherein the degree of association between each feature and whether the user answers the call satisfies the following conditions:
    Figure PCTCN2020129121-appb-100001
    Figure PCTCN2020129121-appb-100001
    其中,X为任一特征,R(X)为X特征的特征值集合,包括X特征的各个特征值,x为特征X的任一特征值;Y为用户是否接听电话的行为,R(Y)为用户是否接听电话的行为集合,包括用户接听电话的行为和用户未接听电话的行为,y为用户接听电话的行为或用户未接听电话的行为;I(X,Y)为特征X与用户是否接听电话的行为的关联程度,P(x,y)为特征值x对应的用户中执行了y行为的用户的数量占用户总数量的比例,P(x)为特征值x对应的用户占用户总数量的比例,P(y)为执行了y行为的用户的数量占用户总数量的比例。Among them, X is any feature, R(X) is the feature value set of X feature, including each feature value of X feature, x is any feature value of feature X; Y is the behavior of whether the user answers the phone, R(Y ) Is the behavior set of whether the user answers the phone, including the behavior of the user answering the phone and the behavior of the user not answering the phone, y is the behavior of the user answering the phone or the behavior of the user not answering the phone; I(X, Y) is the feature X and the user The degree of association of the behavior of answering the phone, P(x,y) is the ratio of the number of users who have performed the behavior y among the users corresponding to the characteristic value x to the total number of users, and P(x) is the proportion of the users corresponding to the characteristic value x The ratio of the total number of users, P(y) is the ratio of the number of users who have performed behavior y to the total number of users.
  8. 根据权利要求1所述的方法,其特征在于,所述第二模型为神经网络模型,所述第二模型通过如下方式得到:The method according to claim 1, wherein the second model is a neural network model, and the second model is obtained in the following manner:
    获取多个用户在各个特征下的特征值;Obtain feature values of multiple users under each feature;
    针对于任一用户,根据所述用户在每个特征下的特征值和所述每个特征的各个特征值构建所述用户在所述每个特征下的特征向量,拼接所述用户在各个特征下的特征向量,得到所述用户对应的第一特征向量;根据所述用户是否执行所述预设行为得到所述用户对应的第二特征向量;For any user, construct the feature vector of the user under each feature according to the feature value of the user under each feature and the feature value of each feature, and splice the feature vector of the user under each feature The following feature vector to obtain the first feature vector corresponding to the user; to obtain the second feature vector corresponding to the user according to whether the user performs the preset behavior;
    将所述多个用户对应的第一特征向量作为模型输入,得到所述多个用户执行所述预设行为的预测结果,基于所述多个用户的第二特征向量和所述多个用户执行所述预设行为的预测结果调整模型参数,得到所述第二模型。The first feature vectors corresponding to the multiple users are used as model input to obtain the prediction result of the multiple users performing the preset behavior, based on the second feature vectors of the multiple users and the multiple users performing The prediction result of the preset behavior adjusts model parameters to obtain the second model.
  9. 根据权利要求6至8中任一项所述的方法,其特征在于,每个特征的各个特征值通过如下方式得到:The method according to any one of claims 6 to 8, wherein each characteristic value of each characteristic is obtained in the following manner:
    若所述特征属于离散特征,则统计所述多个用户在所述特征下的各个值,将所述各个值作为所述特征的各个特征值;若所述特征属于连续特征,则统计所述多个用户在所述特征下的取值范围,将所述取值范围划分为多个取值范围区间,为每个取值范围区间设置一个对应的特征值,得到所述特征的各个特征值。If the feature is a discrete feature, then count the values of the multiple users under the feature, and use each value as each feature value of the feature; if the feature is a continuous feature, then count the The value range of multiple users under the feature, the value range is divided into multiple value range intervals, a corresponding feature value is set for each value range interval, and each feature value of the feature is obtained .
  10. 一种数据处理装置,其特征在于,所述装置包括:A data processing device, characterized in that the device includes:
    获取模块,用于获取第一名单;所述第一名单中包括多个未执行预设行为的用户;An obtaining module, configured to obtain a first list; the first list includes a plurality of users who have not performed a preset behavior;
    确定模块,用于使用第一模型确定第一名单中各用户的用户类别,其中,所述各用户的用户类别中包括第一用户类别,所述第一用户类别表征用户会接听所述催收系统拨打的电话;The determining module is configured to determine the user category of each user in the first list using the first model, wherein the user category of each user includes a first user category, and the first user category indicates that the user will answer the collection system Phone number dialed;
    处理模块,用于统计所述第一名单中属于各个用户类别的用户的数量,并基于所述数量确定第一时长,所述第一时长表征向所述第一名单中的全部用户拨打电话所需的时长; 若所述第一时长超过设定时长,则使用第二模型确定属于所述第一用户类别的每个用户执行所述预设行为的概率,并根据所述概率确定第二名单;所述第二名单用于指示在当前时刻之后需拨打电话的用户。The processing module is configured to count the number of users belonging to each user category in the first list, and determine a first duration based on the number, and the first duration represents the number of calls made to all users in the first list The required duration; if the first duration exceeds the set duration, the second model is used to determine the probability of each user belonging to the first user category performing the preset behavior, and a second list is determined according to the probability ; The second list is used to indicate users who need to make calls after the current moment.
  11. 根据权利要求10所述的装置,其特征在于,所述各用户的用户类别中还包括第二用户类别,所述第二用户类别表征用户不会接听所述催收系统拨打的电话;The device according to claim 10, wherein the user category of each user further includes a second user category, and the second user category indicates that the user will not answer the call made by the collection system;
    所述获取模块还用于:获取第一用户类别对应的第一通话时长和第二用户类别对应的第二通话时长;所述第一通话时长是根据历史时段内向接听电话的各个用户拨打电话的通话时长确定的;所述第二通话时长是根据向用户拨打电话后等待接听的通话时长确定的;The acquiring module is further configured to: acquire the first call duration corresponding to the first user category and the second call duration corresponding to the second user category; the first call duration is based on making calls to each user who answers the call within a historical period of time The duration of the call is determined; the second duration of the call is determined according to the duration of the call waiting to be answered after making a call to the user;
    所述确定模块具体用于:根据所述第一名单中属于第一用户类别的用户的数量和所述第一通话时长、所述第一名单中属于第二用户类别的用户的数量和所述第二通话时长,确定向所述第一名单中的全部用户拨打电话的总通话时长;基于所述总通话时长和可用的电话号码的数量,确定所述第一时长。The determining module is specifically configured to: according to the number of users belonging to the first user category in the first list and the first call duration, the number of users belonging to the second user category in the first list and the The second call duration determines the total call duration for making calls to all users in the first list; the first duration is determined based on the total call duration and the number of available phone numbers.
  12. 根据权利要求11所述的装置,其特征在于,所述确定模块通过如下方式确定所述可用的电话号码:The device according to claim 11, wherein the determining module determines the available phone number in the following manner:
    针对预先在运营商申请的多个电话号码,基于所述总通话时长和所述多个电话号码的数量,得到预测时长,确定所述多个电话号码在所述预测时长内下线的概率,将概率不大于第一预设阈值的电话号码作为所述可用的电话号码。For multiple phone numbers previously applied for by the operator, based on the total call duration and the number of the multiple phone numbers, the predicted duration is obtained, and the probability of the multiple phone numbers going offline within the predicted duration is determined, Use a phone number with a probability not greater than the first preset threshold as the available phone number.
  13. 根据权利要求12所述的装置,其特征在于,所述装置还包括拨打模块,在所述确定模块使用第一模型确定第一名单中各用户的用户类别的同时,所述拨打模块用于:The device according to claim 12, wherein the device further comprises a dialing module, and while the determining module uses the first model to determine the user category of each user in the first list, the dialing module is configured to:
    根据所述第一名单中各用户的联系方式,使用所述可用的电话号码向各用户拨打电话。According to the contact information of each user in the first list, the available phone number is used to make a call to each user.
  14. 根据权利要求10至13中任一项所述的装置,其特征在于,所述处理模块还用于:The device according to any one of claims 10 to 13, wherein the processing module is further configured to:
    在所述第一时长未超过所述设定时长时,若在所述第一时长内接收到处理第三名单的请求消息,则基于所述第一模型确定向所述第三名单中的全部用户拨打电话所需的第二时长;When the first time length does not exceed the set time length, if a request message for processing the third list is received within the first time length, it is determined based on the first model to send all users in the third list The second time required for the user to make a call;
    若所述第一时长和所述第二时长之和超过所述设定时长,则拒绝接收所述第三名单。If the sum of the first duration and the second duration exceeds the set duration, refuse to receive the third list.
  15. 根据权利要求10所述的装置,其特征在于,所述第一模型为分类模型;所述处理模块还用于:The device according to claim 10, wherein the first model is a classification model; and the processing module is further configured to:
    获取多个用户在各个特征下的特征值;Obtain feature values of multiple users under each feature;
    针对于任一特征,根据所述多个用户中接听电话的用户的数量、未接听电话的用户的数量、所述特征的每个特征值对应的用户的数量、每个特征值对应的用户中接听电话的用户的数量和每个特征值对应的用户中未接听电话的用户的数量,确定所述特征与用户是否接听电话的行为的关联程度;For any feature, according to the number of users who answered the phone, the number of users who did not answer the phone, the number of users corresponding to each feature value of the feature, and the number of users corresponding to each feature value The number of users who answer the phone and the number of users who have not answered the phone among the users corresponding to each feature value, and determine the degree of association between the feature and the behavior of whether the user answers the phone;
    将与用户是否接听电话的行为的关联程度大于或等于第二预设阈值的特征作为强相关特征,根据所述多个用户中接听电话的用户的数量、未接听电话的用户的数量、所述强相关特征的各个特征值对应的用户的数量、所述强相关特征的各个特征值对应的用户中接听电话的用户的数量和所述强相关特征的各个特征值对应的用户中未接听电话的用户的数量,训练得到所述第一模型。The feature whose degree of association with the user’s behavior of answering a call is greater than or equal to the second preset threshold is taken as a strong correlation feature, based on the number of users who answered the call, the number of users who did not answer the call, and the The number of users corresponding to each feature value of the strong correlation feature, the number of users who answered the phone among the users corresponding to each feature value of the strong correlation feature, and the number of users who did not answer the call among the users corresponding to each feature value of the strong correlation feature The number of users is trained to obtain the first model.
  16. 根据权利要求15所述的装置,其特征在于,每个特征与用户是否接听电话的行为的关联程度满足如下条件:The device according to claim 15, wherein the degree of association between each feature and whether the user answers the call satisfies the following conditions:
    Figure PCTCN2020129121-appb-100002
    Figure PCTCN2020129121-appb-100002
    其中,X为任一特征,R(X)为X特征的特征值集合,包括X特征的各个特征值,x为特征X的任一特征值;Y为用户是否接听电话的行为,R(Y)为用户是否接听电话的行为集合,包括用户接听电话的行为和用户未接听电话的行为,y为用户接听电话的行为或用户未接听电话的行为;I(X,Y)为特征X与用户是否接听电话的行为的关联程度,P(x,y)为特征值x对应的用户中执行了y行为的用户的数量占用户总数量的比例,P(x)为特征值x对应的用户占用户总数量的比例,P(y)为执行了y行为的用户的数量占用户总数量的比例。Among them, X is any feature, R(X) is the feature value set of X feature, including each feature value of X feature, x is any feature value of feature X; Y is the behavior of whether the user answers the phone, R(Y ) Is the behavior set of whether the user answers the phone, including the behavior of the user answering the phone and the behavior of the user not answering the phone, y is the behavior of the user answering the phone or the behavior of the user not answering the phone; I(X, Y) is the feature X and the user The degree of association of the behavior of answering the phone, P(x,y) is the ratio of the number of users who have performed the behavior y among the users corresponding to the characteristic value x to the total number of users, and P(x) is the proportion of the users corresponding to the characteristic value x The ratio of the total number of users, P(y) is the ratio of the number of users who have performed behavior y to the total number of users.
  17. 根据权利要求10所述的装置,其特征在于,所述第二模型为神经网络模型,所述处理模块还用于:The device according to claim 10, wherein the second model is a neural network model, and the processing module is further configured to:
    获取多个用户在各个特征下的特征值;Obtain feature values of multiple users under each feature;
    针对于任一用户,根据所述用户在每个特征下的特征值和所述每个特征的各个特征值构建所述用户在所述每个特征下的特征向量,拼接所述用户在各个特征下的特征向量,得到所述用户对应的第一特征向量;根据所述用户是否执行所述预设行为得到所述用户对应的第二特征向量;For any user, construct the feature vector of the user under each feature according to the feature value of the user under each feature and the feature value of each feature, and splice the feature vector of the user under each feature The following feature vector to obtain the first feature vector corresponding to the user; to obtain the second feature vector corresponding to the user according to whether the user performs the preset behavior;
    将所述多个用户对应的第一特征向量作为模型输入,得到所述多个用户执行所述预设行为的预测结果,基于所述多个用户的第二特征向量和所述多个用户执行所述预设行为的预测结果调整模型参数,得到所述第二模型。The first feature vectors corresponding to the multiple users are used as model input to obtain the prediction result of the multiple users performing the preset behavior, based on the second feature vectors of the multiple users and the multiple users performing The prediction result of the preset behavior adjusts model parameters to obtain the second model.
  18. 根据权利要求15至17中任一项所述的装置,其特征在于,所述处理模块还用于通过如下方式得到每个特征的各个特征值:The device according to any one of claims 15 to 17, wherein the processing module is further configured to obtain each characteristic value of each characteristic in the following manner:
    若所述特征属于离散特征,则统计所述多个用户在所述特征下的各个值,将所述各个值作为所述特征的各个特征值;若所述特征属于连续特征,则统计所述多个用户在所述特征下的取值范围,将所述取值范围划分为多个取值范围区间,为每个取值范围区间设置一个对应的特征值,得到所述特征的各个特征值。If the feature is a discrete feature, then count the values of the multiple users under the feature, and use each value as each feature value of the feature; if the feature is a continuous feature, then count the The value range of multiple users under the feature, the value range is divided into multiple value range intervals, a corresponding feature value is set for each value range interval, and each feature value of the feature is obtained .
  19. 一种计算设备,其特征在于,包括至少一个处理器以及至少一个存储器,其中,所述存储器存储有计算机程序,当所述程序被所述处理器执行时,使得所述处理器执行权利要求1~9任一权利要求所述的方法。A computing device, characterized by comprising at least one processor and at least one memory, wherein the memory stores a computer program, and when the program is executed by the processor, the processor executes claim 1 -9 The method of any one of claims.
  20. 一种计算机可读存储介质,其特征在于,其存储有可由计算设备执行的计算机程序,当所述程序在所述计算设备上运行时,使得所述计算设备执行权利要求1~9任一权利要求所述的方法。A computer-readable storage medium, characterized in that it stores a computer program executable by a computing device, and when the program runs on the computing device, the computing device executes any one of claims 1-9 Require the described method.
PCT/CN2020/129121 2019-11-22 2020-11-16 Data processing method and device WO2021098652A1 (en)

Applications Claiming Priority (2)

Application Number Priority Date Filing Date Title
CN201911155084.5 2019-11-22
CN201911155084.5A CN111091460A (en) 2019-11-22 2019-11-22 Data processing method and device

Publications (1)

Publication Number Publication Date
WO2021098652A1 true WO2021098652A1 (en) 2021-05-27

Family

ID=70393812

Family Applications (1)

Application Number Title Priority Date Filing Date
PCT/CN2020/129121 WO2021098652A1 (en) 2019-11-22 2020-11-16 Data processing method and device

Country Status (2)

Country Link
CN (1) CN111091460A (en)
WO (1) WO2021098652A1 (en)

Cited By (2)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US20210200949A1 (en) * 2019-12-30 2021-07-01 Beijing Baidu Netcom Science And Technology Co., Ltd. Pre-training method for sentiment analysis model, and electronic device
CN115297212A (en) * 2022-06-25 2022-11-04 上海浦东发展银行股份有限公司 Voice robot collection method, system, device and medium based on machine learning

Families Citing this family (1)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN111091460A (en) * 2019-11-22 2020-05-01 深圳前海微众银行股份有限公司 Data processing method and device

Citations (6)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN106952155A (en) * 2017-03-08 2017-07-14 深圳前海纵腾金融科技服务有限公司 A kind of collection method and device based on credit scoring
JP2018077671A (en) * 2016-11-09 2018-05-17 ヤフー株式会社 Information processing apparatus, information processing method, apparatus for generating prediction models, method for generating prediction models and program
CN109214936A (en) * 2018-09-03 2019-01-15 中国平安人寿保险股份有限公司 A kind of expense collection method, system and terminal device
CN109685336A (en) * 2018-12-10 2019-04-26 深圳市小牛普惠投资管理有限公司 Collection task distribution method, device, computer equipment and storage medium
CN110475033A (en) * 2019-08-21 2019-11-19 深圳前海微众银行股份有限公司 Intelligent dialing method, device, equipment and computer readable storage medium
CN111091460A (en) * 2019-11-22 2020-05-01 深圳前海微众银行股份有限公司 Data processing method and device

Family Cites Families (1)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN109559221A (en) * 2018-11-20 2019-04-02 中国银行股份有限公司 Collection method, apparatus and storage medium based on user data

Patent Citations (6)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
JP2018077671A (en) * 2016-11-09 2018-05-17 ヤフー株式会社 Information processing apparatus, information processing method, apparatus for generating prediction models, method for generating prediction models and program
CN106952155A (en) * 2017-03-08 2017-07-14 深圳前海纵腾金融科技服务有限公司 A kind of collection method and device based on credit scoring
CN109214936A (en) * 2018-09-03 2019-01-15 中国平安人寿保险股份有限公司 A kind of expense collection method, system and terminal device
CN109685336A (en) * 2018-12-10 2019-04-26 深圳市小牛普惠投资管理有限公司 Collection task distribution method, device, computer equipment and storage medium
CN110475033A (en) * 2019-08-21 2019-11-19 深圳前海微众银行股份有限公司 Intelligent dialing method, device, equipment and computer readable storage medium
CN111091460A (en) * 2019-11-22 2020-05-01 深圳前海微众银行股份有限公司 Data processing method and device

Cited By (3)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US20210200949A1 (en) * 2019-12-30 2021-07-01 Beijing Baidu Netcom Science And Technology Co., Ltd. Pre-training method for sentiment analysis model, and electronic device
US11537792B2 (en) * 2019-12-30 2022-12-27 Beijing Baidu Netcom Science And Technology Co., Ltd. Pre-training method for sentiment analysis model, and electronic device
CN115297212A (en) * 2022-06-25 2022-11-04 上海浦东发展银行股份有限公司 Voice robot collection method, system, device and medium based on machine learning

Also Published As

Publication number Publication date
CN111091460A (en) 2020-05-01

Similar Documents

Publication Publication Date Title
WO2021098652A1 (en) Data processing method and device
CN109783730A (en) Products Show method, apparatus, computer equipment and storage medium
WO2019205325A1 (en) Method for determining risk level of user, terminal device, and computer-readable storage medium
US10637990B1 (en) Call center load balancing and routing management
CN109766454A (en) A kind of investor's classification method, device, equipment and medium
CN110796513A (en) Multitask learning method and device, electronic equipment and storage medium
CN110634060A (en) User credit risk assessment method, system, device and storage medium
CN112966189A (en) Fund product recommendation system
CN111061948B (en) User tag recommendation method and device, computer equipment and storage medium
CN115423578A (en) Bidding method and system based on micro-service containerization cloud platform
WO2019171492A1 (en) Prediction task assistance device and prediction task assistance method
CN115914363A (en) Message pushing method and device, computer equipment and storage medium
WO2023114637A1 (en) Computer-implemented system and method of facilitating artificial intelligence based lending strategies and business revenue management
WO2022143431A1 (en) Method and apparatus for training anti-money laundering model
CN115099934A (en) High-latency customer identification method, electronic equipment and storage medium
WO2021129368A1 (en) Method and apparatus for determining client type
KR20230060128A (en) Method for providing electronic bidding information analysis service using eligibility examination engine
CN114565450A (en) Overdue common debt-based collection strategy determination method and related equipment
CN112184417A (en) Business approval method, device, medium and electronic equipment
CN117892112B (en) Data analysis method based on block chain
KR102519878B1 (en) Apparatus, method and recording medium storing commands for providing artificial-intelligence-based risk management solution in credit exposure business of financial institution
US20230237572A1 (en) Structuring a Multi-Segment Operation
CN115375454A (en) User data processing method and device, computer equipment and storage medium
CN117788139A (en) Training method and device for information output model, computer equipment and storage medium
CN116976187A (en) Modeling variable determining method, abnormal data prediction model construction method and device

Legal Events

Date Code Title Description
121 Ep: the epo has been informed by wipo that ep was designated in this application

Ref document number: 20890515

Country of ref document: EP

Kind code of ref document: A1

NENP Non-entry into the national phase

Ref country code: DE

32PN Ep: public notification in the ep bulletin as address of the adressee cannot be established

Free format text: NOTING OF LOSS OF RIGHTS PURSUANT TO RULE 112(1) EPC (EPO FORM 1205A DATED 07.10.2022)

122 Ep: pct application non-entry in european phase

Ref document number: 20890515

Country of ref document: EP

Kind code of ref document: A1