CN109214912A - Processing method, behavior prediction method, apparatus, equipment and the medium of behavioral data - Google Patents

Processing method, behavior prediction method, apparatus, equipment and the medium of behavioral data Download PDF

Info

Publication number
CN109214912A
CN109214912A CN201810931189.4A CN201810931189A CN109214912A CN 109214912 A CN109214912 A CN 109214912A CN 201810931189 A CN201810931189 A CN 201810931189A CN 109214912 A CN109214912 A CN 109214912A
Authority
CN
China
Prior art keywords
user
sample
users
destination application
application program
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Pending
Application number
CN201810931189.4A
Other languages
Chinese (zh)
Inventor
陈棱
刘宾
陈磊
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Photo Letter Data Service (shanghai) Co Ltd
Original Assignee
Photo Letter Data Service (shanghai) Co Ltd
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Photo Letter Data Service (shanghai) Co Ltd filed Critical Photo Letter Data Service (shanghai) Co Ltd
Priority to CN201810931189.4A priority Critical patent/CN109214912A/en
Publication of CN109214912A publication Critical patent/CN109214912A/en
Pending legal-status Critical Current

Links

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06QINFORMATION AND COMMUNICATION TECHNOLOGY [ICT] SPECIALLY ADAPTED FOR ADMINISTRATIVE, COMMERCIAL, FINANCIAL, MANAGERIAL OR SUPERVISORY PURPOSES; SYSTEMS OR METHODS SPECIALLY ADAPTED FOR ADMINISTRATIVE, COMMERCIAL, FINANCIAL, MANAGERIAL OR SUPERVISORY PURPOSES, NOT OTHERWISE PROVIDED FOR
    • G06Q40/00Finance; Insurance; Tax strategies; Processing of corporate or income taxes
    • G06Q40/03Credit; Loans; Processing thereof

Abstract

The embodiment of the present invention proposes a kind of processing method of behavioral data, behavior prediction method, apparatus, equipment and medium, the processing method includes: the difference between the application program usage behavior data of application program usage behavior data and the second class sample of users based on the first kind sample of users in sample of users set, filter out at least one destination application, first kind sample of users includes that the user of overdue refund occurs, and the second class sample of users includes that the user of overdue refund does not occur;The duration that at least one destination application is used according to user each in sample of users set, calculates the weight of each destination application at least one corresponding destination application of each user;According to the weight of the corresponding each destination application of each user, the overdue refund prediction model of user is determined.Through the embodiment of the present invention, manual intervention is reduced, avoids consuming a large amount of human resources, prediction result caused by human subjective's judgement is reduced and deviation occurs.

Description

Processing method, behavior prediction method, apparatus, equipment and the medium of behavioral data
Technical field
The present invention relates to Internet technical field more particularly to a kind of processing method of behavioral data, behavior prediction method, Device, equipment and medium.
Background technique
Personal credit file uses certain modeling after generating characteristic variable by the case history data of acquisition various dimensions The personal following overdue risk of method prediction, be widely used in credit card application, consumption by stages, exempt from the fields such as cash pledge lease.
Collage-credit data is the large corporations such as current bank using extensive judgment basis, and collage-credit data and overdue risk are closed by force Connection, data standard, but the crowd covered is limited, is unable to satisfy the credit demand largely without reference record group.In view of this, more Begin to use fragmentation, non-structured data come more mechanisms to predict personal overdue risk.
There are two types of modes in the prior art to predict whether the user of debt-credit overdue can refund.The first is passed through based on expert The air control rule tested, needs air control personnel to follow up in time market trend, captures the behavior of loan user, according to the row of loan user To judge whether the user has high fraud or overdue risk.Be for second based on expertise Feature Engineering (Feature Engineering, Mean and extract feature from initial data for model use), feature instruction is usually extracted from initial data by staff Practice model, whether there is high fraud or overdue risk by model prediction user.
Both air control application technologies, deficient in stability need staff ceaselessly to observe air control rule and feature work Whether the validity of journey can distinguish overdue user and non-overdue user.Since manual intervention is it is easy to appear deviation, Once the work mistake of staff will cause bigger deviation.Manual intervention needs to expend a large amount of energy, and time-consuming, Low efficiency.
Summary of the invention
The embodiment of the invention provides a kind of processing method of behavioral data, behavior prediction method, apparatus, equipment and Jie Matter can obtain the overdue refund prediction model of user using the behavior of application program based on sample of users, and user is overdue to refund in advance Surveying model can predict whether target user overdue can refund, and reduce manual intervention, avoid consuming a large amount of human resources, reduce There is deviation in result caused by human subjective's judgement, shortens the period of prediction user whether overdue refund, improves prediction Efficiency.
In a first aspect, the embodiment of the invention provides the processing method that a kind of user uses the behavioral data of application program, Include:
Obtain the application program usage behavior data of all users in sample of users set;Wherein the application program uses Behavioral data includes application program installation data and application program unloading data;
Application program usage behavior data and the second class based on the first kind sample of users in the sample of users set Difference between the application program usage behavior data of sample of users, filters out at least one destination application;It is wherein described First kind sample of users includes that the user of overdue refund occurs, and the second class sample of users includes that the use of overdue refund does not occur Family;
According to user each in the sample of users set using the duration of at least one destination application, calculate The weight of each destination application at least one corresponding described destination application of each user;
According to the weight of the corresponding each destination application of each user, the overdue refund prediction of user is determined Model;Wherein the overdue refund prediction model of the user is for predicting whether target user overdue can refund.
Second aspect, the embodiment of the invention provides a kind of user's behavior prediction methods, comprising:
According to target user using the duration of each destination application at least one destination application, institute is calculated State the weight of the corresponding each destination application of target user;
By the weight of the corresponding each destination application of the target user, it is overdue also to be input to preset user In money prediction model, obtain the target user whether can overdue refund prediction result;Wherein, the user is overdue refunds in advance Model is surveyed using the overdue refund prediction model of user described in first aspect, at least one described destination application is the At least one destination application described in one side.
The third aspect uses the processing of behavioral data when application program to fill the embodiment of the invention provides a kind of user It sets, comprising:
Module is obtained, for obtaining the application program usage behavior data of all users in sample of users set;Wherein institute Stating application program usage behavior data includes application program installation data and application program unloading data;
Screening module, for the application program usage behavior based on the first kind sample of users in the sample of users set Difference between data and the application program usage behavior data of the second class sample of users, filters out at least one target application journey Sequence;Wherein the first kind sample of users includes that the user of overdue refund occurs, and the second class sample of users includes not occurring The user of overdue refund;
Computing module, for using at least one described target application journey according to user each in the sample of users set The duration of sequence calculates each destination application at least one corresponding described destination application of each user Weight;
Determining module is determined and is used for the weight according to the corresponding each destination application of each user The overdue refund prediction model in family;Wherein the overdue refund prediction model of the user is for predicting whether target user overdue can go back Money.
Fourth aspect, the embodiment of the invention provides a kind of user's behavior prediction devices, comprising:
Computing module, for using each destination application at least one destination application according to target user Duration, calculate the weight of the corresponding each destination application of the target user;
Model prediction module, for inputting the weight of the corresponding each destination application of the target user Into the overdue refund prediction model of preset user, obtain the target user whether can overdue refund prediction result;Wherein, The overdue refund prediction model of user is using the overdue refund prediction model of user described in first aspect, and described at least one A destination application is at least one destination application described in first aspect.
5th aspect, the embodiment of the invention provides a kind of calculating equipment, comprising: processor, memory and is stored in Computer program instructions in the memory;
Method described in first aspect is realized when the computer program instructions are executed by the processor;
Alternatively,
Method described in second aspect is realized when the computer program instructions are executed by the processor.
6th aspect, the embodiment of the invention provides a kind of computer readable storage mediums, are stored thereon with computer journey Sequence instruction,
Method described in first aspect is realized when the computer program instructions are executed by processor;
Alternatively,
Method described in second aspect is realized when the computer program instructions are executed by processor.
Processing method, behavior prediction method, apparatus, equipment and Jie of a kind of behavioral data provided in an embodiment of the present invention Matter can filter out the destination application having between the user of overdue refund and the user of not overdue refund using difference, base The overdue refund prediction model of user is determined in use duration of the sample of users to destination application, it is overdue using determination user Refund prediction model can predict the whether overdue refund of target user of loan, reduce manual intervention, to avoid consumption big The human resources of amount decrease prediction result caused by human subjective's judgement and deviation occur.Due to being that machine automatic Prediction is used The period of the whether overdue refund in family, therefore predetermined period can be shortened, improve forecasting efficiency.
Detailed description of the invention
In order to illustrate the technical solution of the embodiments of the present invention more clearly, will make below to required in the embodiment of the present invention Attached drawing is briefly described, for those of ordinary skill in the art, without creative efforts, also Other drawings may be obtained according to these drawings without any creative labor.
A kind of user that Fig. 1 shows the embodiment of the present invention uses the process of the processing method of the behavioral data of application program Schematic diagram;
Fig. 2 shows the ROC curves of the overdue refund prediction model of user;
Fig. 3 shows the histogram of relationship between the probable range and sample size of overdue refund;
The sample size that Fig. 4 shows overdue refund in each group accounts for the line chart of the ratio of this group of total sample number amount;
A kind of user that Fig. 5 shows the embodiment of the present invention uses the process of the processing method of the behavioral data of application program Schematic diagram;
A kind of user that Fig. 6 shows the embodiment of the present invention uses the frame of the processing unit of the behavioral data of application program Figure;
Fig. 7 shows a kind of block diagram of user's behavior prediction device of the embodiment of the present invention;
Fig. 8 is the structure chart for showing a kind of exemplary hardware architecture for calculating equipment.
Specific embodiment
In order to make the objectives, technical solutions, and advantages of the present invention clearer, with reference to the accompanying drawings and embodiments, right The present invention is described in detail.It should be understood that described, the specific embodiments are only for explaining the present invention, and is not used to limit this Invention.The first, second equal terms in text be only used to an entity (or operation) and another entity (or operation) into Row is distinguished, without indicating that there are any relationship or sequences between these entities (or operation);In addition, in text it is such as upper and lower, Left, right, front and rear etc. indicate the term in direction or orientation, only indicate opposite direction or orientation, and nisi direction or side Position.It is additionally limit in the case where, the element that is limited by sentence " including ... ", however not excluded that in the mistake including the element There is also other elements in journey, method, article or equipment.
A kind of user that Fig. 1 shows the embodiment of the present invention uses the process of the processing method of the behavioral data of application program Schematic diagram.This method comprises: S101 to S104.
S101 obtains the application program usage behavior data of all users in sample of users set;Wherein application program makes It include application program installation data and application program unloading data with behavioral data.
As an example, application program (Application, APP) behavior number of user is obtained after user authorizes According to.
For example, APP behavioral data is APP behavior list, behavior list is text formatting, is belonged to typical unstructured Data.Unstructured data is irregular structure or incomplete data, it has not been convenient to be showed with database two dimension logical table Data, including text, image, audio and video etc..
Application program usage behavior data include application program installation list, (no less than 3 in the sufficient behavior observation phase Month) application program installation data and application program unload data.
As an example, application program installation data includes but is not limited to application program set-up time and application program peace Number is filled, application program unloading data include but is not limited to application program discharge time and application program unloading number.
It should be noted that application program can be the application program of financial field, for example application program may include: silver Capable application program and financing class application program.Certain application program can also be the application program of other field, for example apply Program can also include: tool-class application program and network communications application program.
S102, application program usage behavior data and the second class based on the first kind sample of users in sample of users set Difference between the application program usage behavior data of sample of users, filters out at least one destination application;Wherein first Class sample of users includes that the user of overdue refund occurs, and the second class sample of users includes that the user of overdue refund does not occur.
As an example, the total quantity of sample of users is 10,000 in sample of users set, wherein first kind sample of users Quantity be 1500, the quantity of the second class sample of users is 8500.
It should be noted that first kind sample of users is to have occurred and that the user of overdue refund, the second class sample of users is The user of overdue refund does not occur.First kind sample of users uses the use of the destination application filtered out and the second class sample Family is larger to the use diversity ratio of the destination application filtered out, and therefore, destination application can distinguish overdue refund User and not overdue refund user.Such as the more use one of user of the user or frequent overdue refund of fraud clique Application program, and the less use application program of the user of not overdue refund.
First kind sample of users can be negative sample user, and the second class sample of users can be positive sample user.Certain A kind of sample of users can be positive sample user, and the second class sample of users can be negative sample user, herein and be not construed as limiting.
S103 uses the duration of at least one destination application according to user each in sample of users set, calculates every The weight of each destination application at least one corresponding destination application of a user.
As an example, it can be determined according to the time of user installation application program and the time of unloading application program User uses the duration of destination application.
For example, user installs a destination application at No. 20 of this month for the first time, No. 25 in this month unload the target Application program, then No. 26 of this month install the destination application again again, and always using the destination application to now ?.Therefore the duration that user uses the application program be can analyze out according to above data.
S104 determines the overdue refund prediction mould of user according to the weight of the corresponding each destination application of each user Type;Wherein the overdue refund prediction model of user is for predicting whether target user overdue can refund.
As an example, the weight of the corresponding each destination application of each user records in the table.
For example, table 1 shows the weight of the corresponding each destination application of user, APP1 to APP5 is that target is answered respectively It include: user 1, user 2, user 3 ... in sample of users set with program.From table 1 it follows that the corresponding APP1 of user 1 Weight be 5, the weight of the corresponding APP2 of user 1 is 6, and the weight of the corresponding APP3 of user 1 is 4, the corresponding APP4's of user 1 Weight is 4.It should be noted that the corresponding target of the user is answered if a user was fitted without destination application It can be 0 with the weight of program.
Table 1
APP1 APP2 APP3 APP4
User 1 5 6 4 4
User 2 1 1 3 2
User 3 2 3 1 1
The user of the embodiment of the present invention can be applied to credit card Shen using the processing method of the behavioral data of application program Please, consumption by stages, exempt from the fields such as cash pledge lease, which has the effect that
1, the effect of the overdue refund prediction model of user is good, and the embodiment of the present invention is to carry out risk Zhen in application program level Not, data granularity is thinner.Receiver operating curve (Receiver Operating can be passed through Characteristic Curve, abbreviation ROC curve) come measure the overdue refund prediction model of user effect quality, ROC curve The size (Area Under Curve, AUC) of lower section is between 1.0 and 0.5.In the case where AUC > 0.5, AUC is more connect It is bordering on 1, illustrates that the overdue refund prediction model of user is better.Wherein, AUC has lower accuracy at 0.5~0.7, and AUC is 0.7 There is certain accuracy when~0.9, AUC has high accuracy at 0.9 or more.When AUC=0.5, illustrates that user is overdue and refund in advance Surveying model does not have effect, and the result of prediction is without reference to meaning.AUC < 0.5 does not meet truth, seldom goes out in practice It is existing.Can also with Ke Ermo can love-Si meter love (Kolmogorov-Smirnov, KS) value overdue to user can refund it is pre- The risk separating capacity for surveying model is assessed, and KS value is bigger, and the risk separating capacity of the overdue refund prediction model of user is stronger. Alternatively, KS value can measure the accuracy of the overdue refund prediction model of user, KS value is bigger, illustrates the overdue refund prediction mould of user Type is more accurate, when KS > 0.2 is i.e. it is believed that the overdue refund prediction model of user has relatively good forecasting accuracy.
Fig. 2 shows the ROC curves of the overdue refund prediction model of user.The AUC of the ROC curve is equal to 0.76, illustrates to use The overdue refund prediction model in family has certain accuracy.The KS value of the overdue refund prediction model of user is 0.38, illustrates user The accuracy of overdue refund prediction model is relatively high.
Fig. 3 shows the histogram of relationship between the probable range and sample size of overdue refund.Horizontal axis table in histogram Show the probable range for predicting overdue refund, the longitudinal axis indicates the sample size in each probable range, for example, predicting overdue refund Sample size of the probability between 0-0.05 be 1500.In the histogram, the total sample number amount for participating in prediction is 24617, It predicts that the user of overdue refund accounts for the 21.9% of total sample number amount by the overdue refund prediction model of user, actually occurs overdue The user of refund accounts for the 21.9% of total sample number amount.As it can be seen that the accuracy rate of the overdue refund prediction model of user is relatively high.
The sample size that Fig. 4 shows overdue refund in each group accounts for the line chart of the ratio of this group of total sample number amount.Fig. 4 Horizontal axis indicate sample group number, the longitudinal axis indicates that the sample size of overdue refund accounts for this group of total sample number amount in corresponding group of group number Ratio.Broken line A indicates that the sample size of the overdue refund of prediction meeting in each group accounts for the ratio of this group of total sample number amount, broken line B table Show that the sample size that overdue refund actually occurs in each group accounts for the ratio of this group of total sample number amount.If there is 10000 samples This, is from small to large ranked up 10000 samples according to the probability of the overdue refund of prediction, will according to the sequence of sequence 10000 samples are divided into 10 groups, and every group has 1000 samples.In the 1st group of 1000 samples, the sample of the overdue refund of prediction meeting This quantity is 90, and the sample size that overdue refund actually occurs is the sample number of the overdue refund of prediction meeting in 50, the 1st group To account for the ratio of this group of total sample number amount be 9% to amount, and the sample size of overdue refund actually occurs in the 1st group, and to account for this group of sample total The ratio of quantity is 5%.Therefore, when the horizontal axis of broken line A is 1, the corresponding longitudinal axis is 9%;It is corresponding when the horizontal axis of broken line B is 1 The longitudinal axis is 5%.Due to from fig. 4, it can be seen that broken line A and broken line B difference be not it is very big, the overdue refund of prediction it is general Rate is more accurately.
2, the overdue refund prediction model iteration of user is fast, if manually to application program carry out identification and classification need about 2 The time in week, the embodiment of the present invention eliminated manually to the time of application program investigation, identification, classification, avoided consumption a large amount of Human resources decrease prediction result caused by human subjective's judgement and deviation occur.
3, carry out risk examination in conjunction with static data and dynamic behaviour: the embodiment of the present invention considers the application that user uses Program determines the overdue refund prediction model of user, the Application Column that user uses using the duration of application program according to user Table is static data, and user is dynamic data using the duration of application program, the side that this dynamic data and static data combine Formula can optimize the overdue refund prediction model of user.
4, the overdue refund prediction model of user can with on-line study, the embodiment of the present invention independent of think subjective judgement and Artificial input, therefore the iterative process of the overdue refund prediction model of user can be automated as to " model shows monitoring-online This full automatic process of model on-line study-model automatization deployment ".
It should be noted that the embodiment of the present invention is to determine that user is overdue using the duration of destination application based on user Refund prediction model.The current relatively common way that overdue refund prediction is carried out using application program is carried out to application program Classification, for example, classifying to 25 application programs, the application program number of first category is 5, the application program of second category Number is 6, and the application program number of third classification is 7, and the application program number of the 4th classification is 5, the application journey of the 5th classification Sequence number is 2.Whether the number by investigating the application program of each classification is higher than most of user, to judge that user is overdue A possibility that.
In one embodiment of the invention, S102 includes:
The difference between the first installation rate and the second installation rate is calculated, the first installation rate is application program to be screened Installation rate in a kind of sample of users, the second installation rate are the installation of application program to be screened in the second class sample of users Rate;If the difference between the first installation rate and the second installation rate is greater than first threshold, using application program to be screened as mesh Mark application program.
As an example, the range of first threshold is greater than or equal to 1.5%.
For example, first threshold is 2% or first threshold is 5%.
By taking first threshold is 2% as an example, installation rate of the financing application program in first kind sample of users is 38.5%, installation rate of the financing application program in the second class sample of users is 35.2%, calculates the difference of the two installation rate Value is 3.3%, is greater than 2%, illustrates the user installation of the overdue refund financing application program that more mostly occurs.
User according to an embodiment of the present invention applies journey by calculating using the processing method of the behavioral data of application program Sequence is in the installation rate in first kind sample of users and the difference between the installation rate in the second class sample of users, if the difference is big In first threshold, illustrate that the user that overdue refund occurs and the user that overdue refund does not occur are to have to the installation of the application program Difference, which can be used for distinguishing to a certain extent the user that overdue refund occurs and overdue refund does not occur User.
In one embodiment of the invention, S102 includes:
First kind sample of users treats the usage behavior data of application program to be screened and the second class sample of users Difference between the usage behavior data of the application program of screening carries out significance test;If by significance test, it will be to The application program of screening is as destination application.
As an example, following hypothesis: use row of the first kind sample of users to application program to be screened is made It is data and the second class sample of users to there are significant differences between the usage behavior data of application program to be screened;According to First kind sample of users is to the usage behavior data of application program to be screened and the second class sample of users to application to be screened The usage behavior data of program, calculate average value, the variance of two class sample of users usage behavior data, then calculate double totality t Test value inquires the distribution table of t test value, determines that P value, P value indicate the probability for assuming to set up;If value≤0.05 P, explanation The hypothesis is invalid, i.e., does not pass through significance test;If P value > 0.05, illustrates that the hypothesis is set up, i.e., examined by conspicuousness It tests.
It should be noted that target application can be screened according only to the difference between the first installation rate and the second installation rate Program;Alternatively, can be according only to significance test as a result, screening destination application;Alternatively, can be according to the first installation rate And the second difference and significance test between installation rate as a result, screening destination application.
For example, table 2 shows the result of screening destination application.
Table 2
Installation rate difference Significance test result Whether significant difference
APP1 + 15.1% Significantly It is
APP2 - 8.5% It is general significant It is
APP3 + 0.2% It is not significant It is no
APP4 + 1.4% It is not significant It is no
APP5 - 4.6% Significantly It is
In table 2, installation rate difference refers to the difference between the first installation rate and the second installation rate, when application program Installation rate difference is greater than 2%, and when significance test result is significant or general significant, filters out the application program in the first sample Significant difference on this user and the second sample of users, using the application program as destination application.APP1, APP2 in table 2 It is the APP of significant difference respectively with APP5, is also the destination application filtered out respectively.
In one embodiment of the invention, S103 includes:
By branch mailbox method, single user is smoothed using the duration of single target application program, will smoothly be located Weight of the numerical value as the corresponding destination application of the user after reason.
It should be noted that branch mailbox method is that adjacent value is classified as to a class, by local smoothing method method by continuous data Discretization increases granularity and removal noise.
As an example, it pre-establishes user and uses the corresponding pass between the duration range and weight of destination application System, the weight of the corresponding destination application of user is determined according to the corresponding relationship.
For example, following corresponding relationship is pre-established, if the duration of user installation destination application is less than 3 months, Then the weight of the corresponding destination application of the user is 1;If the duration of user installation destination application is greater than 3 months, and Less than or equal to 8 months, then the weight of the corresponding destination application of the user was 2;If user installation destination application Duration is greater than 8 months, and is less than or equal to 2 years, then the weight of the corresponding destination application of the user is 3;If user installation The duration of destination application is greater than 2 years, then the weight of the corresponding destination application of the user is 4.Based on above correspondence Relationship, the weight of the corresponding destination application of available single user.
User according to an embodiment of the present invention uses the processing method of the behavioral data of application program, by using user The duration of destination application is smoothed, and is realized and is carried out sliding-model control to the duration, increases data granularity, with And eliminate noise data.
In one embodiment of the invention, S104 includes:
The weight of at least one corresponding destination application of each user is separately input to the prediction model of prebuild In, prediction model exports the prediction result of each user;Calculate the prediction result of each user and the actual result of each user Between difference size, the prediction result of each user indicates that the probability of overdue refund, Mei Geyong occur for each user of prediction The actual result at family indicates whether each user occurs overdue refund;If difference size is greater than second threshold, prediction mould is adjusted Coefficient in type is separately input to prebuild back to by the weight of at least one corresponding destination application of each user In prediction model;It is pre- when by difference size equal to or less than second threshold if difference size is equal to or less than second threshold Model is surveyed as the overdue refund prediction model of user.
Table described in table 3 can be made based on table 1, the prediction result of user and the reality of user are increased in table 3 As a result.The prediction result of user refers to whether the user of prediction model prediction can occur overdue refund, the actual result of user Refer to whether user actually occurs overdue refund.
Table 3
According to the content of table 3, the difference size between the prediction result of user and the actual result of user is calculated, to pre- Function is surveyed to be trained.
User according to an embodiment of the present invention is based on each user couple using the processing method of the behavioral data of application program The actual result of the weight at least one destination application answered and each user, is trained prediction model, after training Prediction model be the overdue refund prediction model of user, the overdue refund prediction model of user can be answered according to user using target It is predicted with the behavior of program.
In one embodiment of the invention, it calculates between the prediction result of each user and the actual result of each user Difference size, comprising:
It is calculated by the following formula difference size Z,
Wherein, i=1 ... ..., n;J=1 ... ..., m;xijIndicate corresponding j-th of the mesh of i-th of user in sample of users set The weight of application program is marked, m indicates the total quantity of at least one destination application, f (xi1, xi2...xij...xim) it is prediction Model, f (xi1, xi2...xij...xim) output i-th of user prediction result, n indicate sample of users set in total sample of users Quantity, yi indicate the actual result of i-th of user, L (f (xi1, xi2...xij...xim), yi) it is loss function.
As an example, L (f (xi1, xi2...xij...xim), yi)=f (xi1, xi2...xij...xim)-yi
Prediction model is following function:
Wherein, θ0、θ1、θ2...θmIt is the coefficient in prediction model.
In one embodiment of the invention, after S103, further includes:
Total sample of users in sample of users set is accounted for according to first kind sample of users quantity or the second class sample of users quantity The ratio of quantity is adjusted the weight of the corresponding destination application of user.
It should be noted that first kind sample of users quantity accounts for the ratio and the second class sample of users of total sample number of users The sum that quantity accounts between the ratio of total sample number of users is 1.
As an example, the weight of the corresponding destination application of user is adjusted and includes:
If overdue refund occurs for user, the quantity in the first kind sample of users accounts for sample in the sample set When the ratio of sum is greater than the first numerical value, the weight of the corresponding destination application of up-regulation user;Alternatively, if user does not exceed Phase refunds, then when the ratio that the quantity of second sample accounts for total sample number in the sample set is greater than the first numerical value, on Call the weight of the corresponding destination application in family.
As an example, the first numerical value is greater than or equal to 60%, the weight of the corresponding destination application of up-regulation user It include: that the weight of the corresponding destination application of user is raised into A%, 1≤A≤3.
As an example, the weight of the corresponding destination application of user is adjusted and includes:
If overdue refund occurs for user, the quantity in the first kind sample of users accounts for sample in the sample set When the ratio of sum is less than second value, the weight of the corresponding destination application of user is lowered;Alternatively, if user does not exceed Phase refunds, then when the ratio that the quantity of second sample accounts for total sample number in the sample set is less than second value, under Call the weight of the corresponding destination application in family.
As an example, second value is less than or equal to 40%, lowers the weight of the corresponding destination application of user It include: that the weight of the corresponding destination application of user is lowered into B%, 1≤B≤3.
As an example, the weight of the corresponding destination application of user is adjusted and includes:
If overdue refund occurs for user, the quantity in first kind sample of users accounts for total sample number in the sample set Ratio is greater than 20% and is less than or equal to 40%, then subtracts 0.1 for the weight of the corresponding destination application of the user;If user occurs Overdue refund, the ratio that the quantity in first kind sample of users accounts for total sample number in the sample set then should less than 20% The weight of the corresponding destination application of user subtracts 0.2;If overdue refund, the number in the second class sample of users do not occur for user The ratio that amount accounts for total sample number in the sample set is greater than 60% and is less than or equal to 80%, then by the corresponding target application of the user The weight of program adds 0.1;If overdue refund does not occur for user, the quantity in the second class sample of users is accounted in the sample set The ratio of total sample number is greater than 80% and is less than or equal to 100%, then the weight of the corresponding destination application of the user is added 0.2.
User according to an embodiment of the present invention uses the processing method of the behavioral data of application program, according to first kind sample Number of users accounts for the ratio of total sample number of users or the second class sample of users quantity accounts for the ratio of total sample number of users, to The weight of the corresponding destination application in family is finely adjusted, and weight is enabled preferably to reflect the whether overdue refund of user, from And the prediction model gone out according to the weight training is able to carry out and is accurately predicted.
A kind of user that Fig. 5 shows the embodiment of the present invention uses the process of the processing method of the behavioral data of application program Schematic diagram.This method comprises: S201 and S202.
S201 uses the duration of each destination application at least one destination application according to target user, Calculate the weight of the corresponding each destination application of target user.
It should be noted that calculating the implementation and Fig. 1 of the weight of the corresponding each destination application of target user The implementation that weight is calculated in middle S103 is identical, and it is no longer repeated herein.Target user can be the user of loan.
The weight of the corresponding each destination application of target user is input to the overdue refund of preset user by S202 In prediction model, obtain target user whether can overdue refund prediction result;Wherein, the overdue refund prediction model of user uses Be such as the overdue refund prediction model of the user of Fig. 1, at least one destination application is at least one target application such as Fig. 1 Program.
User's behavior prediction method according to an embodiment of the present invention, by the weight of the corresponding destination application of target user As the input of the overdue refund prediction model of user, whether can with exporting the target user by the overdue refund prediction model of user Target user is used the behavior and the target of application program from the angle of the data fact by the prediction result of overdue refund User whether can overdue refund hook, reduce manual intervention to avoid consuming a large amount of human resources and decrease artificial master It sees prediction result caused by judgement and deviation occurs.It, can be according to pre- and by predicting whether target user overdue can refund It surveys result and judges whether target user is fraud clique, the risk of assessment user's debt-credit.
A kind of user that Fig. 6 shows the embodiment of the present invention uses the frame of the processing unit of the behavioral data of application program Figure.The device 300 includes: to obtain module 301, screening module 302, computing module 303 and determining module 304.
Module 301 is obtained, for obtaining the application program usage behavior data of all users in sample of users set;Wherein Application program usage behavior data include application program installation data and application program unloading data.
Screening module 302, for the application program usage behavior based on the first kind sample of users in sample of users set Difference between data and the application program usage behavior data of the second class sample of users, filters out at least one target application journey Sequence;Wherein first kind sample of users includes that the user of overdue refund occurs, and the second class sample of users includes that overdue refund does not occur User.
Computing module 303, for using at least one destination application according to user each in sample of users set Duration calculates the weight of each destination application at least one corresponding destination application of each user.
Determining module 304 determines that user is overdue for the weight according to the corresponding each destination application of each user Refund prediction model;Wherein the overdue refund prediction model of user is for predicting whether target user overdue can refund.
In one embodiment of the invention, screening module 302 includes: the first computing unit and the first execution unit.
First computing unit, for calculating the difference between the first installation rate and the second installation rate, the first installation rate be to Installation rate of the application program of screening in first kind sample of users, the second installation rate are application programs to be screened in the second class Installation rate in sample of users;
First execution unit will when being greater than first threshold for the difference between the first installation rate and the second installation rate Application program to be screened is as destination application.
In one embodiment of the invention, screening module 302 includes: significance test unit and the second execution unit.
Significance test unit, for first kind sample of users to the usage behavior data of application program to be screened and Second class sample of users carries out significance test to the difference between the usage behavior data of application program to be screened;
Second execution unit, for when passing through significance test, using application program to be screened as target application journey Sequence.
In one embodiment of the invention, computing module 303 includes:
First processing units, for being carried out using the duration of single target application program to single user by branch mailbox method Smoothing processing, using the numerical value after smoothing processing as the weight of the corresponding destination application of the user.
In one embodiment of the invention, determining module 304 includes: input unit, the second computing unit, second processing Unit and third execution unit.
Input unit, for the weight of at least one corresponding destination application of each user to be separately input to pre- structure In the prediction model built, prediction model exports the prediction result of each user.
Second computing unit, for calculating the difference between the prediction result of each user and the actual result of each user Size, the prediction result of each user indicate that the probability of overdue refund, the practical knot of each user occur for each user of prediction Fruit indicates whether each user occurs overdue refund.
The second processing unit, for adjusting the coefficient in prediction model, returning to when difference size is greater than second threshold The weight of at least one corresponding destination application of each user is separately input in the prediction model of prebuild.
Difference size is equal to or less than by third execution unit if being equal to or less than second threshold for difference size Prediction model when second threshold is as the overdue refund prediction model of user.
In one embodiment of the invention, the second computing unit is used for,
It is calculated by the following formula difference size Z,
Wherein, i=1 ... ..., n;J=1 ... ..., m;xijIndicate corresponding j-th of the mesh of i-th of user in sample of users set The weight of application program is marked, m indicates the total quantity of at least one destination application, f (xi1, xi2...xij...xim) it is prediction Model, f (xi1, xi2...xij...xim) output i-th of user prediction result, n indicate sample of users set in total sample of users Quantity, yiIndicate the actual result of i-th of user, L (f (xi1, xi2...xij...xim), yi) it is loss function.
In one embodiment of the invention, user uses the processing unit 300 of the behavioral data of application program further include:
Total sample of users in sample of users set is accounted for according to first kind sample of users quantity or the second class sample of users quantity The ratio of quantity is adjusted the value of weight.
Fig. 7 shows a kind of block diagram of user's behavior prediction device of the embodiment of the present invention.The device 400 includes: to calculate Module 401 and model prediction module 402.
Computing module 401, for using each target application at least one destination application according to target user The duration of program calculates the weight of the corresponding each destination application of target user.
Model prediction module 402, for being input to the weight of the corresponding each destination application of target user default The overdue refund prediction model of user in, obtain target user whether can overdue refund prediction result;Wherein, user is overdue also Money prediction model using such as Fig. 1 the overdue refund prediction model of user, at least one destination application be such as Fig. 1 extremely A few destination application.
Fig. 8 is the structure chart for showing a kind of exemplary hardware architecture for calculating equipment.It is wrapped as shown in figure 8, calculating equipment 500 Include input equipment 501, input interface 502, processor 503, memory 504, output interface 505 and output equipment 506.
Wherein, input interface 502, processor 503, memory 504 and output interface 505 are interconnected by 510 phase of bus It connects, input equipment 501 and output equipment 506 are connect by input interface 502 and output interface 505 with bus 510 respectively, in turn It is connect with the other assemblies for calculating equipment 500.
Specifically, input equipment 501 is received from external input information, and will input information by input interface 502 It is transmitted to processor 503;Processor 503 carries out input information based on the computer executable instructions stored in memory 504 Output information is temporarily or permanently stored in memory 504 to generate output information, then passes through output interface by processing Output information is transmitted to output equipment 506 by 505;Output information is output to the external confession for calculating equipment 500 by output equipment 506 User uses.
Calculating equipment 500 can execute in processing method of the above-mentioned user of the application using the behavioral data of application program Each step.Alternatively, each step in the above-mentioned user's behavior prediction method of the application can be executed by calculating equipment 500.
Processor 503 can be one or more central processing units (Central Processing Unit, CPU).Locating In the case that reason device 503 is a CPU, which can be monokaryon CPU, be also possible to multi-core CPU.
Memory 504 can be but not limited to random access memory (Random Access Memory, RAM), read-only Memory (Read-Only Memory, ROM), Erasable Programmable Read Only Memory EPROM (Erasable Programmable Read Only Memory, EPROM), compact disc read-only memory (Compact Disc Read-Only Memory, CD-ROM), One of hard disk etc. is a variety of.Memory 504 is for storing program code.
It is understood that in the embodiment of the present application, the function of any module or whole modules that Fig. 6 or Fig. 7 are provided It can be realized with central processing unit 503 shown in Fig. 8.
In the above-described embodiments, can come wholly or partly by software, hardware, firmware or any combination thereof real It is existing.When using entirely or partly realizing in the form of a computer program product, the computer program product include one or Multiple computer program instructions.When loading on computers or executing the computer program instructions, entirely or partly generate According to process or function described in the embodiment of the present invention.The computer can be general purpose computer, special purpose computer, computer Network or other programmable devices.The computer program instructions may be stored in a computer readable storage medium, or It is transmitted from a computer readable storage medium to another computer readable storage medium, for example, the computer program refers to Enable can from a web-site, computer, server or data center by it is wired (such as coaxial cable, optical fiber, number use Family line (DSL) or wireless (such as infrared, wireless, microwave etc.) mode are to another web-site, computer, server or data It is transmitted at center).The computer-readable storage medium can be any usable medium that computer can access or It is comprising data storage devices such as one or more usable mediums integrated server, data centers.The usable medium can be with It is magnetic medium, (for example, floppy disk, hard disk, tape), optical medium (for example, DVD) or semiconductor medium (such as solid state hard disk Solid State Disk (SSD)) etc..
The various pieces of this specification are all made of progressive mode and are described, same and similar portion between each embodiment Dividing may refer to each other, and what each embodiment introduced is and other embodiments difference.Especially for device and it is For embodiment of uniting, since it is substantially similar to the method embodiment, so being described relatively simple, related place is referring to method reality Apply the explanation of example part.

Claims (12)

1. a kind of user uses the processing method of the behavioral data of application program characterized by comprising
Obtain the application program usage behavior data of all users in sample of users set;The wherein application program usage behavior Data include application program installation data and application program unloading data;
Application program usage behavior data and the second class sample based on the first kind sample of users in the sample of users set Difference between the application program usage behavior data of user, filters out at least one destination application;Wherein described first Class sample of users includes that the user of overdue refund occurs, and the second class sample of users includes that the user of overdue refund does not occur;
According to user each in the sample of users set using the duration of at least one destination application, described in calculating The weight of each destination application at least one corresponding described destination application of each user;
According to the weight of the corresponding each destination application of each user, the overdue refund prediction mould of user is determined Type;Wherein the overdue refund prediction model of the user is for predicting whether target user overdue can refund.
2. the method according to claim 1, wherein the first kind sample based in sample of users set is used Difference between the application program usage behavior data at family and the application program usage behavior data of the second class sample of users, screening At least one destination application out, comprising:
The difference between the first installation rate and the second installation rate is calculated, first installation rate is application program to be screened Installation rate in a kind of sample of users, second installation rate are that the application program to be screened is used in the second class sample Installation rate in family;
If the difference between first installation rate and second installation rate is greater than first threshold, described to be screened is answered Use program as the destination application.
3. the method according to claim 1, wherein the first kind sample based in sample of users set is used Difference between the application program usage behavior data at family and the application program usage behavior data of the second class sample of users, screening At least one destination application out, comprising:
By the first kind sample of users to the usage behavior data and the second class sample of users of application program to be screened To the difference between the usage behavior data of application program to be screened, significance test is carried out;
If by significance test, using the application program to be screened as the destination application.
4. the method according to claim 1, wherein described make according to each user in the sample of users set With the duration of at least one destination application, at least one corresponding described target application journey of each user is calculated The weight of each destination application in sequence, comprising:
By branch mailbox method, single user is smoothed using the duration of single target application program, after smoothing processing Weight of the numerical value as the corresponding destination application of the user.
5. the method according to claim 1, wherein described according to the corresponding each mesh of each user The weight for marking application program, determines the overdue refund prediction model of user, comprising:
The weight of corresponding at least one destination application of each user is separately input to the prediction of prebuild In model, the prediction model exports the prediction result of each user;
Calculate the difference value between the prediction result of each user and the actual result of each user, each use The prediction result at family indicates that the probability of overdue refund, the actual result table of each user occur for each user of prediction Show whether each user occurs overdue refund;
If the difference size be greater than second threshold, adjust the coefficient in the prediction model, back to it is described will it is described often The weight of corresponding at least one destination application of a user is separately input in the prediction model of prebuild;
If the difference size is equal to or less than second threshold, when the difference size is equal to or less than the second threshold The prediction model as the overdue refund prediction model of the user.
6. according to the method described in claim 5, it is characterized in that, the prediction result for calculating each user with it is described Difference size between the actual result of each user, comprising:
It is calculated by the following formula the difference size Z,
Wherein, i=1 ... ..., n;J=1 ... ..., m;xijIndicate corresponding j-th of the mesh of i-th of user in the sample of users set The weight of application program is marked, m indicates the total quantity of at least one destination application;f(xi1, xi2...xij...xim) be The prediction model, exports the prediction result of i-th of user, and n indicates total sample number of users in the sample of users set;yi Indicate the actual result of i-th of user, L (f (xi1, xi2...xij...xim), yi) it is loss function.
7. method according to any one of claim 1 to 6, which is characterized in that described according to the sample of users set In each user using the duration of at least one destination application, calculate each user corresponding described at least one After the weight of each destination application in a destination application, further includes:
It is accounted for according to the first kind sample of users quantity or the second class sample of users quantity total in the sample of users set The ratio of sample of users quantity is adjusted the value of the weight.
8. a kind of user's behavior prediction method characterized by comprising
According to target user using the duration of each destination application at least one destination application, the mesh is calculated Mark the weight of the corresponding each destination application of user;
By the weight of the corresponding each destination application of the target user, it is pre- to be input to the overdue refund of preset user Survey model in, obtain the target user whether can overdue refund prediction result;Wherein, mould is predicted in the overdue refund of the user Type is answered using the overdue refund prediction model of user such as of any of claims 1-7, at least one described target It is such as at least one destination application of any of claims 1-7 with program.
9. a kind of processing unit of behavioral data when user is using application program characterized by comprising
Module is obtained, for obtaining the application program usage behavior data of all users in sample of users set;It is wherein described to answer It include application program installation data and application program unloading data with program usage behavior data;
Screening module, for the application program usage behavior data based on the first kind sample of users in the sample of users set Difference between the application program usage behavior data of the second class sample of users, filters out at least one destination application; Wherein the first kind sample of users includes that the user of overdue refund occurs, and the second class sample of users is overdue including not occurring The user of refund;
Computing module, for using at least one destination application according to user each in the sample of users set Duration calculates the power of each destination application at least one corresponding described destination application of each user Weight;
Determining module determines that user exceedes for the weight according to the corresponding each destination application of each user Phase refund prediction model;Wherein the overdue refund prediction model of the user is for predicting whether target user overdue can refund.
10. a kind of user's behavior prediction device characterized by comprising
Computing module, for according to target user using each destination application at least one destination application when It is long, calculate the weight of the corresponding each destination application of the target user;
Model prediction module, for being input to the weight of the corresponding each destination application of the target user pre- If the overdue refund prediction model of user in, obtain the target user whether can overdue refund prediction result;Wherein, described The overdue refund prediction model of user using the overdue refund prediction model of user such as of any of claims 1-7, At least one described destination application is such as at least one destination application of any of claims 1-7.
11. a kind of calculating equipment characterized by comprising processor, the calculating of memory and storage in the memory Machine program instruction;
Such as side of any of claims 1-7 is realized when the computer program instructions are executed by the processor Method;
Alternatively,
Method according to claim 8 is realized when the computer program instructions are executed by the processor.
12. a kind of computer readable storage medium, is stored thereon with computer program instructions, which is characterized in that
Such as method of any of claims 1-7 is realized when the computer program instructions are executed by processor;
Alternatively,
Method according to claim 8 is realized when the computer program instructions are executed by processor.
CN201810931189.4A 2018-08-15 2018-08-15 Processing method, behavior prediction method, apparatus, equipment and the medium of behavioral data Pending CN109214912A (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN201810931189.4A CN109214912A (en) 2018-08-15 2018-08-15 Processing method, behavior prediction method, apparatus, equipment and the medium of behavioral data

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN201810931189.4A CN109214912A (en) 2018-08-15 2018-08-15 Processing method, behavior prediction method, apparatus, equipment and the medium of behavioral data

Publications (1)

Publication Number Publication Date
CN109214912A true CN109214912A (en) 2019-01-15

Family

ID=64988234

Family Applications (1)

Application Number Title Priority Date Filing Date
CN201810931189.4A Pending CN109214912A (en) 2018-08-15 2018-08-15 Processing method, behavior prediction method, apparatus, equipment and the medium of behavioral data

Country Status (1)

Country Link
CN (1) CN109214912A (en)

Cited By (4)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN110246026A (en) * 2019-05-21 2019-09-17 平安银行股份有限公司 A kind of output combination setting method, device and the terminal device of data transfer
CN111062518A (en) * 2019-11-22 2020-04-24 成都铂锡金融信息技术有限公司 Method, device and storage medium for processing hastening service based on artificial intelligence
CN111915378A (en) * 2020-08-17 2020-11-10 深圳墨世科技有限公司 User attribute prediction method, device, computer equipment and storage medium
CN113222258A (en) * 2021-05-17 2021-08-06 北京百度网讯科技有限公司 Method, apparatus, device and storage medium for outputting information

Cited By (6)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN110246026A (en) * 2019-05-21 2019-09-17 平安银行股份有限公司 A kind of output combination setting method, device and the terminal device of data transfer
CN110246026B (en) * 2019-05-21 2023-06-27 平安银行股份有限公司 Data transfer output combination setting method and device and terminal equipment
CN111062518A (en) * 2019-11-22 2020-04-24 成都铂锡金融信息技术有限公司 Method, device and storage medium for processing hastening service based on artificial intelligence
CN111062518B (en) * 2019-11-22 2023-06-09 成都铂锡金融信息技术有限公司 Method, device and storage medium for processing collect-promoting business based on artificial intelligence
CN111915378A (en) * 2020-08-17 2020-11-10 深圳墨世科技有限公司 User attribute prediction method, device, computer equipment and storage medium
CN113222258A (en) * 2021-05-17 2021-08-06 北京百度网讯科技有限公司 Method, apparatus, device and storage medium for outputting information

Similar Documents

Publication Publication Date Title
CN109214912A (en) Processing method, behavior prediction method, apparatus, equipment and the medium of behavioral data
CN101493913A (en) Method and system for assessing user credit in internet
CN107194743A (en) A kind of network surveying questionnaire generation method and device
CN112270545A (en) Financial risk prediction method and device based on migration sample screening and electronic equipment
Woods et al. Towards integrating insurance data into information security investment decision making
CN107633030A (en) Credit estimation method and device based on data model
CN111090833A (en) Data processing method, system and related equipment
CN113313538A (en) User consumption capacity prediction method and device, electronic equipment and storage medium
CN112328869A (en) User loan willingness prediction method and device and computer system
CN113393316B (en) Loan overall process accurate wind control and management system based on massive big data and core algorithm
CN115545886A (en) Overdue risk identification method, overdue risk identification device, overdue risk identification equipment and storage medium
CN111210332A (en) Method and device for generating post-loan management strategy and electronic equipment
CN111179051A (en) Financial target customer determination method and device and electronic equipment
CN110689425A (en) Method and device for pricing quota based on income and electronic equipment
CN109102396A (en) A kind of user credit ranking method, computer equipment and readable medium
WO2011149608A1 (en) Identifying and using critical fields in quality management
CN111382909A (en) Rejection inference method based on survival analysis model expansion bad sample and related equipment
CN108197740A (en) Business failure Forecasting Methodology, electronic equipment and computer storage media
KR102336462B1 (en) Apparatus and method of credit rating
CN110134464A (en) Information processing method and device
CN112446777B (en) Credit evaluation method, device, equipment and storage medium
CN113052512A (en) Risk prediction method and device and electronic equipment
CN104252411B (en) A kind of system pressure analysis method and equipment
El Emam A primer on object-oriented measurement
CN112348584A (en) Vehicle estimation method, device and equipment

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination
RJ01 Rejection of invention patent application after publication
RJ01 Rejection of invention patent application after publication

Application publication date: 20190115