CN109214912A - Processing method, behavior prediction method, apparatus, equipment and the medium of behavioral data - Google Patents
Processing method, behavior prediction method, apparatus, equipment and the medium of behavioral data Download PDFInfo
- Publication number
- CN109214912A CN109214912A CN201810931189.4A CN201810931189A CN109214912A CN 109214912 A CN109214912 A CN 109214912A CN 201810931189 A CN201810931189 A CN 201810931189A CN 109214912 A CN109214912 A CN 109214912A
- Authority
- CN
- China
- Prior art keywords
- user
- sample
- users
- destination application
- application program
- Prior art date
- Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
- Pending
Links
Classifications
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06Q—INFORMATION AND COMMUNICATION TECHNOLOGY [ICT] SPECIALLY ADAPTED FOR ADMINISTRATIVE, COMMERCIAL, FINANCIAL, MANAGERIAL OR SUPERVISORY PURPOSES; SYSTEMS OR METHODS SPECIALLY ADAPTED FOR ADMINISTRATIVE, COMMERCIAL, FINANCIAL, MANAGERIAL OR SUPERVISORY PURPOSES, NOT OTHERWISE PROVIDED FOR
- G06Q40/00—Finance; Insurance; Tax strategies; Processing of corporate or income taxes
- G06Q40/03—Credit; Loans; Processing thereof
Abstract
The embodiment of the present invention proposes a kind of processing method of behavioral data, behavior prediction method, apparatus, equipment and medium, the processing method includes: the difference between the application program usage behavior data of application program usage behavior data and the second class sample of users based on the first kind sample of users in sample of users set, filter out at least one destination application, first kind sample of users includes that the user of overdue refund occurs, and the second class sample of users includes that the user of overdue refund does not occur;The duration that at least one destination application is used according to user each in sample of users set, calculates the weight of each destination application at least one corresponding destination application of each user;According to the weight of the corresponding each destination application of each user, the overdue refund prediction model of user is determined.Through the embodiment of the present invention, manual intervention is reduced, avoids consuming a large amount of human resources, prediction result caused by human subjective's judgement is reduced and deviation occurs.
Description
Technical field
The present invention relates to Internet technical field more particularly to a kind of processing method of behavioral data, behavior prediction method,
Device, equipment and medium.
Background technique
Personal credit file uses certain modeling after generating characteristic variable by the case history data of acquisition various dimensions
The personal following overdue risk of method prediction, be widely used in credit card application, consumption by stages, exempt from the fields such as cash pledge lease.
Collage-credit data is the large corporations such as current bank using extensive judgment basis, and collage-credit data and overdue risk are closed by force
Connection, data standard, but the crowd covered is limited, is unable to satisfy the credit demand largely without reference record group.In view of this, more
Begin to use fragmentation, non-structured data come more mechanisms to predict personal overdue risk.
There are two types of modes in the prior art to predict whether the user of debt-credit overdue can refund.The first is passed through based on expert
The air control rule tested, needs air control personnel to follow up in time market trend, captures the behavior of loan user, according to the row of loan user
To judge whether the user has high fraud or overdue risk.Be for second based on expertise Feature Engineering (Feature Engineering,
Mean and extract feature from initial data for model use), feature instruction is usually extracted from initial data by staff
Practice model, whether there is high fraud or overdue risk by model prediction user.
Both air control application technologies, deficient in stability need staff ceaselessly to observe air control rule and feature work
Whether the validity of journey can distinguish overdue user and non-overdue user.Since manual intervention is it is easy to appear deviation,
Once the work mistake of staff will cause bigger deviation.Manual intervention needs to expend a large amount of energy, and time-consuming,
Low efficiency.
Summary of the invention
The embodiment of the invention provides a kind of processing method of behavioral data, behavior prediction method, apparatus, equipment and Jie
Matter can obtain the overdue refund prediction model of user using the behavior of application program based on sample of users, and user is overdue to refund in advance
Surveying model can predict whether target user overdue can refund, and reduce manual intervention, avoid consuming a large amount of human resources, reduce
There is deviation in result caused by human subjective's judgement, shortens the period of prediction user whether overdue refund, improves prediction
Efficiency.
In a first aspect, the embodiment of the invention provides the processing method that a kind of user uses the behavioral data of application program,
Include:
Obtain the application program usage behavior data of all users in sample of users set;Wherein the application program uses
Behavioral data includes application program installation data and application program unloading data;
Application program usage behavior data and the second class based on the first kind sample of users in the sample of users set
Difference between the application program usage behavior data of sample of users, filters out at least one destination application;It is wherein described
First kind sample of users includes that the user of overdue refund occurs, and the second class sample of users includes that the use of overdue refund does not occur
Family;
According to user each in the sample of users set using the duration of at least one destination application, calculate
The weight of each destination application at least one corresponding described destination application of each user;
According to the weight of the corresponding each destination application of each user, the overdue refund prediction of user is determined
Model;Wherein the overdue refund prediction model of the user is for predicting whether target user overdue can refund.
Second aspect, the embodiment of the invention provides a kind of user's behavior prediction methods, comprising:
According to target user using the duration of each destination application at least one destination application, institute is calculated
State the weight of the corresponding each destination application of target user;
By the weight of the corresponding each destination application of the target user, it is overdue also to be input to preset user
In money prediction model, obtain the target user whether can overdue refund prediction result;Wherein, the user is overdue refunds in advance
Model is surveyed using the overdue refund prediction model of user described in first aspect, at least one described destination application is the
At least one destination application described in one side.
The third aspect uses the processing of behavioral data when application program to fill the embodiment of the invention provides a kind of user
It sets, comprising:
Module is obtained, for obtaining the application program usage behavior data of all users in sample of users set;Wherein institute
Stating application program usage behavior data includes application program installation data and application program unloading data;
Screening module, for the application program usage behavior based on the first kind sample of users in the sample of users set
Difference between data and the application program usage behavior data of the second class sample of users, filters out at least one target application journey
Sequence;Wherein the first kind sample of users includes that the user of overdue refund occurs, and the second class sample of users includes not occurring
The user of overdue refund;
Computing module, for using at least one described target application journey according to user each in the sample of users set
The duration of sequence calculates each destination application at least one corresponding described destination application of each user
Weight;
Determining module is determined and is used for the weight according to the corresponding each destination application of each user
The overdue refund prediction model in family;Wherein the overdue refund prediction model of the user is for predicting whether target user overdue can go back
Money.
Fourth aspect, the embodiment of the invention provides a kind of user's behavior prediction devices, comprising:
Computing module, for using each destination application at least one destination application according to target user
Duration, calculate the weight of the corresponding each destination application of the target user;
Model prediction module, for inputting the weight of the corresponding each destination application of the target user
Into the overdue refund prediction model of preset user, obtain the target user whether can overdue refund prediction result;Wherein,
The overdue refund prediction model of user is using the overdue refund prediction model of user described in first aspect, and described at least one
A destination application is at least one destination application described in first aspect.
5th aspect, the embodiment of the invention provides a kind of calculating equipment, comprising: processor, memory and is stored in
Computer program instructions in the memory;
Method described in first aspect is realized when the computer program instructions are executed by the processor;
Alternatively,
Method described in second aspect is realized when the computer program instructions are executed by the processor.
6th aspect, the embodiment of the invention provides a kind of computer readable storage mediums, are stored thereon with computer journey
Sequence instruction,
Method described in first aspect is realized when the computer program instructions are executed by processor;
Alternatively,
Method described in second aspect is realized when the computer program instructions are executed by processor.
Processing method, behavior prediction method, apparatus, equipment and Jie of a kind of behavioral data provided in an embodiment of the present invention
Matter can filter out the destination application having between the user of overdue refund and the user of not overdue refund using difference, base
The overdue refund prediction model of user is determined in use duration of the sample of users to destination application, it is overdue using determination user
Refund prediction model can predict the whether overdue refund of target user of loan, reduce manual intervention, to avoid consumption big
The human resources of amount decrease prediction result caused by human subjective's judgement and deviation occur.Due to being that machine automatic Prediction is used
The period of the whether overdue refund in family, therefore predetermined period can be shortened, improve forecasting efficiency.
Detailed description of the invention
In order to illustrate the technical solution of the embodiments of the present invention more clearly, will make below to required in the embodiment of the present invention
Attached drawing is briefly described, for those of ordinary skill in the art, without creative efforts, also
Other drawings may be obtained according to these drawings without any creative labor.
A kind of user that Fig. 1 shows the embodiment of the present invention uses the process of the processing method of the behavioral data of application program
Schematic diagram;
Fig. 2 shows the ROC curves of the overdue refund prediction model of user;
Fig. 3 shows the histogram of relationship between the probable range and sample size of overdue refund;
The sample size that Fig. 4 shows overdue refund in each group accounts for the line chart of the ratio of this group of total sample number amount;
A kind of user that Fig. 5 shows the embodiment of the present invention uses the process of the processing method of the behavioral data of application program
Schematic diagram;
A kind of user that Fig. 6 shows the embodiment of the present invention uses the frame of the processing unit of the behavioral data of application program
Figure;
Fig. 7 shows a kind of block diagram of user's behavior prediction device of the embodiment of the present invention;
Fig. 8 is the structure chart for showing a kind of exemplary hardware architecture for calculating equipment.
Specific embodiment
In order to make the objectives, technical solutions, and advantages of the present invention clearer, with reference to the accompanying drawings and embodiments, right
The present invention is described in detail.It should be understood that described, the specific embodiments are only for explaining the present invention, and is not used to limit this
Invention.The first, second equal terms in text be only used to an entity (or operation) and another entity (or operation) into
Row is distinguished, without indicating that there are any relationship or sequences between these entities (or operation);In addition, in text it is such as upper and lower,
Left, right, front and rear etc. indicate the term in direction or orientation, only indicate opposite direction or orientation, and nisi direction or side
Position.It is additionally limit in the case where, the element that is limited by sentence " including ... ", however not excluded that in the mistake including the element
There is also other elements in journey, method, article or equipment.
A kind of user that Fig. 1 shows the embodiment of the present invention uses the process of the processing method of the behavioral data of application program
Schematic diagram.This method comprises: S101 to S104.
S101 obtains the application program usage behavior data of all users in sample of users set;Wherein application program makes
It include application program installation data and application program unloading data with behavioral data.
As an example, application program (Application, APP) behavior number of user is obtained after user authorizes
According to.
For example, APP behavioral data is APP behavior list, behavior list is text formatting, is belonged to typical unstructured
Data.Unstructured data is irregular structure or incomplete data, it has not been convenient to be showed with database two dimension logical table
Data, including text, image, audio and video etc..
Application program usage behavior data include application program installation list, (no less than 3 in the sufficient behavior observation phase
Month) application program installation data and application program unload data.
As an example, application program installation data includes but is not limited to application program set-up time and application program peace
Number is filled, application program unloading data include but is not limited to application program discharge time and application program unloading number.
It should be noted that application program can be the application program of financial field, for example application program may include: silver
Capable application program and financing class application program.Certain application program can also be the application program of other field, for example apply
Program can also include: tool-class application program and network communications application program.
S102, application program usage behavior data and the second class based on the first kind sample of users in sample of users set
Difference between the application program usage behavior data of sample of users, filters out at least one destination application;Wherein first
Class sample of users includes that the user of overdue refund occurs, and the second class sample of users includes that the user of overdue refund does not occur.
As an example, the total quantity of sample of users is 10,000 in sample of users set, wherein first kind sample of users
Quantity be 1500, the quantity of the second class sample of users is 8500.
It should be noted that first kind sample of users is to have occurred and that the user of overdue refund, the second class sample of users is
The user of overdue refund does not occur.First kind sample of users uses the use of the destination application filtered out and the second class sample
Family is larger to the use diversity ratio of the destination application filtered out, and therefore, destination application can distinguish overdue refund
User and not overdue refund user.Such as the more use one of user of the user or frequent overdue refund of fraud clique
Application program, and the less use application program of the user of not overdue refund.
First kind sample of users can be negative sample user, and the second class sample of users can be positive sample user.Certain
A kind of sample of users can be positive sample user, and the second class sample of users can be negative sample user, herein and be not construed as limiting.
S103 uses the duration of at least one destination application according to user each in sample of users set, calculates every
The weight of each destination application at least one corresponding destination application of a user.
As an example, it can be determined according to the time of user installation application program and the time of unloading application program
User uses the duration of destination application.
For example, user installs a destination application at No. 20 of this month for the first time, No. 25 in this month unload the target
Application program, then No. 26 of this month install the destination application again again, and always using the destination application to now
?.Therefore the duration that user uses the application program be can analyze out according to above data.
S104 determines the overdue refund prediction mould of user according to the weight of the corresponding each destination application of each user
Type;Wherein the overdue refund prediction model of user is for predicting whether target user overdue can refund.
As an example, the weight of the corresponding each destination application of each user records in the table.
For example, table 1 shows the weight of the corresponding each destination application of user, APP1 to APP5 is that target is answered respectively
It include: user 1, user 2, user 3 ... in sample of users set with program.From table 1 it follows that the corresponding APP1 of user 1
Weight be 5, the weight of the corresponding APP2 of user 1 is 6, and the weight of the corresponding APP3 of user 1 is 4, the corresponding APP4's of user 1
Weight is 4.It should be noted that the corresponding target of the user is answered if a user was fitted without destination application
It can be 0 with the weight of program.
Table 1
APP1 | APP2 | APP3 | APP4 | |
User 1 | 5 | 6 | 4 | 4 |
User 2 | 1 | 1 | 3 | 2 |
User 3 | 2 | 3 | 1 | 1 |
… | … | … | … | … |
The user of the embodiment of the present invention can be applied to credit card Shen using the processing method of the behavioral data of application program
Please, consumption by stages, exempt from the fields such as cash pledge lease, which has the effect that
1, the effect of the overdue refund prediction model of user is good, and the embodiment of the present invention is to carry out risk Zhen in application program level
Not, data granularity is thinner.Receiver operating curve (Receiver Operating can be passed through
Characteristic Curve, abbreviation ROC curve) come measure the overdue refund prediction model of user effect quality, ROC curve
The size (Area Under Curve, AUC) of lower section is between 1.0 and 0.5.In the case where AUC > 0.5, AUC is more connect
It is bordering on 1, illustrates that the overdue refund prediction model of user is better.Wherein, AUC has lower accuracy at 0.5~0.7, and AUC is 0.7
There is certain accuracy when~0.9, AUC has high accuracy at 0.9 or more.When AUC=0.5, illustrates that user is overdue and refund in advance
Surveying model does not have effect, and the result of prediction is without reference to meaning.AUC < 0.5 does not meet truth, seldom goes out in practice
It is existing.Can also with Ke Ermo can love-Si meter love (Kolmogorov-Smirnov, KS) value overdue to user can refund it is pre-
The risk separating capacity for surveying model is assessed, and KS value is bigger, and the risk separating capacity of the overdue refund prediction model of user is stronger.
Alternatively, KS value can measure the accuracy of the overdue refund prediction model of user, KS value is bigger, illustrates the overdue refund prediction mould of user
Type is more accurate, when KS > 0.2 is i.e. it is believed that the overdue refund prediction model of user has relatively good forecasting accuracy.
Fig. 2 shows the ROC curves of the overdue refund prediction model of user.The AUC of the ROC curve is equal to 0.76, illustrates to use
The overdue refund prediction model in family has certain accuracy.The KS value of the overdue refund prediction model of user is 0.38, illustrates user
The accuracy of overdue refund prediction model is relatively high.
Fig. 3 shows the histogram of relationship between the probable range and sample size of overdue refund.Horizontal axis table in histogram
Show the probable range for predicting overdue refund, the longitudinal axis indicates the sample size in each probable range, for example, predicting overdue refund
Sample size of the probability between 0-0.05 be 1500.In the histogram, the total sample number amount for participating in prediction is 24617,
It predicts that the user of overdue refund accounts for the 21.9% of total sample number amount by the overdue refund prediction model of user, actually occurs overdue
The user of refund accounts for the 21.9% of total sample number amount.As it can be seen that the accuracy rate of the overdue refund prediction model of user is relatively high.
The sample size that Fig. 4 shows overdue refund in each group accounts for the line chart of the ratio of this group of total sample number amount.Fig. 4
Horizontal axis indicate sample group number, the longitudinal axis indicates that the sample size of overdue refund accounts for this group of total sample number amount in corresponding group of group number
Ratio.Broken line A indicates that the sample size of the overdue refund of prediction meeting in each group accounts for the ratio of this group of total sample number amount, broken line B table
Show that the sample size that overdue refund actually occurs in each group accounts for the ratio of this group of total sample number amount.If there is 10000 samples
This, is from small to large ranked up 10000 samples according to the probability of the overdue refund of prediction, will according to the sequence of sequence
10000 samples are divided into 10 groups, and every group has 1000 samples.In the 1st group of 1000 samples, the sample of the overdue refund of prediction meeting
This quantity is 90, and the sample size that overdue refund actually occurs is the sample number of the overdue refund of prediction meeting in 50, the 1st group
To account for the ratio of this group of total sample number amount be 9% to amount, and the sample size of overdue refund actually occurs in the 1st group, and to account for this group of sample total
The ratio of quantity is 5%.Therefore, when the horizontal axis of broken line A is 1, the corresponding longitudinal axis is 9%;It is corresponding when the horizontal axis of broken line B is 1
The longitudinal axis is 5%.Due to from fig. 4, it can be seen that broken line A and broken line B difference be not it is very big, the overdue refund of prediction it is general
Rate is more accurately.
2, the overdue refund prediction model iteration of user is fast, if manually to application program carry out identification and classification need about 2
The time in week, the embodiment of the present invention eliminated manually to the time of application program investigation, identification, classification, avoided consumption a large amount of
Human resources decrease prediction result caused by human subjective's judgement and deviation occur.
3, carry out risk examination in conjunction with static data and dynamic behaviour: the embodiment of the present invention considers the application that user uses
Program determines the overdue refund prediction model of user, the Application Column that user uses using the duration of application program according to user
Table is static data, and user is dynamic data using the duration of application program, the side that this dynamic data and static data combine
Formula can optimize the overdue refund prediction model of user.
4, the overdue refund prediction model of user can with on-line study, the embodiment of the present invention independent of think subjective judgement and
Artificial input, therefore the iterative process of the overdue refund prediction model of user can be automated as to " model shows monitoring-online
This full automatic process of model on-line study-model automatization deployment ".
It should be noted that the embodiment of the present invention is to determine that user is overdue using the duration of destination application based on user
Refund prediction model.The current relatively common way that overdue refund prediction is carried out using application program is carried out to application program
Classification, for example, classifying to 25 application programs, the application program number of first category is 5, the application program of second category
Number is 6, and the application program number of third classification is 7, and the application program number of the 4th classification is 5, the application journey of the 5th classification
Sequence number is 2.Whether the number by investigating the application program of each classification is higher than most of user, to judge that user is overdue
A possibility that.
In one embodiment of the invention, S102 includes:
The difference between the first installation rate and the second installation rate is calculated, the first installation rate is application program to be screened
Installation rate in a kind of sample of users, the second installation rate are the installation of application program to be screened in the second class sample of users
Rate;If the difference between the first installation rate and the second installation rate is greater than first threshold, using application program to be screened as mesh
Mark application program.
As an example, the range of first threshold is greater than or equal to 1.5%.
For example, first threshold is 2% or first threshold is 5%.
By taking first threshold is 2% as an example, installation rate of the financing application program in first kind sample of users is
38.5%, installation rate of the financing application program in the second class sample of users is 35.2%, calculates the difference of the two installation rate
Value is 3.3%, is greater than 2%, illustrates the user installation of the overdue refund financing application program that more mostly occurs.
User according to an embodiment of the present invention applies journey by calculating using the processing method of the behavioral data of application program
Sequence is in the installation rate in first kind sample of users and the difference between the installation rate in the second class sample of users, if the difference is big
In first threshold, illustrate that the user that overdue refund occurs and the user that overdue refund does not occur are to have to the installation of the application program
Difference, which can be used for distinguishing to a certain extent the user that overdue refund occurs and overdue refund does not occur
User.
In one embodiment of the invention, S102 includes:
First kind sample of users treats the usage behavior data of application program to be screened and the second class sample of users
Difference between the usage behavior data of the application program of screening carries out significance test;If by significance test, it will be to
The application program of screening is as destination application.
As an example, following hypothesis: use row of the first kind sample of users to application program to be screened is made
It is data and the second class sample of users to there are significant differences between the usage behavior data of application program to be screened;According to
First kind sample of users is to the usage behavior data of application program to be screened and the second class sample of users to application to be screened
The usage behavior data of program, calculate average value, the variance of two class sample of users usage behavior data, then calculate double totality t
Test value inquires the distribution table of t test value, determines that P value, P value indicate the probability for assuming to set up;If value≤0.05 P, explanation
The hypothesis is invalid, i.e., does not pass through significance test;If P value > 0.05, illustrates that the hypothesis is set up, i.e., examined by conspicuousness
It tests.
It should be noted that target application can be screened according only to the difference between the first installation rate and the second installation rate
Program;Alternatively, can be according only to significance test as a result, screening destination application;Alternatively, can be according to the first installation rate
And the second difference and significance test between installation rate as a result, screening destination application.
For example, table 2 shows the result of screening destination application.
Table 2
Installation rate difference | Significance test result | Whether significant difference | |
APP1 | + 15.1% | Significantly | It is |
APP2 | - 8.5% | It is general significant | It is |
APP3 | + 0.2% | It is not significant | It is no |
APP4 | + 1.4% | It is not significant | It is no |
APP5 | - 4.6% | Significantly | It is |
In table 2, installation rate difference refers to the difference between the first installation rate and the second installation rate, when application program
Installation rate difference is greater than 2%, and when significance test result is significant or general significant, filters out the application program in the first sample
Significant difference on this user and the second sample of users, using the application program as destination application.APP1, APP2 in table 2
It is the APP of significant difference respectively with APP5, is also the destination application filtered out respectively.
In one embodiment of the invention, S103 includes:
By branch mailbox method, single user is smoothed using the duration of single target application program, will smoothly be located
Weight of the numerical value as the corresponding destination application of the user after reason.
It should be noted that branch mailbox method is that adjacent value is classified as to a class, by local smoothing method method by continuous data
Discretization increases granularity and removal noise.
As an example, it pre-establishes user and uses the corresponding pass between the duration range and weight of destination application
System, the weight of the corresponding destination application of user is determined according to the corresponding relationship.
For example, following corresponding relationship is pre-established, if the duration of user installation destination application is less than 3 months,
Then the weight of the corresponding destination application of the user is 1;If the duration of user installation destination application is greater than 3 months, and
Less than or equal to 8 months, then the weight of the corresponding destination application of the user was 2;If user installation destination application
Duration is greater than 8 months, and is less than or equal to 2 years, then the weight of the corresponding destination application of the user is 3;If user installation
The duration of destination application is greater than 2 years, then the weight of the corresponding destination application of the user is 4.Based on above correspondence
Relationship, the weight of the corresponding destination application of available single user.
User according to an embodiment of the present invention uses the processing method of the behavioral data of application program, by using user
The duration of destination application is smoothed, and is realized and is carried out sliding-model control to the duration, increases data granularity, with
And eliminate noise data.
In one embodiment of the invention, S104 includes:
The weight of at least one corresponding destination application of each user is separately input to the prediction model of prebuild
In, prediction model exports the prediction result of each user;Calculate the prediction result of each user and the actual result of each user
Between difference size, the prediction result of each user indicates that the probability of overdue refund, Mei Geyong occur for each user of prediction
The actual result at family indicates whether each user occurs overdue refund;If difference size is greater than second threshold, prediction mould is adjusted
Coefficient in type is separately input to prebuild back to by the weight of at least one corresponding destination application of each user
In prediction model;It is pre- when by difference size equal to or less than second threshold if difference size is equal to or less than second threshold
Model is surveyed as the overdue refund prediction model of user.
Table described in table 3 can be made based on table 1, the prediction result of user and the reality of user are increased in table 3
As a result.The prediction result of user refers to whether the user of prediction model prediction can occur overdue refund, the actual result of user
Refer to whether user actually occurs overdue refund.
Table 3
According to the content of table 3, the difference size between the prediction result of user and the actual result of user is calculated, to pre-
Function is surveyed to be trained.
User according to an embodiment of the present invention is based on each user couple using the processing method of the behavioral data of application program
The actual result of the weight at least one destination application answered and each user, is trained prediction model, after training
Prediction model be the overdue refund prediction model of user, the overdue refund prediction model of user can be answered according to user using target
It is predicted with the behavior of program.
In one embodiment of the invention, it calculates between the prediction result of each user and the actual result of each user
Difference size, comprising:
It is calculated by the following formula difference size Z,
Wherein, i=1 ... ..., n;J=1 ... ..., m;xijIndicate corresponding j-th of the mesh of i-th of user in sample of users set
The weight of application program is marked, m indicates the total quantity of at least one destination application, f (xi1, xi2...xij...xim) it is prediction
Model, f (xi1, xi2...xij...xim) output i-th of user prediction result, n indicate sample of users set in total sample of users
Quantity, yi indicate the actual result of i-th of user, L (f (xi1, xi2...xij...xim), yi) it is loss function.
As an example, L (f (xi1, xi2...xij...xim), yi)=f (xi1, xi2...xij...xim)-yi。
Prediction model is following function:
Wherein, θ0、θ1、θ2...θmIt is the coefficient in prediction model.
In one embodiment of the invention, after S103, further includes:
Total sample of users in sample of users set is accounted for according to first kind sample of users quantity or the second class sample of users quantity
The ratio of quantity is adjusted the weight of the corresponding destination application of user.
It should be noted that first kind sample of users quantity accounts for the ratio and the second class sample of users of total sample number of users
The sum that quantity accounts between the ratio of total sample number of users is 1.
As an example, the weight of the corresponding destination application of user is adjusted and includes:
If overdue refund occurs for user, the quantity in the first kind sample of users accounts for sample in the sample set
When the ratio of sum is greater than the first numerical value, the weight of the corresponding destination application of up-regulation user;Alternatively, if user does not exceed
Phase refunds, then when the ratio that the quantity of second sample accounts for total sample number in the sample set is greater than the first numerical value, on
Call the weight of the corresponding destination application in family.
As an example, the first numerical value is greater than or equal to 60%, the weight of the corresponding destination application of up-regulation user
It include: that the weight of the corresponding destination application of user is raised into A%, 1≤A≤3.
As an example, the weight of the corresponding destination application of user is adjusted and includes:
If overdue refund occurs for user, the quantity in the first kind sample of users accounts for sample in the sample set
When the ratio of sum is less than second value, the weight of the corresponding destination application of user is lowered;Alternatively, if user does not exceed
Phase refunds, then when the ratio that the quantity of second sample accounts for total sample number in the sample set is less than second value, under
Call the weight of the corresponding destination application in family.
As an example, second value is less than or equal to 40%, lowers the weight of the corresponding destination application of user
It include: that the weight of the corresponding destination application of user is lowered into B%, 1≤B≤3.
As an example, the weight of the corresponding destination application of user is adjusted and includes:
If overdue refund occurs for user, the quantity in first kind sample of users accounts for total sample number in the sample set
Ratio is greater than 20% and is less than or equal to 40%, then subtracts 0.1 for the weight of the corresponding destination application of the user;If user occurs
Overdue refund, the ratio that the quantity in first kind sample of users accounts for total sample number in the sample set then should less than 20%
The weight of the corresponding destination application of user subtracts 0.2;If overdue refund, the number in the second class sample of users do not occur for user
The ratio that amount accounts for total sample number in the sample set is greater than 60% and is less than or equal to 80%, then by the corresponding target application of the user
The weight of program adds 0.1;If overdue refund does not occur for user, the quantity in the second class sample of users is accounted in the sample set
The ratio of total sample number is greater than 80% and is less than or equal to 100%, then the weight of the corresponding destination application of the user is added 0.2.
User according to an embodiment of the present invention uses the processing method of the behavioral data of application program, according to first kind sample
Number of users accounts for the ratio of total sample number of users or the second class sample of users quantity accounts for the ratio of total sample number of users, to
The weight of the corresponding destination application in family is finely adjusted, and weight is enabled preferably to reflect the whether overdue refund of user, from
And the prediction model gone out according to the weight training is able to carry out and is accurately predicted.
A kind of user that Fig. 5 shows the embodiment of the present invention uses the process of the processing method of the behavioral data of application program
Schematic diagram.This method comprises: S201 and S202.
S201 uses the duration of each destination application at least one destination application according to target user,
Calculate the weight of the corresponding each destination application of target user.
It should be noted that calculating the implementation and Fig. 1 of the weight of the corresponding each destination application of target user
The implementation that weight is calculated in middle S103 is identical, and it is no longer repeated herein.Target user can be the user of loan.
The weight of the corresponding each destination application of target user is input to the overdue refund of preset user by S202
In prediction model, obtain target user whether can overdue refund prediction result;Wherein, the overdue refund prediction model of user uses
Be such as the overdue refund prediction model of the user of Fig. 1, at least one destination application is at least one target application such as Fig. 1
Program.
User's behavior prediction method according to an embodiment of the present invention, by the weight of the corresponding destination application of target user
As the input of the overdue refund prediction model of user, whether can with exporting the target user by the overdue refund prediction model of user
Target user is used the behavior and the target of application program from the angle of the data fact by the prediction result of overdue refund
User whether can overdue refund hook, reduce manual intervention to avoid consuming a large amount of human resources and decrease artificial master
It sees prediction result caused by judgement and deviation occurs.It, can be according to pre- and by predicting whether target user overdue can refund
It surveys result and judges whether target user is fraud clique, the risk of assessment user's debt-credit.
A kind of user that Fig. 6 shows the embodiment of the present invention uses the frame of the processing unit of the behavioral data of application program
Figure.The device 300 includes: to obtain module 301, screening module 302, computing module 303 and determining module 304.
Module 301 is obtained, for obtaining the application program usage behavior data of all users in sample of users set;Wherein
Application program usage behavior data include application program installation data and application program unloading data.
Screening module 302, for the application program usage behavior based on the first kind sample of users in sample of users set
Difference between data and the application program usage behavior data of the second class sample of users, filters out at least one target application journey
Sequence;Wherein first kind sample of users includes that the user of overdue refund occurs, and the second class sample of users includes that overdue refund does not occur
User.
Computing module 303, for using at least one destination application according to user each in sample of users set
Duration calculates the weight of each destination application at least one corresponding destination application of each user.
Determining module 304 determines that user is overdue for the weight according to the corresponding each destination application of each user
Refund prediction model;Wherein the overdue refund prediction model of user is for predicting whether target user overdue can refund.
In one embodiment of the invention, screening module 302 includes: the first computing unit and the first execution unit.
First computing unit, for calculating the difference between the first installation rate and the second installation rate, the first installation rate be to
Installation rate of the application program of screening in first kind sample of users, the second installation rate are application programs to be screened in the second class
Installation rate in sample of users;
First execution unit will when being greater than first threshold for the difference between the first installation rate and the second installation rate
Application program to be screened is as destination application.
In one embodiment of the invention, screening module 302 includes: significance test unit and the second execution unit.
Significance test unit, for first kind sample of users to the usage behavior data of application program to be screened and
Second class sample of users carries out significance test to the difference between the usage behavior data of application program to be screened;
Second execution unit, for when passing through significance test, using application program to be screened as target application journey
Sequence.
In one embodiment of the invention, computing module 303 includes:
First processing units, for being carried out using the duration of single target application program to single user by branch mailbox method
Smoothing processing, using the numerical value after smoothing processing as the weight of the corresponding destination application of the user.
In one embodiment of the invention, determining module 304 includes: input unit, the second computing unit, second processing
Unit and third execution unit.
Input unit, for the weight of at least one corresponding destination application of each user to be separately input to pre- structure
In the prediction model built, prediction model exports the prediction result of each user.
Second computing unit, for calculating the difference between the prediction result of each user and the actual result of each user
Size, the prediction result of each user indicate that the probability of overdue refund, the practical knot of each user occur for each user of prediction
Fruit indicates whether each user occurs overdue refund.
The second processing unit, for adjusting the coefficient in prediction model, returning to when difference size is greater than second threshold
The weight of at least one corresponding destination application of each user is separately input in the prediction model of prebuild.
Difference size is equal to or less than by third execution unit if being equal to or less than second threshold for difference size
Prediction model when second threshold is as the overdue refund prediction model of user.
In one embodiment of the invention, the second computing unit is used for,
It is calculated by the following formula difference size Z,
Wherein, i=1 ... ..., n;J=1 ... ..., m;xijIndicate corresponding j-th of the mesh of i-th of user in sample of users set
The weight of application program is marked, m indicates the total quantity of at least one destination application, f (xi1, xi2...xij...xim) it is prediction
Model, f (xi1, xi2...xij...xim) output i-th of user prediction result, n indicate sample of users set in total sample of users
Quantity, yiIndicate the actual result of i-th of user, L (f (xi1, xi2...xij...xim), yi) it is loss function.
In one embodiment of the invention, user uses the processing unit 300 of the behavioral data of application program further include:
Total sample of users in sample of users set is accounted for according to first kind sample of users quantity or the second class sample of users quantity
The ratio of quantity is adjusted the value of weight.
Fig. 7 shows a kind of block diagram of user's behavior prediction device of the embodiment of the present invention.The device 400 includes: to calculate
Module 401 and model prediction module 402.
Computing module 401, for using each target application at least one destination application according to target user
The duration of program calculates the weight of the corresponding each destination application of target user.
Model prediction module 402, for being input to the weight of the corresponding each destination application of target user default
The overdue refund prediction model of user in, obtain target user whether can overdue refund prediction result;Wherein, user is overdue also
Money prediction model using such as Fig. 1 the overdue refund prediction model of user, at least one destination application be such as Fig. 1 extremely
A few destination application.
Fig. 8 is the structure chart for showing a kind of exemplary hardware architecture for calculating equipment.It is wrapped as shown in figure 8, calculating equipment 500
Include input equipment 501, input interface 502, processor 503, memory 504, output interface 505 and output equipment 506.
Wherein, input interface 502, processor 503, memory 504 and output interface 505 are interconnected by 510 phase of bus
It connects, input equipment 501 and output equipment 506 are connect by input interface 502 and output interface 505 with bus 510 respectively, in turn
It is connect with the other assemblies for calculating equipment 500.
Specifically, input equipment 501 is received from external input information, and will input information by input interface 502
It is transmitted to processor 503;Processor 503 carries out input information based on the computer executable instructions stored in memory 504
Output information is temporarily or permanently stored in memory 504 to generate output information, then passes through output interface by processing
Output information is transmitted to output equipment 506 by 505;Output information is output to the external confession for calculating equipment 500 by output equipment 506
User uses.
Calculating equipment 500 can execute in processing method of the above-mentioned user of the application using the behavioral data of application program
Each step.Alternatively, each step in the above-mentioned user's behavior prediction method of the application can be executed by calculating equipment 500.
Processor 503 can be one or more central processing units (Central Processing Unit, CPU).Locating
In the case that reason device 503 is a CPU, which can be monokaryon CPU, be also possible to multi-core CPU.
Memory 504 can be but not limited to random access memory (Random Access Memory, RAM), read-only
Memory (Read-Only Memory, ROM), Erasable Programmable Read Only Memory EPROM (Erasable Programmable
Read Only Memory, EPROM), compact disc read-only memory (Compact Disc Read-Only Memory, CD-ROM),
One of hard disk etc. is a variety of.Memory 504 is for storing program code.
It is understood that in the embodiment of the present application, the function of any module or whole modules that Fig. 6 or Fig. 7 are provided
It can be realized with central processing unit 503 shown in Fig. 8.
In the above-described embodiments, can come wholly or partly by software, hardware, firmware or any combination thereof real
It is existing.When using entirely or partly realizing in the form of a computer program product, the computer program product include one or
Multiple computer program instructions.When loading on computers or executing the computer program instructions, entirely or partly generate
According to process or function described in the embodiment of the present invention.The computer can be general purpose computer, special purpose computer, computer
Network or other programmable devices.The computer program instructions may be stored in a computer readable storage medium, or
It is transmitted from a computer readable storage medium to another computer readable storage medium, for example, the computer program refers to
Enable can from a web-site, computer, server or data center by it is wired (such as coaxial cable, optical fiber, number use
Family line (DSL) or wireless (such as infrared, wireless, microwave etc.) mode are to another web-site, computer, server or data
It is transmitted at center).The computer-readable storage medium can be any usable medium that computer can access or
It is comprising data storage devices such as one or more usable mediums integrated server, data centers.The usable medium can be with
It is magnetic medium, (for example, floppy disk, hard disk, tape), optical medium (for example, DVD) or semiconductor medium (such as solid state hard disk
Solid State Disk (SSD)) etc..
The various pieces of this specification are all made of progressive mode and are described, same and similar portion between each embodiment
Dividing may refer to each other, and what each embodiment introduced is and other embodiments difference.Especially for device and it is
For embodiment of uniting, since it is substantially similar to the method embodiment, so being described relatively simple, related place is referring to method reality
Apply the explanation of example part.
Claims (12)
1. a kind of user uses the processing method of the behavioral data of application program characterized by comprising
Obtain the application program usage behavior data of all users in sample of users set;The wherein application program usage behavior
Data include application program installation data and application program unloading data;
Application program usage behavior data and the second class sample based on the first kind sample of users in the sample of users set
Difference between the application program usage behavior data of user, filters out at least one destination application;Wherein described first
Class sample of users includes that the user of overdue refund occurs, and the second class sample of users includes that the user of overdue refund does not occur;
According to user each in the sample of users set using the duration of at least one destination application, described in calculating
The weight of each destination application at least one corresponding described destination application of each user;
According to the weight of the corresponding each destination application of each user, the overdue refund prediction mould of user is determined
Type;Wherein the overdue refund prediction model of the user is for predicting whether target user overdue can refund.
2. the method according to claim 1, wherein the first kind sample based in sample of users set is used
Difference between the application program usage behavior data at family and the application program usage behavior data of the second class sample of users, screening
At least one destination application out, comprising:
The difference between the first installation rate and the second installation rate is calculated, first installation rate is application program to be screened
Installation rate in a kind of sample of users, second installation rate are that the application program to be screened is used in the second class sample
Installation rate in family;
If the difference between first installation rate and second installation rate is greater than first threshold, described to be screened is answered
Use program as the destination application.
3. the method according to claim 1, wherein the first kind sample based in sample of users set is used
Difference between the application program usage behavior data at family and the application program usage behavior data of the second class sample of users, screening
At least one destination application out, comprising:
By the first kind sample of users to the usage behavior data and the second class sample of users of application program to be screened
To the difference between the usage behavior data of application program to be screened, significance test is carried out;
If by significance test, using the application program to be screened as the destination application.
4. the method according to claim 1, wherein described make according to each user in the sample of users set
With the duration of at least one destination application, at least one corresponding described target application journey of each user is calculated
The weight of each destination application in sequence, comprising:
By branch mailbox method, single user is smoothed using the duration of single target application program, after smoothing processing
Weight of the numerical value as the corresponding destination application of the user.
5. the method according to claim 1, wherein described according to the corresponding each mesh of each user
The weight for marking application program, determines the overdue refund prediction model of user, comprising:
The weight of corresponding at least one destination application of each user is separately input to the prediction of prebuild
In model, the prediction model exports the prediction result of each user;
Calculate the difference value between the prediction result of each user and the actual result of each user, each use
The prediction result at family indicates that the probability of overdue refund, the actual result table of each user occur for each user of prediction
Show whether each user occurs overdue refund;
If the difference size be greater than second threshold, adjust the coefficient in the prediction model, back to it is described will it is described often
The weight of corresponding at least one destination application of a user is separately input in the prediction model of prebuild;
If the difference size is equal to or less than second threshold, when the difference size is equal to or less than the second threshold
The prediction model as the overdue refund prediction model of the user.
6. according to the method described in claim 5, it is characterized in that, the prediction result for calculating each user with it is described
Difference size between the actual result of each user, comprising:
It is calculated by the following formula the difference size Z,
Wherein, i=1 ... ..., n;J=1 ... ..., m;xijIndicate corresponding j-th of the mesh of i-th of user in the sample of users set
The weight of application program is marked, m indicates the total quantity of at least one destination application;f(xi1, xi2...xij...xim) be
The prediction model, exports the prediction result of i-th of user, and n indicates total sample number of users in the sample of users set;yi
Indicate the actual result of i-th of user, L (f (xi1, xi2...xij...xim), yi) it is loss function.
7. method according to any one of claim 1 to 6, which is characterized in that described according to the sample of users set
In each user using the duration of at least one destination application, calculate each user corresponding described at least one
After the weight of each destination application in a destination application, further includes:
It is accounted for according to the first kind sample of users quantity or the second class sample of users quantity total in the sample of users set
The ratio of sample of users quantity is adjusted the value of the weight.
8. a kind of user's behavior prediction method characterized by comprising
According to target user using the duration of each destination application at least one destination application, the mesh is calculated
Mark the weight of the corresponding each destination application of user;
By the weight of the corresponding each destination application of the target user, it is pre- to be input to the overdue refund of preset user
Survey model in, obtain the target user whether can overdue refund prediction result;Wherein, mould is predicted in the overdue refund of the user
Type is answered using the overdue refund prediction model of user such as of any of claims 1-7, at least one described target
It is such as at least one destination application of any of claims 1-7 with program.
9. a kind of processing unit of behavioral data when user is using application program characterized by comprising
Module is obtained, for obtaining the application program usage behavior data of all users in sample of users set;It is wherein described to answer
It include application program installation data and application program unloading data with program usage behavior data;
Screening module, for the application program usage behavior data based on the first kind sample of users in the sample of users set
Difference between the application program usage behavior data of the second class sample of users, filters out at least one destination application;
Wherein the first kind sample of users includes that the user of overdue refund occurs, and the second class sample of users is overdue including not occurring
The user of refund;
Computing module, for using at least one destination application according to user each in the sample of users set
Duration calculates the power of each destination application at least one corresponding described destination application of each user
Weight;
Determining module determines that user exceedes for the weight according to the corresponding each destination application of each user
Phase refund prediction model;Wherein the overdue refund prediction model of the user is for predicting whether target user overdue can refund.
10. a kind of user's behavior prediction device characterized by comprising
Computing module, for according to target user using each destination application at least one destination application when
It is long, calculate the weight of the corresponding each destination application of the target user;
Model prediction module, for being input to the weight of the corresponding each destination application of the target user pre-
If the overdue refund prediction model of user in, obtain the target user whether can overdue refund prediction result;Wherein, described
The overdue refund prediction model of user using the overdue refund prediction model of user such as of any of claims 1-7,
At least one described destination application is such as at least one destination application of any of claims 1-7.
11. a kind of calculating equipment characterized by comprising processor, the calculating of memory and storage in the memory
Machine program instruction;
Such as side of any of claims 1-7 is realized when the computer program instructions are executed by the processor
Method;
Alternatively,
Method according to claim 8 is realized when the computer program instructions are executed by the processor.
12. a kind of computer readable storage medium, is stored thereon with computer program instructions, which is characterized in that
Such as method of any of claims 1-7 is realized when the computer program instructions are executed by processor;
Alternatively,
Method according to claim 8 is realized when the computer program instructions are executed by processor.
Priority Applications (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN201810931189.4A CN109214912A (en) | 2018-08-15 | 2018-08-15 | Processing method, behavior prediction method, apparatus, equipment and the medium of behavioral data |
Applications Claiming Priority (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN201810931189.4A CN109214912A (en) | 2018-08-15 | 2018-08-15 | Processing method, behavior prediction method, apparatus, equipment and the medium of behavioral data |
Publications (1)
Publication Number | Publication Date |
---|---|
CN109214912A true CN109214912A (en) | 2019-01-15 |
Family
ID=64988234
Family Applications (1)
Application Number | Title | Priority Date | Filing Date |
---|---|---|---|
CN201810931189.4A Pending CN109214912A (en) | 2018-08-15 | 2018-08-15 | Processing method, behavior prediction method, apparatus, equipment and the medium of behavioral data |
Country Status (1)
Country | Link |
---|---|
CN (1) | CN109214912A (en) |
Cited By (4)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN110246026A (en) * | 2019-05-21 | 2019-09-17 | 平安银行股份有限公司 | A kind of output combination setting method, device and the terminal device of data transfer |
CN111062518A (en) * | 2019-11-22 | 2020-04-24 | 成都铂锡金融信息技术有限公司 | Method, device and storage medium for processing hastening service based on artificial intelligence |
CN111915378A (en) * | 2020-08-17 | 2020-11-10 | 深圳墨世科技有限公司 | User attribute prediction method, device, computer equipment and storage medium |
CN113222258A (en) * | 2021-05-17 | 2021-08-06 | 北京百度网讯科技有限公司 | Method, apparatus, device and storage medium for outputting information |
-
2018
- 2018-08-15 CN CN201810931189.4A patent/CN109214912A/en active Pending
Cited By (6)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN110246026A (en) * | 2019-05-21 | 2019-09-17 | 平安银行股份有限公司 | A kind of output combination setting method, device and the terminal device of data transfer |
CN110246026B (en) * | 2019-05-21 | 2023-06-27 | 平安银行股份有限公司 | Data transfer output combination setting method and device and terminal equipment |
CN111062518A (en) * | 2019-11-22 | 2020-04-24 | 成都铂锡金融信息技术有限公司 | Method, device and storage medium for processing hastening service based on artificial intelligence |
CN111062518B (en) * | 2019-11-22 | 2023-06-09 | 成都铂锡金融信息技术有限公司 | Method, device and storage medium for processing collect-promoting business based on artificial intelligence |
CN111915378A (en) * | 2020-08-17 | 2020-11-10 | 深圳墨世科技有限公司 | User attribute prediction method, device, computer equipment and storage medium |
CN113222258A (en) * | 2021-05-17 | 2021-08-06 | 北京百度网讯科技有限公司 | Method, apparatus, device and storage medium for outputting information |
Similar Documents
Publication | Publication Date | Title |
---|---|---|
CN109214912A (en) | Processing method, behavior prediction method, apparatus, equipment and the medium of behavioral data | |
CN101493913A (en) | Method and system for assessing user credit in internet | |
CN107194743A (en) | A kind of network surveying questionnaire generation method and device | |
CN112270545A (en) | Financial risk prediction method and device based on migration sample screening and electronic equipment | |
Woods et al. | Towards integrating insurance data into information security investment decision making | |
CN107633030A (en) | Credit estimation method and device based on data model | |
CN111090833A (en) | Data processing method, system and related equipment | |
CN113313538A (en) | User consumption capacity prediction method and device, electronic equipment and storage medium | |
CN112328869A (en) | User loan willingness prediction method and device and computer system | |
CN113393316B (en) | Loan overall process accurate wind control and management system based on massive big data and core algorithm | |
CN115545886A (en) | Overdue risk identification method, overdue risk identification device, overdue risk identification equipment and storage medium | |
CN111210332A (en) | Method and device for generating post-loan management strategy and electronic equipment | |
CN111179051A (en) | Financial target customer determination method and device and electronic equipment | |
CN110689425A (en) | Method and device for pricing quota based on income and electronic equipment | |
CN109102396A (en) | A kind of user credit ranking method, computer equipment and readable medium | |
WO2011149608A1 (en) | Identifying and using critical fields in quality management | |
CN111382909A (en) | Rejection inference method based on survival analysis model expansion bad sample and related equipment | |
CN108197740A (en) | Business failure Forecasting Methodology, electronic equipment and computer storage media | |
KR102336462B1 (en) | Apparatus and method of credit rating | |
CN110134464A (en) | Information processing method and device | |
CN112446777B (en) | Credit evaluation method, device, equipment and storage medium | |
CN113052512A (en) | Risk prediction method and device and electronic equipment | |
CN104252411B (en) | A kind of system pressure analysis method and equipment | |
El Emam | A primer on object-oriented measurement | |
CN112348584A (en) | Vehicle estimation method, device and equipment |
Legal Events
Date | Code | Title | Description |
---|---|---|---|
PB01 | Publication | ||
PB01 | Publication | ||
SE01 | Entry into force of request for substantive examination | ||
SE01 | Entry into force of request for substantive examination | ||
RJ01 | Rejection of invention patent application after publication | ||
RJ01 | Rejection of invention patent application after publication |
Application publication date: 20190115 |