CN108537397A

CN108537397A - A kind of internet reference appraisal procedure and system

Info

Publication number: CN108537397A
Application number: CN201710117748.3A
Authority: CN
Inventors: 黎新
Original assignee: Tencent Technology Shenzhen Co Ltd
Current assignee: Tencent Technology Shenzhen Co Ltd
Priority date: 2017-03-01
Filing date: 2017-03-01
Publication date: 2018-09-14
Also published as: WO2018157808A1

Abstract

The embodiment of the invention discloses a kind of internet reference appraisal procedure and systems；The embodiment of the present invention is after obtaining training dataset, can be that weight is arranged in each training sample that the training data is concentrated according to preset strategy, then, assessment models are preset using the training data set pair of Weight to be trained, assessment models after being trained, and the internet reference of user is assessed based on assessment models after the training；The program can greatly improve the reasonability and accuracy of assessment, improve application effect.

Description

A kind of internet reference appraisal procedure and system

Technical field

The present invention relates to fields of communication technology, and in particular to a kind of internet reference appraisal procedure and system.

Background technology

With the arriving in big data epoch, the application of internet reference is also increasingly extensive, in addition to can be applied to Except internet finance, other living scenes can also be covered, for example calls a taxi, hire a car or hotel reservation etc., therefore, how Ensure the accurate and fair of internet reference assessment, is increasingly becoming people's problem of interest.

In the prior art, generally it can be used as training dataset by collecting the behavioral data in training period of user, Then, user characteristics are therefrom extracted, using machine learning algorithms such as decision tree and logistic regressions, come the scoring mould that builds one's credit Type, and the credit of user is assessed based on the credit scoring model.Wherein, training dataset is by user and the non-promise breakings of breaking a contract User forms, and is cut into training set and verification collection, and training set is used for training pattern, and verification collection is for carrying out obtained model The standard assessed, and assessed is exactly that the prediction error on verification collection is small as possible, the prediction error mainly prediction violation of agreement The difference of (predicting whether user breaks a contract) with true violation of agreement.

In the research and practice process to the prior art, it was found by the inventors of the present invention that existing internet reference is commented Estimate not reasonable, accuracy is not high, causes application effect bad.

Invention content

A kind of internet reference appraisal procedure of offer of the embodiment of the present invention and system, can improve the reasonability and standard of assessment True property improves application effect.

The embodiment of the present invention provides a kind of internet reference appraisal procedure, including：

Multiple user data are obtained, the user data includes the attribute data, behavioral data and credit record of user；

Training sample is selected from the user data, obtains training dataset；

It is that weight is arranged in each training sample that the training data is concentrated according to preset strategy, obtains the training of Weight Data set；

Assessment models are preset using the training data set pair of Weight to be trained, assessment models after being trained；

The internet reference of user is assessed based on assessment models after the training.

The embodiment of the present invention also provides a kind of internet reference assessment system, including：

Acquiring unit, for obtaining multiple user data, the user data includes the attribute data of user, behavioral data And credit record；

Selecting unit obtains training dataset for selecting training sample from the user data；

Setting unit is obtained for being that weight is arranged in each training sample that the training data is concentrated according to preset strategy To the training dataset of Weight；

Training unit is trained for presetting assessment models using the training data set pair of Weight, after being trained Assessment models；

Assessment unit, for being assessed the internet reference of user based on assessment models after the training.

The embodiment of the present invention can be each of training data concentration according to preset strategy after obtaining training dataset Weight is arranged in training sample, then, presets assessment models using the training data set pair of Weight and is trained, after being trained Assessment models, and the internet reference of user is assessed based on assessment models after the training；Since the program can be according to Preset strategy is that weight is arranged in each training sample, then carries out model training accordingly, therefore, is conducive to different training samples Promise breaking influences to distinguish, and for existing considers the evaluation scheme whether training sample breaks a contract, can greatly improve The reasonability and accuracy of assessment improve application effect.

Description of the drawings

To describe the technical solutions in the embodiments of the present invention more clearly, make required in being described below to embodiment Attached drawing is briefly described, it should be apparent that, drawings in the following description are only some embodiments of the invention, for For those skilled in the art, without creative efforts, it can also be obtained according to these attached drawings other attached Figure.

Fig. 1 a are the frame diagrams of reference appraisal procedure in internet provided in an embodiment of the present invention；

Fig. 1 b are the flow charts of reference appraisal procedure in internet provided in an embodiment of the present invention；

Fig. 2 is another flow chart of reference appraisal procedure in internet provided in an embodiment of the present invention；

Fig. 3 a are the structural schematic diagrams of reference assessment system in internet provided in an embodiment of the present invention；

Fig. 3 b are another structural schematic diagrams of reference assessment system in internet provided in an embodiment of the present invention；

Fig. 4 is the structural schematic diagram of server provided in an embodiment of the present invention.

Specific implementation mode

Following will be combined with the drawings in the embodiments of the present invention, and technical solution in the embodiment of the present invention carries out clear, complete Site preparation describes, it is clear that described embodiments are only a part of the embodiments of the present invention, instead of all the embodiments.It is based on Embodiment in the present invention, the every other implementation that those skilled in the art are obtained without creative efforts Example, shall fall within the protection scope of the present invention.

A kind of internet reference appraisal procedure of offer of the embodiment of the present invention and system.

Wherein, which can specifically be integrated in the equipment such as server.

For example, with reference to Fig. 1 a, which can obtain multiple user data, such as the attribute number of user According to, behavioral data and credit record etc., then, training sample is selected from the user data, and be each according to preset strategy Weight is arranged in training sample, is its setting power based on analysis result for example, can analyze the income of each training sample Weight so that influence of the promise breaking of different training samples to overall income can be distinguished, etc., hereafter, can use this The training sample of a little Weights is trained default assessment models, and is levied to the internet of user based on assessment models after training Letter is assessed, to improve the reasonability and accuracy of assessment.

Wherein, which can be established according to the demand of practical application, for example, the assessment models can be with Include the loss function for predicting user's violation of agreement and the loss function for predicting user's situation of Profit, i.e. this hair First-loss function and the second loss function described in bright embodiment, etc..

It is described in detail separately below.It should be noted that the serial number of following embodiment is not as preferably suitable to embodiment The restriction of sequence.

Embodiment one,

The present embodiment will be described from the angle of internet reference assessment system, and the internet reference assessment system is specific It can be integrated in the equipment such as server, such as evaluating server.

A kind of internet reference appraisal procedure, including：Multiple user data are obtained, training sample is selected from the user data This, obtains training dataset, is that weight is arranged in each training sample that the training data is concentrated according to preset strategy, obtains cum rights The training dataset of weight is preset assessment models using the training data set pair of Weight and is trained, mould is assessed after being trained Type assesses the internet reference of user based on assessment models after the training.

As shown in Figure 1 b, the detailed process of the internet reference appraisal procedure can be as follows：

101, multiple user data are obtained.

Wherein, which may include the data such as the attribute data, behavioral data and credit record of user.Wherein, The attribute data of user, which may include user, to be registered in platform or from the user information that other channels obtain, such as the property of user Not, the populations attribute information such as age, region, and/or educational background；The behavioral data of user may include user in platform login, point It hits, send out data caused by the behaviors such as message, shopping, payment, and/or reading；The credit record of user may include user's The information such as promise breaking record.

102, training sample is selected from the user data, obtains training dataset.

Wherein, the mode of selection can there are many, for example, can be selected at random, alternatively, can also be according to user's Benefit distribution select, etc..To be selected according to the benefit distribution of user in order to which then step is " from the number of users According to middle selection training sample, training dataset is obtained ", it specifically can be as follows：

(1) according to Users'Data Analysis user's income.

For example, by taking loan as an example, which refers to that capital's (provide a loan provider, such as the mechanisms such as bank) provides After lening some user, which is the interests income that capital brings, and generally may include loan interest income and overdue penalizes Income is ceased, therefore, user's income, i.e. step " root can be calculated by analyzing loan interest income and overdue default interest income According to Users'Data Analysis user income " it can specifically include：

The loan interest income of user is determined according to the user data, and determines that user's is overdue according to the user data Then default interest income calculates the sum of the loan interest income and overdue default interest income, obtains user's income, be formulated i.e. For：

User's income=loan interest income+overdue default interest income.

Wherein, the computational methods of loan interest income can be depending on the demand of practical application, for example, can be according to this Gold and loan interest rate calculate the loan interest income, etc., specifically can be as follows：

Loan interest income=r₁*M。

r₁For loan interest rate, M is capital.It should be noted that the unit of the loan interest rate can be according to the need of practical application It asks to be configured, for example, loan rate per diem, loan rate per month or loan rate, etc. are can be set as, for the side of description It just, in embodiments of the present invention, will be with r₁To be illustrated for loan rate per month.In addition, the specific of the loan interest rate takes Value can also be configured according to the demand of practical application, and details are not described herein.

Wherein, overdue default interest is user due to imposing a fine income caused by overdue refund, still, it should be noted that, user's Overdue default interest is bigger, does not represent that overdue default interest income is bigger, if because user is overdue more long, the user is more dangerous (i.e. Credit is poorer), it is likely that can because also not preceding and cause capital generate bigger loss, therefore, in embodiments of the present invention, Overdue default interest income is defined as a variable with time change, it is overdue to penalize when the overdue time is without departing from predetermined threshold value Breath is otherwise positive income when the overdue time exceeding predetermined threshold value, then becomes negative sense income.That is, it is optional, step " according to The user data determines the overdue default interest income of user " it specifically can be as follows：

The capital, penalty for delay interest rate and overdue time of user are determined according to the user data；

If the overdue time is less than predetermined threshold value, by multiplying for the overdue time, penalty for delay interest rate and capital Product, as overdue default interest income, be formulated as：Overdue default interest income=k*r₂*M；

If the overdue time is more than predetermined threshold value, the difference of the overdue time and predetermined threshold value are calculated, by the difference, the loan The opposite number of the product of the penalty for delay interest rate and capital of interest income and overdue default interest income, as overdue default interest income, Be formulated as：Overdue default interest income=- (k-m) * r₂*M。

Wherein, r₂For penalty for delay interest rate, M is capital, and k is the overdue time, it should be noted that, the overdue time penalizes with overdue The unit of money interest rate can be configured according to the demand of practical application, if for example, penalty for delay interest rate is rate per diem, it can be with The unit of overdue time is set as " number of days ", if penalty for delay interest rate is rate per month, the unit of overdue time can be set It is set to " months ", and so on, etc..

If indicating user's income with Reward, as described above it is found that the calculation formula of user's income can be with It is as follows：

If not overdue：Reward=r₁* M+0=r₁*M；

If the overdue time is less than predetermined threshold value：Reward=r₁*M+k*r₂*M；

If the overdue time is less than predetermined threshold value：Reward=r₁*M+(-(k-m)*r₂* M)=r₁*M-(k-m)*r₂*M。

Wherein, which can be configured according to the demand of practical application, and details are not described herein.

(2) training sample is selected from the user data according to user's income so that the user of selected training sample The distribution of income is consistent with the distribution of user's income of the user data, obtains training dataset.

When selecting training sample, user can be divided into " handy family " and " bad user " according to user's income, for example, It can will be less than the user of preset times (such as 3 times) without overdue or overdue number, and be determined as " handy family ", otherwise, if exceeding Phase number is more than preset times, it is determined that is " bad user ".It can be according to certain sampling proportion from " handy family " and " bad user " It is middle to extract corresponding user, as training sample, and it is added to training data concentration.

Wherein, so-called " distribution of user's income of selected training sample and point of user's income of the user data Cloth is consistent " ratio that refers in selected training sample " handy family " and " bad user ", with all users got " handy family " is consistent with the ratio of " bad user " in data, if for example, in all customer data got " handy family " with The ratio of " bad user " is 3:2, if necessary to therefrom choose 1000 training samples, then at this point it is possible to the choosing from " handy family " 600 training samples are selected, and select 400 training samples from " bad user ", in this way, " good in selected training sample The ratio of user " and " bad user " is：600:400=3:In 2, with all customer data got " handy family " with The ratio of " bad user " is consistent, so, it is believed that the distribution of user's income of selected training sample at this time and the user The distribution of user's income of data is consistent, and so on, details are not described herein.

103, it is that weight is arranged in each training sample that the training data is concentrated according to preset strategy, obtains the instruction of Weight Practice data set.

Wherein, be arranged weight mode can there are many, for example, can be configured according to the size of user's income, It specifically can be as follows：

It is that weight is arranged in each training sample that the training data is concentrated according to the size of user's income, obtains Weight Training dataset.

For example, still indicating user's income corresponding to user x with Reward (x), it is assumed that have N number of training sample, then Each the weight Weight (x) of training sample x (i.e. user x) can be：

Weight (x)=(Reward (x)-Min (Reward))/(Max (Reward)-Min (Reward))；

Wherein, Min (Reward) is the minimum value (i.e. minimum user's financial value) in user's income of all training samples, Max (Reward) is the maximum value (i.e. maximum user's financial value) in user's income of all training samples.

That is, the difference of the user's income and minimum user's financial value of current training sample can be calculated, first is obtained Value, and calculate the difference of maximum user's financial value and minimum user's financial value, obtains second value, by the first value and second value Quotient, the weight as the current training sample.

Alternatively, in addition to can be other than the weight according to user's income size training sample is arranged, it can also will be other Factor, for example, user credit record also as setting weight one of considerations, i.e., step " according to preset strategy be the instruction Weight is arranged in each training sample practiced in data set, obtains the training dataset of Weight " it specifically can be as follows：

It is that each training sample that the training data is concentrated is set according to the credit record of the size of user's income and user Weight is set, the training dataset of Weight is obtained.

Wherein, specific setting method can be depending on the demand of practical application, for example, can be respectively user's income Certain proportion is set with credit record, is then based on the proportion, the weight of the training sample is calculated according to preset algorithm, Etc., details are not described herein.

104, it presets assessment models using the training data set pair of Weight to be trained, assessment models after being trained.

Wherein, which can be defined and be stored in advance according to the demand of practical application, when needed, It is directly directly read from storage location, alternatively, the assessment models can also directly be established by system, i.e., in step Before " presetting assessment models using the training data set pair of Weight to be trained, assessment models after being trained ", the interconnection Net reference appraisal procedure can also include：

First-loss function and the second loss function are set, which is for predicting user's violation of agreement Loss function, second loss function are loss function for predicting user's situation of Profit, according to the first-loss function and Second loss function establishes assessment models.

Wherein, first-loss function and the second loss function can be set according to the demand of practical application, for example, can It is then established based on user characteristics and user tag with excavating user characteristics from user data and obtaining user tag One Logic Regression Models are as follows：

Y (θ, x)=h_θ(x)=θ₀+θ₁x₁+θ₂x₂+……+θ_nx_n；

Wherein, y is user tag, indicates whether user breaks a contract, if promise breaking, is 1, is otherwise dependent variable for 0, y；x It indicates user characteristics, is independent variable；θ indicates that (weight of user characteristics x) is parameter vector to independent variable.

The Logic Regression Models are trained, training process is mainly optimization object function J (θ) so that object function J (θ) is gradually to maximum or minimum direction change.Because prediction is whether user breaks a contract, object function J (θ) can be with It is defined as the loss function of prediction error, can be defined with the mode of least mean-square error, it is as follows：

Wherein, h_θ(x⁽ⁱ⁾) it is the value that i-th of training sample is predicted, y⁽ⁱ⁾Indicate the actual value of i-th of training sample, training Target is so that loss function minimum, i.e. h_θ(x⁽ⁱ⁾)=y⁽ⁱ⁾。

As it can be seen that J (θ) is the loss function that can predict user's violation of agreement, it therefore, can be by J (θ) as the first damage Function is lost, and the second loss function can be defined as multiplying for first-loss function and the weight Weight (x) of each training sample Product, i.e., by taking N number of training sample as an example, if if being formulated, first-loss function and the second loss function respectively can be as Under：

First-loss function：

Second loss function：

The object function " loss " of a structuring, the target are defined according to above-mentioned first-loss function and the second loss function Model corresponding to function is the assessment models.

For example, can by first-loss function and first-loss function and as the object function, etc..

Optionally, two kinds of loss functions (i.e. first-loss function and the second loss letter in order to can be more flexible be controlled Number) between relationship, a constant term can also be set, as the coefficient of balance of first-loss function and the second loss function, used In the proportion relationship of control first-loss function and the second loss function；I.e. in step " according to this for predicting user's promise breaking feelings The loss function of condition and for predicting that the loss function of user's situation of Profit establishes assessment models " before, the internet reference Assessing to include：

Coefficient of balance is set, which is used to control the proportion relationship of first-loss function and the second loss function.

Then at this point, step " establishing assessment models according to the first-loss function and the second loss function ", including：According to this First-loss function, the second loss function and coefficient of balance establish assessment models.

For example, can with the product of calculated equilibrium coefficient and the second loss function, by the product and first-loss function and As the object function " loss ", being formulated can be as follows：

Wherein, γ is coefficient of balance, is a constant term, specific value can be according to the demand of practical application, such as root According to the variation of the factors such as product and/or industry, neatly to be set, to achieve the purpose that adjust scoring tactics.

The object function is being obtained, i.e., after assessment models, can use the training data set pair of Weight is default to comment Estimate model to be trained, with assessment models after being trained.Wherein, the Machine learning tools increased income may be used in training process, For example the machines such as decision tree or logistic regression are trained, and after object function reaches certain threshold value terminate training process, are instructed Experienced target is the minimization of object function.

It should be noted that defining above-mentioned loss function (first-loss function and second in addition to mean square error may be used Loss function) except, other modes can also be used, for example use 0-1 loss functions or logarithm loss function, etc., Details are not described herein.

In addition, it should be noted that, the user characteristics described in the embodiment of the present invention may include foundation characteristic, such as population category Property feature and basic behavioural characteristic etc., can also include some derivative features, such as week/monthly behavioural characteristic and/or behavior sequence Row feature etc., wherein basic behavioural characteristic may include user click, read, forwarding, payment, shopping and/or collection etc. row For；Week/monthly behavioural characteristic can be obtained according to behavioral statistics such as user's click, reading, and/or forwardings, and behavior sequence is special Sign can be obtained according to behavioral statistics such as user's payment, shopping and/or collections, and details are not described herein.

105, the internet reference of user is assessed based on assessment models after the training.For example, specifically can be as follows：

The assessment request of internet reference is received, the target which is assessed is used Family obtains the user data of the target user, according to the user data of the target user, by assessment models after the training to mesh The internet reference of mark user is assessed.

For example, the user data of target user can specifically be calculated using assessment models after the training, and will meter It calculates result and is converted to scoring, it is for reference；I.e. step after the training " according to the user data of the target user, by assessing mould Type assesses the internet reference of target user " may include：

The user data of the target user is calculated using assessment models after the training, assessment probability value is obtained, presses The score value that the assessment probability value is converted to preset format according to preset algorithm obtains the internet reference scoring of target user.

Wherein, which can be configured according to the demand of practical application, for example, will assess probability value conversion For being less than 900 integer more than 400, then the preset algorithm specifically can be as follows：

Score=400+500P；

Wherein, Score scores for internet reference, and P is assessment probability value, and the interval of P is [0,1].

From the foregoing, it will be observed that the present embodiment after obtaining training dataset, can be concentrated according to preset strategy for the training data Each training sample be arranged weight, then, using Weight training data set pair preset assessment models be trained, obtain Assessment models after training, and the internet reference of user is assessed based on assessment models after the training；Since the program can To be that weight is arranged in each training sample, then carries out model training accordingly according to preset strategy, therefore, be conducive to different training The promise breaking influence of sample distinguishes, for existing considers the evaluation scheme whether training sample breaks a contract, Ke Yi great The big reasonability and accuracy for improving assessment, improves application effect.

Embodiment two,

According to method described in embodiment one, citing is described in further detail below.

In the present embodiment, it will be specifically integrated in evaluating server with the internet reference assessment system and with square Error illustrates to define for loss function.

As shown in Fig. 2, a kind of internet reference appraisal procedure, detailed process can be as follows：

201, evaluating server obtains multiple user data.

For example, specifically multiple user data can be acquired from internet or other approach, be then stored in local or its In his storage device, when needed, it is read out from local or other storage devices by evaluating server；Alternatively, may be used also To be directly acquired from internet or other channel to the user data by evaluating server, etc..

Wherein, which may include the data such as the attribute data, behavioral data and credit record of user.

The attribute data of user, which may include user, to be registered in platform or from the user information that other channels obtain, such as The populations attribute information such as gender, age, region, and/or educational background of user.

The behavioral data of user may include user in platform login, click, hair message, shopping, payment, and/or reading Data caused by equal behaviors.

The credit record of user may include the information such as the promise breaking record of user.

202, evaluating server is according to Users'Data Analysis user's income.

Wherein, the computational methods of user's income can be depending on the demand of practical application.For example, by taking loan as an example, User's income refers to that capital offers a loan to after some user, which is the interests income that capital brings, generally can be with Including loan interest income and overdue default interest income, therefore, the loan interest income of user can be determined according to the user data The sum of the loan interest income and overdue default interest income then is calculated with overdue default interest income, to obtain user's income, with public affairs Formula indicates：

User's income=loan interest income+overdue default interest income.

Wherein, loan interest income depends on capital and loan interest rate, and overdue default interest is then referred to since user is overdue Fine income caused by refunding, the overdue default interest are a variable with time change, when the overdue time is without departing from default When threshold value, overdue default interest is otherwise positive income when the overdue time exceeding predetermined threshold value, then becomes negative sense income.

For example, being M, loan interest rate r with capital₁, penalty for delay interest rate is r₂, k is the overdue time, and m is the overdue time For predetermined threshold value, then user's income Reward is：

If not overdue：Reward=r₁* M+0=r₁*M；

If the overdue time is less than predetermined threshold value m (i.e. k<m)：Reward=r₁*M+k*r₂*M；

If the overdue time is more than predetermined threshold value m (i.e. k >=m)：Reward=r₁*M+(-(k-m)*r₂* M)=r₁*M-(k- m)*r₂*M。

Wherein, predetermined threshold value m can be configured, etc. according to the demand of practical application.

It should be noted that loan interest rate r₁Unit and value can be configured according to the demand of practical application, For example, can be by loan interest rate r₁It is set as provide a loan rate per diem, loan rate per month or loan rate, etc.；Similarly, overdue to penalize Money interest rate is r₂Can also accordingly it be arranged according to the demand of practical application with the unit and value of overdue time k, for example, If by penalty for delay interest rate r₂For rate per diem, then the unit of overdue time k can be set as " number of days ", if penalty for delay interest rate r₂For rate per month, then the unit of overdue time k can be set as " months ", and so on, etc..

For example, with loan interest rate r₁For rate per month of providing a loan, penalty for delay interest rate r₂For rate per diem, overdue time k is number of days, And the predetermined threshold value m of overdue time be 10 days for, if the capital of user's first be " 10000 " member, loan interest rate r₁For monthly 0.01%, penalty for delay interest rate r₂Be daily 0.01%, then under different scenes, user's income corresponding to user's first Reward (lening the income that user brings to capital) respectively can be as follows：

(1) if user's first do not have it is overdue,：

Reward=r₁* M+0=r₁* M=0.01%*10000=1 members.

Even user's first does not have the overdue time to refund, then user's income corresponding to user's first is positive 1 yuan of income, i.e., " earning 1 yuan ".

(2) if the overdue time of user's first is less than 10 days, for example the overdue time is 8 days, then：

Reward=r₁*M+k*r₂* M=0.01%*10000+8*0.01%*10000=9 members.

Even the overdue time of user's first is 8 days, then user's income corresponding to user's first is positive 9 yuan of income, i.e., " earning 9 yuan ".

(3) if the overdue time of user's first is more than 10 days, such as 20 days, then：

Reward=r₁*M+(-(k-m)*r₂* M)=r₁*M-(k-m)*r₂* M=0.01%*10000- (20-10) * 0.01%*10000=-9 members.

Even the overdue time of user's first is 20 days, then user's income corresponding to user's first is 9 yuan of negative sense income, i.e., " 9 yuan of loss ".

203, evaluating server selects training sample according to user's income from the user data so that selected training The distribution of user's income of sample is consistent with the distribution of user's income of the user data, obtains training dataset.

Wherein, in selected training sample " handy family " and " bad user " ratio, with all users got The ratio of " handy family " and " bad user " are as consistent as possible in data (allow there are a certain range of errors), so that selected The distribution and the distribution of user's income of the user data of the user's income for the training sample selected can be consistent.

For example, with the ratio at " handy family " in all customer data got and " bad user " for 7:For 3, if needing 1000 training samples are therefrom chosen, then at this point it is possible to select 700 training samples from " handy family ", and from " evil idea is used 300 training samples are selected in family ", in this way, " handy family " and the ratio of " bad user " are in selected training sample： 700:300=7:3, it is consistent with the ratio of " bad user " with " handy family " in all customer data got, and so on, Etc..

204, evaluating server is that power is arranged in each training sample that the training data is concentrated according to the size of user's income Weight, obtains the training dataset of Weight.

For example, evaluation services implement body can calculate user's income of current training sample and minimum user's financial value Difference obtains the first value, and calculates the difference of maximum user's financial value and minimum user's financial value, second value is obtained, by the first value With the quotient of second value, the weight as the current training sample.Be formulated as：

Weight (x)=(Reward (x)-Min (Reward))/(Max (Reward)-Min (Reward))；

Wherein, Weight (x) is the weight of user x, and Reward (x) indicates user's income corresponding to user x, Min (Reward) it is the minimum value (i.e. minimum user's financial value) in user's income of all training samples, Max (Reward) is institute There is the maximum value (i.e. maximum user's financial value) in user's income of training sample.

For example, still by taking user's first as an example, if user's income corresponding to user's first is 1 yuan, and minimum user's financial value It it is -15 yuan, maximum user's financial value is 10 yuan, then the weight of user's first is：

Weight (x)=(1- (- 15))/(10- (- 15))=16/25=0.64.

It optionally, can also be by other in addition to can be other than the weight according to user's income size training sample is arranged Factor, such as user credit record also as setting weight one of considerations, details are not described herein.

205, evaluating server setting first-loss function, the second loss function and coefficient of balance.

Wherein, which is the loss function for predicting user's violation of agreement, which is Loss function for predicting user's situation of Profit, the coefficient of balance are used to control first-loss function and the second loss function Proportion relationship.

Wherein, first-loss function and the second loss function can be set according to the demand of practical application, for example, can With as follows：

First-loss function：

Second loss function：

Wherein, x⁽ⁱ⁾Indicate the user characteristics of i-th of training sample, h_θ(x⁽ⁱ⁾) it is the value that i-th of training sample is predicted, y⁽ⁱ⁾Indicate the actual value of i-th of training sample, Weight (x⁽ⁱ⁾) indicate i-th of training sample weight.

206, evaluating server establishes assessment mould according to the first-loss function, the second loss function and coefficient of balance Type.

For example, evaluating server can be with the product of calculated equilibrium coefficient and the second loss function, by the product and the first damage Lose function and as the object function " loss ", being formulated can be as follows：

Based on above-mentioned object function (assessment models) it is found that when prediction is correct, that is, " h_θ(x⁽ⁱ⁾)=y⁽ⁱ⁾" when, then do not have Bear interest loss.For example, still by taking loan as an example, if being predicted as " handy family ", then it is assumed that lening the user does not have risk, Because prediction is correct, in fact lends the user and there will not be risk, so, loss will not be brought；And if prediction For " bad user ", then the user will not be lent, so loss will not be brought.

It should be noted that in the present embodiment, only to define above-mentioned loss function (first-loss using mean square error Function and the second loss function) for illustrate, it should be appreciated that above-mentioned damage is defined in addition to mean square error may be used It loses except function, other modes can also be used, for example use 0-1 loss functions or logarithm loss function, etc., This is repeated no more.

207, evaluating server receives the assessment request of internet reference, which needs to carry out The target user of assessment.

For example, evaluation services implement body can receive other equipment, such as the internet reference assessment request that terminal is sent, Wherein, the user identifier of the target user assessed, such as user's name are carried in the internet reference assessment request And/or the information such as account number.

208, evaluating server obtains the user data of the target user, according to the user data of the target user, passes through Assessment models assess the internet reference of target user after the training；For example, specifically can be as follows：

Wherein, which can be configured according to the demand of practical application, for example, to assess the value of probability value Section is [0,1], is needed for assessing the integer that probability value is converted to more than 400 less than 900, then the preset algorithm specifically may be used With as follows：

Score=400+500P；

Wherein, Score scores for internet reference, and P is assessment probability value.

For example, if P is 0.2, Score=400+500*0.2=500 points.

For another example, if P is 0.8, Score=400+500*0.8=800 points, and so on, etc..

After obtaining internet reference scoring, the letter of the target user can be determined according to the internet reference scoring With how, to taking corresponding flow to the target user, the user is given for example, offering a loan, alternatively, do not offer a loan to The user, alternatively, allowing the user to execute certain rights, alternatively, not allowing the user to execute certain rights, etc., specifically Flow can be depending on the demand of practical application, and details are not described herein.

From the foregoing, it will be observed that the present embodiment after obtaining training dataset, can be concentrated according to preset strategy for the training data Each training sample be arranged weight, then, using Weight training data set pair preset assessment models be trained, obtain Assessment models after training, and the internet reference of user is assessed based on assessment models after the training；Since the program can To be that weight is arranged in each training sample, then carries out model training accordingly according to preset strategy, therefore, be conducive to different training The promise breaking influence of sample distinguishes, for existing considers the evaluation scheme whether training sample breaks a contract, assessment It is more reasonable and accurate.

Further, since the default assessment models may include for predicting the loss function of user's violation of agreement, Yi Jiyong In the loss function of prediction user's situation of Profit, i.e. first-loss function and the second loss function described in the embodiment of the present invention, So training result can be allow to ensure under the premise of Default Probability minimum, overall income maximum is obtained, different instructions are made Practicing influence of the promise breaking to overall income of sample can be distinguished, and the reasonability of assessment and accurate can be not only greatly improved Property, improve application effect, moreover, it is also possible to improve its flexibility and operability.

Embodiment three,

In order to preferably implement the above method, the embodiment of the present invention also provides a kind of internet reference assessment system, this is mutually Networking reference assessment system can be specifically integrated in the equipment such as server, such as evaluating server.

As shown in Figure 3a, which includes acquiring unit 301, selecting unit 302, setting unit 303, training unit 304 and assessment unit 305 are as follows：

(1) acquiring unit 301；

Acquiring unit 301, for obtaining multiple user data, which includes the attribute data of user, behavior number According to and credit record.

Wherein, the attribute data of user may include user in user's letter that platform is registered or is obtained from other channels Breath, such as the populations attribute information such as the gender of user, age, region, and/or educational background；The behavioral data of user may include using Family data caused by the behaviors such as platform login, click, hair message, shopping, payment, and/or reading；The credit record of user May include the information such as the promise breaking record of user.

(2) selecting unit 302；

Selecting unit 302 obtains training dataset for selecting training sample from the user data.

Wherein, the mode of selection can there are many, for example, according to the benefit distribution of user can select, etc., I.e. the selection unit 302 may include analysis subelement and select subelement, as follows：

The analysis subelement, for according to Users'Data Analysis user's income.

The selection subelement, for selecting training sample from the user data according to user's income so that selected The distribution of user's income of training sample is consistent with the distribution of user's income of the user data, obtains training dataset.

For example, by taking loan as an example, which refers to that capital's (provide a loan provider, such as the mechanisms such as bank) provides After lening some user, which is the interests income that capital brings, and generally may include loan interest income and overdue penalizes Income is ceased, therefore, user's income can be calculated by analyzing loan interest income and overdue default interest income, i.e.,：

The analysis subelement specifically can be used for determining the loan interest income of user according to the user data, according to this User data determines the overdue default interest income of user, calculates the sum of the loan interest income and overdue default interest income, obtains user Income.Be formulated as：

User's income=loan interest income+overdue default interest income.

Wherein, the computational methods of loan interest income can be depending on the demand of practical application, for example, can be according to this Gold and loan interest rate calculate the loan interest income, etc., and overdue default interest is user due to being imposed a fine caused by overdue refund Income can specifically calculate, i.e., according to the capital of user, penalty for delay interest rate and overdue time：

The analysis subelement, specifically can be used for according to the user data determine capital, the penalty for delay interest rate of user with And the overdue time；If the overdue time is less than predetermined threshold value, by the overdue time, penalty for delay interest rate and capital Product, as overdue default interest income；If the overdue time is more than predetermined threshold value, the overdue time and predetermined threshold value are calculated Difference, by the opposite number of the difference, the product of the penalty for delay interest rate of the loan interest income and overdue default interest income and capital, As overdue default interest income, for details, reference can be made to the embodiments of the method for front, and details are not described herein.

Wherein, which can be configured according to the demand of practical application.

(3) setting unit 303；

Setting unit 303 is obtained for being that weight is arranged in each training sample that the training data is concentrated according to preset strategy To the training dataset of Weight.

The setting unit 303 specifically can be used for according to each instruction that the size of user's income is training data concentration Practice sample and weight is set, obtains the training dataset of Weight.

For example, the setting unit 303, can specifically calculate user's income of current training sample and minimum user's financial value Difference, obtain the first value, and calculate the difference of maximum user's financial value and minimum user's financial value, second value obtained, by first The quotient of value and second value, the weight as the current training sample.

Wherein, minimum user's financial value concentrates the minimum value in user's income of all training samples for the training data； Maximum user's financial value concentrates the maximum value in user's income of all training samples for the training data.

It optionally, can also will in addition to can be other than the weight according to user's income size these training samples are arranged Other factors, such as the credit record of user etc. are also as one of the considerations for setting weight, i.e.,：

The setting unit 303 specifically can be used for according to the size of user's income and the credit record of user being the instruction Weight is arranged in each training sample practiced in data set, obtains the training dataset of Weight.

(4) training unit 304；

Training unit 304 is trained for presetting assessment models using the training data set pair of Weight, is trained Assessment models afterwards.

Wherein, which can be defined and be stored in advance according to the demand of practical application, when needed, It is directly directly read from storage location, alternatively, the assessment models can also directly be established by system, i.e., such as Fig. 3 b institutes Show, which can also include setup unit 306 and establish unit 307, as follows：

Setup unit 306 can be used for that first-loss function and the second loss function is arranged.

Wherein, which is the loss function for predicting user's violation of agreement, which is Loss function for predicting user's situation of Profit；The first-loss function and the second loss function can be according to practical applications Demand is set, for example, can be as follows：

First-loss function：

Second loss function：

Unit 307 is established, can be used for establishing assessment models according to the first-loss function and the second loss function.

It specifically can be using first-loss function and the second loss function as the assessment models for example, establishing unit 307 Object function, etc..

Optionally, both loss function (i.e. first-loss functions and the second loss in order to can be more flexible be adjusted Function) between relationship, a coefficient can also be set, for example a constant term is arranged, as first-loss function and second lose The coefficient of balance of function, the proportion relationship for controlling first-loss function and the second loss function, i.e.,：

The setup unit 306 can be also used for setting coefficient of balance.

Then at this point, this establishes unit 307, specifically can be used for according to the first-loss function, the second loss function and Coefficient of balance establishes assessment models.

It can be with the product of calculated equilibrium coefficient and the second loss function, by the product and the first damage for example, establishing unit 307 Lose function and as the object function, being formulated can be as follows：

Wherein, " loss " is object function, and γ is coefficient of balance, is a constant term, specific value can be according to reality The demand of application, such as according to the variation of the factors such as product and/or industry, neatly to be set, to reach adjustment scoring The purpose of strategy.

The object function is obtained establishing unit 307, i.e., after assessment models, training unit 304 can use cum rights The training data set pair of weight is preset assessment models and is trained, with assessment models after being trained.Wherein, training process can adopt It is trained with the Machine learning tools increased income, such as the machines such as decision tree or logistic regression, when object function reaches certain Training process is terminated after threshold value, trained target is the minimization of object function.

(5) assessment unit 305；

Assessment unit 305, for being assessed the internet reference of user based on assessment models after the training.

For example, the assessment unit may include receiving subelement, data acquisition subelement and assessment subelement, it is as follows：

The receiving subelement, for receiving internet reference assessment request, which needs The target user assessed.

Data acquisition subelement, the user data for obtaining the target user.

Subelement is assessed, for the user data according to the target user, target is used by assessment models after the training The internet reference at family is assessed.

For example, the assessment subelement, specifically can be used for the user to the target user using assessment models after the training Data are calculated, and obtain assessment probability value, which is converted to the score value of preset format according to preset algorithm, is obtained To the internet reference scoring of target user.

Score=400+500P；

When it is implemented, above each unit can be realized as independent entity, arbitrary combination can also be carried out, is made It is realized for same or several entities, the specific implementation of above each unit can be found in the embodiment of the method for front, herein not It repeats again.

From the foregoing, it will be observed that the present embodiment, after obtaining training dataset, can be according to preset strategy by setting unit 303 should Weight is arranged in each training sample that training data is concentrated, then, by training unit 304 using the training data set pair of Weight Default assessment models are trained, assessment models after being trained, and by assessment unit 305 based on assessment models pair after the training The internet reference of user is assessed；Since the program can be each training sample setting weight according to preset strategy, then Model training is carried out accordingly, therefore, is conducive to the influence of the promise breaking on different training samples and is distinguished, consider relative to existing For the evaluation scheme whether training sample breaks a contract, assessment result can be made more reasonable and accurate.

Further, since the default assessment models may include for predicting the loss function of user's violation of agreement, Yi Jiyong In the loss function of prediction user's situation of Profit, so, training result can be allow to ensure before Default Probability is minimum It puts, obtains overall income maximum, so that influence of the promise breaking of different training samples to overall income is distinguished, not only may be used To greatly improve the reasonability and accuracy of assessment, improve application effect, moreover, it is also possible to improve its flexibility and operable Property.

Example IV,

The embodiment of the present invention also provides a kind of server, can be as the evaluating server of the embodiment of the present invention.Such as Fig. 4 institutes Show, it illustrates the structural schematic diagrams of the server involved by the embodiment of the present invention, specifically：

The server may include one or processor 401, one or more meters of more than one processing core The components such as memory 402, power supply 403 and the input unit 404 of calculation machine readable storage medium storing program for executing.Those skilled in the art can manage It solves, server architecture does not constitute the restriction to server shown in Fig. 4, may include than illustrating more or fewer portions Part either combines certain components or different components arrangement.Wherein：

Processor 401 is the control centre of the server, utilizes each of various interfaces and the entire server of connection Part by running or execute the software program and/or module that are stored in memory 402, and calls and is stored in memory Data in 402, the various functions and processing data of execute server, to carry out integral monitoring to server.Optionally, locate Reason device 401 may include one or more processing cores；Preferably, processor 401 can integrate application processor and modulatedemodulate is mediated Manage device, wherein the main processing operation system of application processor, user interface and application program etc., modem processor is main Processing wireless communication.It is understood that above-mentioned modem processor can not also be integrated into processor 401.

Memory 402 can be used for storing software program and module, and processor 401 is stored in memory 402 by operation Software program and module, to perform various functions application and data processing.Memory 402 can include mainly storage journey Sequence area and storage data field, wherein storing program area can storage program area, the application program (ratio needed at least one function Such as sound-playing function, image player function) etc.；Storage data field can be stored uses created data according to server Deng.In addition, memory 402 may include high-speed random access memory, can also include nonvolatile memory, for example, at least One disk memory, flush memory device or other volatile solid-state parts.Correspondingly, memory 402 can also include Memory Controller, to provide access of the processor 401 to memory 402.

Server further includes the power supply 403 powered to all parts, it is preferred that power supply 403 can pass through power management system System is logically contiguous with processor 401, to realize the work(such as management charging, electric discharge and power managed by power-supply management system Energy.Power supply 403 can also include one or more direct current or AC power, recharging system, power failure monitor electricity The random components such as road, power supply changeover device or inverter, power supply status indicator.

The server may also include input unit 404, which can be used for receiving the number or character letter of input Breath, and generation keyboard related with user setting and function control, mouse, operating lever, optics or trace ball signal are defeated Enter.

Although being not shown, server can also be including display unit etc., and details are not described herein.Specifically in the present embodiment, Processor 401 in server can according to following instruction, by the process of one or more application program is corresponding can It executes file to be loaded into memory 402, and the application program being stored in memory 402 is run by processor 401, to Realize various functions, it is as follows：

Multiple user data are obtained, training sample is selected from the user data, obtains training dataset, according to default plan Weight is arranged in each training sample that slightly training data is concentrated, and the training dataset of Weight is obtained, using Weight Training data set pair preset assessment models be trained, assessment models after train, be based on the training after assessment models to The internet reference at family is assessed.

For example, can specifically be selected from the user data according to user's income according to Users'Data Analysis user income Training sample so that the distribution one of the distribution and user's income of the user data of user's income of selected training sample It causes, obtains training dataset, be that power is arranged in each training sample that the training data is concentrated according to the size of user's income then Weight, obtains training dataset of Weight, etc..

Wherein, which can be defined and be stored in advance according to the demand of practical application, when needed, It directly directly reads, i.e. application program in the memory 402, can also implement function such as from storage location：

First-loss function and the second loss function are set, commented according to the first-loss function and the foundation of the second loss function Estimate model.

Wherein, which is the loss function for predicting user's violation of agreement, which is Loss function for predicting user's situation of Profit.

The specific implementation of above each operation can be found in the embodiment of front, and details are not described herein.

From the foregoing, it will be observed that the server of the present embodiment can be the training according to preset strategy after obtaining training dataset Weight is arranged in each training sample in data set, then, presets assessment models using the training data set pair of Weight and carries out Training, assessment models after being trained, and the internet reference of user is assessed based on assessment models after the training；Due to The program can be that weight is arranged in each training sample according to preset strategy, then carry out model training accordingly, therefore, be conducive to pair The promise breaking influence of different training samples distinguishes；Moreover, because the default assessment models may include for predicting that user disobeys The about loss function of situation and the loss function for predicting user's situation of Profit, so, it can so that training result can be with Ensure under the premise of Default Probability minimum, it is maximum to obtain overall income；Therefore, on the whole, the program is relative to existing For scheme, the reasonability and accuracy of assessment can be not only greatly improved, improves application effect, moreover, it is also possible to improve it Flexibility and operability.

One of ordinary skill in the art will appreciate that all or part of step in the various methods of above-described embodiment is can It is completed with instructing relevant hardware by program, which can be stored in a computer readable storage medium, storage Medium may include：Read-only memory (ROM, Read Only Memory), random access memory (RAM, Random Access Memory), disk or CD etc..

It is provided for the embodiments of the invention a kind of internet reference appraisal procedure above and system is described in detail, Principle and implementation of the present invention are described for specific case used herein, and the explanation of above example is only used In facilitating the understanding of the method and its core concept of the invention；Meanwhile for those skilled in the art, think of according to the present invention Think, there will be changes in the specific implementation manner and application range, in conclusion the content of the present specification should not be construed as pair The limitation of the present invention.

Claims

1. a kind of internet reference appraisal procedure, which is characterized in that including：

Training sample is selected from the user data, obtains training dataset；

It is that weight is arranged in each training sample that the training data is concentrated according to preset strategy, obtains the training data of Weight Collection；

2. according to the method described in claim 1, it is characterized in that, described select training sample from the user data, obtain To training dataset, including：

According to the Users'Data Analysis user income；

Training sample is selected from the user data so that user's income of selected training sample according to user's income Distribution is consistent with the distribution of user's income of the user data, obtains training dataset.

3. according to the method described in claim 2, it is characterized in that, described according to the Users'Data Analysis user income, packet It includes：

The loan interest income of user is determined according to the user data；

The overdue default interest income of user is determined according to the user data；

The sum for calculating the loan interest income and overdue default interest income, obtains user's income.

4. according to the method described in claim 3, it is characterized in that, described determine that the overdue of user is penalized according to the user data Income is ceased, including：

If the overdue time is less than predetermined threshold value, by multiplying for the overdue time, penalty for delay interest rate and capital Product, as overdue default interest income；

If the overdue time is more than predetermined threshold value, the difference of the overdue time and predetermined threshold value are calculated, it will be described poor, described The opposite number of the product of the penalty for delay interest rate and capital of loan interest income and overdue default interest income, as overdue default interest Income.

5. according to the method described in claim 2, it is characterized in that, it is described according to preset strategy be the training data concentrate Weight is arranged in each training sample, obtains the training dataset of Weight, including：

It is that weight is arranged in each training sample that the training data is concentrated according to the size of user's income, obtains the instruction of Weight Practice data set；Alternatively,

It is each training sample setting that the training data is concentrated according to the credit record of the size of user's income and user Weight obtains the training dataset of Weight.

6. method according to any one of claims 1 to 5, which is characterized in that the training dataset using Weight Default assessment models are trained, after being trained before assessment models, further include：

First-loss function and the second loss function are set, and the first-loss function is the damage for predicting user's violation of agreement Function is lost, second loss function is the loss function for predicting user's situation of Profit；

Assessment models are established according to the first-loss function and the second loss function.

7. according to the method described in claim 6, it is characterized in that, being used to predict the damage of user's violation of agreement described in the basis Lose function and for before predicting that the loss function of user's situation of Profit establishes assessment models, further including：

Coefficient of balance is set, and the coefficient of balance is used to control the proportion relationship of first-loss function and the second loss function；

It is described that assessment models are established according to the first-loss function and the second loss function, including：According to the first-loss Function, the second loss function and coefficient of balance establish assessment models.

8. method according to any one of claims 1 to 5, which is characterized in that described based on assessment models after the training The internet reference of user is assessed, including：

Receive the assessment request of internet reference, the target user that the internet reference assessment request instruction is assessed；

Obtain the user data of the target user；

According to the user data of the target user, by assessment models after the training to the internet reference of target user into Row assessment.

9. according to the method described in claim 8, it is characterized in that, the user data according to the target user, passes through Assessment models assess the internet reference of target user after the training, including：

The user data of the target user is calculated using assessment models after the training, obtains assessment probability value；

The score value that the assessment probability value is converted to preset format according to preset algorithm, obtains the internet reference of target user Scoring.

10. a kind of internet reference assessment system, which is characterized in that including：

Acquiring unit, for obtaining multiple user data, the user data includes the attribute data, behavioral data and letter of user With record；

Setting unit obtains band for being that weight is arranged in each training sample that the training data is concentrated according to preset strategy The training dataset of weight；

Training unit is trained for presetting assessment models using the training data set pair of Weight, is assessed after being trained Model；

11. system according to claim 10, which is characterized in that the selecting unit includes analysis subelement and selection Unit；

The analysis subelement, for according to the Users'Data Analysis user income；

12. system according to claim 11, which is characterized in that

The analysis subelement, specifically for determining the loan interest income of user according to the user data, according to the use User data determines the overdue default interest income of user, calculates the sum of the loan interest income and overdue default interest income, obtains user Income.

13. system according to claim 12, which is characterized in that the analysis subelement is specifically used for：

14. system according to claim 11, which is characterized in that

The setting unit, specifically for being that each training sample that the training data is concentrated is set according to the size of user's income Weight is set, the training dataset of Weight is obtained；Alternatively,

The setting unit is specifically used for according to the size of user's income and the credit record of user being the training data Weight is arranged in each training sample concentrated, and obtains the training dataset of Weight.

15. according to claim 10 to 14 any one of them system, which is characterized in that further include that setup unit and foundation are single Member；

The setup unit, for first-loss function and the second loss function to be arranged, the first-loss function is for pre- The loss function of user's violation of agreement is surveyed, second loss function is the loss function for predicting user's situation of Profit；

It is described to establish unit, for establishing assessment models according to the first-loss function and the second loss function.

16. system according to claim 15, which is characterized in that

The setup unit is additionally operable to setting coefficient of balance；

It is described to establish unit, specifically for being commented according to the foundation of the first-loss function, the second loss function and coefficient of balance Estimate model.

17. according to claim 10 to 14 any one of them system, which is characterized in that the assessment unit includes receiving son list Member, data acquisition subelement and assessment subelement；

The receiving subelement, for receiving internet reference assessment request, the internet reference assessment request instruction needs The target user assessed；

Data acquisition subelement, the user data for obtaining the target user；

18. system according to claim 17, which is characterized in that

The assessment subelement is specifically used for carrying out the user data of the target user using assessment models after the training It calculates, obtains assessment probability value, the assessment probability value is converted to the score value of preset format according to preset algorithm, obtains target The internet reference scoring of user.