CN109801112A - The method and apparatus for calculating user's score value - Google Patents

The method and apparatus for calculating user's score value Download PDF

Info

Publication number
CN109801112A
CN109801112A CN201910107542.1A CN201910107542A CN109801112A CN 109801112 A CN109801112 A CN 109801112A CN 201910107542 A CN201910107542 A CN 201910107542A CN 109801112 A CN109801112 A CN 109801112A
Authority
CN
China
Prior art keywords
data
user
sample
data set
score value
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Granted
Application number
CN201910107542.1A
Other languages
Chinese (zh)
Other versions
CN109801112B (en
Inventor
张元杰
程建波
吕军
王美青
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
JD Digital Technology Holdings Co Ltd
Jingdong Technology Holding Co Ltd
Original Assignee
JD Digital Technology Holdings Co Ltd
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by JD Digital Technology Holdings Co Ltd filed Critical JD Digital Technology Holdings Co Ltd
Priority to CN201910107542.1A priority Critical patent/CN109801112B/en
Publication of CN109801112A publication Critical patent/CN109801112A/en
Application granted granted Critical
Publication of CN109801112B publication Critical patent/CN109801112B/en
Active legal-status Critical Current
Anticipated expiration legal-status Critical

Links

Landscapes

  • Information Retrieval, Db Structures And Fs Structures Therefor (AREA)

Abstract

The invention discloses the method and apparatus for calculating user's score value, are related to field of computer technology.One specific embodiment of this method includes: to obtain the first data set and the second data set;Wherein, first data set includes multiple first data, each first data include first sample user data and corresponding first sample user score value, and the second data set includes multiple second data, and each second data include the second sample of users data and corresponding second sample of users score value;The first data set and the second data set are sampled using bootstrap, obtain sample data;Sample data is calculated using quantile estimate algorithm, obtains estimated value;According to estimated value and target user data, target user's score value is calculated.This embodiment improves the accuracys that user's score value calculates.

Description

The method and apparatus for calculating user's score value
Technical field
The present invention relates to field of computer technology more particularly to a kind of method and apparatus for calculating user's score value.
Background technique
Currently, in user behavior research process, common means be by a large amount of normal users payment amount and Corresponding normal users score value and a small amount of abnormal user payment amount and corresponding abnormal user score value are adopted as data set Data set is calculated with traditional homing method, obtains estimated value, estimated value is multiplied with target user's payment amount, is obtained To target user's score value.
In realizing process of the present invention, at least there are the following problems in the prior art for inventor's discovery:
First, since data set includes a large amount of normal users payment amount, a small amount of abnormal user payment amount, data Collect unbalanced, calculated based on data set, estimated value will be biased to normal users, estimated value inaccuracy, target user's score value It shifts, target user's score value is also inaccurate.
Second, there are the following problems for traditional regression analysis: firstly, obtained using traditional regression analysis Estimated value can be influenced by extremum, estimated value inaccuracy, and target user's score value is also inaccurate.Secondly, traditional regression analysis side Method requires residual error to meet normal distribution, but reality is unsatisfactory for normal distribution substantially, since distribution pattern changes, the credibility of calculating It is difficult to ensure, the accuracy of calculating is not high, and target user's score value is also inaccurate.Again, traditional regression analysis is mean value It returns, only features the index of condition distribution central tendency, can not comprehensively describe the overall picture of dependent variable condition distribution.
Summary of the invention
In view of this, the embodiment of the present invention provides a kind of method and apparatus for calculating user's score value, estimated value can be improved Accuracy, improve target user's score value calculate accuracy.
To achieve the above object, according to an aspect of an embodiment of the present invention, a kind of side for calculating user's score value is provided Method.
A kind of method of calculating user score value of the embodiment of the present invention, comprising:
Obtain the first data set and the second data set;Wherein, first data set includes multiple first data, Mei Ge One data include first sample user data and corresponding first sample user score value, and second data set includes multiple second Data, each second data include the second sample of users data and corresponding second sample of users score value;
First data set and second data set are sampled using bootstrap, obtain sample data;
The sample data is calculated using quantile estimate algorithm, obtains estimated value;
According to the estimated value and target user data, target user's score value is calculated.
In one embodiment, first data set and second data set are sampled using bootstrap, are obtained To sample data, comprising:
First data of the first quantity are extracted from first data set using bootstrap, and are extracted multiple;
Second data of the second quantity are extracted from second data set using the bootstrap, and are extracted more It is secondary;
By second data of first data of first quantity extracted every time and second quantity Set is as the sample data extracted every time.
In one embodiment, the first data bulk and the second data in second data set in first data set The ratio of quantity is greater than 10, and the ratio range of first quantity and second quantity is [0.1,10].
In one embodiment, the sample data is calculated using quantile estimate algorithm, obtains estimated value, wrapped It includes:
The sample data extracted every time is calculated using quantile estimate algorithm, the reference extracted every time Value;
The reference value extracted every time is added, total reference value is obtained;
By total reference value divided by number is extracted, estimated value is obtained.
In one embodiment, according to the estimated value and target user data, target user's score value is calculated, comprising:
The estimated value is multiplied with target user data, obtains target user's score value;
Wherein, the target user data includes: category, the use of User IP, the attribute of user terminal, user's purchase product Any of family time buying and user's payment amount.
To achieve the above object, other side according to an embodiment of the present invention provides and a kind of calculates user score value Device.
A kind of device of calculating user score value of the embodiment of the present invention, comprising:
Acquiring unit, for obtaining the first data set and the second data set;Wherein, first data set includes multiple One data, each first data include first sample user data and corresponding first sample user score value, second data Collection includes multiple second data, and each second data include the second sample of users data and corresponding second sample of users score value;
Sampling unit is obtained for being sampled using bootstrap to first data set and second data set Sample data;
Estimated value computing unit is estimated for being calculated using quantile estimate algorithm the sample data Value;
Target user's score value computing unit, for calculating target user point according to the estimated value and target user data Value.
In one embodiment, sampling unit includes:
First sub-unit, for extracting described the of the first quantity from first data set using bootstrap One data, and extract multiple;
Second sub-unit, for extracting the institute of the second quantity from second data set using the bootstrap The second data are stated, and are extracted multiple;
Aggregation units, for by the institute of first data of first quantity extracted every time and second quantity The set of the second data is stated as the sample data extracted every time.
In one embodiment, the first data bulk and the second data in second data set in first data set The ratio of quantity is greater than 10, and the ratio range of first quantity and second quantity is [0.1,10].
In one embodiment, estimated value computing unit includes:
Reference value computing unit, based on being carried out using quantile estimate algorithm to the sample data extracted every time It calculates, the reference value extracted every time;
Summation unit obtains total reference value for the reference value extracted every time to be added;
Estimated value computation subunit, for total reference value divided by number is extracted, to be obtained estimated value.
In one embodiment, target user's score value computing unit is specifically used for:
The estimated value is multiplied with target user data, obtains target user's score value;
Wherein, the target user data includes: category, the use of User IP, the attribute of user terminal, user's purchase product Any of family time buying and user's payment amount.
To achieve the above object, another aspect according to an embodiment of the present invention, provides a kind of electronic equipment.
The a kind of electronic equipment of the embodiment of the present invention, comprising: one or more processors;Storage device, for storing one A or multiple programs, when one or more of programs are executed by one or more of processors, so that one or more A processor realizes the method provided in an embodiment of the present invention for calculating user's score value.
To achieve the above object, another aspect according to an embodiment of the present invention provides a kind of computer-readable medium.
A kind of computer-readable medium of the embodiment of the present invention, is stored thereon with computer program, and described program is processed Device realizes the method provided in an embodiment of the present invention for calculating user's score value when executing.
One embodiment in foregoing invention has the following advantages that or the utility model has the advantages that obtains the first data set and the second data Collection, samples to it using bootstrap, obtains sample data, since bootstrap is random, equiprobability, there is the double sampling put back to, Thus, sample data be it is balanced, improve the accuracy of estimated value, improve target user's score value calculating accuracy.It adopts Sample data is calculated with quantile estimate algorithm, for using traditional regression analysis, estimated value pair Extremum shows more steady, and the accuracy of estimated value improves, and more comprehensively describes the complete of dependent variable condition distribution Looks, thus, improve the accuracy of target user's score value calculating.
Further effect possessed by above-mentioned non-usual optional way adds hereinafter in conjunction with specific embodiment With explanation.
Detailed description of the invention
Attached drawing for a better understanding of the present invention, does not constitute an undue limitation on the present invention.Wherein:
Fig. 1 is the schematic diagram of the main flow of the method according to an embodiment of the invention for calculating user's score value;
Fig. 2 is the schematic diagram of the main flow of the method according to another embodiment of the present invention for calculating user's score value;
Fig. 3 is the schematic diagram of the formant of the device according to an embodiment of the invention for calculating user's score value;
Fig. 4 is the schematic diagram of the formant of the device according to another embodiment of the present invention for calculating user's score value;
Fig. 5 is that the embodiment of the present invention can be applied to exemplary system architecture figure therein;
Fig. 6 is adapted for the structural representation of the computer system for the terminal device or server of realizing the embodiment of the present invention Figure.
Specific embodiment
Below in conjunction with attached drawing, an exemplary embodiment of the present invention will be described, including the various of the embodiment of the present invention Details should think them only exemplary to help understanding.Therefore, those of ordinary skill in the art should recognize It arrives, it can be with various changes and modifications are made to the embodiments described herein, without departing from scope and spirit of the present invention.Together Sample, for clarity and conciseness, descriptions of well-known functions and structures are omitted from the following description.
It should be pointed out that in the absence of conflict, the feature in embodiment and embodiment in the present invention can be with It is combined with each other.
Currently, in user behavior research process, common means be by a large amount of normal users payment amount and Corresponding normal users score value and a small amount of abnormal user payment amount and corresponding abnormal user score value are adopted as data set Data set is calculated with traditional homing method (Mean Regression), obtains estimated value, estimated value and target are used Family payment amount is multiplied, and obtains target user's score value.To be serviced according to target user's score value for target user, example Such as, recommended products etc..
The prior art has the following problems:
First, since data set includes a large amount of normal users payment amount, a small amount of abnormal user payment amount, because And data set is unbalanced, is calculated based on data set, estimated value will be biased to normal users, estimated value inaccuracy, and target is used Family score value shifts, and target user's score value is also inaccurate.
Second, there are the following problems for traditional regression analysis:
Firstly, the estimated value obtained using traditional regression analysis can by extremum (e.g., Singular variance, outlier and High lever value etc.) it influences, estimated value inaccuracy, target user's score value is also inaccurate.
Secondly, traditional regression analysis requires residual error to meet normal distribution, but reality is unsatisfactory for normal distribution substantially, Since distribution pattern changes, the credibility of calculating is difficult to ensure that the accuracy of calculating is not high, and target user's score value is also inaccurate.
Again, traditional regression analysis is that (traditional regression analysis really studies dependent variable to mean regression Conditional expectation, investigate influence of the independent variable to the conditional mean of dependent variable, so, such method is mean regression), only portray The index of condition distribution central tendency, can not comprehensively not describe the overall picture of dependent variable condition distribution.
Of the existing technology in order to solve the problems, such as, one embodiment of the invention provides a kind of side for calculating user's score value Method, this method can be executed by server, as shown in Figure 1, this method comprises:
Step S101, the first data set and the second data set are obtained;Wherein, first data set includes multiple first numbers According to each first data include first sample user data and corresponding first sample user score value, the second data set packet Multiple second data are included, each second data include the second sample of users data and corresponding second sample of users score value.
Step S102, first data set and second data set are sampled using bootstrap, obtain sample Data.
In this step, if the ratio of the second data bulk of the first data bulk and the second data set of the first data set Greater than 10, then when it is implemented, the first data of the first quantity can be extracted from the first data set, using bootstrap from second The second data that the second quantity is extracted in data set, by the collection of the first data of the first quantity and the second data of the second quantity Cooperation is sample data.Why in this way, being not use bootstrap that can not obtain because the second data bulk of the second data set is few Second data bulk balanced with the first data bulk is taken, without obtaining second data bulk balanced with the first data bulk, There are the following problems for meeting: sample data is unbalanced, estimated value inaccuracy, and the accuracy that target user's score value calculates is poor.In addition, tool When body is implemented, the first data of the first quantity can also be extracted from the first data set using bootstrap, using bootstrap from The second data that the second quantity is extracted in second data set, by the second quantity of the first data of the first quantity and the second quantity Set as sample data.
Step S103, the sample data is calculated using quantile estimate algorithm, obtains estimated value.
In this step, when it is implemented, obtaining objective function according to quantile estimate algorithm, and quantile is set Value, by the first data and the second data substitution objective function in the value and sample data of quantile, to obtain the quantile Under estimated value.The value of different quantiles is set according to specific needs, to obtain the estimated value under different quantiles.Based on not Target user's score value is calculated with the estimated value under quantile, target user's score value is more acurrate, more comprehensively.
Step S104, according to the estimated value and target user data, target user's score value is calculated.
In this step, when it is implemented, estimated value is multiplied with target user data, target user's score value is obtained.If Estimated value is the estimated value under different quantiles, then the estimated value under each quantile is multiplied with target user data, obtains Target user's score value under different quantiles.Due to target user's score value under different quantiles, therefore, it is possible to comprehensively describe The overall picture of dependent variable condition distribution.
In this embodiment, the first data set and the second data set are obtained, is sampled using bootstrap to it, obtains sample number According to, since bootstrap is random, equiprobability, there is the double sampling put back to, thus, sample data be it is balanced, improve estimation The accuracy of value improves the accuracy of target user's score value calculating.Sample data is counted using quantile estimate algorithm Calculate, relative to using for traditional regression analysis, estimated value to extremum show more steadily and surely, estimated value it is accurate Property improve, more comprehensively describe dependent variable condition distribution overall picture, thus, improve target user's score value calculating standard True property.
Of the existing technology in order to solve the problems, such as, another embodiment of the present invention provides a kind of sides for calculating user's score value Method, this method can be executed by server, as shown in Fig. 2, this method comprises:
Step S201, the first data set and the second data set are obtained;Wherein, first data set includes multiple first numbers According to each first data include first sample user data and corresponding first sample user score value, the second data set packet Multiple second data are included, each second data include the second sample of users data and corresponding second sample of users score value.
Step S202, first data of the first quantity are extracted from first data set using bootstrap, and It extracts multiple.
In this step, using bootstrap, (Bootstrap Method, Bootstrapping or Bootstrap sampling method are It is a kind of to be concentrated with the uniform sampling put back to from given training) to the first data set into row stochastic, it is equiprobable, and have and put back to Double sampling obtains the first data of the first quantity.
It should be noted that by the first data for repeatedly extracting the first quantity, the sample data extracted every time, thus It avoids the bad accuracy for leading to estimated value of primary sample data not high and the accuracy of target user's score value calculating is not high Problem.It ensure that the stability that target user's score value calculates.
Step S203, second number of the second quantity is extracted from second data set using the bootstrap According to, and extract multiple.
In this step, using bootstrap (Bootstrap) to the second data set into row stochastic, it is equiprobable, and have The double sampling put back to obtains the second data of the second quantity.
It should be noted that by the second data for repeatedly extracting the second quantity, the sample data extracted every time, thus It avoids the bad accuracy for leading to estimated value of primary sample data not high and the accuracy of target user's score value calculating is not high Problem.It ensure that the stability that target user's score value calculates.
Step S204, by described the of first data of first quantity extracted every time and second quantity The set of two data is as the sample data extracted every time.
In this step, when it is implemented, in first data set in the first data bulk and second data set The ratio of second data bulk is greater than 10, and the ratio range of first quantity and second quantity is [0.1,10].
It should be understood that due to the second data bulk in the first data bulk and the second data set in the first data set Ratio is greater than 10, so, the first data bulk in the first data set is much larger than the second data bulk in the second data set.Example Such as, for data caused by normal users as the first data in the first data set, data caused by abnormal user are (abnormal to use Family refers to seek user that illegitimate benefits uses improper means, such as brush single user, robber brush user etc.) it is used as second The second data in data set.It is well known that the quantity of normal users be it is very large, the quantity of abnormal user is relative to just It is very little for the quantity at common family, thus, the first data bulk in the first data set is much larger than in the second data set the Two data bulks.The prior art is the collection cooperation directly by the second data in the first data in the first data set and the second data set For data set, data set is calculated using traditional homing method, obtains estimated value.Due in the data set of the prior art The substantial amounts of first data, and the quantity of the second data is very little, thus, there are unbalanced problems for data set, thus It causes estimated value to be biased to normal users, it is inaccurate to cause estimated value, and the validity of estimated value is not high, in turn results in target user Score value calculates the problem of inaccuracy.And the embodiment of the present invention extracts the first quantity using bootstrap from the first data set First data extract the second data of the second quantity using bootstrap from the second data set, by the first number of the first quantity According to the set of the second data with the second quantity as sample data, and the ratio range of the first quantity and the second quantity is [0.1,10].Thus, in the sample data of the embodiment of the present invention quantity of the first data with the quantity of the second data be it is balanced, I.e. sample data is balanced, as a result, estimated value will not be biased to either party (i.e. estimated value will not both be biased to normal users, Will not be biased to abnormal user), estimated value accuracy is high, and the validity of estimated value is high, improves the standard that target user's score value calculates True property.
The step is illustrated with a specific example below: it is assumed that the first data set is number caused by 10000 normal users According to (data caused by normal users include normal user data and normal users score value), the second data set is 100 exceptions Data caused by user (data caused by abnormal user include abnormal user data and abnormal user score value), use is self-service Method extracts data caused by 1000 normal users from the first data set, and extracts 5 times;It is counted using bootstrap from second Extract data caused by 1000 abnormal users according to concentration, and extract 5 times (it should be noted that due to using bootstrap into Row extracts, thus, there is the case where repeating in data caused by 1000 abnormal users).To extract every time 1000 Sample number of the set of data caused by data caused by a normal users and 1000 abnormal users as the secondary extraction According to finally obtaining 5 sample datas.
Step S205, the sample data extracted every time is calculated using quantile estimate algorithm, is obtained every time The reference value of extraction.
In this step, the sample data extracted every time is calculated using quantile estimate algorithm, so as to Any quantile steadily describes the overall picture that independent variable is distributed the variation range of dependent variable, comprehensive description dependent variable condition, Application scenarios are richer.
In addition, the expression formula of quantile estimate algorithm:
Wherein,Dependent variable is represented,Quantile is represented, x is independent variable, βiFor estimated value, (i=1-n, n are The number of independent variable).
According to quantile estimate algorithm, by the weighted average of residual absolute value as the objective function minimized.Thus, mesh The expression formula of scalar functions is:
Wherein,Quantile is represented, quantile can be taking human as setting, (0 < τ < 1);
ρτThe weight for representing the residual error under different quantiles is known quantity;It represents under different quantile numbers Residual error;
y*It is dependent variable, x*It is independent variable, i is the number of data in sample data, and j is frequency in sampling;
Represent the estimated value under different quantiles.
Below on the basis of step S204 given example, then the step illustrated with a specific example:
It is 0.5 that quantile, which is arranged, and by 0.5 and 5 sample data, (sample data includes the first data (the of the first quantity First sample user data in one data is independent variable, and corresponding first sample user score value is dependent variable) and the second quantity The second data (the second sample of users data in the second data are independents variable, corresponding second sample of users score value be because become Amount)) objective function is substituted into respectively, obtain 5 reference values that quantile is 0.5;
It is 0.75 that quantile, which is arranged, and 0.75 and 5 sample data is substituted into objective function respectively, and obtaining quantile is 0.75 5 reference values.
It should be noted that being calculated using quantile estimate algorithm sample data, relative to using traditional recurrence For analysis method, it is more steady that estimated value shows extremum, and the accuracy of estimated value improves, and quantile estimate is calculated Different quantiles can be set in method, to obtain the estimated value of different quantiles, calculate further according to the estimated value of different quantiles Target user's score value of different quantiles can more comprehensively describe the overall picture of dependent variable condition distribution, thus, target user The accuracy of the calculating of score value is higher.
Step S206, the reference value extracted every time is added, obtains total reference value.
In this step, below on the basis of step S205 given example, then the step illustrated with a specific example:
5 reference values that quantile is 0.5 are added, obtain total reference value that quantile is 0.5 (assuming that being calculated 1) total reference value that quantile is 0.5 is;
5 reference values that quantile is 0.75 are added, obtain total reference value that quantile is 0.75 (assuming that being calculated Quantile be 0.75 total reference value be 2).
Step S207, total reference value is obtained into estimated value divided by number is extracted.
In this step, the first of the first quantity is extracted from the first data set when it is implemented, extracting number and referring to The number of data.In addition, the number due to the first data for extracting the first quantity and the second data for extracting the second quantity Number it is identical, thus extract number also refer to the number that the second data of the second quantity are extracted from the second data set.
Below on the basis of step S206 given example, then the step illustrated with a specific example:
Total reference value (1) that quantile is 0.5 obtains the estimated value (1/5=0.2) that quantile is 0.5 divided by 5;
Total reference value (2) that quantile is 0.75 obtains the estimated value (2/5=0.4) that quantile is 0.75 divided by 5.
Additionally, it should be understood that can also test at following aspect to estimated value: basic after obtaining estimated value Assuming that, conspicuousness, the goodness of fit, exceptional value or practical significance etc..After upchecking, then by estimated value and target user data phase Multiply, to obtain target user's score value.
Step S208, the estimated value is multiplied with target user data, obtains target user's score value;Wherein, the mesh Mark user data includes: category, user's time buying and the user's branch of User IP, the attribute of user terminal, user's purchase product Pay any of amount of money.
In this step, it should be noted that, the attribute of user terminal can be whether user terminal is common terminal;User Time buying can according to (daytime includes 8 points to 18 points) purchase on user daytime or user's night (night includes 0 point to 8 points, Or 18 points to 24 points) purchase divided.In addition, by the analysis to history big data, the category of discovery User IP, user terminal Property, user's any of category, user's time buying and user's payment amount for buying product variation target can all be used Family score value has an impact, and by more accurately calculating target user's score value as target user data, more comprehensively analysis is used Family.
Below on the basis of step S207 given example, then the step illustrated with a specific example:
Assuming that user's payment amount is 1000, then target user's score value be 20 (1000 × 0.2=20) or 40 (1000 × 0.4=40).
The method for illustrating to calculate user's score value above in association with Fig. 1-Fig. 2 illustrates to calculate user point below in conjunction with Fig. 3-Fig. 4 The device of value.
Of the existing technology in order to solve the problems, such as, one embodiment of the invention provides a kind of dress for calculating user's score value It sets, which can be executed by server, as shown in figure 3, the device includes:
Acquiring unit 301, for obtaining the first data set and the second data set;Wherein, first data set includes more A first data, each first data include first sample user data and corresponding first sample user score value, and described second Data set includes multiple second data, and each second data include the second sample of users data and corresponding second sample of users point Value.
Sampling unit 302 is obtained for being sampled using bootstrap to first data set and second data set To sample data.
Estimated value computing unit 303 is estimated for being calculated using quantile estimate algorithm the sample data Evaluation.
Target user's score value computing unit 304, for calculating target user according to the estimated value and target user data Score value.
Of the existing technology in order to solve the problems, such as, another embodiment of the present invention provides a kind of dresses for calculating user's score value It sets, which can be executed by server, as shown in figure 4, the device includes:
Acquiring unit 401, for obtaining the first data set and the second data set;Wherein, first data set includes more A first data, each first data include first sample user data and corresponding first sample user score value, and described second Data set includes multiple second data, and each second data include the second sample of users data and corresponding second sample of users point Value.
Sampling unit 402 is obtained for being sampled using bootstrap to first data set and second data set To sample data.
When it is implemented, sampling unit 402 includes:
First sub-unit 4021, for extracting the institute of the first quantity from first data set using bootstrap The first data are stated, and are extracted multiple.
Second sub-unit 4022, for extracting the second quantity from second data set using the bootstrap Second data, and extract multiple.
Aggregation units 4023, for by first data and second quantity of first quantity extracted every time Second data set as the sample data extracted every time.
In addition, in first data set in the first data bulk and second data set the second data bulk ratio Greater than 10, the ratio range of first quantity and second quantity is [0.1,10].
Estimated value computing unit 403 is estimated for being calculated using quantile estimate algorithm the sample data Evaluation.
When it is implemented, estimated value computing unit 403 includes:
Reference value computing unit 4031, for being carried out using quantile estimate algorithm to the sample data extracted every time It calculates, the reference value extracted every time.
Summation unit 4032 obtains total reference value for the reference value extracted every time to be added.
Estimated value computation subunit 4033, for total reference value divided by number is extracted, to be obtained estimated value.
Target user's score value computing unit 404, for calculating target user according to the estimated value and target user data Score value.
When it is implemented, target user's score value computing unit 404 is specifically used for:
The estimated value is multiplied with target user data, obtains target user's score value;Wherein, the target user data It include: in User IP, the attribute of user terminal, the category of user's purchase product, user's time buying and user's payment amount Any one.
Fig. 5 is shown can be using the method for calculating user's score value of the embodiment of the present invention or the device of calculating user's score value Exemplary system architecture 500.
As shown in figure 5, system architecture 500 may include terminal device 501,502,503, network 504 and server 505. Network 504 between terminal device 501,502,503 and server 505 to provide the medium of communication link.Network 504 can be with Including various connection types, such as wired, wireless communication link or fiber optic cables etc..
User can be used terminal device 501,502,503 and be interacted by network 504 with server 505, to receive or send out Send message etc..Various telecommunication customer end applications, such as the application of shopping class, net can be installed on terminal device 501,502,503 (merely illustrative) such as the application of page browsing device, searching class application, instant messaging tools, mailbox client, social platform softwares.
Terminal device 501,502,503 can be the various electronic equipments with display screen and supported web page browsing, packet Include but be not limited to smart phone, tablet computer, pocket computer on knee and desktop computer etc..
Server 505 can be to provide the server of various services, such as utilize terminal device 501,502,503 to user The shopping class website browsed provides the back-stage management server (merely illustrative) supported.Back-stage management server can be to reception To the data such as information query request analyze etc. processing, and by processing result (such as target push information, product letter Breath -- merely illustrative) feed back to terminal device.
It should be noted that the method for calculating user's score value provided by the embodiment of the present invention is generally held by server 505 Row, correspondingly, the device for calculating user's score value are generally positioned in server 505.
It should be understood that the number of terminal device, network and server in Fig. 5 is only schematical.According to realization need It wants, can have any number of terminal device, network and server.
Below with reference to Fig. 6, it illustrates the computer systems 600 for the terminal device for being suitable for being used to realize the embodiment of the present invention Structural schematic diagram.Terminal device shown in Fig. 6 is only an example, function to the embodiment of the present invention and should not use model Shroud carrys out any restrictions.
As shown in fig. 6, computer system 600 includes central processing unit (CPU) 601, it can be read-only according to being stored in Program in memory (ROM) 602 or be loaded into the program in random access storage device (RAM) 603 from storage section 608 and Execute various movements appropriate and processing.In RAM 603, also it is stored with system 600 and operates required various programs and data. CPU 601, ROM 602 and RAM 603 are connected with each other by bus 604.Input/output (I/O) interface 605 is also connected to always Line 604.
I/O interface 605 is connected to lower component: the importation 606 including keyboard, mouse etc.;It is penetrated including such as cathode The output par, c 607 of spool (CRT), liquid crystal display (LCD) etc. and loudspeaker etc.;Storage section 608 including hard disk etc.; And the communications portion 609 of the network interface card including LAN card, modem etc..Communications portion 609 via such as because The network of spy's net executes communication process.Driver 610 is also connected to I/O interface 605 as needed.Detachable media 611, such as Disk, CD, magneto-optic disk, semiconductor memory etc. are mounted on as needed on driver 610, in order to read from thereon Computer program be mounted into storage section 608 as needed.
Particularly, disclosed embodiment, the process described above with reference to flow chart may be implemented as counting according to the present invention Calculation machine software program.For example, embodiment disclosed by the invention includes a kind of computer program product comprising be carried on computer Computer program on readable medium, the computer program include the program code for method shown in execution flow chart.? In such embodiment, which can be downloaded and installed from network by communications portion 609, and/or from can Medium 611 is dismantled to be mounted.When the computer program is executed by central processing unit (CPU) 601, system of the invention is executed The above-mentioned function of middle restriction.
It should be noted that computer-readable medium shown in the present invention can be computer-readable signal media or meter Calculation machine readable storage medium storing program for executing either the two any combination.Computer readable storage medium for example can be --- but not Be limited to --- electricity, magnetic, optical, electromagnetic, infrared ray or semiconductor system, device or device, or any above combination.Meter The more specific example of calculation machine readable storage medium storing program for executing can include but is not limited to: have the electrical connection, just of one or more conducting wires Taking formula computer disk, hard disk, random access storage device (RAM), read-only memory (ROM), erasable type may be programmed read-only storage Device (EPROM or flash memory), optical fiber, portable compact disc read-only memory (CD-ROM), light storage device, magnetic memory device, Or above-mentioned any appropriate combination.In the present invention, computer readable storage medium can be it is any include or storage journey The tangible medium of sequence, the program can be commanded execution system, device or device use or in connection.And at this In invention, computer-readable signal media may include in a base band or as carrier wave a part propagate data-signal, Wherein carry computer-readable program code.The data-signal of this propagation can take various forms, including but unlimited In electromagnetic signal, optical signal or above-mentioned any appropriate combination.Computer-readable signal media can also be that computer can Any computer-readable medium other than storage medium is read, which can send, propagates or transmit and be used for By the use of instruction execution system, device or device or program in connection.Include on computer-readable medium Program code can transmit with any suitable medium, including but not limited to: wireless, electric wire, optical cable, RF etc. are above-mentioned Any appropriate combination.
Flow chart and block diagram in attached drawing are illustrated according to the system of various embodiments of the invention, method and computer journey The architecture, function and operation in the cards of sequence product.In this regard, each box in flowchart or block diagram can generation A part of one unit of table, program segment or code, a part of said units, program segment or code include one or more Executable instruction for implementing the specified logical function.It should also be noted that in some implementations as replacements, institute in box The function of mark can also occur in a different order than that indicated in the drawings.For example, two boxes succeedingly indicated are practical On can be basically executed in parallel, they can also be executed in the opposite order sometimes, and this depends on the function involved.Also it wants It is noted that the combination of each box in block diagram or flow chart and the box in block diagram or flow chart, can use and execute rule The dedicated hardware based systems of fixed functions or operations is realized, or can use the group of specialized hardware and computer instruction It closes to realize.
Being described in unit involved in the embodiment of the present invention can be realized by way of software, can also be by hard The mode of part is realized.Described unit also can be set in the processor, for example, can be described as: a kind of processor packet Include acquiring unit, sampling unit, estimated value computing unit and target user's score value computing unit.Wherein, the title of these units The restriction to the unit itself is not constituted under certain conditions, for example, sampling unit is also described as " using bootstrap First data set and second data set are sampled, the unit of sample data is obtained ".
As on the other hand, the present invention also provides a kind of computer-readable medium, which be can be Included in equipment described in above-described embodiment;It is also possible to individualism, and without in the supplying equipment.Above-mentioned calculating Machine readable medium carries one or more program, when said one or multiple programs are executed by the equipment, makes Obtaining the equipment includes: to obtain the first data set and the second data set;Wherein, first data set includes multiple first data, Each first data include first sample user data and corresponding first sample user score value, and second data set includes more A second data, each second data include the second sample of users data and corresponding second sample of users score value;Using self-service Method is sampled first data set and second data set, obtains sample data;Using quantile estimate algorithm pair The sample data is calculated, and estimated value is obtained;According to the estimated value and target user data, target user point is calculated Value.
Technical solution according to an embodiment of the present invention obtains the first data set and the second data set, using bootstrap to it Sampling, obtains sample data, since bootstrap is random, equiprobability, there is the double sampling put back to, thus, sample data is equal Weighing apparatus, the accuracy of estimated value is improved, the accuracy of target user's score value calculating is improved.Using quantile estimate algorithm pair Sample data is calculated, and for using traditional regression analysis, estimated value shows extremum more steady Strong, the accuracy of estimated value improves, and more comprehensively describes the overall picture of dependent variable condition distribution, thus, improve target The accuracy that user's score value calculates.
Above-mentioned specific embodiment, does not constitute a limitation on the scope of protection of the present invention.Those skilled in the art should be bright It is white, design requirement and other factors are depended on, various modifications, combination, sub-portfolio and substitution can occur.It is any Made modifications, equivalent substitutions and improvements etc. within the spirit and principles in the present invention, should be included in the scope of the present invention Within.

Claims (12)

1. a kind of method for calculating user's score value characterized by comprising
Obtain the first data set and the second data set;Wherein, first data set includes multiple first data, each first number According to including first sample user data and corresponding first sample user score value, second data set includes multiple second numbers According to each second data include the second sample of users data and corresponding second sample of users score value;
First data set and second data set are sampled using bootstrap, obtain sample data;
The sample data is calculated using quantile estimate algorithm, obtains estimated value;
According to the estimated value and target user data, target user's score value is calculated.
2. the method according to claim 1, wherein using bootstrap to first data set and described second Data set is sampled, and obtains sample data, comprising:
First data of the first quantity are extracted from first data set using bootstrap, and are extracted multiple;
Second data of the second quantity are extracted from second data set using the bootstrap, and are extracted multiple;
By the set of first data of first quantity extracted every time and second data of second quantity As the sample data extracted every time.
3. according to the method described in claim 2, it is characterized in that, the first data bulk and described the in first data set The ratio of the second data bulk is greater than 10 in two data sets, and the ratio range of first quantity and second quantity is [0.1,10].
4. according to the method described in claim 3, it is characterized in that, being carried out using quantile estimate algorithm to the sample data It calculates, obtains estimated value, comprising:
The sample data extracted every time is calculated using quantile estimate algorithm, the reference value extracted every time;
The reference value extracted every time is added, total reference value is obtained;
By total reference value divided by number is extracted, estimated value is obtained.
5. the method according to claim 1, wherein calculating mesh according to the estimated value and target user data Mark user's score value, comprising:
The estimated value is multiplied with target user data, obtains target user's score value;
Wherein, the target user data includes: User IP, the attribute of user terminal, user buys the category of product, user purchases Buy any of time and user's payment amount.
6. a kind of device for calculating user's score value characterized by comprising
Acquiring unit, for obtaining the first data set and the second data set;Wherein, first data set includes multiple first numbers According to each first data include first sample user data and corresponding first sample user score value, the second data set packet Multiple second data are included, each second data include the second sample of users data and corresponding second sample of users score value;
Sampling unit obtains sample for being sampled using bootstrap to first data set and second data set Data;
Estimated value computing unit obtains estimated value for calculating using quantile estimate algorithm the sample data;
Target user's score value computing unit, for calculating target user's score value according to the estimated value and target user data.
7. device according to claim 6, which is characterized in that sampling unit includes:
First sub-unit, for extracting first number of the first quantity from first data set using bootstrap According to, and extract multiple;
Second sub-unit, for extracting described the of the second quantity from second data set using the bootstrap Two data, and extract multiple;
Aggregation units, for by described the of first data of first quantity extracted every time and second quantity The set of two data is as the sample data extracted every time.
8. device according to claim 7, which is characterized in that the first data bulk and described the in first data set The ratio of the second data bulk is greater than 10 in two data sets, and the ratio range of first quantity and second quantity is [0.1,10].
9. device according to claim 8, which is characterized in that estimated value computing unit includes:
Reference value computing unit is obtained for being calculated using quantile estimate algorithm the sample data extracted every time To the reference value extracted every time;
Summation unit obtains total reference value for the reference value extracted every time to be added;
Estimated value computation subunit, for total reference value divided by number is extracted, to be obtained estimated value.
10. device according to claim 6, which is characterized in that target user's score value computing unit is specifically used for:
The estimated value is multiplied with target user data, obtains target user's score value;
Wherein, the target user data includes: User IP, the attribute of user terminal, user buys the category of product, user purchases Buy any of time and user's payment amount.
11. a kind of electronic equipment characterized by comprising
One or more processors;
Storage device, for storing one or more programs,
When one or more of programs are executed by one or more of processors, so that one or more of processors are real Now such as method as claimed in any one of claims 1 to 5.
12. a kind of computer-readable medium, is stored thereon with computer program, which is characterized in that described program is held by processor Such as method as claimed in any one of claims 1 to 5 is realized when row.
CN201910107542.1A 2019-02-02 2019-02-02 Method and device for calculating user score Active CN109801112B (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN201910107542.1A CN109801112B (en) 2019-02-02 2019-02-02 Method and device for calculating user score

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN201910107542.1A CN109801112B (en) 2019-02-02 2019-02-02 Method and device for calculating user score

Publications (2)

Publication Number Publication Date
CN109801112A true CN109801112A (en) 2019-05-24
CN109801112B CN109801112B (en) 2021-03-05

Family

ID=66561895

Family Applications (1)

Application Number Title Priority Date Filing Date
CN201910107542.1A Active CN109801112B (en) 2019-02-02 2019-02-02 Method and device for calculating user score

Country Status (1)

Country Link
CN (1) CN109801112B (en)

Citations (6)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN101764821A (en) * 2010-01-19 2010-06-30 南京邮电大学 Method for evaluating trust of user action in trusted network
CN105719033A (en) * 2014-12-02 2016-06-29 阿里巴巴集团控股有限公司 Method and device for identifying risk in object
CN108073629A (en) * 2016-11-16 2018-05-25 北京京东尚科信息技术有限公司 The method and device of purchasing model is identified by website visitation data
CN108446849A (en) * 2018-03-21 2018-08-24 携程旅游网络技术(上海)有限公司 The appraisal procedure and its system of credit line, storage medium, electronic equipment
CN108446351A (en) * 2018-03-08 2018-08-24 携程计算机技术(上海)有限公司 The hotel's screening technique and system based on user preference of OTA platforms
CN108573432A (en) * 2018-04-23 2018-09-25 重庆南米电子商务有限公司 Transaction supervisory systems and method for e-commerce

Patent Citations (6)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN101764821A (en) * 2010-01-19 2010-06-30 南京邮电大学 Method for evaluating trust of user action in trusted network
CN105719033A (en) * 2014-12-02 2016-06-29 阿里巴巴集团控股有限公司 Method and device for identifying risk in object
CN108073629A (en) * 2016-11-16 2018-05-25 北京京东尚科信息技术有限公司 The method and device of purchasing model is identified by website visitation data
CN108446351A (en) * 2018-03-08 2018-08-24 携程计算机技术(上海)有限公司 The hotel's screening technique and system based on user preference of OTA platforms
CN108446849A (en) * 2018-03-21 2018-08-24 携程旅游网络技术(上海)有限公司 The appraisal procedure and its system of credit line, storage medium, electronic equipment
CN108573432A (en) * 2018-04-23 2018-09-25 重庆南米电子商务有限公司 Transaction supervisory systems and method for e-commerce

Also Published As

Publication number Publication date
CN109801112B (en) 2021-03-05

Similar Documents

Publication Publication Date Title
CN109460513A (en) Method and apparatus for generating clicking rate prediction model
CN107609890A (en) A kind of method and apparatus of order tracking
CN109961299A (en) The method and apparatus of data analysis
CN110371560A (en) Automatically the method and apparatus made an inventory
CN108776692A (en) Method and apparatus for handling information
CN113095893A (en) Method and device for determining sales of articles
CN109960650A (en) Application assessment method, apparatus, medium and electronic equipment based on big data
CN110019367A (en) A kind of method and apparatus of statistical data feature
WO2014110950A1 (en) Method and device for pushing information
CN110348921A (en) The method and apparatus that shops&#39;s article is chosen
CN109785072A (en) Method and apparatus for generating information
CN110020112A (en) Object Push method and its system
CN109002925A (en) Traffic prediction method and apparatus
CN110473043A (en) A kind of item recommendation method and device based on user behavior
CN109753424A (en) The method and apparatus of AB test
CN105574091B (en) Information-pushing method and device
CN110197317A (en) Target user determines method and device, electronic equipment and storage medium
CN109993566A (en) A kind of method and apparatus for predicting product objective data
CN108665312A (en) Method and apparatus for generating information
CN109840724A (en) Method and apparatus for output information
CN110472190A (en) The method and apparatus for filling ordered sequence
CN109801112A (en) The method and apparatus for calculating user&#39;s score value
CN110032283A (en) The method and apparatus that a kind of pair of associational word is ranked up
CN110827044A (en) Method and device for extracting user interest mode
CN109754199A (en) Information output method and device

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination
GR01 Patent grant
GR01 Patent grant
CP01 Change in the name or title of a patent holder

Address after: Room 221, 2 / F, block C, 18 Kechuang 11th Street, Beijing Economic and Technological Development Zone, 100176

Patentee after: Jingdong Technology Holding Co.,Ltd.

Address before: Room 221, 2 / F, block C, 18 Kechuang 11th Street, Beijing Economic and Technological Development Zone, 100176

Patentee before: Jingdong Digital Technology Holding Co.,Ltd.

Address after: Room 221, 2 / F, block C, 18 Kechuang 11th Street, Beijing Economic and Technological Development Zone, 100176

Patentee after: Jingdong Digital Technology Holding Co.,Ltd.

Address before: Room 221, 2 / F, block C, 18 Kechuang 11th Street, Beijing Economic and Technological Development Zone, 100176

Patentee before: JINGDONG DIGITAL TECHNOLOGY HOLDINGS Co.,Ltd.

CP01 Change in the name or title of a patent holder