CN109801112B

CN109801112B - Method and device for calculating user score

Info

Publication number: CN109801112B
Application number: CN201910107542.1A
Authority: CN
Inventors: 张元杰; 程建波; 吕军; 王美青
Original assignee: JD Digital Technology Holdings Co Ltd
Current assignee: JD Digital Technology Holdings Co Ltd; Jingdong Technology Holding Co Ltd
Priority date: 2019-02-02
Filing date: 2019-02-02
Publication date: 2021-03-05
Anticipated expiration: 2039-02-02
Also published as: CN109801112A

Abstract

The invention discloses a method and a device for calculating a user score, and relates to the technical field of computers. One embodiment of the method comprises: acquiring a first data set and a second data set; wherein the first data set comprises a plurality of first data, each first data comprising a first sample user data and a corresponding first sample user score, the second data set comprises a plurality of second data, each second data comprising a second sample user data and a corresponding second sample user score; sampling the first data set and the second data set by a self-service method to obtain sample data; calculating the sample data by using a quantile regression algorithm to obtain an estimated value; and calculating the score of the target user according to the estimation value and the data of the target user. This embodiment improves the accuracy of the user score calculation.

Description

Method and device for calculating user score

Technical Field

The invention relates to the technical field of computers, in particular to a method and a device for calculating a user score.

Background

Currently, in a user behavior research process, a common method is to use a large amount of normal user payment amount and corresponding normal user score, and a small amount of abnormal user payment amount and corresponding abnormal user score as data sets, calculate the data sets by using a traditional regression method to obtain an estimated value, and multiply the estimated value by a target user payment amount to obtain a target user score.

In the process of implementing the invention, the inventor finds that at least the following problems exist in the prior art:

first, since the data set includes a large amount of payment amounts of normal users, a small amount of payment amounts of abnormal users, and imbalance of the data set, the data set is calculated, the estimated value is biased to the normal users, the estimated value is inaccurate, the score of the target user is biased, and the score of the target user is also inaccurate.

Second, the conventional regression analysis method has the following problems: firstly, the estimation value obtained by the traditional regression analysis method is affected by extreme values, the estimation value is inaccurate, and the score of the target user is also inaccurate. Secondly, the traditional regression analysis method requires that the residual error meets normal distribution, but the residual error does not actually meet the normal distribution basically, and due to the change of distribution types, the calculation credibility is difficult to guarantee, the calculation accuracy is not high, and the score of the target user is also not accurate. Thirdly, the traditional regression analysis method is mean regression, only indexes of centralized trend of condition distribution are described, and the complete picture of condition distribution of dependent variables cannot be comprehensively described.

Disclosure of Invention

In view of this, embodiments of the present invention provide a method and an apparatus for calculating a user score, which can improve accuracy of an estimated value and accuracy of calculating a target user score.

To achieve the above object, according to an aspect of an embodiment of the present invention, there is provided a method of calculating a user score.

The method for calculating the user score comprises the following steps:

acquiring a first data set and a second data set; wherein the first data set comprises a plurality of first data, each first data comprising a first sample user data and a corresponding first sample user score, the second data set comprises a plurality of second data, each second data comprising a second sample user data and a corresponding second sample user score;

sampling the first data set and the second data set by a self-service method to obtain sample data;

calculating the sample data by using a quantile regression algorithm to obtain an estimated value;

and calculating the score of the target user according to the estimated value and the data of the target user.

In one embodiment, sampling the first data set and the second data set by a bootstrap method to obtain sample data includes:

extracting a first amount of the first data from the first data set by a self-service method for multiple times;

extracting a second amount of the second data from the second data set by adopting the self-service method, and extracting for multiple times;

and taking the set of the first quantity of the first data and the second quantity of the second data which are extracted each time as sample data which are extracted each time.

In one embodiment, a ratio of a first amount of data in the first data set to a second amount of data in the second data set is greater than 10, the ratio of the first amount to the second amount ranging from [0.1, 10 ].

In one embodiment, calculating the sample data by using a quantile regression algorithm to obtain an estimated value includes:

calculating the sample data extracted each time by adopting a quantile regression algorithm to obtain a reference value extracted each time;

adding the reference values extracted each time to obtain a total reference value;

and dividing the total reference value by the number of times of extraction to obtain an estimated value.

In one embodiment, calculating a target user score based on the estimate and the target user data comprises:

multiplying the estimated value by the target user data to obtain a target user score;

wherein the target user data comprises: any one of the user IP, the attribute of the user terminal, the kind of the product purchased by the user, the user purchase time, and the user payment amount.

To achieve the above object, according to another aspect of the embodiments of the present invention, there is provided an apparatus for calculating a user score.

The device for calculating the user score of the embodiment of the invention comprises the following components:

an acquisition unit for acquiring a first data set and a second data set; wherein the first data set comprises a plurality of first data, each first data comprising a first sample user data and a corresponding first sample user score, the second data set comprises a plurality of second data, each second data comprising a second sample user data and a corresponding second sample user score;

the sampling unit is used for sampling the first data set and the second data set by adopting a self-service method to obtain sample data;

the estimated value calculating unit is used for calculating the sample data by adopting a quantile regression algorithm to obtain an estimated value;

and the target user score calculating unit is used for calculating the target user score according to the estimated value and the target user data.

In one embodiment, the sampling unit includes:

the first sampling subunit is used for extracting a first quantity of the first data from the first data set by a self-service method for multiple times;

the second sampling subunit is used for extracting a second quantity of second data from the second data set by adopting the self-service method, and extracting for multiple times;

a set unit, configured to use a set of the first quantity of the first data and the second quantity of the second data extracted each time as sample data extracted each time.

In one embodiment, the estimate value calculation unit includes:

the reference value calculating unit is used for calculating the sample data extracted each time by adopting a quantile regression algorithm to obtain a reference value extracted each time;

the summing unit is used for summing the reference values extracted each time to obtain a total reference value;

and the estimated value operator unit is used for dividing the total reference value by the extraction times to obtain an estimated value.

In one embodiment, the target user score calculation unit is specifically configured to:

To achieve the above object, according to still another aspect of an embodiment of the present invention, there is provided an electronic apparatus.

An electronic device of an embodiment of the present invention includes: one or more processors; the storage device is used for storing one or more programs, and when the one or more programs are executed by the one or more processors, the one or more processors realize the method for calculating the user score provided by the embodiment of the invention.

To achieve the above object, according to still another aspect of an embodiment of the present invention, there is provided a computer-readable medium.

A computer-readable medium of an embodiment of the present invention stores thereon a computer program, which, when executed by a processor, implements the method for calculating a user score provided by an embodiment of the present invention.

One embodiment of the above invention has the following advantages or benefits: and (3) acquiring the first data set and the second data set, and sampling the first data set and the second data set by adopting a self-service method to obtain sample data, wherein the self-service method is random and has equal probability, and re-sampling is carried out again, so that the sample data is balanced, the accuracy of an estimated value is improved, and the accuracy of the calculation of the score of the target user is improved. And the quantile regression algorithm is adopted to calculate the sample data, compared with the traditional regression analysis method, the method has the advantages that the estimated value is more stable to the extreme value expression, the accuracy of the estimated value is improved, the overall appearance of the dependent variable condition distribution is more comprehensively described, and therefore the accuracy of the target user score calculation is improved.

Further effects of the above-mentioned non-conventional alternatives will be described below in connection with the embodiments.

Drawings

The drawings are included to provide a better understanding of the invention and are not to be construed as unduly limiting the invention. Wherein:

FIG. 1 is a schematic diagram of the main flow of a method of calculating a user score according to an embodiment of the present invention;

FIG. 2 is a schematic diagram of the main flow of a method of calculating a user score according to another embodiment of the present invention;

FIG. 3 is a schematic diagram of the main elements of an apparatus for calculating a user score according to an embodiment of the present invention;

FIG. 4 is a schematic diagram of the main elements of an apparatus for calculating a user score according to another embodiment of the present invention;

FIG. 5 is an exemplary system architecture diagram in which embodiments of the present invention may be employed;

fig. 6 is a schematic block diagram of a computer system suitable for use in implementing a terminal device or server of an embodiment of the invention.

Detailed Description

Exemplary embodiments of the present invention are described below with reference to the accompanying drawings, in which various details of embodiments of the invention are included to assist understanding, and which are to be considered as merely exemplary. Accordingly, those of ordinary skill in the art will recognize that various changes and modifications of the embodiments described herein can be made without departing from the scope and spirit of the invention. Also, descriptions of well-known functions and constructions are omitted in the following description for clarity and conciseness.

It should be noted that the embodiments and features of the embodiments may be combined with each other without conflict.

Currently, in a user behavior research process, a common means is to use a large amount of normal user payment amount and corresponding normal user score, and a small amount of abnormal user payment amount and corresponding abnormal user score as a data set, calculate the data set by using a traditional Regression method (Mean Regression) to obtain an estimated value, and multiply the estimated value by a target user payment amount to obtain a target user score. Thereby serving the target user, e.g., recommending a product, etc., according to the target user score.

The prior art has the following problems:

first, since the data set includes a large amount of payment amounts of normal users and a small amount of payment amounts of abnormal users, the data set is unbalanced, and when the calculation is performed based on the data set, the estimation value is biased to the normal users, the estimation value is inaccurate, the score of the target user is biased, and the score of the target user is also inaccurate.

Second, the conventional regression analysis method has the following problems:

firstly, the estimation value obtained by the traditional regression analysis method is affected by extreme values (such as variance, outlier, high lever value, etc.), and the estimation value and the score of the target user are inaccurate.

Secondly, the traditional regression analysis method requires that the residual error meets normal distribution, but the residual error does not actually meet the normal distribution basically, and due to the change of distribution types, the calculation credibility is difficult to guarantee, the calculation accuracy is not high, and the score of the target user is also not accurate.

Thirdly, the traditional regression analysis method is mean regression (the traditional regression analysis method is actually to study the condition expectation of the dependent variable and investigate the influence of the independent variable on the condition mean of the dependent variable, so the method is mean regression), only indexes of the centralized trend of the condition distribution are described, and the complete picture of the condition distribution of the dependent variable cannot be comprehensively described.

In order to solve the problems in the prior art, an embodiment of the present invention provides a method for calculating a user score, which may be performed by a server, as shown in fig. 1, and the method includes:

s101, acquiring a first data set and a second data set; wherein the first data set comprises a plurality of first data, each first data comprising a first sample user data and a corresponding first sample user score, the second data set comprises a plurality of second data, each second data comprising a second sample user data and a corresponding second sample user score.

And S102, sampling the first data set and the second data set by adopting a self-service method to obtain sample data.

In this step, if the ratio of the first data quantity of the first data set to the second data quantity of the second data set is greater than 10, in implementation, the first data of the first quantity may be extracted from the first data set, the second data of the second quantity may be extracted from the second data set by a self-service method, and a set of the first data of the first quantity and the second data of the second quantity is used as sample data. Therefore, because the second data amount of the second data set is small, the second data amount balanced with the first data amount cannot be obtained without adopting the self-service method, and the second data amount balanced with the first data amount cannot be obtained, which may have the following problems: sample data is unbalanced, the estimated value is inaccurate, and the accuracy of calculating the score of the target user is poor. In addition, in specific implementation, a self-service method may be adopted to extract a first amount of first data from the first data set, a self-service method may be adopted to extract a second amount of second data from the second data set, and a set of the first amount of first data and the second amount of second data is used as sample data.

And S103, calculating the sample data by adopting a quantile regression algorithm to obtain an estimated value.

In the step, during specific implementation, an objective function is obtained according to a quantile regression algorithm, a quantile value is set, and the quantile value and first data and second data in sample data are substituted into the objective function, so that an estimation value under the quantile is obtained. And setting values of different quantiles according to specific needs, so as to obtain estimated values under different quantiles. And the target user score is calculated based on the estimated values under different quantiles, so that the target user score is more accurate and more comprehensive.

And step S104, calculating the score of the target user according to the estimation value and the data of the target user.

In the step, in the specific implementation, the estimation value is multiplied by the target user data to obtain the target user score. And if the estimated value is the estimated value under different quantiles, multiplying the estimated value under each quantile by the target user data to obtain the target user scores under different quantiles. Due to the scores of the target users under different quantiles, the overall view of the condition distribution of the dependent variable can be comprehensively described.

In the embodiment, the first data set and the second data set are obtained, the self-help method is adopted to sample the first data set and the second data set, sample data is obtained, and the self-help method is random and has equal probability and re-sampling is carried out again, so that the sample data is balanced, the accuracy of the estimated value is improved, and the accuracy of the score calculation of the target user is improved. And the quantile regression algorithm is adopted to calculate the sample data, compared with the traditional regression analysis method, the method has the advantages that the estimated value is more stable to the extreme value expression, the accuracy of the estimated value is improved, the overall appearance of the dependent variable condition distribution is more comprehensively described, and therefore the accuracy of the target user score calculation is improved.

In order to solve the problems in the prior art, another embodiment of the present invention provides a method for calculating a user score, which may be performed by a server, as shown in fig. 2, and includes:

step S201, a first data set and a second data set are obtained; wherein the first data set comprises a plurality of first data, each first data comprising a first sample user data and a corresponding first sample user score, the second data set comprises a plurality of second data, each second data comprising a second sample user data and a corresponding second sample user score.

Step S202, extracting a first amount of the first data from the first data set by a self-service method, and extracting for multiple times.

In this step, a Bootstrap Method (Bootstrap Method, or Bootstrap sampling, which is a uniform sampling with put-back from a given training set) is used to randomly, equiprobustly, and with put-back resampling, obtain a first amount of first data.

It should be noted that, by extracting the first amount of first data for multiple times, sample data extracted each time is obtained, thereby avoiding the problems of low accuracy of the estimated value and low accuracy of the score calculation of the target user caused by poor sample data for one time. The stability of the calculation of the score of the target user is ensured.

And step S203, extracting a second amount of second data from the second data set by adopting the self-service method, and extracting for multiple times.

In this step, a second data set is randomly, equi-probabilistically, and re-sampled with a put-back to obtain a second amount of second data using a Bootstrap method (Bootstrap).

It should be noted that, by extracting a second amount of second data for multiple times, sample data extracted each time is obtained, thereby avoiding the problems of low accuracy of the estimated value and low accuracy of the score calculation of the target user caused by poor sample data at one time. The stability of the calculation of the score of the target user is ensured.

Step S204, using a set of the first amount of the first data and the second amount of the second data extracted each time as sample data extracted each time.

In this step, when implemented, a ratio of a first amount of data in the first data set to a second amount of data in the second data set is greater than 10, and a range of the ratio of the first amount to the second amount is [0.1, 10 ].

It should be appreciated that, since the ratio of the first amount of data in the first data set to the second amount of data in the second data set is greater than 10, the first amount of data in the first data set is much greater than the second amount of data in the second data set. For example, data generated by a normal user is used as the first data in the first data set, and data generated by an abnormal user (an abnormal user refers to a user who adopts an improper means for earning an improper benefit, such as a swipe user or a steal user) is used as the second data in the second data set. As is known, the number of normal users is very large, and the number of abnormal users is very small relative to the number of normal users, so that the number of first data in the first data set is much larger than the number of second data in the second data set. In the prior art, a set of first data in a first data set and second data in a second data set is directly used as a data set, and a traditional regression method is adopted to calculate the data set to obtain an estimated value. Because the quantity of the first data in the data set in the prior art is huge and the quantity of the second data is very small, the data set has the problem of imbalance, so that the estimated value is biased to normal users, the estimated value is inaccurate, the effectiveness of the estimated value is not high, and the calculation of the score of the target user is inaccurate. In the embodiment of the invention, a self-service method is adopted to extract a first quantity of first data from a first data set, a self-service method is adopted to extract a second quantity of second data from a second data set, a set of the first quantity of first data and the second quantity of second data is used as sample data, and the ratio range of the first quantity to the second quantity is [0.1, 10 ]. Therefore, in the sample data of the embodiment of the present invention, the number of the first data and the number of the second data are balanced, that is, the sample data is balanced, and thus, the estimated value does not deviate to any one party (that is, the estimated value does not deviate to a normal user or an abnormal user), the accuracy of the estimated value is high, the validity of the estimated value is high, and the accuracy of calculating the score of the target user is improved.

This step is illustrated below with a specific example: assuming that the first data set is data generated by 10000 normal users (the data generated by the normal users comprises normal user data and normal user scores), the second data set is data generated by 100 abnormal users (the data generated by the abnormal users comprises abnormal user data and abnormal user scores), extracting the data generated by 1000 normal users from the first data set by a self-help method, and extracting the data for 5 times; and extracting the data generated by 1000 abnormal users from the second data set by a self-help method for 5 times (it should be noted that the data generated by 1000 abnormal users are repeated due to the extraction by the self-help method). And taking a set of 1000 data generated by normal users and 1000 data generated by abnormal users extracted each time as the sample data extracted this time, and finally obtaining 5 sample data.

And S205, calculating the sample data extracted each time by adopting a quantile regression algorithm to obtain a reference value extracted each time.

In the step, the quantile regression algorithm is adopted to calculate the sample data extracted each time, so that the variation range of the independent variable to the dependent variable can be stably described in any quantile, the condition distribution of the dependent variable can be comprehensively described, and the application scene is richer.

In addition, the expression of the quantile regression algorithm:

wherein the content of the first and second substances,

represents the amount of the dependent variable,

represents a quantile, x is an independent variable, beta_iThe values are estimated (i is 1-n, n is the number of independent variables).

And according to a quantile regression algorithm, taking the weighted average of the absolute values of the residual errors as a minimized objective function. Thus, the expression of the objective function is:

wherein the content of the first and second substances,

representing quantiles which can be set artificially (0 < tau < 1);

ρ_τweights representing residuals at different quantiles are known quantities;

representing the residual error under different quantiles;

y^*is a dependent variable, x^*Is an argument, i is the number of data in the sample data, and j is the number of sampling times;

representing the estimates at different quantiles.

Based on the example of step S204, the following further describes the step with a specific example:

setting quantiles to be 0.5, and respectively substituting 0.5 and 5 sample data (the sample data comprises a first amount of first data (the first sample user data in the first data is an independent variable, and the corresponding first sample user score is a dependent variable) and a second amount of second data (the second sample user data in the second data is an independent variable, and the corresponding second sample user score is a dependent variable)) into an objective function to obtain 5 reference values with quantiles of 0.5;

setting the quantile to be 0.75, and respectively substituting 0.75 sample data and 5 sample data into the objective function to obtain 5 reference values with the quantile to be 0.75.

It should be noted that, the quantile regression algorithm is adopted to calculate the sample data, compared with the traditional regression analysis method, the quantile regression algorithm has the advantages that the estimated value is more stable to the extreme value, the accuracy of the estimated value is improved, different quantiles can be set by the quantile regression algorithm, the estimated values of different quantiles are obtained, the scores of target users with different quantiles are calculated according to the estimated values of the different quantiles, the overall view of the condition distribution of the dependent variable can be described more comprehensively, and therefore the accuracy of calculation of the scores of the target users is higher.

And step S206, adding the reference values extracted each time to obtain a total reference value.

In this step, based on the example of step S205, the following describes this step with a specific example:

adding 5 reference values with quantiles of 0.5 to obtain a total reference value with quantiles of 0.5 (assuming that the total reference value with quantiles of 0.5 obtained by calculation is 1);

the 5 reference values with quantiles of 0.75 are added to obtain a total reference value with quantiles of 0.75 (assuming that the calculated total reference value with quantiles of 0.75 is 2).

And step S207, dividing the total reference value by the extraction times to obtain an estimated value.

In this step, the number of times of extraction is the number of times of extracting the first number of first data from the first data set. In addition, since the number of times the first number of first data is extracted is the same as the number of times the second number of second data is extracted, the number of times the second number of second data is extracted from the second data set is also referred to as the number of times the second number of second data is extracted.

Based on the example of step S206, the following describes the step with a specific example:

dividing the total reference value (1) with a quantile of 0.5 by 5 to obtain an estimated value with a quantile of 0.5 (1/5-0.2);

the total reference value (2) with a quantile of 0.75 was divided by 5 to obtain an estimate with a quantile of 0.75 (2/5-0.4).

In addition, it should be understood that, after obtaining the estimated value, the estimated value may also be examined in the following ways: basic assumptions, significance, goodness-of-fit, outliers or actual meaning, and the like. And after the test is passed, multiplying the estimated value by the target user data to obtain the value of the target user.

Step S208, multiplying the estimated value by the target user data to obtain a target user score; wherein the target user data comprises: any one of the user IP, the attribute of the user terminal, the kind of the product purchased by the user, the user purchase time, and the user payment amount.

In this step, it should be noted that the attribute of the ue may be whether the ue is a frequently used ue; the user purchase time may be divided by the user's day (day includes 8 o' clock to 18 o 'clock) purchase or the user's night (night includes 0 o 'clock to 8 o' clock, or 18 o 'clock to 24 o' clock) purchase. In addition, through analysis of the historical big data, it is found that changes of any one of the user IP, the attribute of the user terminal, the type of a product purchased by the user, the user purchase time and the user payment amount all affect the target user score, and the target user score is calculated more accurately by taking the changes as the target user data, so that the user is analyzed more comprehensively.

Based on the example of step S207, the following further describes the step with a specific example:

assuming that the user payment amount is 1000, the target user score is 20(1000 × 0.2 ═ 20) or 40(1000 × 0.4 ═ 40).

The method of calculating the user score is described above in connection with fig. 1-2, and the apparatus for calculating the user score is described below in connection with fig. 3-4.

In order to solve the problems in the prior art, an embodiment of the present invention provides an apparatus for calculating a user score, the apparatus being executable by a server, as shown in fig. 3, and the apparatus including:

an acquiring unit 301 for acquiring a first data set and a second data set; wherein the first data set comprises a plurality of first data, each first data comprising a first sample user data and a corresponding first sample user score, the second data set comprises a plurality of second data, each second data comprising a second sample user data and a corresponding second sample user score.

The sampling unit 302 is configured to sample the first data set and the second data set by using a self-service method to obtain sample data.

And an estimated value calculating unit 303, configured to calculate the sample data by using a quantile regression algorithm to obtain an estimated value.

And a target user score calculating unit 304, configured to calculate a target user score according to the estimation value and the target user data.

In order to solve the problems of the prior art, another embodiment of the present invention provides an apparatus for calculating a user score, the apparatus being executable by a server, as shown in fig. 4, and the apparatus comprising:

an obtaining unit 401, configured to obtain a first data set and a second data set; wherein the first data set comprises a plurality of first data, each first data comprising a first sample user data and a corresponding first sample user score, the second data set comprises a plurality of second data, each second data comprising a second sample user data and a corresponding second sample user score.

A sampling unit 402, configured to sample the first data set and the second data set by using a self-service method to obtain sample data.

In a specific implementation, the sampling unit 402 includes:

the first sampling subunit 4021 is configured to extract a first amount of the first data from the first data set by a self-service method, and extract the first data multiple times.

A second sampling subunit 4022, configured to extract a second amount of the second data from the second data set by using the self-service method, and extract the second data multiple times.

A set unit 4023, configured to use a set of the first quantity of the first data and the second quantity of the second data extracted each time as sample data extracted each time.

Additionally, a ratio of a first amount of data in the first data set to a second amount of data in the second data set is greater than 10, the ratio of the first amount to the second amount ranging from [0.1, 10 ].

An estimated value calculating unit 403, configured to calculate the sample data by using a quantile regression algorithm to obtain an estimated value.

In specific implementation, the estimated value calculating unit 403 includes:

a reference value calculating unit 4031, configured to calculate the sample data extracted each time by using a quantile regression algorithm, so as to obtain a reference value extracted each time.

And a summing unit 4032, configured to sum the reference values extracted each time to obtain a total reference value.

And an estimated value operator unit 4033, configured to divide the total reference value by the number of times of extraction to obtain an estimated value.

And a target user score calculating unit 404, configured to calculate a target user score according to the estimation value and the target user data.

In specific implementation, the target user score calculating unit 404 is specifically configured to:

multiplying the estimated value by the target user data to obtain a target user score; wherein the target user data comprises: any one of the user IP, the attribute of the user terminal, the kind of the product purchased by the user, the user purchase time, and the user payment amount.

Fig. 5 illustrates an exemplary system architecture 500 to which the method of calculating a user score or the apparatus for calculating a user score of embodiments of the present invention may be applied.

As shown in fig. 5, the system architecture 500 may include

terminal devices

501, 502, 503, a network 504, and a server 505. The network 504 serves to provide a medium for communication links between the

terminal devices

501, 502, 503 and the server 505. Network 504 may include various connection types, such as wired, wireless communication links, or fiber optic cables, to name a few.

The user may use the

terminal devices

501, 502, 503 to interact with a server 505 over a network 504 to receive or send messages or the like. The

terminal devices

501, 502, 503 may have installed thereon various communication client applications, such as shopping-like applications, web browser applications, search-like applications, instant messaging tools, mailbox clients, social platform software, etc. (by way of example only).

The

terminal devices

501, 502, 503 may be various electronic devices having a display screen and supporting web browsing, including but not limited to smart phones, tablet computers, laptop portable computers, desktop computers, and the like.

The server 505 may be a server providing various services, such as a background management server (for example only) providing support for shopping websites browsed by users using the

terminal devices

501, 502, 503. The backend management server may analyze and perform other processing on the received data such as the product information query request, and feed back a processing result (for example, target push information, product information — just an example) to the terminal device.

It should be noted that the method for calculating the user score provided by the embodiment of the present invention is generally executed by the server 505, and accordingly, the apparatus for calculating the user score is generally disposed in the server 505.

It should be understood that the number of terminal devices, networks, and servers in fig. 5 is merely illustrative. There may be any number of terminal devices, networks, and servers, as desired for implementation.

Referring now to FIG. 6, a block diagram of a computer system 600 suitable for use with a terminal device implementing an embodiment of the invention is shown. The terminal device shown in fig. 6 is only an example, and should not bring any limitation to the functions and the scope of use of the embodiments of the present invention.

As shown in fig. 6, the computer system 600 includes a Central Processing Unit (CPU)601 that can perform various appropriate actions and processes according to a program stored in a Read Only Memory (ROM)602 or a program loaded from a storage section 608 into a Random Access Memory (RAM) 603. In the RAM 603, various programs and data necessary for the operation of the system 600 are also stored. The CPU 601, ROM 602, and RAM 603 are connected to each other via a bus 604. An input/output (I/O) interface 605 is also connected to bus 604.

The following components are connected to the I/O interface 605: an input portion 606 including a keyboard, a mouse, and the like; an output portion 607 including a display such as a Cathode Ray Tube (CRT), a Liquid Crystal Display (LCD), and the like, and a speaker; a storage section 608 including a hard disk and the like; and a communication section 609 including a network interface card such as a LAN card, a modem, or the like. The communication section 609 performs communication processing via a network such as the internet. The driver 610 is also connected to the I/O interface 605 as needed. A removable medium 611 such as a magnetic disk, an optical disk, a magneto-optical disk, a semiconductor memory, or the like is mounted on the drive 610 as necessary, so that a computer program read out therefrom is mounted in the storage section 608 as necessary.

In particular, according to the embodiments of the present disclosure, the processes described above with reference to the flowcharts may be implemented as computer software programs. For example, embodiments of the present disclosure include a computer program product comprising a computer program embodied on a computer readable medium, the computer program comprising program code for performing the method illustrated in the flow chart. In such an embodiment, the computer program may be downloaded and installed from a network through the communication section 609, and/or installed from the removable medium 611. The computer program performs the above-described functions defined in the system of the present invention when executed by the Central Processing Unit (CPU) 601.

It should be noted that the computer readable medium shown in the present invention can be a computer readable signal medium or a computer readable storage medium or any combination of the two. A computer readable storage medium may be, for example, but not limited to, an electronic, magnetic, optical, electromagnetic, infrared, or semiconductor system, apparatus, or device, or any combination of the foregoing. More specific examples of the computer readable storage medium may include, but are not limited to: an electrical connection having one or more wires, a portable computer diskette, a hard disk, a Random Access Memory (RAM), a read-only memory (ROM), an erasable programmable read-only memory (EPROM or flash memory), an optical fiber, a portable compact disc read-only memory (CD-ROM), an optical storage device, a magnetic storage device, or any suitable combination of the foregoing. In the present invention, a computer readable storage medium may be any tangible medium that can contain, or store a program for use by or in connection with an instruction execution system, apparatus, or device. In the present invention, however, a computer readable signal medium may include a propagated data signal with computer readable program code embodied therein, for example, in baseband or as part of a carrier wave. Such a propagated data signal may take many forms, including, but not limited to, electro-magnetic, optical, or any suitable combination thereof. A computer readable signal medium may also be any computer readable medium that is not a computer readable storage medium and that can communicate, propagate, or transport a program for use by or in connection with an instruction execution system, apparatus, or device. Program code embodied on a computer readable medium may be transmitted using any appropriate medium, including but not limited to: wireless, wire, fiber optic cable, RF, etc., or any suitable combination of the foregoing.

The flowchart and block diagrams in the figures illustrate the architecture, functionality, and operation of possible implementations of systems, methods and computer program products according to various embodiments of the present invention. In this regard, each block in the flowchart or block diagrams may represent a unit, segment, or portion of code, which comprises one or more executable instructions for implementing the specified logical function(s). It should also be noted that, in some alternative implementations, the functions noted in the block may occur out of the order noted in the figures. For example, two blocks shown in succession may, in fact, be executed substantially concurrently, or the blocks may sometimes be executed in the reverse order, depending upon the functionality involved. It will also be noted that each block of the block diagrams or flowchart illustration, and combinations of blocks in the block diagrams or flowchart illustration, can be implemented by special purpose hardware-based systems which perform the specified functions or acts, or combinations of special purpose hardware and computer instructions.

The units described in the embodiments of the present invention may be implemented by software or hardware. The described units may also be provided in a processor, and may be described as: a processor includes an acquisition unit, a sampling unit, an estimation value calculation unit, and a target user score calculation unit. The names of these units do not limit the unit itself under certain circumstances, and for example, a sampling unit may also be described as a "unit that samples the first data set and the second data set by self-service method to obtain sample data".

As another aspect, the present invention also provides a computer-readable medium that may be contained in the apparatus described in the above embodiments; or may be separate and not incorporated into the device. The computer readable medium carries one or more programs which, when executed by a device, cause the device to comprise: acquiring a first data set and a second data set; wherein the first data set comprises a plurality of first data, each first data comprising a first sample user data and a corresponding first sample user score, the second data set comprises a plurality of second data, each second data comprising a second sample user data and a corresponding second sample user score; sampling the first data set and the second data set by a self-service method to obtain sample data; calculating the sample data by using a quantile regression algorithm to obtain an estimated value; and calculating the score of the target user according to the estimated value and the data of the target user.

According to the technical scheme of the embodiment of the invention, the first data set and the second data set are obtained, the self-help method is adopted to sample the first data set and the second data set, and sample data is obtained. And the quantile regression algorithm is adopted to calculate the sample data, compared with the traditional regression analysis method, the method has the advantages that the estimated value is more stable to the extreme value expression, the accuracy of the estimated value is improved, the overall appearance of the dependent variable condition distribution is more comprehensively described, and therefore the accuracy of the target user score calculation is improved.

The above-described embodiments should not be construed as limiting the scope of the invention. Those skilled in the art will appreciate that various modifications, combinations, sub-combinations, and substitutions can occur, depending on design requirements and other factors. Any modification, equivalent replacement, and improvement made within the spirit and principle of the present invention should be included in the protection scope of the present invention.

Claims

1. A method of calculating a user score, comprising:

acquiring a first data set and a second data set; the first data set is a data set generated by a normal user and comprises a plurality of first data, and each first data comprises first sample user data and a corresponding first sample user score; the second data set is a data set generated by an abnormal user and comprises a plurality of second data, and each second data comprises second sample user data and a corresponding second sample user score;

extracting a first amount of first data from the first data set and a second amount of second data from the second data set by a self-service method, combining the extracted data into one set, and extracting for multiple times to obtain sample data extracted each time;

calculating the sample data extracted each time by adopting a quantile regression algorithm to obtain a reference value extracted each time, adding the reference values extracted each time to obtain a total reference value, and dividing the total reference value by the extraction times to obtain an estimated value;

and calculating a target user score according to the estimation value and the target user data so as to recommend the article according to the target user score.

2. The method of claim 1, wherein a ratio of a first amount of data in the first data set to a second amount of data in the second data set is greater than 10, and wherein the ratio of the first amount to the second amount ranges from [0.1, 10 ].

3. The method of claim 1, wherein calculating a target user score based on the estimated value and target user data comprises:

4. An apparatus for calculating a user score, comprising:

an acquisition unit for acquiring a first data set and a second data set; the first data set is a data set generated by a normal user and comprises a plurality of first data, and each first data comprises first sample user data and a corresponding first sample user score; the second data set is a data set generated by an abnormal user and comprises a plurality of second data, and each second data comprises second sample user data and a corresponding second sample user score;

a sampling unit comprising:

the first sampling subunit is used for extracting a first amount of first data from the first data set by a self-service method for multiple times;

the second sampling subunit is used for extracting a second quantity of second data from the second data set by adopting the self-service method for multiple times;

a set unit, configured to use a set of the first quantity of first data and the second quantity of second data extracted each time as sample data extracted each time;

an estimated value calculation unit including:

the estimated value operator unit is used for dividing the total reference value by the extraction times to obtain an estimated value;

and the target user score calculating unit is used for calculating a target user score according to the estimation value and the target user data so as to recommend the article according to the target user score.

5. The apparatus of claim 4, wherein a ratio of a first amount of data in the first data set to a second amount of data in the second data set is greater than 10, and wherein the ratio of the first amount to the second amount ranges from [0.1, 10 ].

6. The apparatus according to claim 4, wherein the target user score calculation unit is specifically configured to:

7. An electronic device, comprising:

one or more processors;

a storage device for storing one or more programs,

when executed by the one or more processors, cause the one or more processors to implement the method of any one of claims 1-3.

8. A computer-readable medium, on which a computer program is stored, which, when being executed by a processor, carries out the method according to any one of claims 1-3.