CN111325375B

CN111325375B - Data correction method and device, computer storage medium and electronic equipment

Info

Publication number: CN111325375B
Application number: CN201811528415.0A
Authority: CN
Inventors: 黄坤; 张文; 赫南; 刘罗文
Original assignee: Beijing Wodong Tianjun Information Technology Co Ltd
Current assignee: Beijing Wodong Tianjun Information Technology Co Ltd
Priority date: 2018-12-13
Filing date: 2018-12-13
Publication date: 2024-06-18
Anticipated expiration: 2038-12-13
Also published as: CN111325375A

Abstract

The present disclosure provides a data correction method, apparatus, computer readable medium and electronic device, the method comprising: acquiring a target value and a first predicted value corresponding to the target value, wherein a first offset exists between the distribution of the first predicted value and the distribution of the target value; performing order-preserving fitting according to the target value and the first predicted value to obtain a first fitting function, and obtaining a second predicted value according to the first fitting function, wherein the amplitude of a second offset between the distribution of the second predicted value and the distribution of the target value is smaller than that of the first offset; and carrying out smooth fitting according to the target value and the second predicted value to obtain a second fitting function, and obtaining a target predicted value according to the second fitting function, wherein the amplitude of a third offset between the distribution of the target predicted value and the distribution of the target value is smaller than that of the second offset. The method and the device can correct data offset and improve the accuracy of data prediction.

Description

Data correction method and device, computer storage medium and electronic equipment

Technical Field

The present disclosure relates to the field of computers, and in particular, to a data correction method, a data correction device, a computer readable storage medium, and an electronic apparatus.

Background

Along with the development of science and technology, the traditional commodity transaction mode is gradually replaced by an electronic commerce mode, more and more advertisers choose to put advertisements in an electronic commerce platform, and consumers click on links of advertisement commodities according to own demands so as to know commodity details and further confirm whether to purchase. To assist advertisers in developing reasonable advertisement placement strategies and real-time bidding of advertisements, it is often desirable to predict the click-through rate of a user based on characteristics of the advertisement and the user.

The click rate is the ratio of the click rate to the advertisement display amount, the click rate is predicted mainly through a data model at present, but the predicted click rate predicted by the existing data model and the click rate actually fed back still have certain deviation, the deviation has random noise and obvious deviation according to a certain rule, and the deviation phenomenon of different degrees but similar forms appears for models with different structures and different complexity degrees. The existence of the offset phenomenon causes inaccuracy in the predicted click rate, affects the bidding strategy of advertisers, and the inaccuracy degree is amplified step by step as the advertising system proceeds, thereby affecting other parts.

In view of the foregoing, there is a need in the art for developing a new data correction method and apparatus.

It should be noted that the information disclosed in the above background section is only for enhancing understanding of the background of the present disclosure and thus may include information that does not constitute prior art known to those of ordinary skill in the art.

Disclosure of Invention

The disclosure aims to provide a data correction method, a data correction device, a computer readable storage medium and an electronic device, so as to reduce deviation between predicted data and real data at least to a certain extent and improve accuracy of data prediction.

Other features and advantages of the present disclosure will be apparent from the following detailed description, or may be learned in part by the practice of the disclosure.

According to one aspect of the present disclosure, there is provided a data correction method, the method comprising:

acquiring a target value and a first predicted value corresponding to the target value, wherein a first offset exists between the distribution of the first predicted value and the distribution of the target value;

performing order-preserving fitting according to the target value and the first predicted value to obtain a first fitting function, and obtaining a second predicted value according to the first fitting function, wherein a second offset exists between the distribution of the second predicted value and the distribution of the target value, and the amplitude of the second offset is smaller than that of the first offset;

And carrying out smooth fitting according to the target value and the second predicted value to obtain a second fitting function, and obtaining a target predicted value according to the second fitting function, wherein a third offset exists between the distribution of the target predicted value and the distribution of the target value, and the amplitude of the third offset is smaller than that of the second offset.

In an exemplary embodiment of the disclosure, the performing order-preserving fitting according to the target value and the first predicted value to obtain a first fitting function includes:

taking the first predicted value as an input vector, and taking the target value as a fitting target to input the first regression model;

And performing piecewise positive sequence fitting on the target value and the first predicted value through the first regression model to obtain the first fitting function.

In an exemplary embodiment of the disclosure, the performing, by the first regression model, a piecewise positive sequence fit to the target value and the first predicted value to obtain the first fit function includes:

acquiring a plurality of coordinate points positioned in a two-dimensional coordinate system according to the target value and the first predicted value;

and performing piecewise positive sequence fitting according to the change trend of each coordinate point so as to obtain the first fitting function.

In an exemplary embodiment of the disclosure, the performing a piecewise positive sequence fitting according to a variation trend of each coordinate point to obtain the first fitting function includes:

comparing the ordinate corresponding to the two adjacent coordinate points in sequence, and judging whether the ordinate of the (N+1) th coordinate point is smaller than the ordinate of the (N) th coordinate point;

When the first fitting function is judged to be not present, connecting the Nth coordinate point with the (n+1) th coordinate point to obtain the first fitting function;

When judging that the first fitting function exists, acquiring an nth ' coordinate point and an nth ' +1th coordinate point according to the ordinate of the nth coordinate point and the ordinate of the (n+1) th coordinate point, comparing the ordinate of the (N ' +1) th coordinate point with the ordinate of the (n+2) th coordinate point, repeating the steps until the ordinate of the (n+n) th coordinate point is greater than or equal to the ordinate of the (N ' +n-1) th coordinate point, and sequentially connecting the (N ' th coordinate point to the (n+n) th coordinate point to acquire the first fitting function;

Wherein the N '+n-1 coordinate point has the same abscissa as the N+n-1 coordinate point, and N, N' and N are positive integers.

In an exemplary embodiment of the present disclosure, the acquiring the nth 'coordinate point and the nth' +1 coordinate point according to the ordinate of the nth coordinate point and the ordinate of the (n+1) th coordinate point includes:

And calculating an average value of the ordinate of the Nth coordinate point and the ordinate of the (N+1) th coordinate point, and taking the average value as the ordinate of the (N 'th) coordinate point and the (N' +1) th coordinate point.

In an exemplary embodiment of the disclosure, the obtaining a second predicted value according to the first fitting function includes:

And taking the first predicted value as an independent variable of the first fitting function to obtain the second predicted value corresponding to the first predicted value.

In an exemplary embodiment of the disclosure, the performing smooth fitting according to the target value and the second predicted value to obtain a second fitting function includes:

Taking the second predicted value as an input vector, and taking the target value as a fitting target to be input into a second regression model;

and carrying out smooth fitting on the target value and the second predicted value through the second regression model to obtain the second fitting function.

In an exemplary embodiment of the disclosure, the smoothing fitting the target value and the second predicted value by the second regression model to obtain the second fitting function includes:

acquiring a plurality of coordinate points to be smoothed, which are positioned in a two-dimensional coordinate system, according to the target value and the second predicted value;

Taking any coordinate point to be smoothed as a target coordinate point, and acquiring the coordinate point to be smoothed within a preset range from the target coordinate point;

and carrying out linear regression on the target coordinate point and the coordinate point to be smoothed in the preset range so as to obtain the second fitting function.

In an exemplary embodiment of the present disclosure, the obtaining, with any one of the coordinate points to be smoothed as a target coordinate point, a coordinate point to be smoothed within a preset range from the target coordinate point includes:

calculating the weight corresponding to each second predicted value;

Sequencing the second predicted values from large to small according to the weights corresponding to the second predicted values to form a sequence;

sequentially obtaining a preset number of second predicted values from the front end of the sequence, and taking coordinate points to be smoothed corresponding to the preset number of second predicted values as coordinate points to be smoothed in the preset range.

calculating the weight corresponding to each second predicted value;

comparing the weight corresponding to each second predicted value with a preset weight, and determining a coordinate point to be smoothed in the preset range according to a comparison result;

And if a second predicted target value with the weight greater than the preset weight exists, taking the coordinate point to be smoothed corresponding to the second predicted target value as the coordinate point to be smoothed in the preset range.

In an exemplary embodiment of the disclosure, the calculating the weight corresponding to each of the second predicted values includes:

and calculating the weight corresponding to each second predicted value through a Gaussian kernel function based on the second predicted value corresponding to each coordinate point to be smoothed.

In an exemplary embodiment of the present disclosure, the performing linear regression on the target coordinate point and the coordinate point to be smoothed within the preset range to obtain the second fitting function includes:

Obtaining linear regression parameters according to the target coordinate points and coordinate points to be smoothed in the preset range;

And determining the second fitting function according to the linear regression parameters and the linear regression model.

In an exemplary embodiment of the present disclosure, obtaining the target prediction value according to the second fitting function includes:

And taking the second predicted value as an independent variable of the second fitting function to acquire the target predicted value corresponding to the second predicted value.

According to an aspect of the present disclosure, there is provided a data correction apparatus including:

the data acquisition module is used for acquiring a target value and a first predicted value corresponding to the target value, wherein a first offset exists between the distribution of the first predicted value and the distribution of the target value;

the order-preserving fitting module is used for carrying out order-preserving fitting according to the target value and the first predicted value to obtain a first fitting function and obtaining a second predicted value according to the first fitting function, wherein a second offset exists between the distribution of the second predicted value and the distribution of the target value, and the amplitude of the second offset is smaller than that of the first offset;

And the smooth fitting module is used for carrying out smooth fitting according to the target value and the second predicted value to obtain a second fitting function and obtaining a target predicted value according to the second fitting function, wherein a third offset exists between the distribution of the target predicted value and the distribution of the target value, and the amplitude of the third offset is smaller than that of the second offset.

According to one aspect of the present disclosure, there is provided a computer storage medium having stored thereon a computer program which, when executed by a processor, implements a data correction method as described above.

According to one aspect of the present disclosure, there is provided an electronic device including:

a processor; and

A memory for storing executable instructions of the processor;

Wherein the processor is configured to perform the data correction method as described above via execution of the executable instructions.

As can be seen from the above technical solutions, the data correction method and apparatus, the computer-readable storage medium, and the electronic device in the exemplary embodiments of the present disclosure have at least the following advantages and positive effects:

According to the data correction method, a target value and a first predicted value corresponding to the target value are firstly obtained, and then order-preserving fitting and smooth fitting are respectively carried out according to the first predicted value and the target value, so that a target predicted value with the minimum deviation from the target value is obtained. According to the data correction method, on one hand, the accuracy of data correction can be improved through twice fitting, and the accuracy of data prediction is further improved; on the other hand, the user can formulate a precise business strategy according to the corrected data, and the user experience is improved.

It is to be understood that both the foregoing general description and the following detailed description are exemplary and explanatory only and are not restrictive of the disclosure.

Drawings

The accompanying drawings, which are incorporated in and constitute a part of this specification, illustrate embodiments consistent with the disclosure and together with the description, serve to explain the principles of the disclosure. It will be apparent to those of ordinary skill in the art that the drawings in the following description are merely examples of the disclosure and that other drawings may be derived from them without undue effort.

FIG. 1 shows a flow diagram of a data correction method in an exemplary embodiment of the present disclosure;

FIG. 2 illustrates an exemplary diagram of an application scenario of a data correction method in an exemplary embodiment of the present disclosure;

FIG. 3 illustrates a flow diagram of a method of obtaining a first fitting function in an exemplary embodiment of the present disclosure;

FIG. 4 illustrates a flow diagram of obtaining a first fitting function by piecewise positive sequence fitting in an exemplary embodiment of the present disclosure;

FIG. 5 illustrates a regression function graph after a piecewise conservation regression fit in an exemplary embodiment of the present disclosure;

FIG. 6 shows a flow diagram of a method of obtaining a second fitting function in an exemplary embodiment of the present disclosure;

Fig. 7 is a flowchart illustrating a method of acquiring a coordinate point to be smoothed within a preset range from a target coordinate point in an exemplary embodiment of the present disclosure;

fig. 8 is a flowchart illustrating another method of acquiring a coordinate point to be smoothed within a preset range from a target coordinate point in an exemplary embodiment of the present disclosure;

FIG. 9 is a schematic diagram showing the structure of a data correction device according to an exemplary embodiment of the present disclosure;

FIG. 10 illustrates a schematic diagram of a computer storage medium in an exemplary embodiment of the present disclosure;

Fig. 11 shows a schematic structural diagram of an electronic device in an exemplary embodiment of the present disclosure.

Detailed Description

Example embodiments will now be described more fully with reference to the accompanying drawings. However, the exemplary embodiments may be embodied in many forms and should not be construed as limited to the examples set forth herein; rather, these embodiments are provided so that this disclosure will be thorough and complete, and will fully convey the concept of the example embodiments to those skilled in the art. The described features, structures, or characteristics may be combined in any suitable manner in one or more embodiments. In the following description, numerous specific details are provided to give a thorough understanding of embodiments of the present disclosure. One skilled in the relevant art will recognize, however, that the aspects of the disclosure may be practiced without one or more of the specific details, or with other methods, components, devices, steps, etc. In other instances, well-known technical solutions have not been shown or described in detail to avoid obscuring aspects of the present disclosure.

The terms "a," "an," "the," and "said" are used in this specification to denote the presence of one or more elements/components/etc.; the terms "comprising" and "having" are intended to be inclusive and mean that there may be additional elements/components/etc. in addition to the listed elements/components/etc.; the terms "first" and "second" and the like are used merely as labels, and are not intended to limit the number of their objects.

Furthermore, the drawings are merely schematic illustrations of the present disclosure and are not necessarily drawn to scale. The same reference numerals in the drawings denote the same or similar parts, and thus a repetitive description thereof will be omitted. Some of the block diagrams shown in the figures are functional entities and do not necessarily correspond to physically or logically separate entities.

In order to solve the problem of the deviation between the true value and the predicted value, so that the predicted value outputted by the model reaches the ideal result, the deviation is usually corrected by two methods, the first method is prate correction (PLATT SCALING) and the second method is warranty regression (Isotonic Regression), and the two methods are specifically described by taking the correction of the predicted click rate as an example.

(1) The core of the prater correction is that the probability offset is considered to have the same amplitude and opposite directions on the 0 side and the 1 side and accords with the shape characteristics of the Sigmoid function, so that the prater correction is implemented by taking the predicted click rate (PCTR) as input and the real 1/0 as output, and fitting a logistic regression, namely PCTR '=sigmoid (w·pctr+b), wherein PCTR' is the corrected predicted click rate, PCTR is the original predicted click rate, and w and b are correction parameters.

(2) The core of the order preserving regression is to consider that the relationship between the Predicted Click Through Rate (PCTR) and the true Click Through Rate (CTR) is always positive-correlated, so long as the fitted function is guaranteed to be a positive-correlation function (i.e., PCTR values are corrected without changing the AUC of the model). Specifically, PCTR is taken as input, CTR is taken as a fitting target to perform piecewise order-preserving linear regression fitting, and a value obtained by fitting is taken as output, namely, a fitting function in a certain section meets PCTR' =w _i·PCTR+b_i, wherein w _i >0, so that all PCTR is provided withPCTR₁>PCTR₂,PCTR₁'>PCTR₂’。

In order to achieve an ideal prediction effect, the pctr=ctr must be able to adapt to any offset relationship between the PCTR and the CTR, and must be a continuous function like the CTR, while the method of using pratt correction in the related art cannot ensure that all data offsets conform to the characteristics of the Sigmoid function, and lacks universality for the situation of asymmetric offset; the method of using the order preserving regression ensures that the fitting function is always positively correlated, but the piecewise fitting can lead to the situation that the piecewise functions cannot be smoothly connected, namely, the value of a certain point function is disconnected, the PCTR can not be ensured to be a continuous function by the function, meanwhile, the noise of the data always has interference on the fitting, the slope of a function curve can be rapidly reduced if relatively large noise is encountered in the fitting process, the phenomenon can lead the fitted function to form a platform with almost zero slope by combining the characteristics of the algorithm, and the phenomenon also reduces the order preserving effect while influencing the continuity.

Based on the problems existing in the related art, a data correction method is proposed in one embodiment of the present disclosure to perform an optimization process on the above problems. Referring specifically to fig. 1, the data correction method may be executed by a server, and at least includes the following steps:

step S110: acquiring a target value and a first predicted value corresponding to the target value, wherein a first offset exists between the distribution of the first predicted value and the distribution of the target value;

Step S120: performing order-preserving fitting according to the target value and the first predicted value to obtain a first fitting function, and obtaining a second predicted value according to the first fitting function, wherein a second offset exists between the distribution of the second predicted value and the distribution of the target value, and the amplitude of the second offset is smaller than that of the first offset;

Step S130: and carrying out smooth fitting according to the target value and the second predicted value to obtain a second fitting function, and obtaining a target predicted value according to the second fitting function, wherein a third offset exists between the distribution of the target predicted value and the distribution of the target value, and the amplitude of the third offset is smaller than that of the second offset.

According to the data correction method in the embodiment of the disclosure, on one hand, order-preserving fitting and smooth fitting can be sequentially carried out on the target value and the first predicted value, so that the continuity of an output function is ensured, and the accuracy of data prediction is improved; on the other hand, the user experience is improved by guiding the analysis of the user through the accurate predicted value.

In order to make the technical solution of the present disclosure clearer, next, taking correction of the predicted click rate as an example, each step of the data correction method in the present disclosure will be described in detail with reference to the configuration shown in fig. 2.

In step S110, a target value and a first predicted value corresponding to the target value are acquired, with a first offset between a distribution of the first predicted value and a distribution of the target value.

In an exemplary embodiment of the present disclosure, a developer may obtain data related to a target advertisement, such as a display amount of the advertisement, a click-through amount of the advertisement, etc., from a data repository in the terminal device 210, and an advertiser may obtain a real click-through rate of the target advertisement according to the click-through amount and the display amount. Advertisers need to predict the click-through rate of the advertisement to be placed before placing the advertisement to help the advertiser to reasonably bid, but the efficiency of predicting the click-through rate manually is low and the accuracy is poor. In order to improve the efficiency and accuracy of predicting the click rate of the advertisement, the click rate may be predicted by a prediction model provided in the server 202, wherein the prediction model is trained according to real data, the obtained data related to the target advertisement may be input into the prediction model to obtain a corresponding predicted click rate, and the predicted click rate output by the prediction model is recorded as a first predicted value. However, noise may exist in the process of processing the data by the prediction model, or a deviation between the first predicted value and the target value may exist due to insufficient accuracy of parameters of the prediction model itself, and thus, a distribution curve corresponding to the first predicted value is deviated from a distribution curve corresponding to the target value, and the deviation is denoted as a first deviation.

In step S120, order-preserving fitting is performed according to the target value and the first predicted value to obtain a first fitting function, and a second predicted value is obtained according to the first fitting function, where a second offset exists between the distribution of the second predicted value and the distribution of the target value, and the magnitude of the second offset is smaller than the magnitude of the first offset.

In the exemplary embodiment of the present disclosure, since there is an offset between the distribution of the real values and the distribution of the first predicted values, and the offset degree, the offset direction, and the offset shape are not exactly the same, the corresponding functions are not unique, so in order to correct the offset, to improve the universality of the functions, order-preserving fitting may be performed on the target value and the first predicted value.

In an exemplary embodiment of the present disclosure, when performing order-preserving fitting on the target value and the first predicted value, specifically, the first predicted value may be used as an input vector, the target value may be used as a fitting target to be input to a first regression model, and the target value and the first predicted value may be subjected to piecewise positive order fitting by the first regression model to obtain a first fitting function. In an embodiment of the present disclosure, the first regression model may specifically be an order-preserving regression model by which the first predicted value and the true value are subjected to a piecewise positive order regression fit to obtain the first fitting function.

Fig. 3 shows a flow diagram of a method of obtaining a first fitting function, as shown in fig. 3:

In step S301, a plurality of coordinate points located in a two-dimensional coordinate system are acquired based on the target value and the first predicted value.

In an exemplary embodiment of the present disclosure, the first predicted values may be taken as abscissa and the target values as ordinate, and the coordinate points corresponding to the respective first predicted values may be uniquely determined in the two-dimensional coordinate system from the plurality of sets of the first predicted values and the target values corresponding to the first predicted values.

In step S302, a piecewise positive sequence fitting is performed according to the variation trend of each coordinate point, so as to obtain a first fitting function.

In an exemplary embodiment of the present disclosure, the first predicted value and the target value are subjected to piecewise positive-sequence regression fitting, so that a function obtained after fitting is ensured to be a non-decreasing function. Fig. 4 shows a schematic flow chart of obtaining a first fitting function by piecewise positive sequence fitting, as shown in fig. 4:

in step S401, the ordinate corresponding to the two adjacent coordinate points are sequentially compared, and whether the ordinate of the (n+1) th coordinate point is smaller than the ordinate of the (N) th coordinate point is determined; the ordinate corresponding to the coordinate point is the target value corresponding to the first predicted value;

In step S402, when it is determined that there is no n+1th coordinate point whose ordinate is smaller than that of the N-th coordinate point, the N-th coordinate point is connected to the n+1th coordinate point to acquire a first fitting function; the graph formed by connecting the coordinate points is a function curve corresponding to the first fitting function, and the preserving regression is mainly piecewise preserving linear regression, so that the linear regression parameters can be determined according to the coordinate values of a plurality of coordinate points in each piecewise, and the first fitting function is further determined;

In step S403, when it is determined that there is an n+1th coordinate point whose ordinate is smaller than that of the N-th coordinate point, acquiring an N 'th coordinate point and an N' +1th coordinate point according to the N 'th coordinate point and the n+1th coordinate point, then comparing the N' th coordinate point with the n+2th coordinate point, repeating the above steps until the n+n 'th coordinate point has an ordinate greater than or equal to that of the N' +n-1 th coordinate point, and sequentially connecting the N 'th coordinate point to the n+n' th coordinate point to acquire a first fitting function; wherein the N '+n-1 coordinate point is the same as the abscissa of the N+n-1 coordinate point, and N, N' and N are both positive integers.

Further, in step S403, an average value of the ordinate of the nth coordinate point and the ordinate of the n+1th coordinate point may be obtained, and the average value may be taken as the ordinate of the nth 'coordinate point and the N' +1th coordinate point. For example, when a piece of data {9,10,14,12} exists, a first piece of data {9,10} is selected first, and the second piece of data is larger than the first piece of data, so as to meet the requirement of positive sequence; then selecting a second group of data {9,10,14}, wherein the third data is larger than the second data, so as to meet the requirement of positive sequence; then selecting a third group of data {9,10,14,12}, wherein the fourth data is smaller than the third data, and the positive sequence requirement is not satisfied, and adding the third data and the fourth data to obtain an average value of 13, so that the data after the piecewise positive sequence regression fitting can be determined to be {9,10,13,13}.

In an exemplary embodiment of the present disclosure, fig. 5 shows a regression function graph after piecewise conservation regression fitting, as shown in fig. 5, in which discrete points are data to be fitted, and a straight line is a regression function curve formed after linear regression of the discrete points; the discontinuous dotted line is a regression function curve formed after the discrete points are subjected to the sectional order preserving regression. As can be seen from fig. 5, the regression function formed by the piecewise conservation regression is not a continuous function, and there are a plurality of platforms in the regression function curve.

In an exemplary embodiment of the present disclosure, after the first fitting function is obtained, the first predicted value may be substituted into the first fitting function as an argument to perform an operation, so that a second predicted value corresponding to the first predicted value may be obtained, in other words, an ordinate corresponding to a coordinate point corresponding to the first predicted value on the discontinuous point line in fig. 5 is the second predicted value.

In an exemplary embodiment of the present disclosure, the second predicted value obtained after the piecewise order-preserving fitting is closer to the target value than the first predicted value, but there may be a deviation from the target value, i.e., there is still a second deviation between the distributions of the second predicted value and the distribution of the target value, but the magnitude of the second deviation is smaller than the magnitude of the first deviation.

In step S130, a smooth fitting is performed according to the target value and the second predicted value to obtain a second fitting function, and a target predicted value is obtained according to the second fitting function, where a third offset exists between the distribution of the target predicted value and the distribution of the target value, and the magnitude of the third offset is smaller than the magnitude of the second offset.

In an exemplary embodiment of the present disclosure, the first fitting function obtained through the order preserving regression is discontinuous, a platform inevitably occurs in a function curve, and in order to improve the continuity of the function, the first fitting function may be smoothed, that is, a target value and a second predicted value may be smoothly fitted, so as to obtain a second fitting function.

In an exemplary embodiment of the present disclosure, the second predicted value may be used as an input vector, the target value is used as a fitting target, and the second Regression model is used to perform smooth fitting on the target value and the second predicted value to obtain a second fitting function, where the second Regression model may specifically be a local weighted Regression model (Loess Regression), and of course may also be other models that may smooth a function image and ensure function continuity, and the disclosure is not repeated herein.

In an exemplary embodiment of the present disclosure, fig. 6 shows a flow diagram of a method of obtaining a second fitting function, as shown in fig. 6:

In step S601, a plurality of coordinate points to be smoothed located in a two-dimensional coordinate system are obtained according to the target value and the second predicted value; the corresponding abscissa of the coordinate point to be smoothed in the two-dimensional coordinate system is a second predicted value, and the ordinate is a target value;

In step S602, taking any coordinate point to be smoothed as a target coordinate point, and obtaining a coordinate point to be smoothed within a preset range from the target coordinate point;

in step S603, linear regression is performed on the target coordinate point and the coordinate point to be smoothed within the preset range to obtain a second fitting function.

Further, the coordinate points to be smoothed within the preset range from the target coordinate point may be obtained by a plurality of methods, and fig. 7 is a flowchart illustrating a method for obtaining the coordinate points to be smoothed within the preset range from the target coordinate point, as shown in fig. 7:

in step S701, a weight corresponding to each second predicted value is calculated.

In an exemplary embodiment of the present disclosure, the weight corresponding to each second predicted value may be calculated by a gaussian kernel function, an expression of which is shown in formula (1):

wherein w _i is the weight corresponding to the second predicted value, x _i is the second predicted value corresponding to any coordinate point to be smoothed except the target coordinate point, x _j is the second predicted value corresponding to the target coordinate point, and τ is a parameter.

As shown by analysis of the gaussian kernel function, the weight corresponding to the second predicted value is smaller as the distance from the target coordinate point is farther; the closer the distance to the target coordinate point is, the greater the weight corresponding to the second predicted value is.

In step S702, the second predicted values are ranked from large to small according to the weights corresponding to the second predicted values to form a sequence.

In the exemplary embodiment of the present disclosure, since the weight corresponding to the second predicted value that is closer to the target coordinate point is greater, the second predicted values may be sorted according to the weight sizes corresponding to the second predicted values to form a sequence, and then the coordinate point to be smoothed corresponding to the second predicted value with a partially high weight is selected from the sequence to perform regression with the target coordinate point at the same time.

In step S703, a preset number of second predicted values are sequentially obtained from the front end of the sequence, and the coordinate points to be smoothed corresponding to the preset number of second predicted values are used as the coordinate points to be smoothed in the preset range.

In an exemplary embodiment of the disclosure, after a sequence formed by the second predicted values according to the weight sequence is obtained, a preset number of second predicted values at the front end of the sequence are sequentially intercepted, and corresponding coordinate points to be smoothed are obtained according to the second predicted values, so that the coordinate points to be smoothed in a preset range can be determined.

Fig. 8 is a flowchart illustrating another method for acquiring a coordinate point to be smoothed within a preset range from a target coordinate point, as shown in fig. 8:

In step S801, a weight corresponding to each second predicted value is calculated; the method for calculating the weight corresponding to the second predicted value is the same as the method for calculating the weight in step S701, and will not be described here again.

In step S802, comparing the weight corresponding to each second predicted value with a preset weight, and determining a coordinate point to be smoothed within a preset range according to the comparison result; the preset weight may be set according to practical situations, for example, set to 0.8, 0.6, etc., which is not specifically limited in the present disclosure.

In step S803, if there is a second predicted target value with a weight greater than the preset weight, the coordinate point to be smoothed corresponding to the second predicted target value is used as the coordinate point to be smoothed within the preset range.

In an exemplary embodiment of the present disclosure, the regression fitting is performed on the second predicted value mainly by a locally weighted regression model, where the locally weighted regression algorithm is a non-parametric learning algorithm, and for a general regression algorithm, the corresponding loss function isFor the local weighted regression algorithm, the corresponding loss function isWherein w _i is the weight corresponding to each second predicted value. And after determining the coordinate points to be smoothed within a preset range from the target coordinate point, obtaining a second fitting function by carrying out linear regression on the target coordinate point and the coordinate points to be smoothed within the preset range. Specifically, the linear regression parameter θ may be determined according to the target coordinate point and the coordinate point to be smoothed within the preset range, where the value of the parameter θ is a value that makes the loss function obtain the minimum value, and finally the second fitting function may be determined according to the linear regression parameter and the linear regression model.

In an exemplary embodiment of the disclosure, the second predicted value may be substituted as an argument into the second fitting function to perform an operation, so as to obtain a target predicted value corresponding to the second predicted value, that is, an ordinate corresponding to a coordinate point corresponding to the second predicted value in the second fitting function curve is the target predicted value.

In the embodiment of the disclosure, the continuity of the function can be ensured by performing smooth fitting on the first fitting function, a platform is avoided from occurring in the function curve, the obtained target predicted value is closer to the target value, that is, a third offset exists between the distribution of the target predicted value and the distribution of the target value, and the amplitude of the third offset is smaller than that of the second offset.

The data correction method combines two algorithms of the order preserving regression and the local weighted regression, wherein the order preserving regression has strong function adaptability, and the local weighted regression can smooth a function curve on the premise of keeping the original shape of the function, so that break points and platforms in the piecewise function are removed, data offset is effectively corrected, and the accuracy of data prediction is improved. The data correction method in the disclosure not only can improve the accuracy of the predicted click rate, but also can correct most problems with data offset because the method has the characteristics of adaptability to any function shape and smoothness and removes the limitations of classical methods in different aspects.

The following describes embodiments of the apparatus of the present disclosure that may be used to perform the data modification methods described above in the present disclosure. For details not disclosed in the embodiments of the apparatus of the present disclosure, please refer to the embodiments of the data correction method described above in the present disclosure.

Fig. 9 schematically illustrates a block diagram of a data correction device according to one embodiment of the present disclosure. As shown in fig. 9, the data correction device 900 includes at least:

a data acquisition module 901, configured to acquire a target value and a first predicted value corresponding to the target value, where a first offset exists between a distribution of the first predicted value and a distribution of the target value;

A sequence preserving fitting module 902, configured to perform sequence preserving fitting according to the target value and the first predicted value, so as to obtain a first fitting function, and obtain a second predicted value according to the first fitting function, where a second offset exists between a distribution of the second predicted value and a distribution of the target value, and an amplitude of the second offset is smaller than an amplitude of the first offset;

The smoothing fitting module 903 is configured to perform smoothing fitting according to the target value and the second predicted value, so as to obtain a second fitting function, and obtain a target predicted value according to the second fitting function, where a third offset exists between the distribution of the target predicted value and the distribution of the target value, and an amplitude of the third offset is smaller than an amplitude of the second offset.

The specific details of each module in the above data correction device are described in detail in the corresponding data correction method, so that the details are not repeated here.

It should be noted that although in the above detailed description several modules or units of a device for action execution are mentioned, such a division is not mandatory. Indeed, the features and functionality of two or more modules or units described above may be embodied in one module or unit in accordance with embodiments of the present disclosure. Conversely, the features and functions of one module or unit described above may be further divided into a plurality of modules or units to be embodied.

Furthermore, although the steps of the methods in the present disclosure are depicted in a particular order in the drawings, this does not require or imply that the steps must be performed in that particular order, or that all illustrated steps be performed, to achieve desirable results. Additionally or alternatively, certain steps may be omitted, multiple steps combined into one step to perform, and/or one step decomposed into multiple steps to perform, etc.

From the above description of embodiments, those skilled in the art will readily appreciate that the example embodiments described herein may be implemented in software, or may be implemented in software in combination with the necessary hardware. Thus, the technical solution according to the embodiments of the present disclosure may be embodied in the form of a software product, which may be stored in a non-volatile storage medium (may be a CD-ROM, a U-disk, a mobile hard disk, etc.) or on a network, including several instructions to cause a computing device (may be a personal computer, a server, a mobile terminal, or a network device, etc.) to perform the method according to the embodiments of the present disclosure.

In an exemplary embodiment of the present disclosure, an electronic device capable of implementing the above method is also provided.

Those skilled in the art will appreciate that the various aspects of the present disclosure may be implemented as a system, method, or program product. Accordingly, various aspects of the disclosure may be embodied in the following forms, namely: an entirely hardware embodiment, an entirely software embodiment (including firmware, micro-code, etc.) or an embodiment combining hardware and software aspects may be referred to herein as a "circuit," module "or" system.

An electronic device 1000 according to such an embodiment of the present disclosure is described below with reference to fig. 10. The electronic device 1000 shown in fig. 10 is merely an example and should not be construed as limiting the functionality and scope of use of the disclosed embodiments.

As shown in fig. 10, the electronic device 1000 is embodied in the form of a general purpose computing device. Components of electronic device 1000 may include, but are not limited to: the at least one processing unit 1010, the at least one memory unit 1020, and a bus 1030 that connects the various system components, including the memory unit 1020 and the processing unit 1010.

Wherein the storage unit stores program code that is executable by the processing unit 1010 such that the processing unit 1010 performs steps according to various exemplary embodiments of the present disclosure described in the section "detailed description of the invention" above. For example, the processing unit 1010 may perform step S110 as shown in fig. 1: acquiring a target value and a first predicted value corresponding to the target value, wherein a first offset exists between the distribution of the first predicted value and the distribution of the target value; step S120: performing order-preserving fitting according to the target value and the first predicted value to obtain a first fitting function, and obtaining a second predicted value according to the first fitting function, wherein a second offset exists between the distribution of the second predicted value and the distribution of the target value, and the amplitude of the second offset is smaller than that of the first offset; step S130: and carrying out smooth fitting according to the target value and the second predicted value to obtain a second fitting function, and obtaining a target predicted value according to the second fitting function, wherein a third offset exists between the distribution of the target predicted value and the distribution of the target value, and the amplitude of the third offset is smaller than that of the second offset.

The memory unit 1020 may include readable media in the form of volatile memory units such as Random Access Memory (RAM) 10201 and/or cache memory unit 10202, and may further include Read Only Memory (ROM) 10203.

The storage unit 1020 may also include a program/utility 10204 having a set (at least one) of program modules 10205, such program modules 10205 including, but not limited to: an operating system, one or more application programs, other program modules, and program data, each or some combination of which may include an implementation of a network environment.

Bus 1030 may be representing one or more of several types of bus structures including a memory unit bus or memory unit controller, a peripheral bus, an accelerated graphics port, a processing unit, or a local bus using any of a variety of bus architectures.

The electronic device 1000 can also communicate with one or more external devices 1500 (e.g., keyboard, pointing device, bluetooth device, etc.), with one or more devices that enable a user to interact with the electronic device 1000, and/or with any device (e.g., router, modem, etc.) that enables the electronic device 1000 to communicate with one or more other computing devices. Such communication may occur through an input/output (I/O) interface 1050. Also, electronic device 1000 can communicate with one or more networks such as a Local Area Network (LAN), a Wide Area Network (WAN), and/or a public network, such as the Internet, through network adapter 1060. As shown, the network adapter 1060 communicates with other modules of the electronic device 1000 over the bus 1030. It should be appreciated that although not shown, other hardware and/or software modules may be used in connection with the electronic device 1000, including, but not limited to: microcode, device drivers, redundant processing units, external disk drive arrays, RAID systems, tape drives, data backup storage systems, and the like.

From the above description of embodiments, those skilled in the art will readily appreciate that the example embodiments described herein may be implemented in software, or may be implemented in software in combination with the necessary hardware. Thus, the technical solution according to the embodiments of the present disclosure may be embodied in the form of a software product, which may be stored in a non-volatile storage medium (may be a CD-ROM, a U-disk, a mobile hard disk, etc.) or on a network, including several instructions to cause a computing device (may be a personal computer, a server, a terminal device, or a network device, etc.) to perform the method according to the embodiments of the present disclosure.

In an exemplary embodiment of the present disclosure, there is also provided a computer storage medium having stored thereon a program product capable of implementing the method described above in the present specification. In some possible implementations, various aspects of the disclosure may also be implemented in the form of a program product comprising program code for causing a terminal device to carry out the steps according to the various exemplary embodiments of the disclosure as described in the "exemplary methods" section of this specification, when the program product is run on the terminal device.

Referring to fig. 11, a program product 1100 for implementing the above-described method according to an embodiment of the present disclosure is described, which may employ a portable compact disc read-only memory (CD-ROM) and include program code, and may be run on a terminal device, such as a personal computer. However, the program product of the present disclosure is not limited thereto, and in this document, a readable storage medium may be any tangible medium that can contain, or store a program for use by or in connection with an instruction execution system, apparatus, or device.

The program product may employ any combination of one or more readable media. The readable medium may be a readable signal medium or a readable storage medium. The readable storage medium can be, for example, but is not limited to, an electronic, magnetic, optical, electromagnetic, infrared, or semiconductor system, apparatus, or device, or a combination of any of the foregoing. More specific examples (a non-exhaustive list) of the readable storage medium would include the following: an electrical connection having one or more wires, a portable disk, a hard disk, random Access Memory (RAM), read-only memory (ROM), erasable programmable read-only memory (EPROM or flash memory), optical fiber, portable compact disk read-only memory (CD-ROM), an optical storage device, a magnetic storage device, or any suitable combination of the foregoing.

The computer readable signal medium may include a data signal propagated in baseband or as part of a carrier wave with readable program code embodied therein. Such a propagated data signal may take any of a variety of forms, including, but not limited to, electro-magnetic, optical, or any suitable combination of the foregoing. A readable signal medium may also be any readable medium that is not a readable storage medium and that can communicate, propagate, or transport a program for use by or in connection with an instruction execution system, apparatus, or device.

Program code embodied on a readable medium may be transmitted using any appropriate medium, including but not limited to wireless, wireline, optical fiber cable, RF, etc., or any suitable combination of the foregoing.

Program code for carrying out operations of the present disclosure may be written in any combination of one or more programming languages, including an object oriented programming language such as Java, C++ or the like and conventional procedural programming languages, such as the "C" programming language or similar programming languages. The program code may execute entirely on the user's computing device, partly on the user's device, as a stand-alone software package, partly on the user's computing device, partly on a remote computing device, or entirely on the remote computing device or server. In the case of remote computing devices, the remote computing device may be connected to the user computing device through any kind of network, including a Local Area Network (LAN) or a Wide Area Network (WAN), or may be connected to an external computing device (e.g., connected via the Internet using an Internet service provider).

Furthermore, the above-described figures are only schematic illustrations of processes included in the method according to the exemplary embodiments of the present disclosure, and are not intended to be limiting. It will be readily appreciated that the processes shown in the above figures do not indicate or limit the temporal order of these processes. In addition, it is also readily understood that these processes may be performed synchronously or asynchronously, for example, among a plurality of modules.

Other embodiments of the disclosure will be apparent to those skilled in the art from consideration of the specification and practice of the disclosure disclosed herein. This disclosure is intended to cover any adaptations, uses, or adaptations of the disclosure following the general principles of the disclosure and including such departures from the present disclosure as come within known or customary practice within the art to which the disclosure pertains. It is intended that the specification and examples be considered as exemplary only, with a true scope and spirit of the disclosure being indicated by the following claims.

Claims

1. A method of data modification, the method comprising:

acquiring a target value and a first predicted value corresponding to the target value, wherein a first offset exists between the distribution of the first predicted value and the distribution of the target value; the target value is the real click rate of the target object, and the first predicted value is the initial predicted click rate of the target object;

and carrying out smooth fitting according to the target value and the second predicted value to obtain a second fitting function, and obtaining a target predicted value according to the second fitting function, wherein a third offset exists between the distribution of the target predicted value and the distribution of the target value, the amplitude of the third offset is smaller than that of the second offset, and the target predicted value is the predicted click rate after data correction.

2. The data correction method according to claim 1, wherein said performing order-preserving fitting based on said target value and said first predicted value to obtain a first fitting function comprises:

3. The data correction method according to claim 2, wherein the performing piecewise positive sequence fitting on the target value and the first predicted value by the first regression model to obtain the first fitting function includes:

4. The data correction method according to claim 3, wherein the performing a piecewise positive sequence fitting according to the trend of the change in each coordinate point to obtain the first fitting function includes:

5. The data correction method according to claim 4, wherein the acquiring the nth 'coordinate point and the nth' +1th coordinate point from the ordinate of the nth coordinate point and the ordinate of the (n+1) th coordinate point includes:

6. The method of claim 5, wherein the obtaining a second predicted value according to the first fitting function comprises:

7. The data correction method according to claim 1, wherein said performing smooth fitting based on said target value and said second predicted value to obtain a second fitting function comprises:

8. The data correction method according to claim 7, wherein said smoothing fitting the target value and the second predicted value by the second regression model to obtain the second fitting function comprises:

9. The method for correcting data according to claim 8, wherein the obtaining the coordinate point to be smoothed within a predetermined range from the target coordinate point by using any one of the coordinate points to be smoothed as the target coordinate point includes:

calculating the weight corresponding to each second predicted value;

10. The method for correcting data according to claim 8, wherein the obtaining the coordinate point to be smoothed within a predetermined range from the target coordinate point by using any one of the coordinate points to be smoothed as the target coordinate point includes:

calculating the weight corresponding to each second predicted value;

11. The data correction method according to claim 9 or 10, wherein the calculating the weight corresponding to each of the second predicted values includes:

12. The method for correcting data according to claim 11, wherein the performing linear regression on the target coordinate point and the coordinate point to be smoothed within the preset range to obtain the second fitting function includes:

13. The data correction method according to claim 11, wherein obtaining a target predicted value from the second fitting function includes:

14. A data correction device, comprising:

The data acquisition module is used for acquiring a target value and a first predicted value corresponding to the target value, wherein a first offset exists between the distribution of the first predicted value and the distribution of the target value; the target value is the real click rate of the target object, and the first predicted value is the initial predicted click rate of the target object;

And the smooth fitting module is used for carrying out smooth fitting according to the target value and the second predicted value to obtain a second fitting function, and obtaining a target predicted value according to the second fitting function, wherein a third offset exists between the distribution of the target predicted value and the distribution of the target value, the amplitude of the third offset is smaller than that of the second offset, and the target predicted value is the predicted click rate after data correction.

15. A computer storage medium having a computer program stored thereon, wherein the computer program, when executed by a processor, implements the data correction method of any of claims 1 to 13.

16. An electronic device, comprising:

a processor; and

A memory for storing executable instructions of the processor;

wherein the processor is configured to perform the data correction method of any one of claims 1to 13 via execution of the executable instructions.