CN117290691A

CN117290691A - Unbiased Top-K evaluation method, unbiased Top-K evaluation device and unbiased Top-K evaluation equipment based on unbiased recommendation model

Info

Publication number: CN117290691A
Application number: CN202311551487.8A
Authority: CN
Inventors: 冯福利; 王城冰; 石文焘; 张及之; 王文杰; 何向南
Original assignee: Data Space Research Institute
Current assignee: Data Space Research Institute
Priority date: 2023-11-21
Filing date: 2023-11-21
Publication date: 2023-12-26

Abstract

The application relates to an unbiased Top-K evaluation method, device and equipment based on a unbiased recommendation model, wherein the unbiased Top-K evaluation method based on the unbiased recommendation model comprises the following steps: obtaining a target full exposure entity set, and predicting the evaluation of a target user on each sample entity in the target full exposure entity set through a to-be-detected depolarization recommendation model to obtain the prediction score of each sample entity in the target full exposure entity set; and acquiring a random exposure entity set and a true score of the target user on each sample entity in the random exposure entity set, wherein the target full exposure entity set comprises the sample entities in the random exposure entity set. According to the method and the device, the accuracy of unbiased recommendation performance evaluation of the unbiased recommendation model to be tested on the target full exposure entity set is improved, and the problem that the test accuracy of an unbiased Top-K evaluation method in the prior art is low when the K value is small is solved.

Description

Unbiased Top-K evaluation method, unbiased Top-K evaluation device and unbiased Top-K evaluation equipment based on unbiased recommendation model

Technical Field

The application relates to the technical field of model evaluation, in particular to an unbiased Top-K evaluation method, device and equipment based on a unbiased recommendation model.

Background

In industry recommendation systems, there are various deviations in the recommendation model, such as selection deviations and popularity deviations, which can cause the recommended goods to be seriously inconsistent with the actual preferences of the users and produce undesirable revenue results in the platform online service. For example, these deviations may amplify long tail effects, reducing user satisfaction and confidence. Ideally, the actual user preferences may be measured by the actual feedback (e.g., scoring) of all items by the user on the recommendation platform. Therefore, the gold standard for evaluating the depolarization method is to use a full exposure dataset (in which all items are exposed to the user) to evaluate the recommended performance of model Top-K.

In reality, it is very difficult to acquire a full exposure data set, so researchers widely use the conventional evaluation method, i.e., evaluate the depolarization method by calculating the Top-K index using a random exposure data set. The random exposure dataset contains only the user's scoring of randomly selected items, i.e., items in the full exposure dataset are very sparsely randomly sampled.

Top-K evaluation (e.g., recall@5) on random exposure dataset and full exposure datasetThere is a weaker correlation between the smaller value Top-K indicators (e.g. Recall@30), however, with +.>There is a strong correlation (e.g., recall @ 1000) between the larger Top-K indices, which results in a performance order of the model on the random exposure dataset and +/on the full exposure dataset>The order of merit is inconsistent when small, and this inconsistency indicates that conventional evaluation methods on random exposure datasets are not convincing. And in practice the user is usually only able to browse a limited number of items in the recommendation list, thus obtaining a smaller +_ on the full exposure dataset>Value recall @>To evaluate the performance of the depolarization model is critical.

In view of the above analysis, we have further found that the evaluation index on the random exposure dataset cannot be used to evaluate whether a depolarization model approach is truly effective in alleviating the bias. In conventional unbiased testing, the performance improvement resulting from the unbiased approach can be attributed to the optimization of the approach over the full exposure dataThe Top-K index at larger values, in turn, shows high performance on the random exposure dataset without true depolarization. For example, theoretically optimize the full exposure data +.>The method for the difficult negative sampling of the Top-K index with larger value is superior to a representative depolarization method in the traditional unbiased test. Thus, the +.>An unbiased estimate of the smaller value Top-K performance is crucial.

Aiming at the problem that the test accuracy of an unbiased Top-K evaluation method in the related art is lower when the K value is smaller, no effective solution is proposed at present.

Disclosure of Invention

The embodiment provides an unbiased Top-K evaluation method, device and equipment based on a unbiased recommendation model, so as to solve the problem that the unbiased Top-K evaluation method in the related art has lower test accuracy when the K value is smaller.

In a first aspect, the invention provides an unbiased Top-K evaluation method based on a unbiased recommendation model, the method comprising:

obtaining a target full exposure entity set, and predicting the evaluation of a target user on each sample entity in the target full exposure entity set through a to-be-detected depolarization recommendation model to obtain the prediction score of each sample entity in the target full exposure entity set;

acquiring a random exposure entity set and a real score of the target user on each sample entity in the random exposure entity set, wherein the target full exposure entity set comprises the sample entities in the random exposure entity set;

determining the ratio between positive sample entities with the true score higher than a preset score and all positive sample entities in the random exposure entity set, and determining the recall rate evaluation index of the unbiased recommendation model to be tested for the target user according to the ratio;

and determining unbiased recommendation performance indexes of the unbiased recommendation model to be tested according to recall rate evaluation indexes of the unbiased recommendation model to be tested for a plurality of target users.

In some of these embodiments, the obtaining a set of randomly exposed entities and the target user's true score for each sample entity in the set of randomly exposed entities comprises:

and randomly sampling the target full exposure entity set, determining a random exposure entity set, and determining the true score of each sample entity in the random exposure entity set according to the evaluation of the target user on each sample entity in the random exposure entity set.

In some of these embodiments, the method further comprises, prior to randomly sampling the target set of full exposure entities:

and sequencing all sample entities in the target full-exposure entity set according to the prediction score of each sample entity in the target full-exposure entity set.

In some of these embodiments, before determining the ratio between the positive sample entity with the true score higher than the preset score and all positive sample entities in the set of random exposure entities, the method further includes:

and removing negative sample entities in the random exposure entity set according to the true scores of the target users on the sample entities in the random exposure entity set.

In some of these embodiments, the method further comprises:

and acquiring a preset K value, and determining the predicted score of the Kth sample entity in the target full exposure entity set as the preset score.

In some of these embodiments, the K value is less than or equal to 50.

In some embodiments, the determining, according to the ratio, a recall rate evaluation index of the to-be-measured depolarization recommendation model for the target user includes:

and determining the ratio as a recall rate evaluation index of the to-be-detected depolarization recommendation model for the target user.

In some embodiments, the determining, according to recall rate evaluation indexes of the to-be-measured depolarization recommendation model for a plurality of target users, an unbiased recommendation performance index of the to-be-measured depolarization recommendation model includes:

and counting the average value of recall rate indexes of the to-be-measured depolarization recommendation model aiming at a plurality of target users, and determining the unbiased recommendation performance of the to-be-measured depolarization recommendation model according to the average value.

In a second aspect, the present invention provides an unbiased Top-K evaluation apparatus based on a unbiased recommendation model, including:

the prediction module is used for obtaining a target full exposure entity set, predicting the evaluation of a target user on each sample entity in the target full exposure entity set through a to-be-detected depolarization recommendation model, and obtaining the prediction score of each sample entity in the target full exposure entity set;

the acquisition module is used for acquiring a random exposure entity set and the true score of the target user on each sample entity in the random exposure entity set, and the target full exposure entity set comprises the sample entities in the random exposure entity set;

the evaluation module is used for determining the ratio between the positive sample entity with the real score higher than the preset score and all the positive sample entities in the random exposure entity set, and determining the recall rate evaluation index of the to-be-detected depolarization recommendation model for the target user according to the ratio;

the determining module is used for determining unbiased recommendation performance indexes of the unbiased recommendation model to be detected according to recall rate evaluation indexes of the unbiased recommendation model to be detected for a plurality of target users.

In a third aspect, the present invention provides an electronic device, including a memory, a processor, and a computer program stored in the memory and capable of running on the processor, where the processor implements the unbiased Top-K evaluation method based on the unbiased recommendation model according to the first aspect when executing the computer program.

In a fourth aspect, the present invention provides a storage medium having stored thereon a computer program which, when executed by a processor, implements the unbiased Top-K assessment method based on a unbiased recommendation model as described in the first aspect above.

Compared with the related art, the unbiased Top-K evaluation method based on the unbiased recommendation model provided by the invention acquires the random exposure entity set based on the target full exposure entity set. And then, unbiased estimation is performed on recall rate evaluation indexes of the unbiased recommendation model to be measured on the target user on the target full exposure entity set according to the ratio between the positive sample entity with the real score higher than the preset score in the random exposure entity set and all the positive sample entities, so that the accuracy of unbiased recommendation performance evaluation of the unbiased recommendation model to be measured on the target full exposure entity set is improved, and the problem of lower test accuracy of an unbiased Top-K evaluation method in the prior related art is solved. Especially when the K value is smaller, the evaluation result of the unbiased recommendation performance of the unbiased recommendation model by the evaluation method provided by the invention is better than the accuracy of the existing evaluation method.

The details of one or more embodiments of the application are set forth in the accompanying drawings and the description below to provide a more thorough understanding of the other features, objects, and advantages of the application.

Drawings

The accompanying drawings, which are included to provide a further understanding of the application and are incorporated in and constitute a part of this application, illustrate embodiments of the application and together with the description serve to explain the application and do not constitute an undue limitation to the application. In the drawings:

FIG. 1 is a block diagram of a terminal hardware structure for executing an unbiased Top-K evaluation method based on a unbiased recommendation model provided in the present invention;

FIG. 2 is a flow chart of an unbiased Top-K assessment method based on a unbiased recommendation model of the present invention;

FIG. 3 is a schematic diagram of an unbiased Top-K assessment scheme based on a unbiased recommendation model of the present invention;

FIG. 4 is a schematic diagram of predicting recall evaluation metrics for a target set of fully-exposed entities from recall evaluation metrics for a set of randomly exposed entities in one embodiment;

FIG. 5 is a graph of Recall@on a full exposure dataset for different models using a conventional evaluation methodSequencing schematic diagrams of the index and the recall@5 index on the random exposure data set;

FIG. 6 is an AutoDebias model in different casesUnbiased recommended performance index measurement value +.>And true recall @ o>A comparison graph of the results of the performance;

FIG. 7 is a graph showing the contrast of Kendell levels of various recall evaluation indicators at different K values;

FIG. 8 is a Recall@And->At different +.>Schematic representation of Kendall-level correlation coefficients in values;

fig. 9 is a block diagram of an unbiased Top-K evaluation apparatus based on a unbiased recommendation model according to the present invention.

Detailed Description

For a clearer understanding of the objects, technical solutions and advantages of the present application, the present application is described and illustrated below with reference to the accompanying drawings and examples.

Unless defined otherwise, technical or scientific terms used herein shall have the same meaning as commonly understood by one of ordinary skill in the art to which this application belongs. The terms "a," "an," "the," "these," and the like in this application are not intended to be limiting in number, but rather are singular or plural. The terms "comprising," "including," "having," and any variations thereof, as used in the present application, are intended to cover a non-exclusive inclusion; for example, a process, method, and system, article, or apparatus that comprises a list of steps or modules (units) is not limited to the list of steps or modules (units), but may include other steps or modules (units) not listed or inherent to such process, method, article, or apparatus. The terms "connected," "coupled," and the like in this application are not limited to physical or mechanical connections, but may include electrical connections, whether direct or indirect. Reference to "a plurality" in this application means two or more. "and/or" describes an association relationship of an association object, meaning that there may be three relationships, e.g., "a and/or B" may mean: a exists alone, A and B exist together, and B exists alone. Typically, the character "/" indicates that the associated object is an "or" relationship. The terms "first," "second," "third," and the like, as referred to in this application, merely distinguish similar objects and do not represent a particular ordering of objects.

The method embodiments provided in the present invention may be performed in a terminal, a computer or similar computing device. For example, the method runs on a terminal, and fig. 1 is a block diagram of a terminal hardware structure for executing an unbiased Top-K evaluation method based on a unbiased recommendation model provided in the present invention. As shown in fig. 1, the terminal may include one or more (only one is shown in fig. 1) processors 120 and a memory 140 for storing data, wherein the processors 120 may include, but are not limited to, a processing device such as a microprocessor MCU or a programmable logic device FPGA. The terminal may further include a transmission device 160 for a communication function and an input-output device 180. It will be appreciated by those skilled in the art that the structure shown in fig. 1 is merely illustrative and is not intended to limit the structure of the terminal. For example, the terminal may also include more or fewer components than shown in fig. 2, or have a different configuration than shown in fig. 1.

The memory 140 may be used to store a computer program, for example, a software program of application software and a module, such as a computer program corresponding to an unbiased Top-K evaluation method based on a unbiased recommendation model in the present invention, and the processor 120 performs various functional applications and data processing by running the computer program stored in the memory 140, that is, implements the above-described method. Memory 140 may include high-speed random access memory, and may also include non-volatile memory, such as one or more magnetic storage devices, flash memory, or other non-volatile solid-state memory. In some examples, memory 140 may further include memory located remotely from processor 120, which may be connected to the terminal via a network. Examples of such networks include, but are not limited to, the internet, intranets, local area networks, mobile communication networks, and combinations thereof.

The transmission device 160 is used to receive or transmit data via a network. The network includes a wireless network provided by a communication provider of the terminal. In one example, the transmission device 160 includes a network adapter (Network Interface Controller, simply referred to as NIC) that may be connected to other network devices via a base station to communicate with the internet. In one example, the transmission device 160 may be a Radio Frequency (RF) module for communicating with the internet wirelessly.

In the present invention, an unbiased Top-K evaluation method based on a unbiased recommendation model is provided, fig. 2 is a flowchart of the unbiased Top-K evaluation method based on a unbiased recommendation model according to the present invention, as shown in fig. 2, the flowchart includes the following steps:

step S201, a target full exposure entity set is obtained, the evaluation of a target user on each sample entity in the target full exposure entity set is predicted through a to-be-detected depolarization recommendation model, and the prediction score of each sample entity in the target full exposure entity set is obtained.

Step S202, obtaining a random exposure entity set and a true score of a target user for each sample entity in the random exposure entity set, wherein the target full exposure entity set comprises the sample entities in the random exposure entity set.

And step S203, determining the ratio between the positive sample entity with the true score higher than the preset score and all the positive sample entities in the random exposure entity set, and determining the recall rate evaluation index of the unbiased recommendation model to be tested for the target user according to the ratio.

Step S204, determining unbiased recommendation performance indexes of the unbiased recommendation model to be tested according to recall rate evaluation indexes of the unbiased recommendation model to be tested for a plurality of target users.

FIG. 3 is a schematic diagram of an unbiased Top-K evaluation scheme (UT-Eval scheme) based on a unbiased recommendation model according to the present invention, as shown in FIG. 3, for a target full exposure entity set, the Recall evaluation index (Recall@on the target full exposure entity set can be measured by the ratio of the target user to the positive sample entities with a true score higher than a preset score to all positive sample entities) Wherein the predetermined score may be a predetermined score value, or may be one of the sample entities in the target full exposure entity setPrediction score of the volume. Since the set of randomly exposed entities corresponds to a subset of the set of target fully exposed entities, both the positive and negative sample entities in the set of randomly exposed entities are randomly missing. Thus, the positive sample entity proportion +.A.true score higher than the preset score in the random exposure entity set +.>Can be used to unbiased estimate +.> 。

For example, a target full exposure entity set (full exposure data) is obtained first, and a total of 12 sample entities in the target full exposure entity set are obtained, and the evaluation of the target user on each sample entity in the target full exposure entity set is predicted through a to-be-detected unbiased recommendation model, so as to obtain the prediction score of each sample entity. Then, a random exposure entity set (random exposure data) containing 4 sample entities is obtained based on the target full exposure entity set. In the random exposure entity set, the number of positive sample entities with the true score higher than the preset score is 1, the number of all positive sample entities in the random exposure entity set is 2, and the ratio between the positive sample entities with the true score higher than the preset score and all positive sample entities isDetermining recall rate evaluation index (/ -) of to-be-tested depolarization recommendation model on target user through the ratio>). And obtaining recall rate evaluation indexes of the to-be-measured depolarization recommendation model on different target users according to the operation, and further determining unbiased recommendation performance of the to-be-measured depolarization recommendation model.

Because of the actual scene, it is difficult to obtain the actual scores of the target user for all sample entities in the target full exposure entity set. In order to unbiased recommended performance index (Top-K) of the unbiased recommended model to be tested on the target full exposure entity set, the unbiased recommended performance evaluation method based on the random exposure entity set obtained by the target full exposure entity set in the invention can be used for unbiased predicting the recall rate evaluation index of the unbiased recommended model to be tested on the target full exposure entity set for the target user according to the ratio between the positive sample entity with the true score higher than the preset score in the random exposure entity set and all the positive sample entities, so that the measurement accuracy of the recall rate index of the unbiased recommended model to be tested on the target full exposure entity set is improved, namely the accuracy of unbiased recommended performance evaluation of the unbiased recommended model to be tested on the target full exposure entity set is improved, and the problem that the unbiased Top-K evaluation method particularly has lower test accuracy when the K value is smaller in the prior related technology is solved.

In some of these embodiments, step S202, obtaining the set of randomly exposed entities and the true score of the target user for each sample entity in the set of randomly exposed entities includes: and randomly sampling the target full-exposure entity set, determining the random exposure entity set, and determining the true score of each sample entity in the random exposure entity set according to the evaluation of the target user on each sample entity in the random exposure entity set. Illustratively, in fig. 3, there are 12 sample entities in the target full exposure entity set, and the target full exposure entity set is randomly sampled to extract 4 sample entities, which constitute a random exposure entity set. The true scores of the 4 sample entities are then determined based on the target user's evaluation of the 4 sample entities.

Further, the sample entities in the target full exposure entity set may not be ordered prior to randomly sampling the target full exposure entity set. To more quickly count the number of positive sample entities in the subsequent process, in some embodiments, prior to randomly sampling the target set of full exposure entities, the method further comprises: and sequencing all sample entities in the target full exposure entity set according to the prediction score of each sample entity in the target full exposure entity set. After the sample entities in the target full exposure entity set are sequenced, the random exposure entity set is determined, so that the number of positive sample entities with the true scores higher than the preset scores in the random exposure entity set can be determined conveniently.

In some embodiments, step S203, before determining the ratio between the positive sample entity with the true score higher than the preset score and all the positive sample entities in the set of random exposure entities, further includes: and removing negative sample entities in the random exposure entity set according to the true scores of the target users on the sample entities in the random exposure entity set.

Fig. 4 is a schematic diagram showing the prediction of recall rate evaluation index of a target full exposure entity set by random recall rate evaluation index of the target full exposure entity set according to the present embodiment, wherein Rank represents the target full exposure entity set, rank _u + represents the set of all positive sample entities in the target full exposure entity set,the method comprises the steps of representing a random exposure entity set after negative sample entities are removed, representing positive sample entities by an icon with a "+" mark, representing negative sample entities by an icon with a "-" mark, kid representing id of a Kth sample entity in a target full exposure entity set, N representing the number of all positive sample entities in the target full exposure entity set, M representing the number of positive sample entities with true scores higher than a preset score in the target full exposure entity set, N representing the number of all positive sample entities in the random exposure entity set, and M representing the number of positive sample entities with true scores higher than the preset score in the random exposure entity set. As shown in FIG. 4, due to Rank _u +set sum->Only positive sample entities are included in the set, so theoretically +>The collection can be thought of as being made by Rank _u The + sets were randomly sampled. In the target full exposure entity set, positive sample entities exist objectively but cannot be known completely, namely recall rate evaluation indexes of the target full exposure entity set cannot be obtained directly. Therefore, the recall rate evaluation index of the target full-exposure entity set is estimated by adopting the recall rate evaluation index of the random exposure entity set which can completely know the positive sample entity. And atIn this embodiment, the negative sample entities in the random exposure entity set are all removed, so that the ratio between the positive sample entity with the true score higher than the preset score in the random exposure entity set and all the positive sample entities in the random exposure entity set can be counted better.

The preset score may be a preset score value, in some embodiments of which the method further comprises: and acquiring a preset K value, and determining the predictive score of the Kth sample entity in the target full exposure entity set as a preset score. Compared with the direct setting of the preset score, the method and the device adopt the predicted score of one sample entity in the target full exposure entity set as the preset score, so that the preset score is more in line with the current test scene, and the data setting is more true. As shown in FIG. 5, FIG. 5 is a graph of the recall@on the full exposure dataset for different models using a conventional evaluation methodThe ranking of the index and the recall@5 index on the random exposure dataset is schematically shown. As can be seen from fig. 5, in the conventional evaluation method, when the K value is large, the consistency of the evaluation result and the real result is also high, and when the K value is small, the evaluation result and the real result are different greatly. In this embodiment, the K value is less than or equal to 50. Under the condition of smaller K value, the evaluation result obtained by the evaluation method provided by the invention can still keep higher accuracy.

In some embodiments, step S203, determining, according to the ratio, a recall rate evaluation index of the to-be-measured depolarization recommendation model for the target user includes: and determining the ratio as a recall rate evaluation index of the depolarization recommendation model to be tested for the target user. The above only provides a scheme for determining the recall rate evaluation index according to the ratio, and the ratio can not be directly used as the recall rate evaluation index in actual application.

In some embodiments, step S204, determining, according to recall rate evaluation indexes of the to-be-measured depolarization recommendation model for the plurality of target users, an unbiased recommendation performance index of the to-be-measured depolarization recommendation model includes: and (3) counting the average value of recall rate indexes of the to-be-measured depolarization recommendation model aiming at a plurality of target users, and determining the unbiased recommendation performance of the to-be-measured depolarization recommendation model according to the average value. The above content only provides a scheme for determining the unbiased recommendation performance index of the unbiased recommendation model to be measured, and other parameters (such as a median and the like) can be used as the basis for determining the unbiased recommendation performance index of the unbiased recommendation model to be measured in actual application.

In summary, the present invention obtains a set of randomly exposed entities based on a set of target fully exposed entities. And then unbiased predicting recall rate evaluation indexes of the unbiased recommendation model to be measured on the target full-exposure entity set aiming at the target user according to the ratio between the positive sample entity with the true score higher than the preset score in the random exposure entity set and all the positive sample entities, so that the measurement accuracy of the recall rate indexes of the unbiased recommendation model to be measured on the target full-exposure entity set is improved, and the problem of lower test accuracy of an unbiased Top-K evaluation method in the prior related art is solved. Particularly, in a scene with a smaller K value, the accuracy of the result of evaluating the unbiased recommendation performance index of the unbiased recommendation model is more outstanding.

In order to verify the accuracy of unbiased recommendation performance index evaluation of the unbiased recommendation model to be tested under different K values by the method, the AutoDebias model is trained by using a KuaiRec data set, and the AutoDebias model is provided under different K valuesUnbiased recommended performance index measurement value +.>And true recall @ o>Results of the performance. As shown in FIG. 6, FIG. 6 shows the AutoDebias model at different +.>Unbiased recommended performance index measurement value +.>And true recall @ o>Results of the properties are compared with one another, in which a curve is the measured value +.>The b curve is true recall @ and +.>Is a curve of (2). As can be seen from FIG. 6, the invention can more accurately estimate unbiased recommended performance indexes of different K values of the unbiased recommended model to be measured on the full exposure data set.

Further, in order to research the recall rate evaluation index obtained by the evaluation method of the invention on the random exposure data set) And recall rate evaluation index (recall @ for) of the arrival of the conventional evaluation method on the random exposure dataset>) Recall evaluation index (recall @ and +_) on the target full exposure dataset>) The differences between them, some hyper-parametric combinations were randomly chosen, and the AutoDebias model was trained on the KuaiRec dataset. Then, calculating Recall@ +.>And->Recall @ is->And a Kendall scale (Kendall scale) correlation coefficient therebetween to exhibit their rank consistency. As shown in FIG. 7, FIG. 7 is a diagram showing the comparison of Kendell levels of the respective recall evaluation indexes at different K values, and as can be seen in FIG. 7, when +.>Smaller hours (e.g.)>=30), based on the conventional evaluation method, no matter recall@ +.>Or recall @ is->They are associated with Recall @ and->The correlation between the two is relatively poor and is less than 0.9. In contrast, the present invention evaluates +.>And Recall @ is->The Kendall correlation coefficient between them is greater than 0.9. Therefore, the evaluation method of the invention has more accurate and real evaluation results of unbiased recommendation performance of the unbiased recommendation model to be tested, and the unbiased recommendation model selected by the evaluation method of the invention is more likely to perform better on the full exposure data set.

As shown in FIG. 8, FIG. 8 is a view of Recall@And->At different +.>In fig. 8, a c curve is a Kendall level correlation coefficient when k=20, a d curve is a Kendall level correlation coefficient when k=50, and an e curve is a Kendall level correlation coefficient when k=200. It can be seen in fig. 8 that only +.>When the same is used，Recall@/>And->With the highest correlation coefficient. Therefore, in practical application, it is possible to estimate the given +.>Recall@ +.>The optimal depolarization recommendation model is conveniently selected according to actual requirements, namely the evaluation method provided by the invention has higher flexibility.

In actual use, the method can be applied to large-scale electronic commerce, short video and news recommendation platforms, and unbiased estimation is carried out on the real performance of the unbiased model on the full exposure data set. Determining according to recommended characteristics of platformThe scheme of the evaluation method provided by the invention can select the optimal unbiased recommendation model, thereby improving the satisfaction degree of users and leading the platform to obtain higher income. For example, the unbiased recommended performance indexes of different unbiased recommended models (Bias model, IPS model, DR model, autoDebias model, and NegS model) were evaluated using the present method (UT-Eval scheme) and the conventional evaluation method (Traditional scheme), and the results are shown in table 1:

table 1 unbiased recommendation performance index evaluation results table for different unbiased recommendation models

In Table 1, the NegS model is a base model (MF, lightGCN) optimized by a difficult-to-negative sampling strategy in combination with BCE loss, the optimization process to optimize on the target full exposure datasetLarger Recall@>And optimizing the index as a target.

As can be seen from Table 1, use is made ofTo verify the depolarization effect of the model, some depolarization methods perform even worse than biased MF, so we need to develop further depolarization techniques. When using a LightGCN backbone network, the AutoDebias model and NegS model are recall@ +.>0.5224 and 0.5394 respectively, is superior to 0.5044 of the Bias model. Under this index, the two models would be considered to have a depolarizing effect. However, the two modelsThe performance is significantly worse than the Bias model. This discrepancy indicates that the traditional evaluation method is unreliable in selecting the depolarization model. Based on UT-Eval scheme proposed by the invention>) The performance of the NegS model is not as good as expected, which means that the evaluation method proposed by the present invention has an advantage in selecting a truly effective depolarization method. Based onEvaluation index for larger and smaller +.>Value, there is inconsistency in the ranking results. Different and corresponding actual demandsThis means that in a practical application scenario different depolarization models should be developed in order to pursue different +.>The lower higher recall@ +.>And (5) an index.

It should be noted that the steps illustrated in the above-described flow or flow diagrams of the figures may be performed in a computer system, such as a set of computer-executable instructions, and that, although a logical order is illustrated in the flow diagrams, in some cases, the steps illustrated or described may be performed in an order other than that illustrated herein.

The invention also provides an unbiased Top-K evaluation device based on the unbiased recommendation model, which is used for realizing the above embodiment and the preferred implementation mode, and is not described again. The terms "module," "unit," "sub-unit," and the like as used below may refer to a combination of software and/or hardware that performs a predetermined function. While the means described in the following embodiments are preferably implemented in software, implementations in hardware, or a combination of software and hardware, are also possible and contemplated.

Fig. 9 is a block diagram of an unbiased Top-K evaluation apparatus based on a unbiased recommendation model according to the present invention, as shown in fig. 9, the apparatus including:

the prediction module 901 is configured to obtain a target full exposure entity set, predict an evaluation of a target user on each sample entity in the target full exposure entity set by using a depolarization recommendation model to be tested, and obtain a prediction score of each sample entity in the target full exposure entity set;

an obtaining module 902, configured to obtain a random exposure entity set and a true score of a target user for each sample entity in the random exposure entity set, where the target full exposure entity set includes sample entities in the random exposure entity set;

the evaluation module 903 is configured to determine, in the random exposure entity set, a ratio between a positive sample entity with a true score higher than a preset score and all positive sample entities, and determine, according to the ratio, a recall evaluation index of the to-be-tested depolarization recommendation model for the target user;

the determining module 904 is configured to determine, according to recall rate evaluation indexes of the to-be-measured depolarization recommendation model for a plurality of target users, an unbiased recommendation performance index of the to-be-measured depolarization recommendation model.

In the device, a random exposure entity set is acquired based on a target full exposure entity set. And then, unbiased estimation is performed on recall rate evaluation indexes of the unbiased recommendation model to be detected on the target full-exposure entity set aiming at the target user according to the ratio between the positive sample entity with the true score higher than the preset score in the random exposure entity set and all positive sample entities, so that the measurement accuracy of recall rate indexes of the unbiased recommendation model to be detected on the target full-exposure entity set is improved, namely, the accuracy of unbiased recommendation performance evaluation of the unbiased recommendation to be detected on the target full-exposure entity set is improved, and the problem that the unbiased Top-K evaluation method particularly has lower test accuracy when the K value is smaller in the prior related technology is solved.

The above-described respective modules may be functional modules or program modules, and may be implemented by software or hardware. For modules implemented in hardware, the various modules described above may be located in the same processor; or the above modules may be located in different processors in any combination.

There is also provided in the invention an electronic device comprising a memory in which a computer program is stored and a processor arranged to run the computer program to perform the steps of any of the method embodiments described above.

Optionally, the electronic device may further include a transmission device and an input/output device, where the transmission device is connected to the processor, and the input/output device is connected to the processor.

Alternatively, in one embodiment, the processor may be arranged to perform the following steps by a computer program:

s1, acquiring a target full exposure entity set, and predicting the evaluation of a target user on each sample entity in the target full exposure entity set through a to-be-detected depolarization recommendation model to obtain the prediction score of each sample entity in the target full exposure entity set.

S2, acquiring a random exposure entity set and the actual score of a target user on each sample entity in the random exposure entity set, wherein the target full exposure entity set comprises the sample entities in the random exposure entity set.

S3, determining the ratio between the positive sample entity with the real score higher than the preset score and all the positive sample entities in the random exposure entity set, and determining the recall rate evaluation index of the unbiased recommendation model to be tested for the target user according to the ratio.

S4, determining unbiased recommendation performance indexes of the unbiased recommendation model to be tested according to recall rate evaluation indexes of the unbiased recommendation model to be tested for a plurality of target users.

It should be noted that, the specific examples of the present electronic device may refer to examples described in the embodiments and the optional implementations of the method, and are not described in detail in this embodiment.

In addition, in combination with the unbiased Top-K evaluation device method based on the unbiased recommendation model provided by the invention, a storage medium can be provided for implementation in the invention. The storage medium has a computer program stored thereon; the computer program, when executed by a processor, implements any of the unbiased Top-K assessment means methods according to the embodiments described above, based on a unbiased recommendation model.

It should be understood that the specific embodiments described herein are merely illustrative of this application and are not intended to be limiting. All other embodiments, which can be made by one of ordinary skill in the art without undue burden from the present application, are within the scope of the present application in light of the embodiments provided herein.

It is evident that the drawings are only examples or embodiments of the present application, from which the present application can also be adapted to other similar situations by a person skilled in the art without the inventive effort. In addition, it should be appreciated that while the development effort might be complex and lengthy, it would nevertheless be a routine undertaking of design, fabrication, or manufacture for those of ordinary skill having the benefit of this disclosure, and thus should not be construed as an admission of insufficient detail.

The term "embodiment" in this application means that a particular feature, structure, or characteristic described in connection with the embodiment may be included in at least one embodiment of the application. The appearances of such phrases in various places in the specification are not necessarily all referring to the same embodiment, nor are separate or alternative embodiments mutually exclusive. It will be clear or implicitly understood by those of ordinary skill in the art that the embodiments described in this application can be combined with other embodiments without conflict.

Claims

1. An unbiased Top-K evaluation method based on a unbiased recommendation model is characterized by comprising the following steps:

2. The unbiased Top-K evaluation method based on a unbiased recommendation model as claimed in claim 1, in which the acquisition of a set of randomly exposed entities and the true score of the target user for each sample entity in the set of randomly exposed entities includes:

3. The unbiased Top-K evaluation method based on a unbiased recommendation model as claimed in claim 2, in which the method further includes, before randomly sampling the target set of full exposure entities:

4. The unbiased Top-K evaluation method based on a unbiased recommendation model as claimed in claim 1, further including, before the random exposure entity set determines the ratio between the positive sample entity whose true score is higher than the preset score and all positive sample entities:

5. The unbiased Top-K evaluation method based on a unbiased recommendation model as claimed in claim 1, in which the method further includes:

6. The unbiased Top-K evaluation method based on a unbiased recommendation model as claimed in claim 5, in which the K value is less than or equal to 50.

7. The unbiased Top-K evaluation method based on a unbiased recommendation model according to claim 1, characterized in that the determining, according to the ratio, a recall rate evaluation index of the unbiased recommendation model to be tested for the target user includes:

8. The unbiased Top-K evaluation method based on a unbiased recommendation model according to claim 1, characterized in that the determining, according to recall rate evaluation indexes of the unbiased recommendation model to be measured for a plurality of target users, unbiased recommendation performance indexes of the unbiased recommendation model to be measured includes:

9. Unbiased Top-K evaluation device based on unbiased recommendation model, characterized by comprising:

10. An electronic device comprising a memory and a processor, wherein the memory has stored therein a computer program, the processor being arranged to run the computer program to perform the unbiased Top-K assessment method based on a unbiased recommendation model as claimed in any one of claims 1 to 8.