CN111460384B

CN111460384B - Policy evaluation method, device and equipment

Info

Publication number: CN111460384B
Application number: CN202010247385.7A
Authority: CN
Inventors: 贾晋康; 陈冠霖; 李世雷; 孙玉坤; 王轶凡; 张钋; 朱弘哲; 王雪颖
Original assignee: Beijing Baidu Netcom Science and Technology Co Ltd
Current assignee: Beijing Baidu Netcom Science and Technology Co Ltd
Priority date: 2020-03-31
Filing date: 2020-03-31
Publication date: 2023-09-08
Anticipated expiration: 2040-03-31
Also published as: CN111460384A

Abstract

The application discloses a method, a device and equipment for evaluating a strategy, and relates to the technical field of big data. The specific implementation scheme is as follows: determining a target offline resource recommendation list corresponding to the strategy to be evaluated according to a first online resource recommendation list, wherein the first online resource recommendation list is a resource list obtained after the strategy to be evaluated is online; determining user characteristics of each user in the sampling users according to a target user knowledge base, wherein the target user knowledge base is determined according to first user feedback behaviors of all online users using the strategies to be evaluated before using the strategies to be evaluated; determining a benefit evaluation index corresponding to the strategy to be evaluated according to the user characteristics of each user and each resource in the target offline resource recommendation list; and sending the income evaluation index to the terminal equipment. It can be seen that the accuracy of the prediction of the yield evaluation index corresponding to the strategy to be evaluated can be improved by the embodiment of the application.

Description

Policy evaluation method, device and equipment

Technical Field

The application relates to the field of data processing, in particular to the technical field of big data.

Background

At present, the application of internet information flow products is more and more widespread, the information flow products usually adopt a certain recommendation algorithm or recommendation strategy to recommend related contents to users, and how to evaluate the effect of the recommendation algorithm or recommendation strategy is a focus of attention.

Currently, for a recommended strategy, the recommended strategy is evaluated by means of online control of a small flow experiment. The online control small flow experiment (also called as AB test small flow evaluation experiment) refers to randomly selecting a certain proportion of users (flow), and dividing the selected users into two groups, namely a group and a group B. And the information flow products before the online of the recommendation strategy are used by the group A users, the information flow products after the online of the recommendation strategy are used by the group B users, the feedback behavior data of the information flow products used by the group A users and the group B users are stored, and the effect of the online of the recommendation strategy is evaluated based on the feedback behavior data of the two groups of users.

However, as information flow products become more and more complex, and optimization algorithms or strategies increase gradually, the adjustment of the strategies becomes more and more refined, so that the flow circulation period of the on-line small flow experiment becomes longer, and the research and development efficiency is greatly reduced.

Disclosure of Invention

The embodiment of the application provides a method, a device and equipment for evaluating a strategy, which can improve the accuracy of predicting a yield evaluation index corresponding to the strategy to be evaluated.

In a first aspect, an embodiment of the present application provides a method for evaluating a policy, where the method includes:

Determining a target offline resource recommendation list corresponding to a strategy to be evaluated according to a first online resource recommendation list corresponding to a sampling user, wherein the first online resource recommendation list is a resource list obtained after the strategy to be evaluated is online, and the user distribution of the sampling user is consistent with the user distribution of the online user;

determining user characteristics of each user in the sampling users according to a target user knowledge base, wherein the user distribution of the sampling users is consistent with the user distribution of online users, and the target user knowledge base is determined according to first user feedback behaviors of all online users using the strategy to be evaluated before using the strategy to be evaluated;

determining a benefit evaluation index corresponding to the strategy to be evaluated according to the user characteristics of each user and each resource in the target offline resource recommendation list, wherein the benefit evaluation index is used for evaluating the user feedback behaviors of the resources in the target offline resource recommendation list;

and sending the income evaluation index to terminal equipment.

According to the embodiment of the application, on one hand, in an offline state, the user characteristics and the target offline resource recommendation list corresponding to the strategy to be evaluated can be combined, the strategy to be evaluated is rapidly evaluated, a direct online real small-flow experiment is not needed, the online flow circulation period can be reduced, the offline investigation efficiency is continuously improved, and the research and development cost is reduced; and the new strategy is put on line after the strategy is fully evaluated, so that the user experience and the user loyalty can be improved, and the loss of the user is avoided. On the other hand, since the target offline resource recommendation list is obtained according to the real first online resource recommendation list of the sampled user recommendation after the online policy to be evaluated is on line, the difference between the real resource recommendation list of the online user and the resource recommendation list obtained in the offline environment can be reduced as much as possible, and the target user knowledge base is determined according to the real first user feedback behaviors of all online users using the policy to be evaluated before the online users using the policy to be evaluated, so that when the evaluation result is inaccurate, the cause of the inaccurate evaluation result can be accurately positioned. In still another aspect, after the target offline resource recommendation list and the target user knowledge base are obtained according to the above manner, the accuracy of the evaluation can be improved when the new online strategy is evaluated in an online manner.

In a second aspect, an embodiment of the present application further provides a policy evaluation device, where the device includes:

the processing module is used for determining a target offline resource recommendation list corresponding to the strategy to be evaluated according to a first online resource recommendation list corresponding to the sampling user, wherein the first online resource recommendation list is a resource list obtained after the strategy to be evaluated is online, and the user distribution of the sampling user is consistent with the user distribution of the online user;

the processing module is further configured to determine user characteristics of each user in the sampled users according to a target user knowledge base, where the target user knowledge base is determined according to first user feedback behaviors of all online users using the policy to be evaluated before using the policy to be evaluated;

the processing module is further configured to determine a benefit evaluation index corresponding to the policy to be evaluated according to user characteristics of each user and each resource in the target offline resource recommendation list, where the benefit evaluation index is used to evaluate user feedback behaviors of resources in the target offline resource recommendation list;

and the sending module is used for sending the income evaluation index to the terminal equipment.

In a third aspect, an embodiment of the present application further provides an electronic device, where the electronic device may include:

at least one processor; and a memory communicatively coupled to the at least one processor; wherein the memory stores instructions executable by the at least one processor to enable the at least one processor to perform the method of evaluating policies described in any one of the possible implementations of the first aspect.

In a fourth aspect, embodiments of the present application further provide a non-transitory computer-readable storage medium storing computer instructions for causing the computer to perform the method of evaluating a policy as described in any one of the possible implementations of the first aspect.

One embodiment of the above application has the following advantages or benefits: on the one hand, in an offline state, the user characteristics and a target offline resource recommendation list corresponding to the strategy to be evaluated can be combined, the strategy to be evaluated is rapidly evaluated, a direct online real small-flow experiment is not needed, the online flow circulation period can be reduced, the offline investigation efficiency is continuously improved, and the research and development cost is reduced; and the new strategy is put on line after the strategy is fully evaluated, so that the user experience and the user loyalty can be improved, and the loss of the user is avoided. On the other hand, since the target offline resource recommendation list is obtained according to the real first online resource recommendation list of the sampled user recommendation after the online policy to be evaluated is on line, the difference between the real resource recommendation list of the online user and the resource recommendation list obtained in the offline environment can be reduced as much as possible, and the target user knowledge base is determined according to the real first user feedback behaviors of all online users using the policy to be evaluated before the online users using the policy to be evaluated, so that when the evaluation result is inaccurate, the cause of the inaccurate evaluation result can be accurately positioned. In still another aspect, after the target offline resource recommendation list and the target user knowledge base are obtained according to the above manner, the accuracy of the evaluation can be improved when the new online strategy is evaluated in an online manner.

It should be understood that the description in this section is not intended to identify key or critical features of the embodiments of the disclosure, nor is it intended to be used to limit the scope of the disclosure. Other features of the present disclosure will become apparent from the following specification.

Drawings

The drawings are included to provide a better understanding of the present application and are not to be construed as limiting the application. Wherein:

FIG. 1 is a schematic diagram of an architecture of a policy evaluation method according to an embodiment of the present application;

FIG. 2 is a flowchart of a policy evaluation method according to an embodiment of the present application;

FIG. 3 is a schematic diagram of determining a benefit evaluation index;

FIG. 4 is a schematic diagram showing a revenue evaluation index;

FIG. 5 is a flowchart of a policy evaluation method according to a second embodiment of the present application;

FIG. 6 is a schematic diagram of determining a target offline resource recommendation list;

FIG. 7 is a schematic diagram of determining a target user knowledge base;

FIG. 8 is a schematic structural diagram of an evaluation device of a policy according to an embodiment of the present application;

fig. 9 is a block diagram of an electronic device for implementing a policy evaluation method according to an embodiment of the present application.

Detailed Description

Exemplary embodiments of the present application will now be described with reference to the accompanying drawings, in which various details of the embodiments of the present application are included to facilitate understanding, and are to be considered merely exemplary. Accordingly, those of ordinary skill in the art will recognize that various changes and modifications of the embodiments described herein can be made without departing from the scope and spirit of the application. Also, descriptions of well-known functions and constructions are omitted in the following description for clarity and conciseness.

In embodiments of the present application, "at least one" means one or more, and "a plurality" means two or more. "and/or", describes an association relationship of an association object, and indicates that there may be three relationships, for example, a and/or B, and may indicate: there are three cases, a alone, a and B together, and B alone, wherein a, B may be singular or plural. In the text description of the present application, the character "/" generally indicates that the front-rear associated object is an or relationship.

Before describing the scheme of the present application, the terms and the like related to the present application will be explained:

information flow product: the information stream products can be classified into social products, information products, video products, music products, etc. according to their attributes.

Strategy: an algorithm for recommending content of the information stream product;

user distribution: age, gender, and region of the user;

revenue evaluation index: the method comprises the steps of measuring user feedback behaviors of the information flow product, wherein the user feedback behaviors comprise clicking times, stay time, collection operation, praise operation, comment operation and the like;

guardrail index: and the method is used for evaluating the sequencing and recall of the resources corresponding to the strategy to be evaluated applied to the information flow product and judging whether the effect of the strategy to be evaluated meets the expected defensive index. The guardrail index is a generic term and comprises all indexes irrelevant to feedback behaviors, such as indexes including display duty ratio of resource types, image-text duty ratio in the resource, video duty ratio in the resource, display style distribution, source queues and the like, wherein the resource types can comprise image-text, short video, small video and the like.

User knowledge base: a library for storing user characteristics including user portraits, user browsing behavior characteristics, user clicking behavior characteristics, and the like.

Offline investigation environment: in an offline state, the resource characteristics of a plurality of resources and the environments of the user characteristics of a plurality of users are obtained.

The resource: including text resources, image resources, video resources, audio resources, and the like.

In the early development stage of information flow products, the online flow can accommodate a certain number of small flow experiments. As the information flow products become more and more complex, the optimization algorithm (also called a policy) for optimizing the information flow products is gradually increased, and a large number of small flow experiments have to be queued for a long time to be on line because of being unable to distribute on-line flow, so that the research and development efficiency becomes low; further, as the development of information flow products changes, the adjustment of strategies is more and more refined, and each group of small-flow experiments needs to test a plurality of groups of experimental effect differences (corresponding to strategy parameter adjustment scenes), so that the online traffic which is already stressed becomes more and more congested. Furthermore, to reach an evaluation conclusion with a high confidence, a long duration of observation (typically on the order of days, e.g., 3 days or more) is required for one online experiment, resulting in a continuously longer online traffic circulation period. In addition, a large number of strategies which are not fully verified are directly put on line to carry out small-flow experimental evaluation, so that a certain degree of negative influence is brought to the user experience on line, the user experience is poor, the loyalty of the user is reduced, and the user loss is caused.

Aiming at the problem, the historical data of the user can be analyzed and modeled on line to construct a user knowledge base, namely, the user portrait of the user is subjected to feature learning and description; before a new strategy is online, an offline prediction result of the strategy, namely an offline resource recommendation list, is simulated through an offline investigation environment, user feedback behaviors, such as clicking behaviors, of users on each resource in the offline resource recommendation list under the strategy are predicted through a user knowledge base, and after the feedback behaviors of a plurality of users are predicted, effects, such as positive benefits, benefit leveling or negative benefits, which can be obtained after the strategy is online are evaluated. Although the method for rapidly evaluating the new online strategy can reduce the online flow circulation period, continuously improve the offline investigation efficiency, reduce the research and development cost, and further, after the strategy is fully evaluated, the new online strategy can be added, so that the user experience and the user loyalty can be improved, and the user loss is avoided. However, in the above-mentioned manner of evaluating the new online policy, the obtained evaluation result is related to the offline user knowledge base constructed offline and the offline resource recommendation list simulated in the offline investigation environment, which not only causes inaccurate evaluation result of the policy to be evaluated, but also fails to locate the cause of inaccurate evaluation result if the evaluation result is inaccurate.

Based on the above, the basic idea of the application is that: and adjusting the offline codes and the deployment environment through a first online resource recommendation list corresponding to the sampling user obtained after the strategy to be evaluated is online so as to continuously adjust the determined initial offline resource recommendation list, thereby obtaining a target offline resource recommendation list corresponding to the strategy to be evaluated, and ensuring that the target offline resource recommendation list recommended in the offline investigation environment gradually tends to be consistent with the first online resource recommendation list actually recommended by the sampling user. In addition, the initial user knowledge base constructed according to the historical behavior data of the user can be adjusted according to the first user feedback behaviors, such as clicking behaviors, of all online users using the to-be-evaluated policies before using the to-be-evaluated policies, so that a target user knowledge base is obtained, the difference between the real resource recommendation list of the online users and the resource recommendation list obtained in an offline environment can be reduced as far as possible, and the target user knowledge base is determined according to the real first user feedback behaviors of all online users using the to-be-evaluated policies before using the to-be-evaluated policies, so that not only can the accuracy of evaluation results of the to-be-evaluated policies be improved, but also the newly-online policies can be evaluated on line, and when the evaluation results are inaccurate, the reasons causing the inaccurate evaluation results can be accurately positioned.

The method for evaluating the strategy provided by the application is described below by means of a specific embodiment.

Fig. 1 is a schematic diagram of an architecture of a policy evaluation method according to an embodiment of the present application. As shown in fig. 1, the policy evaluation system includes a policy evaluation module 110 and an offline policy module 120. The offline policy module 120 is configured to generate, in an offline state, a target offline resource recommendation list corresponding to a policy to be evaluated according to a first online resource recommendation list corresponding to a sampling user, where the first online resource recommendation list is a resource list obtained after the policy to be evaluated is online; the policy evaluation module 110 is configured to evaluate a user feedback behavior of a resource of the target offline resource recommendation list, where the user feedback behavior of the resource includes one or more of a click count, a stay time, a collection operation, a praise operation, or a comment operation of the user on the resource.

The policy evaluation module 110 includes a target user knowledge base 112 and a sampling engine 114, wherein the target user knowledge base 112 stores a plurality of user features including: user basic features, user interest features, user click resource features, etc., wherein the user basic features may be gender, age, occupation, etc., for example. The target user knowledge base 112 is determined according to a first user feedback behavior of all online users using the policy to be evaluated before using the policy to be evaluated, where the first user feedback behavior includes one or more of a click number, a stay time, a collection operation, a praise operation, or a comment operation of all online users using the policy to be evaluated on the resource before using the policy to be evaluated. The sampling engine 114 is configured to perform sampling simulation on all online users using the policy to be evaluated, so as to obtain sampling users consistent with the user distribution of the online users.

Further, the policy evaluation module 110 obtains the user characteristics of each user from the target user knowledge base 112 based on the user identification of each user in the sampled users, and determines a benefit evaluation index corresponding to the policy to be evaluated based on the user characteristics of each user and the target offline resource recommendation list generated by the offline policy module 120, where the benefit evaluation index is used to evaluate the user feedback behavior of the resources of the target offline resource recommendation list.

It can be understood that the method for evaluating the strategy provided by the embodiment of the application can be applied to a scene of predicting the gain evaluation index corresponding to a certain strategy before the strategy is online, and particularly to a scene of predicting the strategy on line.

The method for evaluating the policy provided by the present application will be described in detail by way of specific examples. It is to be understood that the following embodiments may be combined with each other and that some embodiments may not be repeated for the same or similar concepts or processes.

Fig. 2 is a flow chart of a policy evaluation method according to an embodiment of the present application, where the policy evaluation method may be performed by a software and/or hardware device, for example, the hardware device may be a policy evaluation device, and the policy evaluation device may be disposed in a server. For example, referring to fig. 2, the method for evaluating the policy may include:

S201, determining a target offline resource recommendation list corresponding to the strategy to be evaluated according to a first online resource recommendation list corresponding to the sampling user.

The first online resource recommendation list is a resource list obtained after the strategy to be evaluated is online, and the user distribution of the sampling users is consistent with the user distribution of online users.

In the step, the sampling engine is used for sampling the online users by adopting a preset algorithm, so that the sampled users can be obtained, and the user distribution of the sampled users can be ensured to be consistent with the user distribution of the online users. For example, according to the age, sex and regional characteristic distribution of the online user, the online user is sampled by the sampling engine, so that the sampled user is consistent with the online real user in age, sex and regional characteristic distribution.

It should be noted that, although the user profile is described as an example of an age, a gender, and a regional characteristic profile, it should be understood by those skilled in the art that the user profile may also include other characteristic profiles, such as a profession, a hobby profile, and the like, which is also within the scope of the present application.

The first online resource recommendation list is a real resource list obtained after the sampling user uses the policy to be evaluated, and because the user distribution of the sampling user is consistent with the user distribution of all online users using the policy to be evaluated, the real behaviors of all online users using the policy to be evaluated can be represented by the first online resource recommendation list corresponding to the sampling user.

In the offline investigation environment, according to the first online resource recommendation list, the original offline resource recommendation list can be continuously adjusted to generate a target offline resource recommendation list corresponding to the strategy to be evaluated. The original offline resource recommendation list is adjusted through the real first online resource recommendation list, so that a target offline resource recommendation list is obtained, the difference between the first online resource recommendation list corresponding to the sampling user and the target offline resource recommendation list generated in the offline investigation environment can be reduced as much as possible, and the accuracy of the evaluation of the strategy to be evaluated in the offline environment is ensured.

It should be noted that the policy to be evaluated may be a collaborative filtering model, a logistic regression model, or other suitable recommendation models, such as a deep learning model, a gradient lifting decision tree model, etc., which is not limited in particular by the present application.

S202, determining user characteristics of each user in the sampling users according to a target user knowledge base, wherein the target user knowledge base is determined according to first user feedback behaviors of all online users using the to-be-evaluated policies before the online users using the to-be-evaluated policies.

Further, based on the user identification of each user in the sampled users, user characteristics of each user are obtained from the target user knowledge base, and the user characteristics may include, for example: one or more of a user basic feature, a user interest feature, or a user click resource feature, where the user basic feature may be, for example, gender, age, and the like.

In addition, through the first user feedback behavior of all online users using the policy to be evaluated on the resource before using the policy to be evaluated, the original user knowledge base can be continuously adjusted to generate the target user knowledge base. It should be noted that, the target user knowledge base is generated by analyzing and modeling by using the historical data of the user, and is decoupled from the current policy to be evaluated. For example: the target user knowledge base can be generated by adopting the first user feedback behavior of the first week of 3 months, so that the target user knowledge base is used for evaluating the strategy to be evaluated which is online after the first week of 3 months.

The original user knowledge base is adjusted through the real first user feedback behavior to obtain the target user knowledge base, so that the accuracy of the constructed target user knowledge base can be ensured as much as possible, and the accuracy of the strategy to be evaluated in an offline environment can be ensured.

S203, determining a benefit evaluation index corresponding to the strategy to be evaluated according to the user characteristics of each user and each resource in the target offline resource recommendation list, wherein the benefit evaluation index is used for evaluating the user feedback behaviors of the resources in the target offline resource recommendation list.

In this step, based on the user characteristics of each user and each resource in the target offline resource recommendation list, the user feedback behavior of each user on the resources in the target offline resource recommendation list, such as the number of resource clicks, the stay time, and the like, is predicted. And determining a benefit evaluation index corresponding to the strategy to be evaluated based on the predicted user feedback behavior, wherein the benefit evaluation index is used for evaluating the user feedback behavior of the resources of the target offline resource recommendation list.

For example, if the target offline resource recommendation list includes resource 1, resource 2, resource 3 and resource 4, the server may predict the number of clicks and/or the residence time of each user on resource 1, resource 2, resource 3 and resource 4 based on the user characteristics of each user, for example, based on the gender, age or interest characteristics of each user, etc., and then obtain the benefit evaluation index corresponding to the to-be-evaluated policy by counting the total number of clicks and/or the total residence time of each user. For example, if the counted total click parameters of the users are greater than a first preset value and/or the counted total residence time of the users is greater than a second preset value, it may be estimated that positive benefits may be obtained after the to-be-estimated strategy is online, or if the counted total click parameters of the users are greater than the first preset value or the counted total residence time of the users is greater than the second preset value, it may be estimated that benefits may be leveled after the to-be-estimated strategy is online, or if the counted total click parameters of the users are not greater than the first preset value and the counted total residence time of the users is not greater than the second preset value, it may be estimated that negative benefits may be obtained after the to-be-estimated strategy is online, and so on. Of course, the above evaluation result is merely an example, and other benefit evaluation indexes may be used to evaluate the user feedback behavior of the resources in the target offline resource recommendation list, for example, the user performs a collection operation, a praise operation, a comment operation, and the like on a certain resource.

FIG. 3 is a schematic diagram of determining a revenue evaluation index, as shown in FIG. 3, after determining a target offline resource recommendation list and a target user knowledge base by sampling a user, determining a revenue evaluation index corresponding to a policy to be evaluated. And determining whether the strategy to be evaluated needs an online small flow test according to the gain evaluation index.

S204, sending the income evaluation index to the terminal equipment.

In this step, after determining the revenue evaluation index corresponding to the policy to be evaluated, the server may send the revenue evaluation index to the terminal device, and the user may check the corresponding revenue evaluation index through the terminal device to determine whether the policy to be evaluated is directly online, online after adjustment, or the like.

Fig. 4 is a schematic display diagram of a revenue evaluation index, as shown in fig. 4, after receiving the revenue evaluation index sent by the server, the terminal device may display the revenue evaluation index through a display interface, for example: the policy to be evaluated is an XX policy, the clicking times of the user on the resource under the XX policy are 100 times, and the total stay time of the user on the resource is 150 minutes.

According to the method for evaluating the strategy provided by the embodiment of the application, the target offline resource recommendation list corresponding to the strategy to be evaluated is determined according to the first online resource recommendation list corresponding to the sampling user, wherein the first online resource recommendation list is a resource list obtained after the strategy to be evaluated is online, the user distribution of the sampling user is consistent with the user distribution of the online user, the user characteristics of each user in the sampling user are determined according to the target user knowledge base, wherein the target user knowledge base is determined according to the first user feedback behaviors of all online users using the strategy to be evaluated before using the strategy to be evaluated, then the profit evaluation index corresponding to the strategy to be evaluated is determined according to the user characteristics of each user and each resource in the target offline resource recommendation list, and the profit evaluation index is sent to the terminal equipment, wherein the profit evaluation index is used for evaluating the user feedback behaviors of the resources in the target offline resource recommendation list. On the one hand, in an offline state, the user characteristics and a target offline resource recommendation list corresponding to the strategy to be evaluated can be combined, the strategy to be evaluated is rapidly evaluated, a direct online real small-flow experiment is not needed, the online flow circulation period can be reduced, the offline investigation efficiency is continuously improved, and the research and development cost is reduced; and the new strategy is put on line after the strategy is fully evaluated, so that the user experience and the user loyalty can be improved, and the loss of the user is avoided. On the other hand, since the target offline resource recommendation list is obtained according to the real first online resource recommendation list of the sampled user recommendation after the online policy to be evaluated is on line, the difference between the real resource recommendation list of the online user and the resource recommendation list obtained in the offline environment can be reduced as much as possible, and the target user knowledge base is determined according to the real first user feedback behaviors of all online users using the policy to be evaluated before the online users using the policy to be evaluated, so that when the evaluation result is inaccurate, the cause of the inaccurate evaluation result can be accurately positioned. In still another aspect, after the target offline resource recommendation list and the target user knowledge base are obtained according to the above manner, the accuracy of the evaluation can be improved when the new online strategy is evaluated in an online manner.

Fig. 5 is a flowchart of a policy evaluation method according to a second embodiment of the present application, and the present embodiment describes in detail how to determine, based on the first online resource recommendation list corresponding to the sampling user in S201, a process of determining a target offline resource recommendation list corresponding to the policy to be evaluated based on the embodiment shown in fig. 2. As shown in fig. 5, the method for evaluating the policy may include:

s501, determining a first guardrail index according to the first online resource recommendation list, wherein the first guardrail index is used for evaluating the sorting and recall of the resources in the first online resource recommendation list.

In this step, the first online resource recommendation list is a real recommendation list obtained by sampling user recommendation after the online policy to be evaluated is on line, so that the first guardrail index determined according to the first online resource recommendation list basically accords with the expected defensive index.

In one possible implementation manner, the first guardrail index may be determined by determining the resource type of each resource in the first online resource recommendation list, counting the number of each type of resource in the first online resource recommendation list, obtaining a counting result, and then determining the first guardrail index according to the counting result.

Specifically, the resource types of the resources in the first online resource recommendation list may include an image resource type, a text resource type, a video resource type, and the like. The server may determine the resource types of the respective resources in advance, and store the determined resource types in the resource feature library, so that the resource types of the respective resources in the first online resource recommendation list may be obtained from the resource feature library based on the identification information of the resources included in the first online resource recommendation list.

After determining the resource types of the resources in the first online resource recommendation list, statistics may be performed on the resources of each type in the first online resource recommendation list. For example, statistics is performed on video resources, picture resources and text resources in the first online resource recommendation list to obtain statistics results of various types of resources, where the statistics results may be the number of each type of resources, for example, 3 video resources, 4 picture resources, 5 text resources, and so on.

Based on the statistical result, a first guardrail index can be determined, wherein the first guardrail index is used for evaluating the sorting and recall of resources in a first online resource recommendation list corresponding to a strategy to be evaluated, and the first guardrail index comprises indexes such as the display duty ratio of the resource type, the image-text duty ratio in the resource, the video duty ratio in the resource, the display style distribution, the source queue and the like.

In this embodiment, the first guardrail index is determined according to the statistical result by determining the resource type of each resource in the first online resource recommendation list and counting the resources of each type in the first online resource recommendation list, so that the determination mode of the first guardrail index is simpler.

S502, determining a second guardrail index according to the initial offline resource recommendation list, wherein the second guardrail index is used for evaluating the sorting and recall of the resources in the initial offline resource recommendation list.

In this step, the initial offline resource recommendation list is a resource list corresponding to the policy to be evaluated, which is generated in the offline investigation environment, where the initial offline resource recommendation list is a resource list generated by inputting the user characteristics of the sampling user into a preset training model, and may also be understood as a resource list collected in the offline investigation environment, that is, the initial offline resource recommendation list is a resource list obtained in the offline investigation environment.

In one possible implementation manner, the second guardrail index may be determined by determining the resource type of each resource in the initial offline resource recommendation list, counting each type of resource in the initial offline resource recommendation list, obtaining a statistical result, and then determining the second guardrail index according to the statistical result.

Specifically, the resource types of the respective resources in the initial offline resource recommendation list may include an image resource type, a text resource type, a video resource type, and the like. The server may determine the resource type of each resource in advance, and store the determined resource type in the resource feature library, so that when determining the resource type of the resource, the server may obtain the resource type of each resource in the initial offline resource recommendation list from the resource feature library based on the identification information of the resource included in the initial offline resource recommendation list.

After determining the resource type of each resource in the initial offline resource recommendation list, statistics may be performed on each type of resource in the initial offline resource recommendation list. For example, statistics is performed on video resources, picture resources and text resources in the initial offline resource recommendation list to obtain statistics results of various types of resources, wherein the statistics results can be the number of each type of resources, such as 2 video resources, 3 picture resources, 6 text resources and the like.

Based on the statistical result, a second guardrail index can be determined, wherein the second guardrail index is used for evaluating the sorting and recall of resources in an initial offline resource recommendation list corresponding to the strategy to be evaluated, and the second guardrail index comprises indexes such as the display duty ratio of the resource type, the image-text duty ratio in the resource, the video duty ratio in the resource, the display style distribution, the source queue and the like.

In this embodiment, the second guardrail index is determined according to the statistical result by determining the resource type of each resource in the initial offline resource recommendation list and counting the resources of each type in the initial offline resource recommendation list, so that the determination mode of the second guardrail index is simpler.

S503, determining a target offline resource recommendation list according to the first guardrail index and the second guardrail index.

In this step, after the first guardrail index and the second guardrail index are determined, the first guardrail index and the second guardrail index may be compared to determine the target offline resource recommendation list.

In this embodiment, the first guardrail index is determined according to the first online resource recommendation list, the second guardrail index is determined according to the initial offline resource recommendation list, and then the first guardrail index and the second guardrail index are compared, so that the target offline resource recommendation list is determined, and therefore the difference between the target offline resource recommendation list simulated offline and the real first online resource recommendation list obtained online can be reduced, and the accuracy of the target offline resource recommendation list is improved.

If the difference value between the first guardrail index and the second guardrail index is greater than the first preset threshold, adjusting the initial offline resource recommendation list, and determining the adjusted initial offline resource recommendation list as a target offline resource recommendation list, wherein the difference value between a third guardrail index corresponding to the adjusted initial offline resource recommendation list and the first guardrail index is not greater than the first preset threshold; and if the difference value between the first guardrail index and the second guardrail index is not greater than a first preset threshold value, determining the initial offline resource recommendation list as a target offline resource recommendation list.

Specifically, fig. 6 is a schematic diagram of determining a target offline resource recommendation list, as shown in fig. 6, where a first online resource recommendation list may be determined by sampling users, and an initial offline resource recommendation list may be simulated and generated by the sampling users in an offline investigation environment. Further, the first guardrail index can be obtained by counting the resource types of the resources in the first online resource recommendation list, and the second guardrail index can be obtained by counting the resource types of the resources in the initial offline resource recommendation list. And comparing the first guardrail index with the second guardrail index, if the difference value between the first guardrail index and the second guardrail index is not larger than a first preset threshold value, the difference between the initial offline resource recommendation list simulated offline and the first online resource recommendation list obtained online truly is smaller, and at the moment, the initial offline resource recommendation list can be directly determined to be the target offline resource recommendation list.

Further, if the difference value between the first guardrail index and the second guardrail index is greater than the first preset threshold, the fact that the difference between the initial offline resource recommendation list simulated offline and the first online resource recommendation list obtained online truly is greater is indicated, and the initial offline resource recommendation list needs to be adjusted. Specifically, the reasons that the difference between the initial offline resource recommendation list and the first online resource recommendation list which are simulated offline may be caused by analyzing the environmental difference, dictionary difference or user model difference, etc. can be eliminated by adjusting the offline code and the deployment environment, so that the purpose of adjusting the initial offline resource recommendation list is achieved, and the adjusted initial offline resource recommendation list is generated. The resource types of the resources in the adjusted initial offline resource recommendation list can be counted to obtain a third guardrail index corresponding to the adjusted initial offline resource recommendation list, and if the difference value between the third guardrail index and the first guardrail index is not greater than a first preset threshold value, the adjusted initial offline resource recommendation list can be directly determined to be a target offline resource recommendation list. And if the difference value between the third guardrail index and the first guardrail index is larger than the first preset threshold value, repeating the steps until the obtained difference value between the third guardrail index and the first guardrail index is not larger than the first preset threshold value.

The first preset threshold may be selected according to actual situations or experience, and the embodiment of the present application is not limited herein for a specific value of the first preset threshold.

In this example, by comparing the first guardrail index with the second guardrail index, when the difference value obtained after comparison is greater than the first preset threshold, the initial offline resource recommendation list needs to be adjusted, and when the difference value is not greater than the first preset threshold, the initial offline resource recommendation list can be directly determined as the target offline resource recommendation list, so that the difference between the target offline resource recommendation list obtained offline and the first online resource recommendation list obtained online truly can be reduced as much as possible, and the accuracy of the revenue evaluation index corresponding to the strategy to be evaluated predicted by the target offline resource recommendation list is ensured.

S504, determining user characteristics of each user in the sampling users according to a target user knowledge base, wherein the target user knowledge base is determined according to first user feedback behaviors of all online users using the to-be-evaluated policies before the online users using the to-be-evaluated policies.

In this step, the server may determine user characteristics of each of the sampled users based on a pre-constructed knowledge base of the target users.

In one possible implementation, the target user knowledge base may be determined as follows: determining a second online resource recommendation list according to all online users using the strategy to be evaluated, determining first user feedback behaviors of all online users using the strategy to be evaluated before using the strategy to be evaluated according to the second online resource recommendation list, determining second user feedback behaviors corresponding to the sampling users according to the first online resource recommendation list and the initial user knowledge base, and determining a target user knowledge base according to the first user feedback behaviors and the second user feedback behaviors.

Specifically, fig. 7 is a schematic diagram of determining a knowledge base of a target user, as shown in fig. 7, where a second online resource recommendation list may be determined according to all online users using the policy to be evaluated, where the second online resource recommendation list includes a plurality of resources. The second online resource recommendation list is obtained according to all online user recommendations using the policy to be evaluated, so that the second online resource recommendation list is a real resource list.

According to the determined second online resource recommendation list, first user feedback behaviors of all online users using the to-be-evaluated strategy before using the to-be-evaluated strategy can be determined, wherein the first user feedback behaviors comprise the clicking times, the stay time, the collection operation, the praise operation, the comment operation and the like of all online users using the evaluation strategy on resources before using the to-be-evaluated strategy. The first user feedback behavior is the actual behavior of all online users using the evaluation strategy before using the strategy to be evaluated.

In addition, according to the sampling users using the strategy to be evaluated, a first online resource recommendation list can be determined, and the initial user knowledge base is generated after extracting user characteristics from historical data of a plurality of users. Based on the user characteristics of each user in the initial user knowledge base and each resource in the first online resource recommendation list, the second user feedback behavior of each user on the resources in the first online resource recommendation list, such as the number of resource clicks and/or the stay time, can be predicted. For example: the number of clicks of the female user on the video resource in the first online resource recommendation list is more, the number of clicks of the male user on the text resource in the first online resource recommendation list is more, and so on.

Further, the determined first user feedback behavior and the determined second user feedback behavior can be compared to determine a target user knowledge base.

In this embodiment, the second online resource recommendation list is used to determine the real first user feedback behavior of the online user before using the policy to be evaluated, and determine the second user feedback behavior according to the real first online resource recommendation list recommended by the sampled user and the initial user knowledge base, and then compare the first user feedback behavior with the second user feedback behavior, so as to determine the target user knowledge base, thereby improving the accuracy of the constructed target user knowledge base.

If the difference value between the first user feedback behavior and the second user feedback behavior is greater than the second preset threshold, adjusting the initial user knowledge base, and determining the adjusted initial user knowledge base as a target user knowledge base, wherein the difference value between the third user feedback behavior and the first user feedback behavior corresponding to the adjusted initial user knowledge base is not greater than the second preset threshold; and if the difference value between the first user feedback behavior and the second user feedback behavior is not greater than a second preset threshold value, determining the initial user knowledge base as a target user knowledge base.

Specifically, with continued reference to fig. 7, the first user feedback behavior and the second user feedback behavior are compared, and if the difference value between the first user feedback behavior and the second user feedback behavior is not greater than the second preset threshold, it is indicated that the difference between the initial user knowledge base generated offline and the actual user knowledge base is smaller, and at this time, the initial user knowledge base may be directly determined as the target user knowledge base.

Further, if the difference value between the first user feedback behavior and the second user feedback behavior is greater than the second preset threshold, it is indicated that the difference between the initial user knowledge base generated offline and the actual user knowledge base is greater, and the initial user knowledge base needs to be adjusted. Specifically, the initial user knowledge base may be adjusted by optimizing a learning algorithm of the user historical behavior data or optimizing an algorithm for extracting user features, so as to generate an adjusted initial user knowledge base. And then, determining a third user feedback behavior through the first online resource category and the adjusted initial user knowledge base, comparing the third user feedback behavior with the first user feedback behavior, and if the difference value between the third user feedback behavior and the first user feedback behavior is not greater than a second preset threshold value, directly determining the adjusted initial user knowledge base as a target user knowledge base. If the difference value between the third user feedback behavior and the first user feedback behavior is larger than the second preset threshold, repeating the steps until the obtained difference value between the third user feedback behavior and the first user feedback behavior is not larger than the second preset threshold.

The second preset threshold may be selected according to actual situations or experience, for example, may be 80%, and the specific value of the second preset threshold is not limited herein.

In this example, by comparing the first user feedback behavior with the second user feedback behavior, when the difference value obtained after comparison is greater than the second preset threshold, the initial user knowledge base needs to be adjusted, and when the difference value is not greater than the second preset threshold, the initial user knowledge base can be directly determined as the target user knowledge base, so that the difference between the target user knowledge base obtained offline and the user knowledge base obtained by the online user can be reduced as much as possible, and the accuracy of the gain evaluation index corresponding to the strategy to be evaluated predicted by the target user knowledge base is ensured.

S505, determining a benefit evaluation index corresponding to the strategy to be evaluated according to the user characteristics of each user and each resource in the target offline resource recommendation list, wherein the benefit evaluation index is used for evaluating the user feedback behaviors of the resources in the target offline resource recommendation list.

The method comprises the steps of predicting user feedback behaviors of each user on each resource according to user characteristics of each user and resource characteristics of each resource in a target offline resource recommendation list, counting the user feedback behaviors of each user on each resource to obtain a statistical result, and determining a benefit evaluation index corresponding to a strategy to be evaluated according to the statistical result.

In particular, the resource characteristics may include one or more of a resource type, a resource tag, or a resource subject word. The user feedback behavior of the user comprises the click times of the user on the resource, the stay time and the like.

Further, based on the user characteristics of each user and the resource characteristics of each resource in the target offline resource recommendation list, the user feedback behavior of each resource in the target offline resource recommendation list is predicted by the user through a statistical model or a machine learning model. For example, the click rate estimation model is used for predicting the click times of the user on each resource in the target offline resource recommendation list, and the statistical model or the machine learning model is used for predicting the stay time of each resource in the target offline resource recommendation list, for example, the average stay time of the user on each type of resource can be counted, and the average stay time is used as the stay time of the type of resource.

Further, user feedback behaviors of each user in each resource can be counted, and a counting result is obtained. For example, the target offline resource recommendation list includes resource 1, resource 2, resource 3 and resource 4, the clicking times and the stay time of each user on resource 1, resource 2, resource 3 and resource 4 are predicted based on the user characteristics of each user, and the total clicking times and the stay time of each user are counted to obtain the total clicking times and the total stay time of the resources in the target offline resource recommendation list.

Based on the statistics result, after the total clicking times and the total residence time of the resources in the target offline resource recommendation list by the sampling user are obtained, the total clicking times and the total residence time can be used as the income evaluation index corresponding to the strategy to be evaluated.

In this embodiment, after predicting the user feedback behavior of each user on each resource, the user feedback behavior is counted, and the yield evaluation index corresponding to the policy to be evaluated is determined according to the counted result, so that in an offline state, the user characteristics and the target offline resource recommendation list corresponding to the policy to be evaluated can be combined, the policy to be evaluated can be rapidly evaluated, a direct online real small-flow experiment is not needed, the online flow circulation period can be reduced, the offline investigation efficiency is continuously improved, and the research and development cost is reduced.

Optionally, the revenue evaluation index corresponding to the policy to be evaluated may be determined by a target evaluation algorithm according to the user characteristics of each user and each resource in the target offline resource recommendation list, where the target evaluation algorithm is determined according to the user characteristics of the user.

Specifically, the target evaluation algorithm may be determined according to a user characteristic of a user in the target user knowledge base, for example, determined according to a weight value occupied by the user characteristic, or determined according to a probability that the user clicks a certain class of resources according to the user characteristic, for example, a probability that a female user clicks a video resource is higher, a probability that a male user clicks a text class resource is higher, or the like.

In this embodiment, a target evaluation algorithm is determined according to user characteristics of a user, and a revenue evaluation index corresponding to a policy to be evaluated is determined through the target evaluation algorithm, so that accuracy of determining the revenue evaluation index can be improved.

S506, sending the income evaluation index to the terminal equipment.

According to the method for evaluating the strategy, provided by the embodiment of the application, the first guardrail index determined according to the first online resource recommendation list corresponding to the sampling user is compared with the second guardrail index determined according to the initial offline resource recommendation list to determine the target offline resource recommendation list, so that the difference between the target offline resource recommendation list and the first online resource recommendation list which are simulated offline can be reduced, the accuracy of the target offline resource recommendation list is improved, and the accuracy of the gain evaluation index corresponding to the strategy to be evaluated, which is predicted by the target offline resource recommendation list, can be ensured.

Fig. 8 is a schematic structural diagram of an evaluation device 80 for a policy according to an embodiment of the present application, for example, referring to fig. 8, the evaluation device 80 for a policy may include:

a processing module 801, configured to determine a target offline resource recommendation list corresponding to a policy to be evaluated according to a first online resource recommendation list corresponding to a sampling user, where the first online resource recommendation list is a resource list obtained after the policy to be evaluated is online, and user distribution of the sampling user is consistent with user distribution of online users;

the processing module 801 is further configured to determine user characteristics of each user in the sampled users according to a target user knowledge base, where the target user knowledge base is determined according to first user feedback behaviors of all online users using the policy to be evaluated before using the policy to be evaluated;

the processing module 801 is further configured to determine a benefit evaluation index corresponding to the policy to be evaluated according to user characteristics of each user and each resource in the target offline resource recommendation list, where the benefit evaluation index is used to evaluate user feedback behavior of the resource in the target offline resource recommendation list;

And a sending module 802, configured to send the benefit evaluation index to a terminal device.

Optionally, the processing module 801 is specifically configured to:

determining a first guardrail index according to the first online resource recommendation list, wherein the first guardrail index is used for evaluating the sequence and recall of resources in the first online resource recommendation list corresponding to the strategy to be evaluated;

determining a second guardrail index according to the initial offline resource recommendation list, wherein the second guardrail index is used for evaluating the sequence and recall of resources in the initial offline resource recommendation list corresponding to the strategy to be evaluated;

and determining the target offline resource recommendation list according to the first guardrail index and the second guardrail index.

Optionally, the processing module 801 is specifically configured to:

if the difference value between the first guardrail index and the second guardrail index is larger than a first preset threshold value, the initial offline resource recommendation list is adjusted, and the adjusted initial offline resource recommendation list is determined to be the target offline resource recommendation list, wherein the difference value between a third guardrail index corresponding to the adjusted initial offline resource recommendation list and the first guardrail index is not larger than the first preset threshold value;

And if the difference value between the first guardrail index and the second guardrail index is not greater than the first preset threshold value, determining the initial offline resource recommendation list as the target offline resource recommendation list.

Optionally, the processing module 801 is further configured to determine a second online resource recommendation list according to all online users using the policy to be evaluated;

the processing module 801 is further configured to determine, according to the second online resource recommendation list, a first user feedback behavior of all online users using the policy to be evaluated before using the policy to be evaluated;

the processing module 801 is further configured to determine a second user feedback behavior corresponding to the sampling user according to the first online resource recommendation list and an initial user knowledge base;

the processing module 801 is further configured to determine the target user knowledge base according to the first user feedback behavior and the second user feedback behavior.

Optionally, the processing module 801 is specifically configured to:

if the difference value between the first user feedback behavior and the second user feedback behavior is larger than a second preset threshold, adjusting the initial user knowledge base, and determining the adjusted initial user knowledge base as the target user knowledge base, wherein the difference value between a third user feedback behavior corresponding to the adjusted initial user knowledge base and the first user feedback behavior is not larger than the second preset threshold;

And if the difference value between the first user feedback behavior and the second user feedback behavior is not greater than a second preset threshold value, determining the initial user knowledge base as the target user knowledge base.

Optionally, the processing module 801 is specifically configured to:

and determining a gain evaluation index corresponding to the strategy to be evaluated through a target evaluation algorithm according to the user characteristics of each user and each resource in the target offline resource recommendation list, wherein the target evaluation algorithm is determined according to the user characteristics of the user.

Optionally, the processing module 801 is specifically configured to:

predicting user feedback behaviors of the users on the resources according to the user characteristics of the users and the resource characteristics of the resources in the target offline resource recommendation list;

counting the user feedback behaviors of the users on the resources to obtain a counting result;

and determining the benefit evaluation index corresponding to the strategy to be evaluated according to the statistical result.

Optionally, the processing module 801 is specifically configured to:

determining the resource type of each resource in the first online resource recommendation list;

Counting all types of resources in the first online resource recommendation list to obtain a counting result;

and determining a first guardrail index according to the statistical result.

Optionally, the processing module 801 is specifically configured to:

determining the resource type of each resource in the initial offline resource recommendation list;

counting all types of resources in the initial offline resource recommendation list to obtain a counting result;

and determining a second guardrail index according to the statistical result.

The policy evaluation device 80 provided in the embodiment of the present application may execute the technical scheme of the policy evaluation method in any of the foregoing embodiments, and the implementation principle and beneficial effects of the policy evaluation method are similar to those of the policy evaluation method, and reference may be made to the implementation principle and beneficial effects of the policy evaluation method, which are not described herein.

According to an embodiment of the present application, the present application also provides an electronic device and a readable storage medium.

As shown in fig. 9, there is a block diagram of an electronic device of a policy evaluation method according to an embodiment of the present application. Electronic devices are intended to represent various forms of digital computers, such as laptops, desktops, workstations, personal digital assistants, servers, blade servers, mainframes, and other appropriate computers. The electronic device may also represent various forms of mobile devices, such as personal digital processing, cellular telephones, smartphones, wearable devices, and other similar computing devices. The components shown herein, their connections and relationships, and their functions, are meant to be exemplary only, and are not meant to limit implementations of the applications described and/or claimed herein.

As shown in fig. 9, the electronic device includes: one or more processors 901, memory 902, and interfaces for connecting the components, including high-speed interfaces and low-speed interfaces. The various components are interconnected using different buses and may be mounted on a common motherboard or in other manners as desired. The processor may process instructions executing within the electronic device, including instructions stored in or on memory to display graphical information of the GUI on an external input/output device, such as a display device coupled to the interface. In other embodiments, multiple processors and/or multiple buses may be used, if desired, along with multiple memories and multiple memories. Also, multiple electronic devices may be connected, each providing a portion of the necessary operations (e.g., as a server array, a set of blade servers, or a multiprocessor system). In fig. 9, a processor 901 is taken as an example.

Memory 902 is a non-transitory computer readable storage medium provided by the present application. Wherein the memory stores instructions executable by the at least one processor to cause the at least one processor to perform the method of evaluating policies provided by the present application. The non-transitory computer readable storage medium of the present application stores computer instructions for causing a computer to execute the evaluation method of the policy provided by the present application.

The memory 902 is used as a non-transitory computer readable storage medium for storing non-transitory software programs, non-transitory computer executable programs, and modules, such as program instructions/modules (e.g., the processing module 801 and the transmitting module 802 shown in fig. 8) corresponding to the method for evaluating policies in the embodiments of the application. The processor 901 performs various functional applications of the server and data processing, i.e., implements the evaluation method of the policy in the above-described method embodiment, by running non-transitory software programs, instructions, and modules stored in the memory 902.

The memory 902 may include a storage program area and a storage data area, wherein the storage program area may store an operating system, at least one application program required for a function; the storage data area may store data created according to the use of the electronic device of the evaluation method of the policy, and the like. In addition, the memory 902 may include high-speed random access memory, and may also include non-transitory memory, such as at least one magnetic disk storage device, flash memory device, or other non-transitory solid-state storage device. In some embodiments, memory 902 optionally includes memory remotely located relative to processor 901, which may be connected to the electronic device of the policy evaluation method via a network. Examples of such networks include, but are not limited to, the internet, intranets, local area networks, mobile communication networks, and combinations thereof.

The electronic device of the evaluation method of the policy may further include: an input device 903 and an output device 904. The processor 901, memory 902, input devices 903, and output devices 904 may be connected by a bus or other means, for example in fig. 9.

The input device 903 may receive input numeric or character information and generate key signal inputs related to user settings and function control of the electronic device of the evaluation method of the policy, such as a touch screen, a keypad, a mouse, a track pad, a touch pad, a pointer stick, one or more mouse buttons, a track ball, a joystick, etc. input devices. The output means 904 may include a display device, auxiliary lighting means (e.g., LEDs), tactile feedback means (e.g., vibration motors), and the like. The display device may include, but is not limited to, a Liquid Crystal Display (LCD), a Light Emitting Diode (LED) display, and a plasma display. In some implementations, the display device may be a touch screen.

Various implementations of the systems and techniques described here can be realized in digital electronic circuitry, integrated circuitry, application specific ASIC (application specific integrated circuit), computer hardware, firmware, software, and/or combinations thereof. These various embodiments may include: implemented in one or more computer programs, the one or more computer programs may be executed and/or interpreted on a programmable system including at least one programmable processor, which may be a special purpose or general-purpose programmable processor, that may receive data and instructions from, and transmit data and instructions to, a storage system, at least one input device, and at least one output device.

These computing programs (also referred to as programs, software applications, or code) include machine instructions for a programmable processor, and may be implemented in a high-level procedural and/or object-oriented programming language, and/or in assembly/machine language. As used herein, the terms "machine-readable medium" and "computer-readable medium" refer to any computer program product, apparatus, and/or device (e.g., magnetic discs, optical disks, memory, programmable Logic Devices (PLDs)) used to provide machine instructions and/or data to a programmable processor, including a machine-readable medium that receives machine instructions as a machine-readable signal. The term "machine-readable signal" refers to any signal used to provide machine instructions and/or data to a programmable processor.

To provide for interaction with a user, the systems and techniques described here can be implemented on a computer having: a display device (e.g., a CRT (cathode ray tube) or LCD (liquid crystal display) monitor) for displaying information to a user; and a keyboard and pointing device (e.g., a mouse or trackball) by which a user can provide input to the computer. Other kinds of devices may also be used to provide for interaction with a user; for example, feedback provided to the user may be any form of sensory feedback (e.g., visual feedback, auditory feedback, or tactile feedback); and input from the user may be received in any form, including acoustic input, speech input, or tactile input.

The systems and techniques described here can be implemented in a computing system that includes a background component (e.g., as a data server), or that includes a middleware component (e.g., an application server), or that includes a front-end component (e.g., a user computer having a graphical user interface or a web browser through which a user can interact with an implementation of the systems and techniques described here), or any combination of such background, middleware, or front-end components. The components of the system can be interconnected by any form or medium of digital data communication (e.g., a communication network). Examples of communication networks include: local Area Networks (LANs), wide Area Networks (WANs), and the internet.

The computer system may include a client and a server. The client and server are typically remote from each other and typically interact through a communication network. The relationship of client and server arises by virtue of computer programs running on the respective computers and having a client-server relationship to each other.

According to the technical scheme of the embodiment of the application, the target offline resource recommendation list corresponding to the strategy to be evaluated is determined according to the first online resource recommendation list corresponding to the sampling user, wherein the first online resource recommendation list is a resource list obtained after the strategy to be evaluated is online, the user distribution of the sampling user is consistent with the user distribution of the online user, the user characteristics of each user in the sampling user are determined according to the target user knowledge base, the target user knowledge base is determined according to the first user feedback behaviors of all online users using the strategy to be evaluated before the strategy to be evaluated is used, then the profit evaluation index corresponding to the strategy to be evaluated is determined according to the user characteristics of each user and each resource in the target offline resource recommendation list, and the profit evaluation index is sent to the terminal equipment, wherein the profit evaluation index is used for evaluating the user feedback behaviors of the resources in the target offline resource recommendation list. On the one hand, in an offline state, the user characteristics and a target offline resource recommendation list corresponding to the strategy to be evaluated can be combined, the strategy to be evaluated is rapidly evaluated, a direct online real small-flow experiment is not needed, the online flow circulation period can be reduced, the offline investigation efficiency is continuously improved, and the research and development cost is reduced; and the new strategy is put on line after the strategy is fully evaluated, so that the user experience and the user loyalty can be improved, and the loss of the user is avoided. On the other hand, since the target offline resource recommendation list is obtained according to the real first online resource recommendation list of the sampled user recommendation after the online policy to be evaluated is on line, the difference between the real resource recommendation list of the online user and the resource recommendation list obtained in the offline environment can be reduced as much as possible, and the target user knowledge base is determined according to the real first user feedback behaviors of all online users using the policy to be evaluated before the online users using the policy to be evaluated, so that when the evaluation result is inaccurate, the cause of the inaccurate evaluation result can be accurately positioned. In still another aspect, after the target offline resource recommendation list and the target user knowledge base are obtained according to the above manner, the accuracy of the evaluation can be improved when the new online strategy is evaluated in an online manner.

It should be appreciated that various forms of the flows shown above may be used to reorder, add, or delete steps. For example, the steps described in the present application may be performed in parallel, sequentially, or in a different order, provided that the desired results of the disclosed embodiments are achieved, and are not limited herein.

The above embodiments do not limit the scope of the present application. It will be apparent to those skilled in the art that various modifications, combinations, sub-combinations and alternatives are possible, depending on design requirements and other factors. Any modifications, equivalent substitutions and improvements made within the spirit and principles of the present application should be included in the scope of the present application.

Claims

1. A method for evaluating a policy, comprising:

determining user characteristics of each user in the sampling users according to a target user knowledge base, wherein the target user knowledge base is determined according to first user feedback behaviors of all online users using the to-be-evaluated strategy before using the to-be-evaluated strategy;

transmitting the benefit evaluation index to a terminal device;

the determining a target offline resource recommendation list corresponding to the policy to be evaluated according to the first online resource recommendation list corresponding to the sampling user comprises:

determining a first guardrail index according to the first online resource recommendation list, wherein the first guardrail index is used for evaluating the sequencing and recall of resources in the first online resource recommendation list;

determining a second guardrail index according to the initial offline resource recommendation list, wherein the second guardrail index is used for evaluating the sorting and recall of resources in the initial offline resource recommendation list;

2. The method according to claim 1, wherein the method further comprises:

determining a second online resource recommendation list according to all online users using the strategy to be evaluated;

determining a first user feedback behavior of all online users using the strategy to be evaluated before using the strategy to be evaluated according to the second online resource recommendation list;

determining a second user feedback behavior corresponding to the sampling user according to the first online resource recommendation list and an initial user knowledge base;

and determining the target user knowledge base according to the first user feedback behavior and the second user feedback behavior.

3. The method of claim 2, wherein the determining the target user knowledge base from the first user feedback behavior and the second user feedback behavior comprises:

And if the difference value between the first user feedback behavior and the second user feedback behavior is not greater than a second preset threshold value, determining the initial user knowledge base as the user knowledge base.

4. The method of claim 1, wherein the determining, according to the user characteristics of the users and the resources in the target offline resource recommendation list, a revenue evaluation index corresponding to the policy to be evaluated includes:

5. The method of claim 1, wherein the determining, according to the user characteristics of the users and the resources in the target offline resource recommendation list, a revenue evaluation index corresponding to the policy to be evaluated includes:

6. The method of claim 1, wherein determining the first guardrail indicator based on the first online resource recommendation list comprises:

and determining a first guardrail index according to the statistical result.

7. The method of claim 1, wherein determining the second guardrail indicator based on the initial offline resource recommendation list comprises:

and determining a second guardrail index according to the statistical result.

8. A policy evaluation device, comprising:

the sending module is used for sending the income evaluation index to the terminal equipment;

the processing module is specifically configured to:

9. The apparatus of claim 8, wherein the device comprises a plurality of sensors,

the processing module is further configured to determine a second online resource recommendation list according to all online users using the policy to be evaluated;

the processing module is further configured to determine, according to the second online resource recommendation list, a first user feedback behavior of all online users using the policy to be evaluated before using the policy to be evaluated;

the processing module is further configured to determine a second user feedback behavior corresponding to the sampling user according to the first online resource recommendation list and an initial user knowledge base;

The processing module is further configured to determine the target user knowledge base according to the first user feedback behavior and the second user feedback behavior.

10. The apparatus according to claim 9, wherein the processing module is specifically configured to:

11. The apparatus according to claim 8, wherein the processing module is specifically configured to:

12. The apparatus according to claim 8, wherein the processing module is specifically configured to:

13. The apparatus according to claim 8, wherein the processing module is specifically configured to:

and determining a first guardrail index according to the statistical result.

14. The apparatus according to claim 8, wherein the processing module is specifically configured to:

And determining a second guardrail index according to the statistical result.

15. An electronic device, comprising:

at least one processor; and

a memory communicatively coupled to the at least one processor; wherein,,

the memory stores instructions executable by the at least one processor to enable the at least one processor to perform the method of any one of claims 1-7.

16. A non-transitory computer readable storage medium storing computer instructions for causing the computer to perform the method of any one of claims 1-7.