CN113190746B

CN113190746B - Recommendation model evaluation method and device and electronic equipment

Info

Publication number: CN113190746B
Application number: CN202110463193.4A
Authority: CN
Inventors: 李心明; 魏龙; 王召玺; 王峰
Original assignee: Beijing Baidu Netcom Science and Technology Co Ltd
Current assignee: Beijing Baidu Netcom Science and Technology Co Ltd
Priority date: 2021-04-27
Filing date: 2021-04-27
Publication date: 2024-01-12
Anticipated expiration: 2041-04-27
Also published as: CN113190746A

Abstract

The application discloses a recommendation model evaluation method and device and electronic equipment, and relates to the technical field of artificial intelligence such as intelligent search and deep learning. The implementation scheme is as follows: acquiring a plurality of user information, a plurality of recommended data labels and associated information of each user and each recommended data from a historical use log; inputting each user information and a plurality of recommendation data labels into a recommendation model to be evaluated so as to determine the prediction association degree between each user and each recommendation data; and determining the effect of the recommendation model to be evaluated according to the prediction association degree between each user and each recommendation data and the difference degree between the association information of each user and each recommendation data. Therefore, according to the user information, the various recommendation data labels and the association information of each user and each recommendation data, the effect of the recommendation model is determined, and the offline evaluation of the recommendation model can be realized, so that the evaluation waiting period is saved, and the recommendation effect of the recommendation model can be determined in advance.

Description

Recommendation model evaluation method and device and electronic equipment

Technical Field

The present disclosure relates to the field of computer technologies, and in particular, to the field of artificial intelligence technologies such as intelligent search and deep learning, and in particular, to a recommendation model evaluation method, apparatus, electronic device, storage medium, and computer program product.

Background

At present, with the continuous development of computer technology, the application frequency of a recommended model is increasingly increased, the whole iteration period of the model comprises two stages of offline modeling and online experiments, the offline modeling comprises model design and model investigation, and the online experiments comprise online small flow verification and full-scale application of the model. Whether a model is applied on-line or whether a strategy is adopted depends on the result of model evaluation, i.e. the evaluation of the recommended model plays an extremely important role in the application, and therefore, the technology for evaluating the recommended model is particularly important.

Disclosure of Invention

The application provides a recommendation model evaluation method and device and electronic equipment.

According to a first aspect of the present application, there is provided a recommendation model evaluation method, including:

acquiring a plurality of user information, a plurality of recommended data labels and associated information of each user and each recommended data from a historical use log;

inputting each piece of user information and the plurality of recommendation data labels into a recommendation model to be evaluated so as to determine the prediction association degree between each piece of user and each piece of recommendation data;

and determining the effect of the recommendation model to be evaluated according to the predicted association degree between each user and each recommendation data and the difference degree between the association information of each user and each recommendation data.

According to a second aspect of the present application, there is provided an evaluation device of a recommendation model, including:

the first acquisition module is used for acquiring a plurality of user information, a plurality of recommended data labels and the associated information of each user and each recommended data from the historical use log;

the first determining module is used for inputting each piece of user information and the plurality of recommendation data labels into a recommendation model to be evaluated so as to determine the prediction association degree between each piece of user and each piece of recommendation data;

and the second determining module is used for determining the effect of the recommendation model to be evaluated according to the prediction association degree between each user and each recommendation data and the difference degree between the association information of each user and each recommendation data.

According to a third aspect of the present application, there is provided an electronic device comprising: at least one processor; and a memory communicatively coupled to the at least one processor; wherein the memory stores instructions executable by the at least one processor to enable the at least one processor to perform the recommendation model evaluation method described in the embodiment of the above aspect.

According to a fourth aspect of the present application, there is provided a non-transitory computer-readable storage medium having stored thereon computer instructions for causing the computer to execute the recommendation model evaluation method according to the embodiment of the above aspect.

According to a fifth aspect of the present application, there is provided a computer program product, which when executed by a processor, implements the method for evaluating a recommendation model according to the embodiment of the above aspect.

It should be understood that the description of this section is not intended to identify key or critical features of the embodiments of the application or to delineate the scope of the application. Other features of the present application will become apparent from the description that follows.

Drawings

The drawings are for better understanding of the present solution and do not constitute a limitation of the present application. Wherein:

fig. 1 is a flow chart of a method for evaluating a recommendation model according to an embodiment of the present application;

FIG. 2 is a flowchart illustrating determining an effect of a recommendation model to be evaluated according to an embodiment of the present application;

FIG. 3 is a schematic flow chart of locating a network layer to be corrected in a recommendation model to be evaluated according to an embodiment of the present application;

Fig. 4 is a schematic flow chart of obtaining user information, a recommended data tag and associated information according to an embodiment of the present application;

FIG. 5 is a schematic diagram of a specific evaluation recommendation model according to an embodiment of the present application;

fig. 6 is a schematic structural diagram of an evaluation device of a recommendation model according to an embodiment of the present application;

fig. 7 is a block diagram of an electronic device for implementing a method for evaluating a recommendation model according to an embodiment of the present application.

Detailed Description

Exemplary embodiments of the present application are described below in conjunction with the accompanying drawings, which include various details of the embodiments of the present application to facilitate understanding, and should be considered as merely exemplary. Accordingly, one of ordinary skill in the art will recognize that various changes and modifications of the embodiments described herein can be made without departing from the scope and spirit of the present application. Also, descriptions of well-known functions and constructions are omitted in the following description for clarity and conciseness.

Smart searches are a new generation of search engines that incorporate artificial intelligence technology. Besides the functions of traditional quick search, relevance sorting and the like, the system can also provide functions of user role registration, automatic user interest identification, semantic understanding of content, intelligent informatization filtering, pushing and the like.

Deep learning is a new research direction in the field of machine learning, and it was introduced into machine learning to make it closer to the original goal-artificial intelligence. Deep learning is the inherent law and presentation hierarchy of learning sample data, and the information obtained in these learning processes greatly helps the interpretation of data such as text, images and sounds, with the ultimate goal of enabling a machine to analyze learning capabilities like a person, and to recognize text, images and sounds.

It should be noted that whether a recommended model is applied on-line or whether a strategy is adopted depends on the result of model evaluation.

In the related art, the recommendation model is generally evaluated on-line. In the case of online evaluation, there may be a problem in that the evaluation period is long and the recommended effect of the model cannot be predicted in advance by this evaluation method.

Therefore, the embodiment of the application provides a recommendation model evaluation method and device and electronic equipment. According to the embodiment of the application, the effect of the recommendation model is determined according to the user information, the various recommendation data labels and the association information of each user and each recommendation data, so that the offline evaluation of the recommendation model can be realized, the evaluation waiting period is saved, and the recommendation effect of the recommendation model can be determined in advance.

The following describes a recommendation model evaluation method, a recommendation model evaluation device and electronic equipment according to the embodiments of the present application with reference to the accompanying drawings.

Fig. 1 is a flow chart of a method for evaluating a recommendation model according to an embodiment of the present application.

It should be noted that, the main body of the evaluation method of the recommendation model in the embodiment of the present application may be an electronic device, specifically, the electronic device may be, but not limited to, a server, a terminal, and the terminal may be, but not limited to, a personal computer, a smart phone, an IPAD, and the like.

The embodiment of the application is exemplified by the fact that the evaluation method of the recommendation model is configured in the evaluation device of the recommendation model, and the device can be applied to electronic equipment so that the electronic equipment can execute the evaluation method of the recommendation model.

As shown in fig. 1, the evaluation method of the recommendation model includes the following steps:

s101, acquiring a plurality of user information, a plurality of recommended data labels and associated information of each user and each recommended data from a historical use log.

It should be noted that, in practical application, each time the user uses the electronic device, for example, the electronic device performs content searching and clicking, so that when the electronic device recommends relevant content to the user according to the user requirement, a log stream is generated, and each log stream corresponds to a time stamp. In this embodiment of the present application, a log that records a user history of recommended actions performed by using an electronic device may be referred to as a history use log, data recommended by the electronic device to the user may be referred to as recommended data, and a tag of the data may be referred to as a recommended data tag.

The association information between each user and each recommendation data can be used for representing the interest degree of each user in each recommendation data, and can be understood as the association degree between each user and each recommendation data, or can be said that the more interested the user is in the recommendation data, the higher the association degree between the user and the recommendation data is; the less interested the user is in the recommended data, the lower the degree of association between the user and the recommended data.

The historical usage log may include user information of a user using the electronic device, such as an account number of the user, a time of using the electronic device by the user, etc., may also include a recommendation data tag, such as a commodity tag, a manufacturer tag, etc., and may further include association information between each user and each recommendation data, such as that association information between the user a and "daily life of student" of the recommendation data is relatively associated, which indicates that the user a is interested in "daily life of student" of the recommendation data.

Specifically, when the recommendation model needs to be evaluated, multiple user information, multiple recommendation data labels and association information of each user and each recommendation data can be obtained from the historical use log, and then the multiple user information, the multiple recommendation data labels and the multiple association information can be obtained. Wherein, one user information may correspond to one user, and one recommended data tag may correspond to one recommended data.

It should be noted that, in the embodiment of the present application, the manner of determining the plurality of user information, the plurality of recommended data tags, and the plurality of associated information may be any feasible manner in the related art, so long as the plurality of user information, the plurality of recommended data tags, and the plurality of associated information may be determined, which is not limited in the embodiment of the present application.

S102, inputting each user information and various recommendation data labels into a recommendation model to be evaluated so as to determine the prediction association degree between each user and each recommendation data.

In the embodiment of the application, the recommendation model to be evaluated can be called a recommendation model to be evaluated. It may be appreciated that, when the user performs a search or click operation, the electronic device may make a recommendation of the relevant content to the user based on the recommendation model, that is, the recommendation model may output data, such as video, text, images, etc., required by the user according to the input content (the search content or click content of the user) thereof.

In the embodiment of the application, the association degree between each user and each recommendation data output by the recommendation model to be evaluated can be called as a prediction association degree, and the association degree can be used for representing the interest degree of the user in the recommendation data. It can be appreciated that the higher the predicted relevance, the more interesting the user is to the recommendation data; the lower the predicted relevance, the less interesting the user is to the recommendation data.

Specifically, after a plurality of user information and a plurality of recommendation data labels are obtained, each user information and a plurality of recommendation data labels can be input into a recommendation model to be evaluated, and then the recommendation model to be evaluated can output a user corresponding to the user information and a plurality of recommendation data corresponding to the plurality of recommendation data labels, so that the degree of correlation is predicted.

S103, determining the effect of the recommendation model to be evaluated according to the prediction association degree between each user and each recommendation data and the difference degree between the association information of each user and each recommendation data.

It should be noted that after the association information between each user and each recommended data is obtained, a plurality of association information between each user and a plurality of recommended data may be obtained, where the plurality of association information may be different, that is, a difference may exist between the plurality of association information, and the difference may represent the degree of interest of each user in the plurality of recommended data respectively.

Specifically, after determining the predicted association degree between each user and each recommended data and the association information between each user and each recommended data, the degree of difference between the association information between each user and each recommended data may be further determined, wherein the predicted association degree is obtained based on the recommended model to be evaluated, and the association information is obtained based on the actual or actual history log, so that the degree of difference between the association information is also true, and therefore, the effect of the recommended model to be evaluated may be determined by comparing the predicted association degree with the degree of difference.

Specifically, if the predicted association degree between each user and each recommendation data is matched or matched with the actual difference degree, the better effect of the recommendation model to be evaluated can be determined; if the predicted association degree between each user and each recommendation data is not matched with the actual difference degree or the matching degree is low, the effect of the recommendation model to be evaluated can be determined to be poor.

That is, if two pieces of user information are obtained from the history log, and two pieces of recommended data labels are respectively Y1 and Y2, where Y1 corresponds to user 1, Y2 corresponds to user 2, B1 corresponds to recommended data S1 and B2 corresponds to recommended data S2, then Y1 and labels B1 and B2 may be input into the recommendation model to be evaluated, and the model may output the prediction association degree G11 between user 1 and recommended data S1 and the prediction association degree G12 between user 1 and recommended data S2; y2 and tags B1, B2 may be input into the recommendation model to be evaluated, and the model may output a predicted association G21 between the user 2 and the recommendation data S1, and a predicted association G22 between the user 2 and the recommendation data S2. The association information W11 of the user 1 and the recommendation data S1, the association information W12 of the user 1 and the recommendation data S2, the association information W21 of the user 2 and the recommendation data S1, and the association information W22 of the user 2 and the recommendation data S2 are also required to be obtained. Then, the degree of difference C1 between the related information W11 and W12 and the degree of difference C2 between the related information W11 and W12 can be determined. Finally, the effect of the recommended model to be evaluated can be determined according to the predicted relevance G11, G12 and the difference degree C1, the effect of the recommended model to be evaluated can be determined according to the predicted relevance G21, G22 and the difference degree C2, two effect results are obtained, and the effect of the model to be evaluated can be further comprehensively determined according to the two effect results.

For example, if the predicted association degree between the user a and the two types of recommended data is 20% and 80%, respectively, and the degree of difference between the user a and the associated information of the two types of recommended data is larger, it can be said that the degree of difference between the user a and the associated information of the two types of recommended data is 60% (60% of the two types of associated information have a difference), it is indicated that the predicted association degree between the user a and each type of recommended data matches or is relatively matched with the actual degree of difference, and then it can be determined that the effect of the recommendation model to be evaluated is better.

According to the recommendation model evaluation method, the effect of the recommendation model is determined according to the user information, the multiple recommendation data labels and the association information of each user and each recommendation data, so that the recommendation model can be evaluated offline, the evaluation waiting period is saved, and the recommendation effect of the recommendation model can be determined in advance.

It should be noted that, in the embodiment of the present application, the associated information of each user and each recommended data may be any quantitative information that may be used to characterize the interest degree of the user in the data, for example, a duration of browsing the recommended data by the user, a number of times the user searches for keywords in the recommended data, a probability of clicking the recommended data by the user (i.e. a clicking rate of the recommended data by the user), and so on.

That is, in one embodiment of the present application, the association information of each user with each recommendation data may include a probability that each user clicks on each recommendation data.

It should be noted that, the greater the probability that the user clicks on the recommended data, the more interested the user is in the recommended data; the smaller the probability that a user clicks on the recommendation data, the less interesting the user may be to the recommendation data.

Specifically, the probability of each user clicking on each recommended data may be obtained in a historical usage log.

In this embodiment, as shown in fig. 2, the determination of the effect of the recommended model to be evaluated in step S103 described above may include the following steps S201 to S203.

S201, obtaining data labels to be recommended from a plurality of recommendation data labels according to the prediction association degree between each user and each recommendation data.

In this embodiment of the present application, the recommended data obtained from the multiple recommended data may be referred to as data to be recommended, and the label corresponding to the data is the label of the data to be recommended.

Specifically, after determining the predicted association degree between each user and each recommended data, the data tag to be recommended may be obtained from a plurality of recommended data tags according to the predicted association degree between each user and each recommended data. It is understood that one type of recommended data corresponds to one type of recommended data tag, and one type of recommended data tag may include a plurality of recommended data tags.

The basis for obtaining the tags to be recommended may be the order of predicting the association degree, for example, the tag corresponding to the recommended data with higher predicted association degree may be obtained, one or more tags to be recommended may be obtained, for example, the first 100 data tags with highest association degree may be selected from the large to the small according to the predicted association degree.

S202, determining the proportion of each recommended data label in the data labels to be recommended according to the labels of each data to be recommended.

Specifically, after the data tags to be recommended are obtained, the proportion of each data tag to be recommended in the data tags to be recommended can be determined according to the tags of each data to be recommended.

For example, if three data tags to be recommended are B1, B2 and B3, and two data tags to be recommended are B1, B2, wherein B1 includes B1, B2, and B2 includes B3, it can be determined that the proportion of the data tag to be recommended B1 in the data tags to be recommended B1, B2 and B3 is two thirds and the proportion of the data tag to be recommended B2 in the data tags to be recommended B1, B2 and B3 is one third.

S203, determining the effect of the recommendation model to be evaluated according to the proportion of each recommendation data label in the data label to be recommended and the probability of each user clicking each recommendation data.

Specifically, after determining the proportion of each recommended data tag in the data tags to be recommended and the probability of each user clicking each recommended data, determining the effect of the recommended model to be evaluated according to the proportion and the probability.

Specifically, the proportion of each recommended data label in the data label to be recommended can be judged, whether the proportion is matched with the probability of clicking the recommended data by a user or the matching degree is higher, and under the condition that the matching or the matching degree is higher, the recommendation effect of the recommended model to be evaluated can be described as better, namely, the content which is more interested by the user or meets the requirement of the user can be recommended according to the model input; otherwise, under the condition that the matching degree of the proportion and the probability is low or the proportion and the probability are not matched completely, the recommendation effect of the recommendation model to be evaluated can be indicated to be poor.

That is, the data of the type with high click probability has large proportion in the data to be recommended, and the model effect can be described as good; on the contrary, the data with large click probability (small) has small occupation ratio (large) in the data to be recommended, and the poor effect of the model can be explained.

For example, if the ratio of the recommended data tag B1 (corresponding to the recommended data S1) to the data tags B1, B2, and B3 to be recommended is two-thirds, and the ratio of the B2 (corresponding to the recommended data S2) to the data tags B1, B2, and B3 to be recommended is one-third, the probability that the user clicks the recommended data S1 is high, the probability that the user clicks the recommended data S2 is low, that is, the occupation of the data S1 with high clicking probability in the data to be recommended is high, so that it can be determined that the recommendation effect of the model to be evaluated is good.

Therefore, the recommendation effect of the model is determined according to the probability of clicking the recommendation data by the user and the prediction association degree between the user and the recommendation data, so that the reliability of determination is improved, and the obtained model effect is more reliable.

In the related art, when online small flow verification is performed in online experiments, if the small flow experimental effect of a model does not meet the expectations, the policy design problem of the current recommendation model to be evaluated cannot be directly deduced only by using a large disk core index and the like, and the positioning analysis intuitiveness is not enough. In order to determine whether a problem exists in the recommendation model to be evaluated, and determine the position of the problem in the recommendation model to be evaluated when the problem exists in the recommendation model to be evaluated, the following implementation is provided in the embodiment of the present application.

That is, in one embodiment of the present application, as shown in fig. 3, the evaluation method of the recommendation model may further include the following steps S301 to S304.

S301, inputting each user information and various recommendation data labels into a recommendation model to be evaluated to determine a first output value corresponding to a specified network layer in the recommendation model to be evaluated and a prediction association degree between each user and each recommendation data output by the recommendation model to be evaluated.

The network layer is specified in the recommendation model to be evaluated, which can be understood as any forward network layer in the recommendation model to be evaluated, and can also be understood as a specified network node (an output node of a network, and the network is composed of a plurality of nodes). The number of the designated network layers can be one or a plurality, and can be specifically determined according to actual situations or actual requirements.

The first output value may be understood as an output value output by a specified network layer in the model after the input value is input to the recommended model to be evaluated, and may also be referred to as a forward output value (i.e., an activation value of a network neuron).

Specifically, after determining the plurality of user information and the plurality of recommendation data tags, in order to determine whether a problem occurs in the recommendation model to be evaluated, each user information and the plurality of recommendation data tags may be input into the recommendation model to be evaluated. And then, determining and acquiring a designated network layer in the recommendation model to be evaluated, outputting a first output value based on each user information and various recommendation data labels, and acquiring the prediction association degree (namely the final output value of the recommendation model to be evaluated) between each user and each recommendation data based on each user information and various recommendation data labels.

It should be noted that, the manner of determining the first output value may be any feasible manner in the related art, as long as reliable acquisition of the first output value corresponding to the specified network layer in the recommendation model to be evaluated can be achieved, and the embodiment of the present application is not limited in any way.

S302, inputting each user information and various recommendation data labels into a reference recommendation model to determine a second output value corresponding to a designated network layer in the reference recommendation model and a reference association degree between each user and each recommendation data output by the reference recommendation model.

According to the embodiment of the application, the reference recommendation model can be determined in advance and used as a reference model of the recommendation model to be evaluated. The reference recommendation model can be a recommendation model with the same or similar functions as the recommendation model to be evaluated, and the recommendation effect is good.

In the embodiment of the application, the association degree between each user and each recommended data output by the reference recommendation model can be called reference association degree, and the association degree can be used for representing the interest degree of the user in the recommended data. It can be appreciated that the higher the reference relevance, the more interesting the user is to the recommendation data; the lower the reference association, the less interesting the user is to the recommendation data.

The reference to the network layer specified in the recommendation model may be understood as any forward network layer in the reference recommendation model, and may also be understood as a specified network node (an output node of the network, where the network is composed of a plurality of nodes). The number of the specified network layers in the reference recommendation model can be one or more, can be the same as the number of the specified network layers in the recommendation model to be evaluated, and can be in one-to-one correspondence with the specified network layers in the recommendation model to be evaluated.

The second output value may be understood as an output value output by a specified network layer in the model after the input value is input to the reference recommendation model, and may also be referred to as a forward output value (i.e., an activation value of a network neuron).

Specifically, after determining the plurality of user information, the plurality of recommendation data tags, and the reference recommendation model, each of the user information and the plurality of recommendation data tags may be input into the reference recommendation model. And then, determining and acquiring a designated network layer in the reference recommendation model, outputting a first output value based on each user information and various recommendation data labels, and acquiring a reference association degree (namely a value finally output by the reference recommendation model) between each user and each recommendation data, which is finally output by the recommendation model to be evaluated, based on each user information and various recommendation data labels.

It should be noted that, the manner of determining the second output value may be any feasible manner in the related art, as long as reliable obtaining of the second output value corresponding to the specified network layer in the reference recommendation model can be achieved, and the embodiment of the present application is not limited in any way.

S303, determining the difference between the first output value and the second output value corresponding to the appointed network layer in response to the fact that the reference association degree and the prediction association degree between any user and any recommended data label are not matched.

Specifically, after determining a plurality of first output values, a plurality of second output values, a plurality of prediction association degrees and a plurality of reference association degrees, it may be determined whether the reference association degrees between each user and each recommended data tag and the prediction association degrees between the user and the recommended data tag are matched, if a mismatch occurs, that is, if the reference association degrees between any user and any recommended data tag are not matched with the prediction association degrees, a problem exists in the recommended model to be evaluated, and at this time, a difference between the first output value corresponding to the specified network layer in the model to be evaluated and the second output value corresponding to the specified network layer in the reference recommended model may be determined.

Specifically, the difference between the first output value and the second output value may be zero or may be different, and when the difference exists, the degree of difference may be higher or lower.

S304, determining the network layer to be corrected in the model to be evaluated according to the difference and the position of the designated network layer in the model.

Specifically, after determining the difference between the first output value and the second output value corresponding to the specified network layer, the corresponding specified network layer when the difference exists can be determined, and then the position of the specified network layer in the recommended model to be evaluated is determined, and the network layer to be corrected in the recommended model to be evaluated can be determined according to the position, so that the network layer to be corrected in the model is positioned.

For example, if a first output value corresponding to a specific network layer M in the recommendation model M1 to be evaluated is x1, a second output value corresponding to a specific network layer M in the reference recommendation model M2 is x2, and a predicted association degree between a user a output by the model M1 and two recommendation data S1 (corresponding to the label B1) and S2 (corresponding to the label B2) is 10% and 90%, respectively, and a predicted association degree between a user a output by the model M2 and two recommendation data S1 and S2 is 50% and 90%, respectively, then it may be determined that the predicted association degree between the user a and the recommendation data label B1 is 10% and the reference association degree is 50%, and then it may be determined whether a difference exists between the first output value x1 corresponding to the specific network layer M and the second output value x 2. If the difference exists, the position of the M layers in the model M2 is determined, and then the network layer needing correction in the model M1 can be determined according to the position.

It will be appreciated that after locating a network layer that is problematic in the recommended model to be evaluated, the network layer may be modified according to any feasible manner until the network layer is problem-free.

That is, in the embodiment of the present disclosure, as long as the result finally output by the model to be evaluated is inconsistent with the result finally output by the reference model, it is explained that the model to be evaluated has a problem, so that the result of the model middle layer (i.e., the designated network layer) can be obtained, and then the model problem is located according to the intermediate result.

Therefore, when the results output by the model to be evaluated and the reference model are inconsistent, the problem of the model is positioned according to the first output value and the second output value, and the problem in the model can be reliably determined.

Further, the number of the specified network layers may be one or more, specifically may be determined according to the number of all network layers in the model, may be determined according to the position of each network layer in the model, may be determined according to the parameters corresponding to each network layer, or, in an embodiment of the present application, may be determined according to the number of network layers included in the recommendation model to be evaluated and the number of network parameters corresponding to each network layer.

For example, when the number of network layers in the recommended model to be evaluated is large, only the network layer with the large corresponding network parameter number may be selected as the designated network layer, or when the number of network layers in the model is small, the network layer may be extracted as the designated network layer at a certain interval.

Therefore, the appointed network layer is determined according to the number of network layers contained in the model and the number of network parameters corresponding to each network layer, the reliability of the appointed network layer can be ensured, and the reliability of model evaluation is further improved.

In the step S101, when acquiring the plurality of user information, the plurality of recommended data tags, and the associated information of each user and each recommended data, in order to ensure the reliability of the acquisition, the acquisition may be performed according to the usage scenario of the model and the index to be evaluated.

That is, in one embodiment of the present application, as shown in fig. 4, the step S101 may include the following steps S401 and S402.

S401, receiving a recommendation model evaluation request, wherein the evaluation request comprises a usage scene or an index to be evaluated of a recommendation model to be evaluated.

It should be understood that when the recommendation model needs to be evaluated, the user may send a recommendation model evaluation request to the electronic device, and the electronic device receives the recommendation model evaluation request, where the evaluation request includes a usage scenario or an index to be evaluated of the recommendation model to be evaluated.

The usage scenario of the recommendation model to be evaluated may be, for example, a novice scenario (i.e., applied to a new user), an application recommendation, or a content recommendation scenario.

The index to be evaluated may be, for example, a ratio of generalized resources, a ratio of specific resources, or the like in the resources recommended by the evaluation model. I.e. for different purposes, the test data to be used is also different.

S402, acquiring a plurality of user information, a plurality of recommended data labels and associated information of each user and each recommended data from a historical use log according to a use scene or an index to be evaluated of the model to be evaluated.

Specifically, after the usage scenario or the index to be evaluated of the model to be evaluated is acquired, a plurality of user information, a plurality of recommended data labels and associated information of each user and each recommended data can be acquired from the historical usage log in a targeted manner according to the usage scenario or the index to be evaluated.

For example, when the usage scenario is a novice recommendation, more representative user information and various recommendation data labels can be obtained from the historical usage log, and the associated information of each user and each recommendation data in the log can be obtained.

Therefore, according to the use scene or the index to be evaluated of the model to be evaluated, historical user information, various recommendation data labels and association information between the user and the recommendation data are obtained, reliability of the user information, the recommendation data labels and the association information can be guaranteed, and reliability of model recommendation is further improved.

In one embodiment of the present application, the step S402 may include: determining the type of the user information to be acquired according to the use scene of the model to be evaluated or the index to be evaluated; acquiring a plurality of user information matched with the type of the user information to be acquired from a historical use log; and determining various recommended data labels and associated information of each user and each recommended data according to the historical use logs of the plurality of users.

The user information type may include, for example, age, occupation, sex, etc. of the user.

Specifically, after a usage scenario or an index to be evaluated of the model to be evaluated is acquired, the type of user information to be acquired can be determined, then a plurality of user information matched with the user information is acquired in a history usage log, and then the history usage log of a plurality of users can be acquired according to the plurality of user information, so that a plurality of recommended data labels and associated information of each user and each recommended data can be acquired in the history usage log.

It should be noted that the types of user information may be plural, and for example, plural pieces of differentiated user information may be acquired, such as different ages, different sexes, different professions, or different periods of time in which the system is used.

It should be noted that, the embodiments of the present application may also determine the initial sampling mode according to the indicator light information according to other manners in the related art, and the above embodiments of the present application are merely exemplary illustrations.

Therefore, the user information type is determined according to the use scene or the index to be evaluated of the model to be evaluated, so that various recommended data labels and the associated information between the user and the recommended data are obtained according to the user information type, the reliability of the user information, the recommended data labels and the associated information can be improved, and the reliability of model recommendation is further improved.

That is, according to the method for evaluating the recommendation model in the embodiment of the present application, by using the existing real sample feature set on the line, extracting the feature packet through the offline experiment and the offline experiment pre-estimation model, combining the information of the high-quality recommendation resources, setting the evaluated policy label, such as generalization, giving whether the resources recommended by the current recommendation model to be evaluated are improved compared with the reference recommendation model on the designated policy index (such as generalization or specificity), and besides the conclusion that whether the policy index is improved, evaluating whether the diversity, the recommendation effect capability and the like of the recommended resources of the current recommendation model to be evaluated are improved through the information comparison (such as the article title, the article detail, the article clicked and other information) of the high-quality resources recommended by the two models, and indirectly judging whether the model has design problems from the conclusion, such as whether the recommended high-quality resources are matched with the interest points of the current user.

It should be noted that, in the technical solution of the embodiment of the present application, the acquisition, storage, application, etc. of the related user information all conform to the rules of the related laws and regulations, and do not violate the popular regulations of the public order.

In order to more clearly describe the evaluation method of the recommendation model in the embodiment of the present application, the following description is given by way of an example:

as shown in fig. 5, first, user information (uid_info) of a specified user may be acquired, including: the method comprises the steps of counting an account number, landing time (when a user actually initiates refreshing recommended content, a log stream exists, each log is provided with a corresponding time stamp, the time stamp is the landing time) and using the log, inquiring that the current user data is on a specific machine on which the user lands through a platform interface, acquiring the part of the log of the user landing from the appointed machine, and completely filtering out characteristic sample samples rich in a user side and a resource side (namely recommended pictures, videos and the like) related to the user.

In the deployment stage of the evaluation environment (Feature-extrator environment), an offline Feature extraction model extrator (extractor) can be deployed for extracting a sample set required by model training input, and an offline model estimation environment is deployed for preparing an output value q of a model during offline model estimation.

After the environment is prepared, a sample set can be extracted firstly, then, according to the extracted sample set, a large model on a line is cut, only the embeding information of the large model corresponding to the sample to be used is reserved, so that model estimation promotion is realized, model estimation is carried out next, the forward output value (namely a first output value), q value and details of estimation of other specified models of a specified network node can be determined, in addition, a forward gcms service in a feed architecture is required according to recommended data tag nid (such as articles, connection and the like), the forward information corresponding to the current tag nid is taken, such as article titles, links, picture contents and the like, detail analysis is reserved, then, the q value estimated off-line is required to be cut off in an inverted mode, the first 100 nid tags in the q value are selected, whether the current resource hits the strategy or not is marked according to the existing strategy index, the occupation ratio of the first 100 nid of the strategy indexes is counted, for example, each nid has a mark whether the universal resource is the mark, the statistics is the corresponding index value of the recommended model in the 100 nid, and the index value is estimated according to the recommended value of the estimated model, and the ratio is estimated, and the evaluation can be carried out.

In summary, in the embodiment of the present application, the effect of the recommendation model can be evaluated offline, so that an evaluation waiting period of 3 days of online experiments can be saved; and the model is subjected to offline evaluation, so that repeated iteration of an invalid model is reduced, the model iteration efficiency is improved, and the model can be supported for hours.

The embodiment of the application also provides a recommendation model evaluation device, and fig. 6 is a schematic structural diagram of the recommendation model evaluation device provided in the embodiment of the application.

As shown in fig. 6, the evaluation device 600 of the recommendation model includes: the first acquisition module 610, the first determination module 620, and the second determination module 630.

The first obtaining module 610 is configured to obtain, from the historical usage log, a plurality of user information, a plurality of recommendation data labels, and association information of each user and each recommendation data;

a first determining module 620, configured to input each of the user information and the plurality of recommendation data labels into a recommendation model to be evaluated, so as to determine a predicted association degree between each of the users and each of the recommendation data;

And a second determining module 630, configured to determine an effect of the recommendation model to be evaluated according to a predicted association degree between each user and each recommendation data and a degree of difference between association information of each user and each recommendation data.

In one embodiment of the present application, the association information of each user with each recommended data includes a probability that each user clicks each recommended data, and the second determining module 630 may include:

the first acquisition unit is used for acquiring data tags to be recommended from the plurality of recommendation data tags according to the prediction association degree between each user and each recommendation data;

the first determining unit is used for determining the proportion of each recommended data label in the data labels to be recommended according to the labels of each data to be recommended;

and the second determining unit is used for determining the effect of the recommendation model to be evaluated according to the proportion of each recommendation data label in the data label to be recommended and the probability of each user clicking each recommendation data.

In one embodiment of the present application, the evaluation device 600 of the recommendation model may further include:

the third determining module is used for inputting each piece of user information and the plurality of types of recommended data labels into the recommended model to be evaluated so as to determine a first output value corresponding to a specified network layer in the recommended model to be evaluated and a prediction association degree between each piece of user and each type of recommended data output by the recommended model to be evaluated;

The fourth determining module is used for inputting each piece of user information and the plurality of types of recommended data labels into a reference recommended model so as to determine a second output value corresponding to a specified network layer in the reference recommended model and a reference association degree between each piece of recommended data and each piece of user output by the reference recommended model;

a fifth determining module, configured to determine, in response to the reference association degree and the predicted association degree between any user and any recommended data tag not matching, a difference between the first output value and the second output value corresponding to the specified network layer;

and a sixth determining module, configured to determine a network layer to be corrected in the model to be evaluated according to the difference and the position of the specified network layer in the model.

In one embodiment of the present application, the evaluation device 600 of the recommendation model may further include: and a seventh determining module, configured to determine the specified network layer according to the number of network layers included in the model to be recommended and the number of network parameters corresponding to each network layer.

In one embodiment of the present application, the first obtaining module 610 may include:

the first receiving unit is used for receiving a recommended model evaluation request, wherein the evaluation request comprises a use scene or an index to be evaluated of the model to be evaluated;

The second acquisition unit is used for acquiring a plurality of user information, a plurality of recommended data labels and associated information of each user and each recommended data from the historical use log according to the use scene or the index to be evaluated of the model to be evaluated.

In one embodiment of the present application, the second obtaining unit may specifically be configured to: determining the type of the user information to be acquired according to the use scene or the index to be evaluated of the model to be evaluated; acquiring a plurality of user information matched with the type of the user information to be acquired from the historical use log; determining multiple recommendation data labels and associated information of each user and each recommendation data according to the historical use logs of the multiple users

It should be noted that, for avoiding redundancy, other specific embodiments of the device for evaluating a recommendation model in the embodiments of the present application may refer to the specific embodiments of the method for evaluating a recommendation model described above, which are not described herein again.

According to the recommendation model evaluation device, the effect of the recommendation model is determined according to the user information, the multiple recommendation data labels and the association information of each user and each recommendation data, so that the recommendation model can be evaluated offline, the evaluation waiting period is saved, and the recommendation effect of the recommendation model can be determined in advance.

According to embodiments of the present application, there is also provided an electronic device, a readable storage medium, and a computer program product of a method of evaluating a recommendation model. The following is a description with reference to fig. 7.

As shown in fig. 7, a block diagram of an electronic device according to an evaluation method of a recommendation model according to an embodiment of the present application is shown. Electronic devices are intended to represent various forms of digital computers, such as laptops, desktops, workstations, personal digital assistants, servers, blade servers, mainframes, and other appropriate computers. The electronic device may also represent various forms of mobile devices, such as personal digital processing, cellular telephones, smartphones, wearable devices, and other similar computing devices. The components shown herein, their connections and relationships, and their functions, are meant to be exemplary only, and are not meant to limit implementations of the application described and/or claimed herein.

As shown in fig. 7, the apparatus 700 includes a computing unit 701 that can perform various appropriate actions and processes according to a computer program stored in a Read Only Memory (ROM) 702 or a computer program loaded from a storage unit 708 into a Random Access Memory (RAM) 703. In the RAM703, various programs and data required for the operation of the device 700 may also be stored. The computing unit 701, the ROM702, and the RAM703 are connected to each other through a bus 704. An input/output (I/O) interface 705 is also connected to bus 704.

Various components in device 700 are connected to I/O interface 705, including: an input unit 706 such as a keyboard, a mouse, etc.; an output unit 707 such as various types of displays, speakers, and the like; a storage unit 708 such as a magnetic disk, an optical disk, or the like; and a communication unit 709 such as a network card, modem, wireless communication transceiver, etc. The communication unit 709 allows the device 700 to exchange information/data with other devices via a computer network, such as the internet, and/or various telecommunication networks.

The computing unit 701 may be a variety of general and/or special purpose processing components having processing and computing capabilities. Some examples of computing unit 701 include, but are not limited to, a Central Processing Unit (CPU), a Graphics Processing Unit (GPU), various specialized Artificial Intelligence (AI) computing chips, various computing units running machine learning model algorithms, a Digital Signal Processor (DSP), and any suitable processor, controller, microcontroller, etc. The calculation unit 701 performs the respective methods and processes described above, for example, an evaluation method of a recommendation model. For example, in some embodiments, the method of evaluating the recommendation model may be implemented as a computer software program tangibly embodied on a machine-readable medium, such as storage unit 708. In some embodiments, part or all of the computer program may be loaded and/or installed onto device 700 via ROM702 and/or communication unit 709. When the computer program is loaded into the RAM703 and executed by the computing unit 701, one or more steps of the above-described evaluation method of the recommendation model may be performed. Alternatively, in other embodiments, the computing unit 701 may be configured to perform the evaluation method of the recommendation model by any other suitable means (e.g. by means of firmware).

Various implementations of the systems and techniques described here above may be implemented in digital electronic circuitry, integrated circuit systems, field Programmable Gate Arrays (FPGAs), application Specific Integrated Circuits (ASICs), application Specific Standard Products (ASSPs), systems On Chip (SOCs), load programmable logic devices (CPLDs), computer hardware, firmware, software, and/or combinations thereof. These various embodiments may include: implemented in one or more computer programs, the one or more computer programs may be executed and/or interpreted on a programmable system including at least one programmable processor, which may be a special purpose or general-purpose programmable processor, that may receive data and instructions from, and transmit data and instructions to, a storage system, at least one input device, and at least one output device.

Program code for carrying out methods of the present application may be written in any combination of one or more programming languages. These program code may be provided to a processor or controller of a general purpose computer, special purpose computer, or other programmable data processing apparatus such that the program code, when executed by the processor or controller, causes the functions/operations specified in the flowchart and/or block diagram to be implemented. The program code may execute entirely on the machine, partly on the machine, as a stand-alone software package, partly on the machine and partly on a remote machine or entirely on the remote machine or server.

In the context of this application, a machine-readable medium may be a tangible medium that can contain, or store a program for use by or in connection with an instruction execution system, apparatus, or device. The machine-readable medium may be a machine-readable signal medium or a machine-readable storage medium. The machine-readable medium may include, but is not limited to, an electronic, magnetic, optical, electromagnetic, infrared, or semiconductor system, apparatus, or device, or any suitable combination of the foregoing. More specific examples of a machine-readable storage medium would include an electrical connection based on one or more wires, a portable computer diskette, a hard disk, a Random Access Memory (RAM), a read-only memory (ROM), an erasable programmable read-only memory (EPROM or flash memory), an optical fiber, a portable compact disc read-only memory (CD-ROM), an optical storage device, a magnetic storage device, or any suitable combination of the foregoing.

To provide for interaction with a user, the systems and techniques described here can be implemented on a computer having: a display device (e.g., a CRT (cathode ray tube) or LCD (liquid crystal display) monitor) for displaying information to a user; and a keyboard and pointing device (e.g., a mouse or trackball) by which a user can provide input to the computer. Other kinds of devices may also be used to provide for interaction with a user; for example, feedback provided to the user may be any form of sensory feedback (e.g., visual feedback, auditory feedback, or tactile feedback); and input from the user may be received in any form, including acoustic input, speech input, or tactile input.

The systems and techniques described here can be implemented in a computing system that includes a background component (e.g., as a data server), or that includes a middleware component (e.g., an application server), or that includes a front-end component (e.g., a user computer having a graphical user interface or a web browser through which a user can interact with an implementation of the systems and techniques described here), or any combination of such background, middleware, or front-end components. The components of the system can be interconnected by any form or medium of digital data communication (e.g., a communication network). Examples of communication networks include: local Area Networks (LANs), wide Area Networks (WANs), the internet, and blockchain networks.

The computer system may include a client and a server. The client and server are typically remote from each other and typically interact through a communication network. The relationship of client and server arises by virtue of computer programs running on the respective computers and having a client-server relationship to each other. The server can be a cloud server, also called a cloud computing server or a cloud host, and is a host product in a cloud computing service system, so that the defects of high management difficulty and weak service expansibility in the traditional physical host and VPS (Virtual Private Server ) service are overcome. The server may also be a server of a distributed system or a server that incorporates a blockchain.

It should be appreciated that various forms of the flows shown above may be used to reorder, add, or delete steps. For example, the steps recited in the present disclosure may be performed in parallel or sequentially or in a different order, provided that the desired results of the technical solutions of the present disclosure are achieved, and are not limited herein.

The above detailed description should not be taken as limiting the scope of the present disclosure. It will be apparent to those skilled in the art that various modifications, combinations, sub-combinations and alternatives are possible, depending on design requirements and other factors. Any modifications, equivalent substitutions and improvements made within the spirit and principles of the present disclosure are intended to be included within the scope of the present disclosure.

Claims

1. A method of evaluating a recommendation model, comprising:

determining the effect of the recommendation model to be evaluated according to the prediction association degree between each user and each recommendation data and the difference degree between the association information of each user and each recommendation data;

Inputting each piece of user information and the plurality of types of recommended data labels into the recommended model to be evaluated to determine a first output value corresponding to a specified network layer in the recommended model to be evaluated and a prediction association degree between each piece of user and each type of recommended data output by the recommended model to be evaluated;

inputting each piece of user information and the plurality of recommendation data labels into a reference recommendation model to determine a second output value corresponding to a specified network layer in the reference recommendation model and a reference association degree between each piece of user and each piece of recommendation data output by the reference recommendation model;

determining a difference between a first output value and a second output value corresponding to the designated network layer in response to the fact that the reference association degree and the prediction association degree between any user and any recommended data tag are not matched;

and determining the network layer to be corrected in the recommended model to be evaluated according to the difference and the position of the designated network layer in the model.

2. The method of claim 1, wherein the association information of each user with each recommended data includes a probability that each user clicks on each recommended data;

The determining the effect of the recommendation model to be evaluated according to the prediction association degree between each user and each recommendation data and the difference degree between the association information of each user and each recommendation data comprises the following steps:

acquiring data tags to be recommended from the plurality of recommendation data tags according to the prediction association degree between each user and each recommendation data;

determining the proportion of each recommended data label in the data label to be recommended according to the label of each data to be recommended;

and determining the effect of the recommendation model to be evaluated according to the proportion of each recommendation data label in the data label to be recommended and the probability of each user clicking each recommendation data.

3. The method of claim 1, further comprising:

and determining the appointed network layer according to the number of network layers contained in the recommendation model to be evaluated and the number of network parameters corresponding to each network layer.

4. The method as claimed in any one of claims 1 to 3, wherein the acquiring, from the historical usage log, a plurality of user information, a plurality of recommended data tags, and associated information of each user with each recommended data, includes:

Receiving a recommendation model evaluation request, wherein the evaluation request comprises a use scene or an index to be evaluated of the recommendation model to be evaluated;

and acquiring a plurality of user information, a plurality of recommendation data labels and the associated information of each user and each recommendation data from the historical use log according to the use scene or the index to be evaluated of the recommendation model to be evaluated.

5. The method of claim 4, wherein the obtaining, from a history usage log, a plurality of user information, a plurality of recommendation data tags, and association information of each user with each recommendation data according to a usage scenario or an index to be evaluated of the recommendation model to be evaluated, comprises:

determining the type of the user information to be acquired according to the use scene or the index to be evaluated of the recommendation model to be evaluated;

acquiring a plurality of user information matched with the type of the user information to be acquired from the historical use log;

and determining a plurality of recommended data labels and associated information of each user and each recommended data according to the historical use logs of the plurality of users.

6. An evaluation device of a recommendation model, comprising:

the second determining module is used for determining the effect of the recommendation model to be evaluated according to the prediction association degree between each user and each recommendation data and the difference degree between the association information of each user and each recommendation data;

7. The apparatus of claim 6, wherein the association information of each user with each recommended data includes a probability that each user clicks on each recommended data,

the second determining module includes:

8. The apparatus of claim 6, further comprising:

and a seventh determining module, configured to determine the specified network layer according to the number of network layers included in the recommendation model to be evaluated and the number of network parameters corresponding to each network layer.

9. The apparatus of any of claims 6-8, wherein the first acquisition module comprises:

the first receiving unit is used for receiving a recommendation model evaluation request, wherein the evaluation request comprises a use scene or an index to be evaluated of the recommendation model to be evaluated;

the second acquisition unit is used for acquiring a plurality of user information, a plurality of recommendation data labels and the association information of each user and each recommendation data from the historical use log according to the use scene or the index to be evaluated of the recommendation model to be evaluated.

10. The apparatus of claim 9, wherein the second acquisition unit is specifically configured to:

11. An electronic device, comprising:

at least one processor; and

a memory communicatively coupled to the at least one processor; wherein,

the memory stores instructions executable by the at least one processor to enable the at least one processor to perform the recommendation model evaluation method of any one of claims 1-5.

12. A non-transitory computer-readable storage medium storing computer instructions for causing the computer to perform the method of evaluating the recommendation model of any one of claims 1-5.