CN113190746A

CN113190746A - Recommendation model evaluation method and device and electronic equipment

Info

Publication number: CN113190746A
Application number: CN202110463193.4A
Authority: CN
Inventors: 李心明; 魏龙; 王召玺; 王峰
Original assignee: Beijing Baidu Netcom Science and Technology Co Ltd
Current assignee: Beijing Baidu Netcom Science and Technology Co Ltd
Priority date: 2021-04-27
Filing date: 2021-04-27
Publication date: 2021-07-30
Anticipated expiration: 2041-04-27
Also published as: CN113190746B

Abstract

The application discloses a recommendation model evaluation method and device and electronic equipment, and relates to the technical field of artificial intelligence such as intelligent search and deep learning. The implementation scheme is as follows: acquiring a plurality of user information, a plurality of recommended data labels and associated information of each user and each recommended data from a historical use log; inputting each user information and various recommended data labels into a recommended model to be evaluated so as to determine the prediction association degree between each user and each recommended data; and determining the effect of the recommendation model to be evaluated according to the prediction relevance between each user and each recommended data and the difference degree between the relevance information between each user and each recommended data. Therefore, the effect of the recommendation model is determined according to the user information, the various recommendation data labels and the associated information of each user and each recommendation data, namely, the off-line evaluation of the recommendation model can be realized, the evaluation waiting period is saved, and the recommendation effect of the recommendation model can be determined in advance.

Description

Recommendation model evaluation method and device and electronic equipment

Technical Field

The present application relates to the field of computer technologies, and in particular, to the field of artificial intelligence technologies such as intelligent search and deep learning, and in particular, to a method and an apparatus for evaluating a recommendation model, an electronic device, a storage medium, and a computer program product.

Background

At present, with the continuous development of computer technology, the application frequency of a recommended model is increased day by day, the whole iteration cycle of the model comprises two stages of off-line modeling and on-line experiment, the off-line modeling comprises model design and model investigation, and the on-line experiment comprises on-line small flow verification and model full-scale application. The number of experiments depends on the result of model evaluation, i.e. the evaluation of the recommended model plays an important role in its application, and therefore, the technology for evaluating the recommended model is particularly important.

Disclosure of Invention

The application provides a recommendation model evaluation method and device and electronic equipment.

According to a first aspect of the present application, there is provided a recommendation model evaluation method, including:

acquiring a plurality of user information, a plurality of recommended data labels and associated information of each user and each recommended data from a historical use log;

inputting each user information and the various recommended data labels into a recommended model to be evaluated so as to determine the prediction association degree between each user and each recommended data;

and determining the effect of the recommendation model to be evaluated according to the prediction relevance between each user and each recommended data and the difference degree between the relevance information of each user and each recommended data.

According to a second aspect of the present application, there is provided an evaluation apparatus of a recommendation model, including:

the first acquisition module is used for acquiring a plurality of user information, a plurality of recommended data labels and associated information of each user and each recommended data from the historical use log;

the first determination module is used for inputting each piece of user information and the various recommended data labels into a recommended model to be evaluated so as to determine the prediction association degree between each piece of user information and each piece of recommended data;

and the second determining module is used for determining the effect of the recommendation model to be evaluated according to the prediction relevance between each user and each recommended data and the difference degree between the relevance information between each user and each recommended data.

According to a third aspect of the present application, there is provided an electronic device comprising: at least one processor; and a memory communicatively coupled to the at least one processor; wherein the memory stores instructions executable by the at least one processor to enable the at least one processor to perform a method of evaluating a recommendation model according to an embodiment of an aspect described above.

According to a fourth aspect of the present application, there is provided a non-transitory computer-readable storage medium having stored thereon a computer program for causing a computer to execute the method of evaluating a recommendation model according to an embodiment of the above-described aspect.

According to a fifth aspect of the present application, there is provided a computer program product, which when executed by a processor, implements the method for evaluating a recommendation model according to an embodiment of the above-mentioned aspect.

It should be understood that the statements in this section do not necessarily identify key or critical features of the embodiments of the present application, nor do they limit the scope of the present application. Other features of the present application will become apparent from the following description.

Drawings

The drawings are included to provide a better understanding of the present solution and are not intended to limit the present application. Wherein:

fig. 1 is a schematic flowchart of an evaluation method of a recommendation model according to an embodiment of the present application;

fig. 2 is a schematic flowchart illustrating a process of determining an effect of a recommendation model to be evaluated according to an embodiment of the present application;

fig. 3 is a schematic flowchart illustrating a process of positioning a network layer that needs to be modified in a recommendation model to be evaluated according to an embodiment of the present application;

fig. 4 is a schematic flowchart of a process of acquiring user information, a recommended data tag, and associated information according to an embodiment of the present application;

FIG. 5 is a diagram of a specific evaluation recommendation model provided by an embodiment of the present application;

fig. 6 is a schematic structural diagram of an evaluation apparatus of a recommendation model according to an embodiment of the present application;

fig. 7 is a block diagram of an electronic device for implementing the recommendation model evaluation method according to an embodiment of the present application.

Detailed Description

The following description of the exemplary embodiments of the present application, taken in conjunction with the accompanying drawings, includes various details of the embodiments of the application for the understanding of the same, which are to be considered exemplary only. Accordingly, those of ordinary skill in the art will recognize that various changes and modifications of the embodiments described herein can be made without departing from the scope and spirit of the present application. Also, descriptions of well-known functions and constructions are omitted in the following description for clarity and conciseness.

Intelligent search is a new generation of search engines that incorporate artificial intelligence techniques. The system can provide functions of traditional quick retrieval, relevance sequencing and the like, and also can provide functions of user role registration, automatic user interest identification, semantic understanding of contents, intelligent informatization filtering, pushing and the like.

Deep learning is a new research direction in the field of machine learning, and is introduced into machine learning to make it closer to the original target, artificial intelligence. Deep learning is the intrinsic rule and the expression level of learning sample data, the information obtained in the learning process is very helpful for the interpretation of data such as characters, images and sounds, and the final aim of the deep learning is to enable a machine to have the analysis learning capability like a human and to recognize the data such as the characters, the images and the sounds.

It should be noted that, whether a recommended model is applied on line or whether a policy is adopted, the number of experiments depends on the result of model evaluation.

In the related art, when the recommendation model is evaluated, the evaluation is generally performed on line. When on-line evaluation is performed, there may be a problem that an evaluation period is long, and the recommendation effect of the model cannot be predicted in advance by such an evaluation method.

Therefore, the embodiment of the application provides a recommendation model evaluation method and device and electronic equipment. According to the embodiment of the application, the effect of the recommendation model is determined according to the user information, the various recommendation data labels and the associated information of each user and each recommendation data, namely, the off-line evaluation of the recommendation model can be realized, so that the evaluation waiting period is saved, and the recommendation effect of the recommendation model can be determined in advance.

The following describes a recommendation model evaluation method, a recommendation model evaluation device, and an electronic device according to an embodiment of the present application with reference to the drawings.

Fig. 1 is a schematic flowchart of an evaluation method of a recommendation model according to an embodiment of the present application.

It should be noted that the execution subject of the evaluation method of the recommendation model in the embodiment of the present application may be an electronic device, specifically, the electronic device may be, but is not limited to, a server and a terminal, and the terminal may be, but is not limited to, a personal computer, a smart phone, an IPAD, and the like.

The embodiment of the present application is exemplified by configuring the evaluation method of the recommendation model in the evaluation apparatus of the recommendation model, and the apparatus may be applied to an electronic device, so that the electronic device may execute the evaluation method of the recommendation model.

As shown in fig. 1, the evaluation method of the recommendation model includes the following steps:

s101, obtaining a plurality of user information, a plurality of recommended data labels and associated information of each user and each recommended data from the historical use log.

It should be noted that, in practical applications, each time a user uses an electronic device, for example, content search and click are performed through the electronic device, so that the electronic device may generate a log stream when recommending related content to the user according to a user requirement, where each log stream corresponds to a timestamp. In the embodiment of the present application, a log that records a user history and performs a recommended action by using an electronic device may be referred to as a history use log, data recommended to the user by the electronic device may be referred to as recommended data, and a tag of the data may be referred to as a recommended data tag.

The association information between each user and each recommended data may be used to represent the interest level of each user in each recommended data, and may also be understood as the association level between each user and each recommended data, that is, the more interested the user is in the recommended data, the higher the association level between the user and the recommended data is; the less the user is interested in the recommendation data, the lower the association between the user and the recommendation data.

The historical usage log may include user information of a user using the electronic device, such as an account number of the user, time of the user using the electronic device, and the like, may also include recommended data tags, such as a merchandise tag, a manufacturer tag, and the like, and may also include associated information between each user and each type of recommended data, such as that the associated information between the user a and the recommended data "daily life of student" is relatively associated, which indicates that the user a is relatively interested in the recommended data "daily life of student".

Specifically, when the recommendation model needs to be evaluated, the multiple user information, the multiple recommended data tags, and the associated information between each user and each recommended data may be obtained from the historical usage log, so that the multiple user information, the multiple recommended data tags, and the multiple associated information may be obtained. One user information may correspond to one user, and one recommended data tag may correspond to one recommended data.

It should be noted that, in the embodiment of the present application, a manner of determining the plurality of user information, the plurality of recommended data tags, and the plurality of associated information may be any feasible manner in the related art, as long as the plurality of user information, the plurality of recommended data tags, and the plurality of associated information can be determined, which is not limited in the embodiment of the present application.

S102, inputting the information of each user and various recommended data labels into a recommended model to be evaluated so as to determine the prediction association degree between each user and each recommended data.

In the embodiment of the application, the recommendation model to be evaluated can be referred to as a recommendation model to be evaluated. It is understood that when the user performs a search or click operation, the electronic device may perform a recommendation of related content to the user based on a recommendation model, that is, the recommendation model may output data, such as video, text, images, etc., required by the user according to input content (search content or click content of the user).

In the embodiment of the application, the relevance between each user and each recommended data output by the recommendation model to be evaluated can be referred to as a prediction relevance, and the relevance can be used for representing the interest degree of the user in the recommended data. It can be understood that the higher the predicted relevance, the more interesting the user is in recommending data; the lower the predicted relevance, the less interested the user is in the recommendation data.

Specifically, after the plurality of user information and the plurality of recommended data tags are obtained, each user information and the plurality of recommended data tags may be input into the recommendation model to be evaluated, and the recommendation model to be evaluated may output the predictive association degree between the user corresponding to the user information and the plurality of recommended data corresponding to the plurality of recommended data tags.

S103, determining the effect of the recommendation model to be evaluated according to the prediction relevance between each user and each recommended data and the difference degree between the relevance information between each user and each recommended data.

It should be noted that after obtaining the association information between each user and each kind of recommendation data, multiple association information between each user and multiple kinds of recommendation data can be obtained, where the multiple association information may be different, that is, there may be a difference between the multiple association information, and the difference may represent the degree of interest of each user in the multiple kinds of recommendation data.

Specifically, after the predicted association degree between each user and each recommended data and the associated information between each user and each recommended data are determined, the degree of difference between the associated information between each user and each recommended data may be further determined, where the predicted association degree is obtained based on the recommended model to be evaluated, and the associated information is obtained based on an actual or real historical usage log, so that the degree of difference between the associated information is also real, and therefore, the effect of the recommended model to be evaluated may be determined by comparing the predicted association degree with the degree of difference.

Specifically, if the prediction relevance between each user and each recommended data is matched with or more matched with the actual difference degree, it can be determined that the effect of the recommendation model to be evaluated is better; if the prediction association degree between each user and each recommendation data is not matched with the actual difference degree or the matching degree is low, the recommendation model to be evaluated is determined to have poor effect.

That is, if two pieces of user information are respectively Y1 and Y2 and two pieces of recommendation data are respectively B1 and B2, wherein Y1 corresponds to user 1, Y2 corresponds to user 2, B1 corresponds to recommendation data S1, and B2 corresponds to recommendation data S2, then Y1 and the labels B1 and B2 may be input into a recommendation model to be evaluated, and the model may output a prediction association degree G11 between user 1 and recommendation data S1 and a prediction association degree G12 between user 1 and recommendation data S2; y2 and labels B1, B2 may be input into the recommendation model to be evaluated, which may in turn output a predicted association G21 between user 2 and recommendation data S1, and a predicted association G22 between user 2 and recommendation data S2. The associated information W11 of the user 1 and the recommendation data S1, the associated information W12 of the user 1 and the recommendation data S2, the associated information W21 of the user 2 and the recommendation data S1, and the associated information W22 of the user 2 and the recommendation data S2 are also obtained. Thereafter, the degree of difference C1 between the associated information W11 and W12 and the degree of difference C2 between the associated information W11 and W12 can be determined. And finally, determining the effect of the recommendation model to be evaluated according to the predicted association degrees G11 and G12 and the difference degree C1, determining the effect of the recommendation model to be evaluated according to the predicted association degrees G21 and G22 and the difference degree C2, further obtaining two effect results, and further comprehensively determining the effect of the recommendation model to be evaluated according to the two effect results.

For example, if the predicted association degrees between the user a and the two recommendation data are 20% and 80%, respectively, and the degree of difference between the user a and the association information of the two recommendation data is large, or it can be said that the degree of difference between the user a and the association information of the two recommendation data is 60% (there is a difference between 60% of the two association information), it indicates that the predicted association degree between the user a and each recommendation data matches or is relatively matched with the actual difference degree, and it can be determined that the effect of the recommendation model to be evaluated is good.

According to the evaluation method of the recommendation model, the effect of the recommendation model is determined according to the user information, the various recommendation data labels and the associated information of each user and each recommendation data, namely, the off-line evaluation of the recommendation model can be realized, so that the evaluation waiting period is saved, and the recommendation effect of the recommendation model can be determined in advance.

It should be noted that, in the embodiment of the present application, the association information of each user and each recommendation data may be any quantitative information that can be used to characterize the degree of interest of the user in the data, for example, the duration of browsing the recommendation data by the user, the number of times the user searches for keywords in the recommendation data, the probability of clicking the recommendation data by the user (i.e., the click rate of the user on the recommendation data), and the like.

That is, in one embodiment of the present application, the association information of each user with each recommendation data may include a probability that each user clicks each recommendation data.

It should be noted that the greater the probability that the user clicks the recommended data, the more the user is interested in the recommended data; the smaller the probability that a user clicks on the recommended data, the less interested the user is in the recommended data.

Specifically, the probability of each user clicking on each recommendation data may be obtained in the history use log.

In this embodiment, as shown in fig. 2, the determining the effect of the recommendation model to be evaluated in step S103 may include the following steps S201 to S203.

S201, acquiring a data label to be recommended from the various recommended data labels according to the predicted association degree between each user and each kind of recommended data.

In the embodiment of the application, the recommendation data acquired from the multiple recommendation data can be called as data to be recommended, and the tag corresponding to the data is the tag of the data to be recommended.

Specifically, after the predicted association degree between each user and each recommended data is determined, the data tag to be recommended may be obtained from the plurality of recommended data tags according to the predicted association degree between each user and each recommended data. It is understood that one recommendation data corresponds to one recommendation data tag, and one recommendation data tag may include a plurality of recommendation data tags.

The basis for obtaining the tags to be recommended may be a high-low order of the predicted relevance degrees, for example, a tag corresponding to recommended data with a high predicted relevance degree may be obtained, one or more tags of the data to be recommended may be obtained, and for example, the first 100 data tags with the highest relevance degree may be selected according to the predicted relevance degrees.

S202, determining the proportion of each recommended data label in the data labels to be recommended according to the label of each data to be recommended.

Specifically, after the data tags to be recommended are obtained, the proportion of each recommended data tag in the data tags to be recommended can be determined according to each tag of the data to be recommended.

For example, if three to-be-recommended data tags are B1, B2 and B3 and two recommended data tags are B1, B2, wherein B1 includes B1, B2 and B2 includes B3, it may be determined that the recommended data tag is B1 in two thirds of the to-be-recommended data tags B1, B2 and B3 and B2 in one third of the to-be-recommended data tags B1, B2 and B3.

S203, determining the effect of the recommendation model to be evaluated according to the proportion of each recommendation data tag in the data tags to be recommended and the probability of each user clicking each recommendation data.

Specifically, after the proportion of each recommended data tag in the data tags to be recommended and the probability of each user clicking each recommended data are determined, the effect of the recommendation model to be evaluated is determined according to the proportion and the probability.

Specifically, the proportion of each recommended data tag in the data tags to be recommended can be judged, whether the probability of clicking the recommended data by the user is matched or the matching degree is high or not can be judged, and under the condition of high matching or matching degree, the recommendation effect of the recommendation model to be evaluated is good, namely, contents which are more interesting to the user or meet the requirements of the user can be recommended according to the model input; on the contrary, under the condition that the matching degree of the proportion and the probability is low or the proportion and the probability are not matched completely, the recommendation effect of the recommendation model to be evaluated is poor.

That is to say, the data with the high click probability accounts for a large proportion of the data to be recommended, which can show that the model effect is good; on the contrary, the data of the type with the high (small) click probability accounts for a smaller (large) part of the data to be recommended, which may indicate that the model effect is poor.

For example, if the proportion of the recommended data label B1 (corresponding to the recommended data S1) in the to-be-recommended data labels B1, B2, and B3 is two thirds, and the proportion of the recommended data label B2 (corresponding to the recommended data S2) in the to-be-recommended data labels B1, B2, and B3 is one third, the probability that the user clicks the recommended data S1 is higher, the probability that the user clicks the recommended data S2 is lower, that is, the proportion of the data S1 having a higher probability of clicking in the to-be-recommended data is higher, it may be determined that the recommendation effect of the model to be evaluated is good.

Therefore, the recommendation effect of the model is determined according to the probability of clicking the recommendation data by the user and the prediction association degree between the user and the recommendation data, so that the determination reliability is improved, and the obtained model effect is more reliable.

In the related technology, when online small flow verification is performed in an online experiment, if the effect of the model small flow experiment is not in line with expectations, what kind of strategy design problem exists in the current recommended model to be evaluated cannot be directly inferred only by large-scale core indexes and the like, and the positioning analysis intuitiveness is not enough. In order to determine whether a problem exists in a recommendation model to be evaluated and determine a position of the problem in the recommendation model to be evaluated when the problem exists in the recommendation model to be evaluated, the following implementation modes are provided in the embodiments of the present application.

That is, in an embodiment of the present application, as shown in fig. 3, the method for evaluating a recommendation model may further include the following steps S301 to S304.

S301, inputting each user information and various recommended data labels into the recommended model to be evaluated so as to determine a first output value corresponding to a specified network layer in the recommended model to be evaluated and a prediction association degree between each user and each recommended data output by the recommended model to be evaluated.

The specified network layer in the recommendation model to be evaluated may be understood as any forward network layer in the recommendation model to be evaluated, or may be understood as a specified network node (an output node of a network, where the network is composed of a plurality of nodes). The number of the specified network layers may be one or more, and may be determined according to actual conditions or actual needs.

The first output value may be understood as an output value output by a specified network layer in the model after the input value is input into the recommended model to be evaluated, and may also be referred to as a forward output value (i.e., an activation value of a network neuron).

Specifically, after the plurality of user information and the plurality of recommended data tags are determined, in order to determine whether the recommended model to be evaluated has a problem, each user information and the plurality of recommended data tags may be input into the recommended model to be evaluated. Then, a specified network layer in the recommendation model to be evaluated can be determined and obtained, a first output value output based on each user information and the various recommendation data labels is obtained, and the prediction association degree between each user and each recommendation data (namely, the final output value of the recommendation model to be evaluated) which is finally output based on each user information and the various recommendation data labels is obtained.

It should be noted that the manner of determining the first output value may be any feasible manner in the related art, as long as reliable acquisition of the first output value corresponding to the specified network layer in the recommendation model to be evaluated can be achieved, and the embodiment of the present application is not limited in any way.

S302, inputting the information of each user and various recommended data labels into a reference recommendation model to determine a second output value corresponding to a specified network layer in the reference recommendation model and a reference association degree between each user and each recommended data output by the reference recommendation model.

According to the embodiment of the application, the reference recommendation model can be determined in advance and used as the reference model of the recommendation model to be evaluated. The reference recommendation model can be a recommendation model with the same or similar function as the recommendation model to be evaluated, and the recommendation effect is good.

According to the embodiment of the application, the relevance between each user and each recommended data output by the reference recommendation model can be called as the reference relevance, and the relevance can be used for representing the interest degree of the user in the recommended data. It can be understood that the higher the reference association, the more interesting the user is in recommending data; the lower the reference relevance, the less interesting the user is in the recommendation data.

The network layer specified in the reference recommendation model may be understood as any forward network layer in the reference recommendation model, or may be understood as a specified network node (an output node of a network, where the network is composed of a plurality of nodes). The number of the specified network layers in the reference recommendation model may be one or more, may be the same as the number of the specified network layers in the recommendation model to be evaluated, and may be in one-to-one correspondence with the specified network layers in the recommendation model to be evaluated.

Here, the second output value may be understood as an output value that specifies a network layer output in the model after the input value is input into the reference recommendation model, and may also be referred to as a forward output value (i.e., an activation value of a network neuron).

Specifically, after the plurality of user information, the plurality of recommended data tags, and the reference recommendation model are determined, each of the user information and the plurality of recommended data tags may be input into the reference recommendation model. Then, a specified network layer in the reference recommendation model can be determined and obtained, a first output value output based on each user information and the various recommendation data tags is obtained, a reference association degree (namely a value finally output by the reference recommendation model) between each user and each recommendation data is finally output based on each user information and the various recommendation data tags of the recommendation model to be evaluated is obtained.

It should be noted that the manner of determining the second output value may be any feasible manner in the related art, as long as reliable acquisition of the second output value corresponding to the specified network layer in the reference recommendation model can be achieved, and the embodiment of the present application is not limited in any way.

S303, in response to the fact that the reference relevance degree and the prediction relevance degree between any user and any recommended data label are not matched, determining the difference between the first output value and the second output value corresponding to the specified network layer.

Specifically, after the plurality of first output values, the plurality of second output values, the plurality of prediction association degrees and the plurality of reference association degrees are determined, whether the reference association degree between each user and each recommended data tag is matched with the prediction association degree between the user and the recommended data tag or not can be determined, if a mismatch occurs, that is, when the reference association degree and the prediction association degree between any user and any recommended data tag are not matched, it is stated that the recommended model to be evaluated has a problem, and at this time, the difference between the first output value corresponding to the specified network layer in the model to be evaluated and the second output value corresponding to the instruction network layer in the reference recommended model can be determined.

Specifically, the difference between the first output value and the second output value may be either no difference or a difference, and when a difference exists, the degree of difference may be high or low.

S304, determining the network layer to be corrected in the model to be evaluated according to the difference and the position of the specified network layer in the model.

Specifically, after the difference between the first output value and the second output value corresponding to the specified network layer is determined, the specified network layer corresponding to the difference can be determined, the position of the specified network layer in the recommended model to be evaluated is further determined, and the network layer needing to be corrected in the recommended model to be evaluated can be determined according to the position, so that the network layer needing to be corrected in the model can be positioned.

For example, if a first output value corresponding to a designated network layer M in the recommendation model M1 to be evaluated is x1, a second output value corresponding to the designated network layer M in the reference recommendation model M2 is x2, and the predicted association degrees between the user a output by the model M1 and the two types of recommendation data S1 (corresponding to labels B1) and S2 (corresponding to labels B2) are 10% and 90%, respectively, and the reference association degrees between the user a output by the model M2 and the two types of recommendation data S1 and S2 are 50% and 90%, respectively, then it may be determined whether there is a difference between the first output value x1 corresponding to the designated network layer M and the second output value x2, and then 10% of the predicted association degree between the user a and the recommendation data label B1 does not match with 50% of the reference association degree. If there is a difference, the position of the M layers in the model M2 is determined, and then the network layer in the model M1 that needs to be modified can be determined according to the position.

It will be appreciated that after locating a network layer that has a problem in the recommendation model to be evaluated, the network layer may be modified in any feasible manner until the network layer has no problems.

That is to say, in the embodiment of the present disclosure, as long as the final output result of the model to be evaluated is inconsistent with the final output result of the reference model, it is described that the model to be evaluated has a problem, so that the result of the model intermediate layer (i.e., the specified network layer) can be obtained, and then the model problem is located according to the intermediate result.

Therefore, when the output result of the model to be evaluated is inconsistent with the output result of the reference model, the problem of the model is positioned according to the first output value and the second output value, and the problem in the model can be reliably determined.

Further, the number of the specified network layers may be one or multiple, and specifically may be determined according to the number of all network layers in the model, or may be determined according to the positions of the network layers in the model, or may be determined according to parameters corresponding to the network layers, or in an embodiment of the present application, the specified network layers may be determined according to the number of network layers included in the recommended model to be evaluated and the network parameters corresponding to each network layer.

For example, when the number of network layers in the recommended model to be evaluated is large, only the network layer with the large corresponding network parameter amount may be selected as the designated network layer, or when the number of network layers in the model is small, the network layer may be extracted as the designated network layer at certain intervals.

Therefore, the appointed network layer is determined according to the number of the network layers contained in the model and the number of the network parameters corresponding to each network layer, the reliability of the appointed network layer can be guaranteed, and the reliability of model evaluation is further improved.

In the step S101, when the information of the plurality of users, the plurality of recommended data tags, and the associated information between each user and each recommended data are obtained, in order to ensure the reliability of the obtaining, the information may be obtained according to the usage scenario of the model and the index to be evaluated.

That is, in an embodiment of the present application, as shown in fig. 4, the step S101 may include the following steps S401 and S402.

S401, a recommendation model evaluation request is received, wherein the evaluation request comprises a use scene or an index to be evaluated of a recommendation model to be evaluated.

It should be understood that when the recommendation model needs to be evaluated, the user may send a recommendation model evaluation request to the electronic device, and the electronic device receives the recommendation model evaluation request, where the evaluation request includes a usage scenario or an index to be evaluated of the recommendation model to be evaluated.

The usage scenario of the recommendation model to be evaluated may be, for example, a novice scenario (i.e., application to a new user), application recommendation, or application to a content recommendation scenario.

The index to be evaluated may be, for example, the proportion of the generalized resource or the proportion of the specific resource in the resource recommended by the model to be evaluated. That is, different test data may be used for different purposes.

S402, according to the use scene or the index of the model to be evaluated, obtaining a plurality of user information, a plurality of recommended data labels and the associated information of each user and each recommended data from the historical use log.

Specifically, after the usage scenario or the index to be evaluated of the model to be evaluated is obtained, the multiple user information, the multiple recommended data tags, and the associated information between each user and each recommended data may be obtained in a targeted manner from the historical usage log according to the usage scenario or the index to be evaluated.

For example, when the usage scenario is recommended by novice, more and representative user information and various recommended data labels may be obtained from the historical usage log, and the associated information between each user and each recommended data in the log is obtained.

Therefore, according to the use scene or the index to be evaluated of the model to be evaluated, historical user information, various recommended data labels and associated information between the user and the recommended data are obtained, the reliability of the user information, the recommended data labels and the associated information can be guaranteed, and the reliability of model recommendation is further improved.

In an embodiment of the application, the step S402 may include: determining the type of user information to be acquired according to the use scene or the index to be evaluated of the model to be evaluated; acquiring a plurality of user information matched with the type of the user information to be acquired from a historical use log; and determining various recommended data labels and the associated information of each user and each recommended data according to the historical use logs of the users.

The user information type may include, for example, the age, occupation, gender, and the like of the user.

Specifically, after the usage scenario or the index to be evaluated of the model to be evaluated is obtained, the type of the user information to be obtained may be determined, and then a plurality of pieces of user information matched with the user information may be obtained in the historical usage log, and then the historical usage logs of a plurality of users may be obtained according to the plurality of pieces of user information, so as to obtain a plurality of types of recommended data tags and associated information of each user and each type of recommended data in the historical usage log.

It should be noted that the user information types may be various, for example, a plurality of different user information such as different ages, different sexes, different professions, or different time periods of using the system may be obtained.

It should be noted that, the embodiment of the present application may also determine the initial sampling mode according to the indicator light information according to other ways in the related art, and the above embodiment of the present application is only an exemplary illustration.

Therefore, the user information type is determined according to the use scene or the index to be evaluated of the model to be evaluated, so that various recommended data labels and the associated information between the user and the recommended data are obtained according to the user information type, the reliability of the user information, the recommended data labels and the associated information can be improved, and the reliability of model recommendation is further improved.

That is to say, the method for evaluating a recommendation model according to the embodiment of the application extracts a feature package and an offline experimental prediction model through offline experiments by using an existing online real sample feature set, combines information of high-quality recommended resources, sets an evaluation policy flag, such as generalization, and gives whether the resource recommended by the current recommendation model to be evaluated is improved or not compared with a reference recommendation model on a specified policy index (such as generalization or specificity), and besides the conclusion that whether the policy index is improved or not, whether the diversity, recommendation effect capability and other aspects of the recommended resources of the current recommendation model to be evaluated are improved or not can be evaluated through information comparison of the high-quality resources recommended by two models (such as information of article titles, article details, whether articles are clicked or not), and whether the model has a design problem or not can be indirectly judged from the conclusion, such as whether the recommended good resources match the current user's points of interest, etc.

It should be noted that, in the technical solution of the embodiment of the present application, the acquisition, storage, application, and the like of the related user information all conform to the regulations of the relevant laws and regulations, and do not violate the common customs of the public order.

In order to more clearly illustrate the evaluation method of the recommendation model according to the embodiment of the present application, the following description is given by way of an example:

as shown in fig. 5, first, user information (uid _ info) specifying a user may be acquired, including: the method comprises the steps of accounting, the time of falling a disk (a log stream exists when a user really initiates refreshing recommended content, each log is provided with a corresponding timestamp which is the time of falling the disk) and a use log, wherein the data of the current user can be inquired on which specific machine the disk falls on the line through a platform interface, the part of log of the falling disk of the user can be obtained from the specified machine, and feature samples sample samples of a user side and a resource (namely recommended pictures and texts, videos and the like) side which are rich in the log are completely filtered out, wherein the user side and the resource (namely recommended pictures and texts, videos and the like) side are related to the user.

In the deployment stage of the evaluation environment (Feature-extractor environment), an offline Feature extraction model extractor can be deployed and used for extracting a sample set to be used by model training input, and an offline model estimation environment is deployed and used for preparing an output value q of the model during offline model estimation.

After the environment is prepared, a sample set can be extracted, then a large model on the line is cut according to the extracted sample set, only the embedding information of the large model corresponding to the sample to be used is reserved, so that the model estimation effect improvement is realized, then the model estimation is carried out, the forward output value (namely a first output value) of the appointed network node, the q value and the details of the appointed other model estimation can be determined, in addition, a forward ranking gcms service in a feed framework is requested according to a recommended data label nid (such as an article, a connection and the like), forward ranking information corresponding to the current label nid, such as article titles, links, picture contents and the like, is taken for detail analysis and standby, then the backward ranking of the q value estimated from the line is required, the first 100 nid labels in the q value are taken, whether the current resource hits the strategy is marked by the existing strategy indexes, and the occupation ratio of the hit strategy indexes in the first 100 nid is counted, for example, in the previous 100 nids, each nid has a mark indicating whether the nid is a generalization resource, the ratio of the nid of the generalization resource in the 100 nids is counted, the calculated ratio is used as quantitative data for evaluation, and then comparative analysis is performed according to an evaluation index value corresponding to a recommendation model to be evaluated and an evaluation index value corresponding to a reference recommendation model, so that the promotion condition of the model index can be obtained.

In summary, according to the embodiment of the application, the effect of the recommendation model can be evaluated off-line, and the evaluation waiting period of 3 days of an on-line experiment can be saved; the evaluation is preposed before the model experiment, the manual operation is replaced by the universal tool, the analysis is more visual by combining the forward information mode, the repeated iteration of an invalid model is reduced after the model is evaluated in an off-line mode, the model iteration efficiency is improved, and the small-level iteration of the model can be supported.

The embodiment of the present application further provides an evaluation device for a recommendation model, and fig. 6 is a schematic structural diagram of the evaluation device for a recommendation model provided in the embodiment of the present application.

As shown in fig. 6, the recommendation model evaluation device 600 includes: a first obtaining module 610, a first determining module 620, and a second determining module 630.

The first obtaining module 610 is configured to obtain, from a history usage log, a plurality of user information, a plurality of recommended data tags, and associated information between each user and each recommended data;

a first determining module 620, configured to input each piece of user information and the plurality of types of recommended data tags into a recommendation model to be evaluated, so as to determine a predicted association degree between each user and each type of recommended data;

the second determining module 630 is configured to determine the effect of the recommendation model to be evaluated according to the predicted relevance between each user and each recommended data and the difference between the relevance information between each user and each recommended data.

In an embodiment of the application, the association information of each user with each recommendation data includes a probability of each user clicking each recommendation data, and the second determining module 630 may include:

the first obtaining unit is used for obtaining a data label to be recommended from the various recommended data labels according to the prediction relevance between each user and each recommended data;

the first determining unit is used for determining the proportion of each recommended data label in the to-be-recommended data labels according to each to-be-recommended data label;

and the second determining unit is used for determining the effect of the recommendation model to be evaluated according to the proportion of each recommendation data tag in the data tag to be recommended and the probability of each user clicking each recommendation data.

In an embodiment of the present application, the apparatus 600 for evaluating a recommendation model may further include:

a third determining module, configured to input each piece of user information and the plurality of recommended data labels into the recommended model to be evaluated, so as to determine a first output value corresponding to a specified network layer in the recommended model to be evaluated and a predicted association degree between each user and each piece of recommended data output by the recommended model to be evaluated;

a fourth determining module, configured to input each piece of user information and the plurality of recommended data labels into a reference recommendation model, so as to determine a second output value corresponding to a specified network layer in the reference recommendation model and a reference association degree between each piece of user and each piece of recommended data output by the reference recommendation model;

a fifth determining module, configured to determine, in response to that a reference relevance between any user and any recommended data tag does not match a predicted relevance, a difference between a first output value and a second output value corresponding to the specified network layer;

and the sixth determining module is used for determining the network layer to be corrected in the model to be evaluated according to the difference and the position of the specified network layer in the model.

In an embodiment of the present application, the apparatus 600 for evaluating a recommendation model may further include: and the seventh determining module is used for determining the specified network layer according to the number of the network layers contained in the model to be recommended and the network parameter number corresponding to each network layer.

In an embodiment of the present application, the first obtaining module 610 may include:

the device comprises a first receiving unit, a second receiving unit and a third receiving unit, wherein the first receiving unit is used for receiving a recommendation model evaluation request, and the evaluation request comprises a use scene or an index to be evaluated of the model to be evaluated;

and the second acquisition unit is used for acquiring a plurality of user information, a plurality of recommended data labels and associated information of each user and each recommended data from a historical use log according to the use scene or the index to be evaluated of the model to be evaluated.

In an embodiment of the application, the second obtaining unit may be specifically configured to: determining the type of the user information to be acquired according to the use scene or the index to be evaluated of the model to be evaluated; acquiring a plurality of user information matched with the type of the user information to be acquired from the historical use log; determining various recommended data labels and associated information of each user and each recommended data according to the historical use logs of the users

It should be noted that, for other specific embodiments of the evaluation apparatus of the recommendation model in the embodiment of the present application, reference may be made to the specific embodiment of the evaluation method of the recommendation model, and details are not described here for avoiding redundancy.

According to the evaluation device of the recommendation model, the effect of the recommendation model is determined according to the user information, the various recommendation data labels and the associated information of each user and each recommendation data, namely, the off-line evaluation of the recommendation model can be realized, so that the evaluation waiting period is saved, and the recommendation effect of the recommendation model can be determined in advance.

According to an embodiment of the present application, there is also provided an electronic device, a readable storage medium, and a computer program product of an evaluation method of a recommendation model. This will be explained with reference to fig. 7.

Fig. 7 is a block diagram of an electronic device according to an evaluation method of a recommendation model according to an embodiment of the present application. Electronic devices are intended to represent various forms of digital computers, such as laptops, desktops, workstations, personal digital assistants, servers, blade servers, mainframes, and other appropriate computers. The electronic device may also represent various forms of mobile devices, such as personal digital processing, cellular phones, smart phones, wearable devices, and other similar computing devices. The components shown herein, their connections and relationships, and their functions, are meant to be examples only, and are not meant to limit implementations of the present application that are described and/or claimed herein.

As shown in fig. 7, the device 700 comprises a computing unit 701, which may perform various suitable actions and processes according to a computer program stored in a Read Only Memory (ROM)702 or a computer program loaded from a storage unit 708 into a Random Access Memory (RAM) 703. In the RAM703, various programs and data required for the operation of the device 700 can also be stored. The computing unit 701, the ROM702, and the RAM703 are connected to each other by a bus 704. An input/output (I/O) interface 705 is also connected to bus 704.

Various components in the device 700 are connected to the I/O interface 705, including: an input unit 706 such as a keyboard, a mouse, or the like; an output unit 707 such as various types of displays, speakers, and the like; a storage unit 708 such as a magnetic disk, optical disk, or the like; and a communication unit 709 such as a network card, modem, wireless communication transceiver, etc. The communication unit 709 allows the device 700 to exchange information/data with other devices via a computer network, such as the internet, and/or various telecommunication networks.

Computing unit 701 may be a variety of general purpose and/or special purpose processing components with processing and computing capabilities. Some examples of the computing unit 701 include, but are not limited to, a Central Processing Unit (CPU), a Graphics Processing Unit (GPU), various specialized Artificial Intelligence (AI) computing chips, various computing units running machine learning model algorithms, a Digital Signal Processor (DSP), and any suitable processor, controller, microcontroller, and so forth. The calculation unit 701 executes the respective methods and processes described above, such as the evaluation method of the recommendation model. For example, in some embodiments, the method of evaluating a recommendation model may be implemented as a computer software program tangibly embodied in a machine-readable medium, such as storage unit 708. In some embodiments, part or all of a computer program may be loaded onto and/or installed onto device 700 via ROM702 and/or communications unit 709. When the computer program is loaded into the RAM703 and executed by the computing unit 701, one or more steps of the method of evaluating a recommendation model described above may be performed. Alternatively, in other embodiments, the computing unit 701 may be configured by any other suitable means (e.g. by means of firmware) to perform the evaluation method of the recommendation model.

Various implementations of the systems and techniques described here above may be implemented in digital electronic circuitry, integrated circuitry, Field Programmable Gate Arrays (FPGAs), Application Specific Integrated Circuits (ASICs), Application Specific Standard Products (ASSPs), system on a chip (SOCs), load programmable logic devices (CPLDs), computer hardware, firmware, software, and/or combinations thereof. These various embodiments may include: implemented in one or more computer programs that are executable and/or interpretable on a programmable system including at least one programmable processor, which may be special or general purpose, receiving data and instructions from, and transmitting data and instructions to, a storage system, at least one input device, and at least one output device.

Program code for implementing the methods of the present application may be written in any combination of one or more programming languages. These program codes may be provided to a processor or controller of a general purpose computer, special purpose computer, or other programmable data processing apparatus, such that the program codes, when executed by the processor or controller, cause the functions/operations specified in the flowchart and/or block diagram to be performed. The program code may execute entirely on the machine, partly on the machine, as a stand-alone software package partly on the machine and partly on a remote machine or entirely on the remote machine or server.

In the context of this application, a machine-readable medium may be a tangible medium that can contain, or store a program for use by or in connection with an instruction execution system, apparatus, or device. The machine-readable medium may be a machine-readable signal medium or a machine-readable storage medium. A machine-readable medium may include, but is not limited to, an electronic, magnetic, optical, electromagnetic, infrared, or semiconductor system, apparatus, or device, or any suitable combination of the foregoing. More specific examples of a machine-readable storage medium would include an electrical connection based on one or more wires, a portable computer diskette, a hard disk, a Random Access Memory (RAM), a read-only memory (ROM), an erasable programmable read-only memory (EPROM or flash memory), an optical fiber, a portable compact disc read-only memory (CD-ROM), an optical storage device, a magnetic storage device, or any suitable combination of the foregoing.

To provide for interaction with a user, the systems and techniques described here can be implemented on a computer having: a display device (e.g., a CRT (cathode ray tube) or LCD (liquid crystal display) monitor) for displaying information to a user; and a keyboard and a pointing device (e.g., a mouse or a trackball) by which a user can provide input to the computer. Other kinds of devices may also be used to provide for interaction with a user; for example, feedback provided to the user can be any form of sensory feedback (e.g., visual feedback, auditory feedback, or tactile feedback); and input from the user may be received in any form, including acoustic, speech, or tactile input.

The systems and techniques described here can be implemented in a computing system that includes a back-end component (e.g., as a data server), or that includes a middleware component (e.g., an application server), or that includes a front-end component (e.g., a user computer having a graphical user interface or a web browser through which a user can interact with an implementation of the systems and techniques described here), or any combination of such back-end, middleware, or front-end components. The components of the system can be interconnected by any form or medium of digital data communication (e.g., a communication network). Examples of communication networks include: local Area Networks (LANs), Wide Area Networks (WANs), the internet, and blockchain networks.

The computer system may include clients and servers. A client and server are generally remote from each other and typically interact through a communication network. The relationship of client and server arises by virtue of computer programs running on the respective computers and having a client-server relationship to each other. The Server may be a cloud Server, which is also called a cloud computing Server or a cloud host, and is a host product in a cloud computing service system, so as to solve the defects of high management difficulty and weak service expansibility in the conventional physical host and VPS (Virtual Private Server) service. The server may also be a server of a distributed system, or a server incorporating a blockchain.

It should be understood that various forms of the flows shown above may be used, with steps reordered, added, or deleted. For example, the steps described in the present disclosure may be executed in parallel, sequentially, or in different orders, as long as the desired results of the technical solutions disclosed in the present disclosure can be achieved, and the present disclosure is not limited herein.

The above detailed description should not be construed as limiting the scope of the disclosure. It should be understood by those skilled in the art that various modifications, combinations, sub-combinations and substitutions may be made in accordance with design requirements and other factors. Any modification, equivalent replacement, and improvement made within the spirit and principle of the present disclosure should be included in the scope of protection of the present disclosure.

Claims

1. A method of evaluating a recommendation model, comprising:

2. The method of claim 1, wherein the association information of each user with each recommendation data comprises a probability of each user clicking each recommendation data;

the determining the effect of the recommendation model to be evaluated according to the prediction relevance between each user and each recommended data and the difference degree between the relevance information between each user and each recommended data comprises:

acquiring a data label to be recommended from the various recommended data labels according to the predicted association degree between each user and each recommended data;

determining the proportion of each recommended data label in the to-be-recommended data labels according to each to-be-recommended data label;

and determining the effect of the recommendation model to be evaluated according to the proportion of each recommendation data tag in the data tags to be recommended and the probability of each user clicking each recommendation data.

3. The method of claim 1, further comprising:

inputting each user information and the plurality of recommended data labels into the recommended model to be evaluated so as to determine a first output value corresponding to a specified network layer in the recommended model to be evaluated and a prediction association degree between each user and each recommended data output by the recommended model to be evaluated;

inputting each user information and the various recommended data labels into a reference recommendation model so as to determine a second output value corresponding to a specified network layer in the reference recommendation model and a reference association degree between each user and each recommended data output by the reference recommendation model;

in response to that the reference relevance and the prediction relevance between any user and any recommended data label are not matched, determining the difference between a first output value and a second output value corresponding to the specified network layer;

and determining the network layer to be corrected in the recommended model to be evaluated according to the difference and the position of the specified network layer in the model.

4. The method of claim 3, further comprising:

and determining the specified network layer according to the number of the network layers contained in the recommendation model to be evaluated and the network parameter number corresponding to each network layer.

5. The method of any one of claims 1-4, wherein the obtaining of the plurality of user information, the plurality of recommended data tags, and the association information of each user with each recommended data from the historical usage log comprises:

receiving a recommendation model evaluation request, wherein the evaluation request comprises a use scene or an evaluation index of the recommendation model to be evaluated;

and acquiring a plurality of user information, a plurality of recommended data labels and associated information of each user and each recommended data from a historical use log according to the use scene or the index to be evaluated of the recommended model to be evaluated.

6. The method of claim 5, wherein the obtaining of the plurality of user information, the plurality of recommended data tags, and the associated information of each user and each recommended data from the historical usage log according to the usage scenario or the index to be evaluated of the recommendation model to be evaluated comprises:

determining the type of the user information to be acquired according to the use scene or the index to be evaluated of the recommendation model to be evaluated;

acquiring a plurality of user information matched with the type of the user information to be acquired from the historical use log;

and determining various recommended data labels and the associated information of each user and each recommended data according to the historical use logs of the users.

7. An evaluation apparatus of a recommendation model, comprising:

8. The apparatus of claim 7, wherein the association information of each user with each recommendation data comprises a probability of each user clicking each recommendation data,

the second determining module includes:

9. The apparatus of claim 7, further comprising:

a third determining module, configured to input each piece of user information and the plurality of types of recommended data tags into the recommendation model to be evaluated, so as to determine a first output value corresponding to a specified network layer in the recommendation model to be evaluated and a predicted association degree between each user and each type of recommended data output by the recommendation model to be evaluated;

10. The apparatus of claim 9, further comprising:

and the seventh determining module is used for determining the specified network layer according to the number of the network layers contained in the recommendation model to be evaluated and the network parameter number corresponding to each network layer.

11. The apparatus of any one of claims 7-10, wherein the first obtaining means comprises:

the device comprises a first receiving unit, a second receiving unit and a third receiving unit, wherein the first receiving unit is used for receiving a recommendation model evaluation request, and the evaluation request comprises a use scene or an evaluation index of the recommendation model to be evaluated;

and the second acquisition unit is used for acquiring a plurality of user information, a plurality of recommended data labels and the associated information of each user and each recommended data from a historical use log according to the use scene or the index to be evaluated of the recommended model to be evaluated.

12. The apparatus according to claim 11, wherein the second obtaining unit is specifically configured to:

13. An electronic device, comprising:

at least one processor; and

a memory communicatively coupled to the at least one processor; wherein the content of the first and second substances,

the memory stores instructions executable by the at least one processor to enable the at least one processor to perform a method of evaluating a recommendation model according to any of claims 1-6.

14. A non-transitory computer-readable storage medium storing computer instructions for causing a computer to execute the evaluation method of the recommendation model of any one of claims 1-6.

15. A computer program product comprising a computer program, characterized in that the computer program, when being executed by a processor, carries out the method of evaluation of a recommendation model according to any one of claims 1-6.