CN110110139B

CN110110139B - Method and device for explaining recommendation result and electronic equipment

Info

Publication number: CN110110139B
Application number: CN201910317239.4A
Authority: CN
Inventors: 孙成龙; 郭正凯; 宋华
Original assignee: Beijing QIYI Century Science and Technology Co Ltd
Current assignee: Beijing QIYI Century Science and Technology Co Ltd
Priority date: 2019-04-19
Filing date: 2019-04-19
Publication date: 2021-06-22
Anticipated expiration: 2039-04-19
Also published as: CN110110139A

Abstract

The embodiment of the invention provides a method, a device and electronic equipment for explaining a recommendation result, wherein the method comprises the following steps: obtaining a prediction model corresponding to a recommendation result to be explained and a target user receiving the recommendation result to be explained, and constructing an initial sample, wherein the initial sample comprises: the content characteristics of the recommended content and the user characteristics of the target user included in the recommendation result to be explained; inputting the prediction model and the initial sample into a preset local feature diagnosis model irrelevant to the prediction model to obtain the importance value of each feature on the predicted click probability; and interpreting the recommendation result to be interpreted according to the importance value. By applying the embodiment of the invention, the preset local characteristic diagnosis model is a model irrelevant to the prediction model and can be suitable for any prediction model. Thus, the versatility of the interpretation method is improved.

Description

Method and device for explaining recommendation result and electronic equipment

Technical Field

The present invention relates to the field of machine learning technologies, and in particular, to a method and an apparatus for interpreting a recommendation result, and an electronic device.

Background

At present, a plurality of network application systems recommend some content to users so as to better meet the requirements of the users. For example: the video playing system recommends some videos for the user to select. The specific recommendation method mainly comprises the following steps:

firstly, acquiring a training sample, wherein the training sample comprises videos watched by a user, personal information of the user and the like; then, training a recommendation model by using the collected samples; and finally, inputting the information of the candidate video and the target user into the trained recommendation model. Firstly, recommending a prediction model in the model, extracting features from information of candidate videos and target users, and comprising the following steps: and predicting the probability of the candidate videos clicked by the target user according to the video characteristics and the user characteristics to obtain the predicted click probability of the target user on each candidate video, and then recommending the candidate video with the highest predicted click probability to the target user by using the recommendation model as the video to be recommended.

In order to improve the prediction model, the role of each extracted feature on the prediction result, namely the probability of being clicked, needs to be determined, and then the recommendation result needs to be explained.

At present, there are many commonly used prediction models for a recommendation system, and the click probability can be predicted according to the characteristics of an input sample. Such as: a GBDT (Gradient Boosting Decision Tree) prediction model, a prediction model formed by combining GBDT and FM (Factorization Machine), or a prediction model formed by combining GBDT and LR (Logistic Regression Classifier).

For the tree model such as GBDT, in the prior art, the prediction result may be analyzed by using an information gain method, in which an information gain is calculated for each feature of a sample and then sorted, and the importance of the feature is determined according to the sorted result, so as to explain the recommendation result. For example: the importance of the feature A and the feature B is the highest by using an information gain method, and the 2 features with the highest corresponding importance in the video to be recommended can be interpreted, so that the predicted click probability is high.

However, the inventor finds that the prior scheme has at least the following problems in the process of implementing the invention:

the existing information gain method can only analyze the prediction result of a tree model such as GBDT and can not analyze the prediction result of a combined model, for example: the prediction results of the prediction model formed by combining GBDT and FM or the prediction model formed by combining GBDT and LR cannot be analyzed, and after the characteristics with the number more than million levels are introduced, the information gain method cannot calculate the characteristic information gain one by one, and the importance of each characteristic cannot be determined.

Therefore, the method for explaining the recommendation result in the prior art cannot determine the importance of each feature for some prediction models, so that the method cannot be applied to all recommendation models, and the universality is not high.

Disclosure of Invention

The embodiment of the invention aims to provide a method, a device and electronic equipment for explaining a recommendation result so as to improve the universality of an explanation method.

In order to achieve the above object, an embodiment of the present invention discloses a method for interpreting a recommendation result, including:

obtaining a prediction model and a target user corresponding to a recommendation result to be explained; the recommendation result to be interpreted comprises recommendation content; the target user is the user receiving the recommendation result to be explained;

constructing an initial sample; the initial sample, comprising: the content characteristics of the recommended content and the user characteristics of the target user;

inputting the prediction model and the initial sample into a preset local feature diagnosis model irrelevant to the prediction model; performing multiple disturbances on the content characteristics and/or the user characteristics in the initial sample to obtain multiple disturbance samples; inputting the initial samples into the prediction model to obtain corresponding initial predicted click probabilities, and respectively inputting each disturbance sample into the prediction model to obtain corresponding predicted click probabilities after a plurality of disturbances; obtaining the importance value of each characteristic to the predicted click probability according to the difference between the initial predicted click probability and the predicted click probability after each disturbance; wherein the predicted click probability is: the prediction model predicts the click probability of the target user on the recommended content according to an input sample;

and according to the importance value of each feature, interpreting the recommendation result to be interpreted.

Optionally, the step of performing multiple perturbations on the content features and/or the user features in the initial sample to obtain multiple perturbed samples includes:

selecting an undisturbed feature from the content features and the user features in the initial sample as a feature to be determined;

hiding the to-be-determined features from the initial sample, completing one-time disturbance, and obtaining a disturbance sample corresponding to the to-be-determined features; and returning to the step of selecting an undisturbed feature from the content features and the user features in the initial sample as the pending feature.

Optionally, the step of inputting the initial sample to the prediction model to obtain a corresponding initial predicted click probability includes:

inputting content characteristics and user characteristics in the initial sample into the prediction model to obtain corresponding initial prediction click probability;

the step of inputting each disturbance sample into the prediction model respectively to obtain a plurality of corresponding post-disturbance prediction click probabilities includes:

and inputting the content characteristics and the user characteristics in the disturbance sample corresponding to each characteristic to be determined into the prediction model to obtain the post-disturbance predicted click probability corresponding to each characteristic to be determined.

Optionally, the step of obtaining an importance value of each feature to the predicted click probability according to a difference between the initial predicted click probability and each post-disturbance predicted click probability includes:

respectively calculating the difference between the initial predicted click probability and the disturbed predicted click probability corresponding to each undetermined characteristic;

and taking each obtained difference value as an importance value of the corresponding undetermined characteristic.

Optionally, the prediction model includes: a first submodel and a second submodel;

before the step of inputting the prediction model and the initial sample into a preset local feature diagnosis model independent of the prediction model, the method further comprises the following steps:

dividing all the characteristics into dense characteristics and sparse characteristics according to a preset algorithm for the content characteristics and the user characteristics in the initial sample;

inputting the dense features into a first submodel to obtain combined features;

the step of inputting the initial sample to the prediction model to obtain a corresponding initial predicted click probability includes:

and inputting the combined features and the sparse features into the second submodel to obtain initial predicted click probability.

Optionally, the step of inputting the prediction model and the initial sample into a preset local feature diagnosis model independent of the prediction model includes:

splicing the combination features and the sparse features to obtain spliced features;

and inputting the prediction model and the spliced features into a preset local feature diagnosis model irrelevant to the prediction model.

performing multiple disturbances on the spliced features to obtain multiple disturbance samples;

for each disturbance sample, obtaining a sample after disturbance of dense features or a sample after disturbance of sparse features from the disturbance sample;

inputting the samples with the dense features disturbed to a first sub-model to obtain first disturbed combination features;

inputting the sparse features in the samples after the first disturbed combination features and the disturbed dense features into the second submodel to obtain the disturbed predicted click probability corresponding to the samples after the disturbed dense features;

or inputting the dense features of the sample after the sparse features are disturbed to the first sub-model to obtain second disturbed combination features;

and inputting the second post-disturbance combination characteristic and the sparse characteristic in the sample after the sparse characteristic disturbance into the second sub-model, and obtaining the post-disturbance predicted click probability corresponding to the sample after the sparse characteristic disturbance.

Optionally, the first sub-model is: and the second sub-model is a logistic regression LR model or a factorization machine FM model.

In order to achieve the above object, an embodiment of the present invention further discloses an apparatus for interpreting a recommendation result, including: a first obtaining module, a building module, a second obtaining module, and an interpreting module, wherein,

the first obtaining module is used for obtaining a prediction model and a target user corresponding to the recommendation result to be explained; the recommendation result to be interpreted comprises recommendation content; the target user is the user receiving the recommendation result to be explained;

the construction module is used for constructing an initial sample; the initial sample, comprising: the content characteristics of the recommended content and the user characteristics of the target user;

the second obtaining module includes: the device comprises an input submodule, a disturbance submodule, an initial probability obtaining submodule, a disturbance probability obtaining submodule and an importance value obtaining submodule;

the input submodule is used for inputting the prediction model and the initial sample into a preset local characteristic diagnosis model irrelevant to the prediction model;

the disturbance submodule is used for carrying out multiple disturbances on the content characteristics and/or the user characteristics in the initial sample to obtain multiple disturbance samples;

the initial probability obtaining submodule is used for inputting the initial sample into the prediction model to obtain a corresponding initial prediction click probability;

the disturbance probability obtaining submodule is used for respectively inputting each disturbance sample into the prediction model to obtain a plurality of corresponding post-disturbance prediction click probabilities;

the importance value obtaining submodule is used for obtaining the importance value of each characteristic to the predicted click probability according to the difference between the initial predicted click probability and the predicted click probability after each disturbance; wherein the predicted click probability is: the prediction model predicts the click probability of the target user on the recommended content according to an input sample;

and the interpretation module is used for interpreting the recommendation result to be interpreted according to the importance value of each characteristic.

Optionally, the perturbation sub-module includes:

the selection unit is used for selecting an undisturbed feature from the content features and the user features in the initial sample as the undetermined feature;

and the disturbance unit is used for hiding the to-be-determined feature from the initial sample, completing one-time disturbance, obtaining a disturbance sample corresponding to the to-be-determined feature, and triggering the selection unit.

Optionally, the initial probability obtaining sub-module is specifically configured to input content features and user features in an initial sample into the prediction model, so as to obtain a corresponding initial predicted click probability;

the disturbance probability obtaining submodule is specifically configured to input content features and user features in the disturbance sample corresponding to each to-be-determined feature into the prediction model, and obtain a post-disturbance prediction click probability corresponding to each to-be-determined feature.

Optionally, the importance value obtaining submodule is specifically configured to calculate a difference between an initial predicted click probability and a post-disturbance predicted click probability corresponding to each undetermined feature respectively; and taking each obtained difference value as an importance value of the corresponding undetermined characteristic.

the device, still include:

the dividing module is used for dividing all the characteristics into dense characteristics and sparse characteristics according to a preset algorithm for the content characteristics and the user characteristics in the initial sample before inputting the prediction model and the initial sample into a preset local characteristic diagnosis model irrelevant to the prediction model;

the input module is used for inputting the dense features to a first sub-model to obtain combined features;

and the initial probability obtaining submodule is specifically used for inputting the combined features and the sparse features into the second submodel to obtain an initial predicted click probability.

Optionally, the input sub-module includes:

the splicing unit is used for splicing the combination features and the sparse features to obtain spliced features;

and the first input unit is used for inputting the prediction model and the spliced features into a preset local feature diagnosis model irrelevant to the prediction model.

Optionally, the perturbation sub-module is specifically configured to perform multiple perturbations on the spliced features to obtain multiple perturbation samples;

the disturbance probability obtaining submodule comprises: an obtaining unit, a second input unit, and a third input unit, or comprising: an obtaining unit, a fourth input unit, and a fifth input unit;

the obtaining unit is used for obtaining a sample after disturbing dense features or a sample after disturbing sparse features from each disturbed sample;

the second input unit is used for inputting the samples with the dense disturbance characteristics into the first sub-model to obtain first disturbed combination characteristics;

the third input unit is used for inputting the sparse features in the samples after the first disturbed combination features and the disturbed dense features into the second sub-model to obtain the disturbed predicted click probability corresponding to the samples after the disturbed dense features;

the fourth input unit is used for inputting the dense features in the samples after the sparse features are disturbed to the first sub-model to obtain second disturbed combination features;

and the fifth input unit is used for inputting the second post-disturbance combination characteristic and the sparse characteristic in the sample after the sparse characteristic is disturbed into the second sub-model, and obtaining the post-disturbance predicted click probability corresponding to the sample after the sparse characteristic is disturbed.

In order to achieve the above object, an embodiment of the present invention further discloses an electronic device for interpreting a recommendation result, which includes a processor, a communication interface, a memory and a communication bus, wherein the processor, the communication interface and the memory complete communication with each other through the communication bus;

the memory is used for storing a computer program;

the processor is configured to implement any of the above method steps for interpreting the recommendation result when executing the program stored in the memory.

In yet another aspect of the present invention, there is also provided a computer-readable storage medium having stored therein instructions, which when run on a computer, cause the computer to perform any one of the above-described methods of interpreting recommendation results.

In yet another aspect of the present invention, the present invention further provides a computer program product containing instructions, which when run on a computer, causes the computer to execute any of the above methods for interpreting recommendation results.

As can be seen from the foregoing technical solutions, the method, the apparatus, and the electronic device for interpreting a recommendation result, provided by the embodiments of the present invention, obtain a prediction model corresponding to a recommendation result to be interpreted and a target user receiving the recommendation result to be interpreted, and construct an initial sample, where the initial sample includes: the content characteristics of the recommended content and the user characteristics of the target user included in the recommendation result to be explained; and inputting the prediction model and the initial sample into a preset local feature diagnosis model irrelevant to the prediction model to obtain an importance value of each feature for predicting the click probability, and interpreting the recommendation result to be interpreted according to the importance value. The preset local characteristic diagnosis model is a model independent of the prediction model and can be applied to any prediction model. Therefore, the versatility of the interpretation method can be improved.

Of course, it is not necessary for any product or method of practicing the invention to achieve all of the above-described advantages at the same time.

Drawings

In order to more clearly illustrate the embodiments of the present invention or the technical solutions in the prior art, the drawings used in the description of the embodiments or the prior art will be briefly described below.

Fig. 1a is a schematic flow chart of a method for explaining recommendation results according to an embodiment of the present invention;

FIG. 1b is a schematic diagram of the processing of the local feature diagnosis model in the embodiment shown in FIG. 1 a;

fig. 2 is another schematic flow chart of a method for explaining recommendation results according to an embodiment of the present invention;

fig. 3 is a schematic flowchart of a method for explaining recommendation results according to an embodiment of the present invention;

fig. 4 is a schematic structural diagram of an apparatus for interpreting recommendation results according to an embodiment of the present invention;

fig. 5 is a schematic diagram of an electronic device for explaining a recommendation result according to an embodiment of the present invention.

Detailed Description

The technical solutions in the embodiments of the present invention will be described below with reference to the drawings in the embodiments of the present invention.

In order to solve the problem of the prior art, embodiments of the present invention provide a method and an apparatus for interpreting a recommendation result, and an electronic device. The method and the device can be applied to various electronic equipment. First, a method for explaining a recommendation result provided in an embodiment of the present invention is described below.

As shown in fig. 1a, fig. 1a is a schematic flowchart of a method for explaining a recommendation result according to an embodiment of the present invention, and the method may include:

s101: obtaining a prediction model and a target user corresponding to a recommendation result to be explained; the recommendation result to be interpreted comprises recommendation content; the target user is the user who receives the recommendation result to be explained.

In practical applications, the prediction model may be a single model, such as a GBDT (Deep Neural Network) model or a DNN (Deep Neural Network) model, or may be a combination model of multiple models, such as a combination model of GBDT and LR or a combination model of DNN and FM. For example: for video recommendation, the prediction model may be a trained prediction model obtained through multiple training by inputting videos that have been watched by multiple users and personal information of the users and a probability that the users click on the videos that have been watched.

S102: constructing an initial sample; the initial sample, comprising: the content characteristics of the recommended content and the user characteristics of the target user.

For example: for video recommendations, the initial sample may include: characteristics of the recommended video and user characteristics of the target user.

S103: inputting the prediction model and the initial sample into a preset local characteristic diagnosis model irrelevant to the prediction model; performing multiple disturbances on the content characteristics and/or the user characteristics in the initial sample to obtain multiple disturbance samples; inputting the initial samples into a prediction model to obtain corresponding initial predicted click probabilities, and respectively inputting each disturbance sample into the prediction model to obtain corresponding predicted click probabilities after multiple disturbances; and obtaining the importance value of each characteristic to the predicted click probability according to the difference between the initial predicted click probability and the predicted click probability after each disturbance.

In S103, the importance value of each feature to the predicted click probability refers to: each feature of the initial sample has an importance value for predicting the click probability.

As shown in fig. 1b, the process of the local feature diagnosis model includes:

1) inputting the initial sample into a local characteristic diagnosis model, and disturbing the initial sample by the local characteristic diagnosis model to obtain a plurality of disturbed samples;

specifically, the local feature diagnosis model performs multiple perturbations on the content features and/or the user features in the initial sample to obtain multiple perturbation samples.

2) Meanwhile, the local feature diagnosis model also directly inputs the initial sample to the prediction model to obtain the corresponding initial prediction click probability.

3) Then, the local feature diagnosis model inputs all the disturbance samples into the prediction model respectively to obtain a plurality of corresponding post-disturbance prediction click probabilities.

4) Then, the local diagnosis model calculates the importance value of each feature of the initial sample to the predicted click rate according to the initial predicted click probability and the predicted click probability after each disturbance.

Specifically, the local feature diagnosis model may calculate a difference between an initial predicted click probability and each post-disturbance predicted click probability, and then obtain an importance value of each feature of the initial sample to the predicted click probability according to the difference between the initial predicted click probability and each post-disturbance predicted click probability.

5) And finally, outputting the importance value obtained by calculation by the local characteristic diagnosis model.

The above is a processing procedure of the local feature diagnosis model, and through the processing procedure, an importance value of each feature of the initial sample to the predicted click probability can be obtained, so that the recommendation result to be interpreted can be further interpreted by using the importance value.

S104: and interpreting the recommendation result to be interpreted according to the importance value of each feature.

The importance value of each feature of the initial sample may represent the size of the contribution of the feature to the predicted click probability.

By applying the embodiment shown in fig. 1a, the prediction model corresponding to the recommendation result to be explained and the constructed initial sample are input into a preset local feature diagnosis model unrelated to the prediction model, so as to obtain the importance value of each feature for predicting the click probability, and the recommendation result to be explained is explained according to the importance value. The preset local characteristic diagnosis model is a model independent of the prediction model and can be applied to any prediction model. Thus, the versatility of the interpretation method is improved.

Furthermore, in another flowchart of the method for interpreting the recommendation result, in this embodiment, a local feature diagnosis model is used, an undisturbed feature is selected from the content features and the user features in the initial sample, the feature is hidden, a disturbed sample is obtained, and thus, an importance value of each feature is obtained, so as to interpret the recommendation result to be interpreted. Specifically, as shown in fig. 2, the method may include:

s201: obtaining a prediction model and a target user corresponding to a recommendation result to be explained; the recommendation result to be interpreted comprises recommendation content; the target user is the user who receives the recommendation result to be explained.

The recommended content may be a video, an audio, an image, or the like, and is not particularly limited. The prediction model corresponding to the recommendation result to be explained can predict the probability that the recommendation result is clicked by the target user.

S202: constructing an initial sample; the initial sample, comprising: the content characteristics of the recommended content and the user characteristics of the target user.

In one case, the recommended content is a video, and the content feature of the recommended content may be a label of the video, such as an actor, a movie, a situation comedy, or the like, or a video duration, a video release time, or the like. The user characteristics of the target user may be personal information of the user, such as the user's gender, the user's age, or the user's registration number in the video playing system. Thus, constructing the initial sample may be constructing the content characteristics and the user characteristics of the initial sample.

In this embodiment, one or more initial samples may be constructed. Specifically, when there is one target user, an initial sample may be constructed by using the user characteristics of the target user and the content characteristics of the recommended content of the recommendation result to be interpreted received by the target user. When there are a plurality of target users, for each target user, an initial sample may be constructed by using the user characteristics of the target user and the content characteristics of the recommended content of the recommendation result to be interpreted received by the target user, so as to construct a plurality of initial samples.

S203: inputting the prediction model and the initial sample into a preset local characteristic diagnosis model irrelevant to the prediction model; performing multiple disturbances on the content characteristics and/or the user characteristics in the initial sample to obtain multiple disturbance samples; inputting the content characteristics and the user characteristics in the initial sample into a prediction model to obtain corresponding initial prediction click probability; inputting the content characteristics and the user characteristics in the disturbance sample corresponding to each characteristic to be determined into a prediction model to obtain the predicted click probability after disturbance corresponding to each characteristic to be determined; respectively calculating the difference between the initial predicted click probability and the disturbed predicted click probability corresponding to each undetermined characteristic; taking each obtained difference value as an importance value of the corresponding undetermined characteristic; wherein, the predicted click probability is as follows: and the prediction model predicts the click probability of the target user on the recommended content according to the input sample.

In one case, if an initial sample is constructed, the features of the prediction model and the initial sample may be formed into a vector and input into the local feature diagnosis model.

Alternatively, if a plurality of initial samples are constructed, the plurality of initial samples may be input into the local feature diagnosis model in a matrix form, in which one initial sample is used for each row, and the local feature diagnosis model may process data in the matrix row by row. The processing here includes: processing 1) to 5) in step 103 described above.

In S203, performing multiple perturbations on the content features and/or the user features in the initial sample to obtain multiple perturbed samples, which may include the following steps:

1) and selecting an undisturbed feature from the content features and the user features in the initial sample as the pending feature.

2) And hiding the undetermined features from the initial sample, completing one-time disturbance, and obtaining a disturbance sample corresponding to the undetermined features.

The step of hiding the to-be-determined feature may be to set a value of the to-be-determined feature to 0.

3) And returning to the step of selecting one undisturbed feature from the content features and the user features in the initial sample as the feature to be determined.

Through the three steps, each feature in the initial sample can be disturbed once, so that a plurality of disturbed samples can be obtained.

For example, assuming that the content features and the user features in the initial sample are A, B, C, D, E five features, selecting an unperturbed feature a as an undetermined feature, hiding the feature a to obtain a perturbed sample [0, B, C, D, E ] corresponding to the feature a, then selecting an unperturbed feature B as an undetermined feature, hiding the feature B to obtain a perturbed sample [ a, 0, C, D, E ] corresponding to the feature B, and so on, thereby obtaining a plurality of perturbed samples.

After obtaining a plurality of disturbance samples, inputting content characteristics and user characteristics in the initial samples into a prediction model to obtain corresponding initial prediction click probability; inputting the content characteristics and the user characteristics in the disturbance sample corresponding to each characteristic to be determined into a prediction model to obtain the predicted click probability after disturbance corresponding to each characteristic to be determined; respectively calculating the difference between the initial predicted click probability and the disturbed predicted click probability corresponding to each undetermined characteristic; and taking each obtained difference value as an importance value of the corresponding undetermined characteristic.

Continuing with the above example, assuming that the content feature and the user feature A, B, C, D, E in the initial sample are input into the prediction model to obtain an initial predicted click probability of 0.6, the feature a is hidden, the perturbation sample [0, B, C, D, E ] corresponding to the feature a is input into the prediction model to obtain a post-perturbation predicted click probability of 0.1 corresponding to the feature a, the difference between the initial predicted click probability and the post-perturbation predicted click probability corresponding to the feature a is calculated to obtain a difference of 0.6-0.1-0.5, and thus the difference of 0.5 is the importance value of the undetermined feature a.

Of course, the difference may also be a negative number, for example, after the undetermined feature B is hidden, a disturbance sample [ a, 0, C, D, E ] corresponding to the feature B is obtained, the disturbance sample is input into the prediction model, the post-disturbance predicted click probability corresponding to the feature B is obtained to be 0.8, then the difference between the initial predicted click probability and the post-disturbance predicted click probability corresponding to the feature B is calculated, and the difference is obtained to be 0.6-0.8-0.2, so that the difference-0.2 is the importance value of the undetermined feature B.

The input samples are: the user characteristics of the target user and the characteristics of the candidate recommendation results. For example, the candidate recommendation results are: and (6) candidate videos.

S204: and interpreting the recommendation result to be interpreted according to the importance value of each feature.

Since the difference between the initial predicted click probability and the post-perturbation predicted click probability corresponding to the undetermined feature may be a positive number, a negative number, or 0, and the difference represents an importance value, the importance value of each feature may also be a positive number, a negative number, or 0.

The importance value may represent the size of the contribution of the feature to the predicted click probability. The greater the importance value, the greater the contribution of the corresponding feature to the predicted click probability, and conversely, the smaller the importance value, the less the contribution of the corresponding feature to the predicted click probability.

For example, assuming that the importance value of feature a is 0.1 and the importance value of feature B is-0.3, feature a makes a positive contribution to the to-be-interpreted recommendation and feature B makes a negative contribution to the to-be-interpreted recommendation.

In one case, according to the importance value of each feature, the recommendation result to be interpreted is interpreted, which may be: and sorting the importance values of all the features according to the sizes, selecting the features corresponding to the preset number of the maximum importance values according to the sorting result, and explaining the recommendation result to be explained.

For example, if the importance value of the feature a is 0.1, the importance value of the feature B is-0.3, the importance value of the feature C is 0.5, and the importance value of the feature D is 0.8, two features D and C with the largest importance values are selected to explain the recommendation result to be explained.

In another case, the contribution values of each feature may be sorted according to the magnitude of the contribution values, and a preset number of features with the largest contribution values are selected according to the magnitude of the contribution values to explain the recommendation result to be explained; or selecting the characteristic which makes forward contribution to the recommendation result to be explained, and explaining the recommendation result to be explained.

For example, continuing the above example, selecting the feature D with the largest contribution value, and interpreting the recommendation result to be interpreted; or selecting the characteristics D, C and A which make positive contribution to the recommendation result to be interpreted, and interpreting the recommendation result to be interpreted.

Applying the embodiment shown in fig. 2, obtaining a prediction model corresponding to a recommendation result to be interpreted and a target user receiving the recommendation result to be interpreted, constructing an initial sample, and inputting the prediction model and the initial sample into a local feature diagnosis model; selecting undisturbed characteristics from the content characteristics and the user characteristics in the initial sample, hiding the characteristics to obtain a disturbed sample, and further obtaining a difference value between the initial sample and the disturbed sample as an importance value of the hidden characteristics to explain a recommendation result to be explained. The preset local characteristic diagnosis model is a model independent of the prediction model and can be applied to any prediction model. Thus, the versatility of the interpretation method is improved.

Furthermore, in another embodiment of the present invention, a flowchart of a method for interpreting a recommendation result is further provided, where in this embodiment, the prediction model may include a first sub-model and a second sub-model, combine and concatenate all features in the initial sample to obtain a concatenated feature, and input the concatenated feature into a preset local feature diagnosis algorithm to obtain an importance value of each feature, so as to interpret the recommendation result. Specifically, as shown in fig. 3, the method may include:

s301: obtaining a prediction model and a target user corresponding to a recommendation result to be explained; the recommendation result to be interpreted comprises recommendation content; the target user is the user who receives the recommendation result to be explained.

In one case, the prediction model corresponding to the recommendation result to be interpreted may include a first sub-model and a second sub-model, where the first sub-model may be GBDT or DNN, and is used for processing dense features and does not support processing high-dimensional sparse features; the second submodel may be LR or FM for processing sparse features.

S302: constructing an initial sample; the initial sample, comprising: the content characteristics of the recommended content and the user characteristics of the target user.

S303: and dividing all the characteristics into dense characteristics and sparse characteristics according to a preset algorithm for the content characteristics and the user characteristics in the initial sample.

For example, if the initial sample includes the features of the recommended video and the user features of the target user, the features of the initial sample may include dense features such as the user age, the user gender, or the video duration, and sparse features such as the registration number or the video tag of the user.

After the initial sample is constructed, all the features in the initial sample can be numbered, and all the features are divided into dense features and sparse features according to a preset algorithm. Specifically, each feature in the initial sample may be represented as a vector, the length of the vector is used as the number of the feature, the feature corresponding to the number smaller than the preset threshold is divided into dense features, and the feature corresponding to the number larger than the preset threshold is divided into sparse features. For example, if the preset threshold is 10 ten thousand, the features with the numbers smaller than 10 ten thousand are determined as dense features, and the features with the numbers larger than 10 ten thousand are determined as sparse features.

S304: the dense features are input to a first submodel, obtaining combined features.

In practical application, after the dense features are input into the first submodel, the dense features automatically traverse all feature classes in the first submodel, and automatically select all features in the dense features according to entropy gain, so that all features of the dense features are divided into multiple classes, each class is a combined feature, and the features in each class are subjected to discretization coding to obtain a combined feature vector for machine identification processing. Due to the large number of features of the initial sample, the obtained combined features are high-dimensional combined features.

In one case, after obtaining the combined features, the combined features and the sparse features may be input into a second sub-model of the prediction model, and the second sub-model may calculate the input features to obtain an initial predicted click probability.

S305: and splicing the combination characteristic and the sparse characteristic to obtain a spliced characteristic.

In one case, the combination feature and the sparse feature are spliced to obtain the spliced feature, which may be obtained by splicing a feature vector generated by the combination feature and a high-dimensional sparse feature vector generated by the sparse feature.

For example, feature vectors of combined featuresIs [ a ]₁、a₂、…a_n]The high-dimensional sparse feature vector is [ b ]₁、0、0、0…b₂]Splicing the feature vector of the combined feature with the high-dimensional sparse feature vector to obtain a spliced feature vector [ a ]₁、a₂、…a_n、b₁、0、0、0…b₂]。

S306: and inputting the prediction model and the spliced features into a preset local feature diagnosis model irrelevant to the prediction model to obtain the importance value of each feature on the predicted click probability.

In practical application, a local feature diagnosis model is utilized to carry out multiple disturbances on input spliced features to obtain multiple disturbance samples; for each disturbance sample, obtaining a sample after disturbance of dense features or a sample after disturbance of sparse features from the disturbance sample;

for example, continuing the above example, for the input stitched feature [ a ]₁、a₂、…a_n、b₁、0、0、0…b₂]Dense feature A₁Corresponding vector element a₁Hiding to obtain perturbation dense characteristic A₁Later samples [0, a ]₂、…a_n、b₁、0、0、0…b₂]Or sparse feature B₁Corresponding vector element b₁Hiding to obtain a disturbance sparse feature B₁The latter sample [ a₁、a₂、…a_n、0、0、0、0…b₂]。

Inputting the samples with the dense features disturbed to a first sub-model to obtain first disturbed combination features; inputting the sparse features in the samples after the first disturbed combination features and the disturbed dense features into a second submodel to obtain the disturbed predicted click probability corresponding to the samples after the disturbed dense features;

specifically, the first submodel processes the dense features in the sample after the dense features are disturbed to obtain first combination features after disturbance, and the second submodel processes the combination features after the first disturbance and the sparse features in the sample after the dense features are disturbed to obtain predicted click probability after disturbance corresponding to the sample after the dense features are disturbed.

Or inputting the dense features in the sample after the sparse features are disturbed to the first sub-model to obtain second disturbed combination features; and inputting the second post-disturbance combination characteristic and the sparse characteristic in the sample after the sparse characteristic disturbance into a second sub-model, and obtaining the post-disturbance predicted click probability corresponding to the sample after the sparse characteristic disturbance.

Specifically, the first sub-model processes dense features in the sample after the sparse features are disturbed to obtain second post-disturbance combination features, and the second sub-model processes the second post-disturbance combination features and the sparse features in the sample after the sparse features are disturbed to obtain post-disturbance predicted click probability corresponding to the sample after the sparse features are disturbed.

After obtaining the post-disturbance predicted click probability corresponding to the samples after the disturbance of the dense features, obtaining the importance value of each feature to the predicted click probability according to the difference between the initial predicted click probability and the post-disturbance predicted click probability corresponding to the samples after the disturbance of each dense feature, or after obtaining the post-disturbance predicted click probability corresponding to the samples after the disturbance of the sparse features, according to the difference between the initial predicted click probability and the post-disturbance predicted click probability corresponding to the samples after the disturbance of each sparse feature. Here, the importance value of each feature to the predicted click probability refers to: the importance value of each feature in the stitched feature vector to the predicted click probability.

S307: and interpreting the recommendation result to be interpreted according to the importance value of each feature.

By applying the embodiment shown in fig. 3, the features of the initial samples are combined and spliced to obtain splicing features, the splicing features are input into a preset local feature diagnosis model which is irrelevant to a prediction model, the importance value of each feature to the predicted click probability is obtained, and the recommendation result is explained according to the importance value. The preset local characteristic diagnosis model is a model independent of the prediction model and can be applied to any prediction model. Thus, the versatility of the interpretation method is improved.

Corresponding to the method embodiment shown in fig. 1a, an embodiment of the present invention further provides an apparatus for interpreting a recommendation result, and as shown in fig. 4, the apparatus may include: a first obtaining module 401, a building module 402, a second obtaining module 403 and an interpreting module 404, wherein,

the first obtaining module 401 is configured to obtain a prediction model and a target user corresponding to a recommendation result to be interpreted; the recommendation result to be interpreted comprises recommendation content; the target user is the user receiving the recommendation result to be explained;

the constructing module 402 is configured to construct an initial sample; the initial sample, comprising: the content characteristics of the recommended content and the user characteristics of the target user;

the second obtaining module 403 includes: the device comprises an input submodule, a disturbance submodule, an initial probability obtaining submodule, a disturbance probability obtaining submodule and an importance value obtaining submodule;

the interpreting module 404 is configured to interpret the recommendation result to be interpreted according to the importance value of each feature.

By applying the embodiment shown in fig. 4, a prediction model corresponding to a recommendation result to be interpreted and a target user receiving the recommendation result to be interpreted are obtained, and an initial sample is constructed, where the initial sample includes: the content characteristics of the recommended content and the user characteristics of the target user included in the recommendation result to be explained; and inputting the prediction model and the initial sample into a preset local feature diagnosis model irrelevant to the prediction model to obtain an importance value of each feature for predicting the click probability, and interpreting the recommendation result to be interpreted according to the importance value. The preset local characteristic diagnosis model is a model independent of the prediction model and can be applied to any prediction model. Thus, the versatility of the interpretation method is improved.

Specifically, in this embodiment, the perturbation sub-module may include:

Specifically, in this embodiment, the initial probability obtaining submodule is specifically configured to input content features and user features in an initial sample into the prediction model, and obtain a corresponding initial predicted click probability;

Specifically, in this embodiment, the importance value obtaining submodule is specifically configured to calculate differences between the initial predicted click probability and post-disturbance predicted click probabilities corresponding to the respective undetermined features respectively; and taking each obtained difference value as an importance value of the corresponding undetermined characteristic.

Specifically, in this embodiment, the prediction model includes: a first submodel and a second submodel;

the device, still include:

a dividing module (not shown in the figure) for dividing all the characteristics into dense characteristics and sparse characteristics according to a preset algorithm for the content characteristics and the user characteristics in the initial sample before inputting the prediction model and the initial sample into a preset local characteristic diagnosis model irrelevant to the prediction model;

an input module (not shown in the figure) for inputting the dense features to a first submodel to obtain combined features;

Specifically, in this embodiment, the input sub-module includes:

Specifically, in this embodiment, the perturbation submodule is specifically configured to perform multiple perturbations on the spliced features to obtain multiple perturbation samples;

the obtaining unit is used for obtaining a sample with dense characteristics after disturbance or a sample with sparse characteristics after disturbance from each disturbance sample;

Specifically, in this embodiment, the first sub-model is: and the second sub-model is a logistic regression LR model or a factorization machine FM model.

The embodiment of the present invention further provides an electronic device for interpreting a recommendation result, as shown in fig. 5, which includes a processor 501, a communication interface 502, a memory 503 and a communication bus 504, wherein the processor 501, the communication interface 502 and the memory 503 complete mutual communication through the communication bus 504,

a memory 503 for storing a computer program;

the processor 501, when executing the program stored in the memory 503, implements the following steps:

As can be seen, in the scheme provided in the embodiment of the present invention, the prediction model corresponding to the recommendation result to be interpreted and the target user receiving the recommendation result to be interpreted are obtained, and an initial sample is constructed, where the initial sample includes: the content characteristics of the recommended content and the user characteristics of the target user included in the recommendation result to be explained; inputting the characteristics of the prediction model and the initial sample into a preset local characteristic diagnosis model irrelevant to the prediction model to obtain the importance value of each characteristic on the predicted click probability, and interpreting the recommendation result to be interpreted according to the importance value. The preset local characteristic diagnosis model is a model independent of the prediction model and can be applied to any prediction model. Thus, the versatility of the interpretation method is improved.

The communication bus mentioned in the electronic device may be a Peripheral Component Interconnect (PCI) bus, an Extended Industry Standard Architecture (EISA) bus, or the like. The communication bus may be divided into an address bus, a data bus, a control bus, etc. For ease of illustration, only one thick line is shown, but this does not mean that there is only one bus or one type of bus.

The communication interface is used for communication between the electronic equipment and other equipment.

The Memory may include a Random Access Memory (RAM) or a non-volatile Memory (non-volatile Memory), such as at least one disk Memory. Optionally, the memory may also be at least one memory device located remotely from the processor.

The Processor may be a general-purpose Processor, and includes a Central Processing Unit (CPU), a Network Processor (NP), and the like; the Integrated Circuit may also be a Digital Signal Processor (DSP), an Application Specific Integrated Circuit (ASIC), a Field Programmable Gate Array (FPGA) or other Programmable logic device, a discrete Gate or transistor logic device, or a discrete hardware component.

In another embodiment of the present invention, a computer-readable storage medium is further provided, which stores instructions that, when executed on a computer, cause the computer to perform any of the above-mentioned methods for interpreting recommendation results to achieve the same technical effects.

In yet another embodiment of the present invention, there is provided a computer program product containing instructions, which when run on a computer, causes the computer to execute the method for interpreting recommendation results as described in any of the above embodiments, so as to achieve the same technical effect.

In the above embodiments, the implementation may be wholly or partially realized by software, hardware, firmware, or any combination thereof. When implemented in software, may be implemented in whole or in part in the form of a computer program product. The computer program product includes one or more computer instructions. When loaded and executed on a computer, cause the processes or functions described in accordance with the embodiments of the invention to occur, in whole or in part. The computer may be a general purpose computer, a special purpose computer, a network of computers, or other programmable device. The computer instructions may be stored in a computer readable storage medium or transmitted from one computer readable storage medium to another, for example, from one website site, computer, server, or data center to another website site, computer, server, or data center via wired (e.g., coaxial cable, fiber optic, Digital Subscriber Line (DSL)) or wireless (e.g., infrared, wireless, microwave, etc.). The computer-readable storage medium can be any available medium that can be accessed by a computer or a data storage device, such as a server, a data center, etc., that incorporates one or more of the available media. The usable medium may be a magnetic medium (e.g., floppy Disk, hard Disk, magnetic tape), an optical medium (e.g., DVD), or a semiconductor medium (e.g., Solid State Disk (SSD)), among others.

It is noted that, herein, relational terms such as first and second, and the like may be used solely to distinguish one entity or action from another entity or action without necessarily requiring or implying any actual such relationship or order between such entities or actions. Also, the terms "comprises," "comprising," or any other variation thereof, are intended to cover a non-exclusive inclusion, such that a process, method, article, or apparatus that comprises a list of elements does not include only those elements but may include other elements not expressly listed or inherent to such process, method, article, or apparatus. Without further limitation, an element defined by the phrase "comprising an … …" does not exclude the presence of other identical elements in a process, method, article, or apparatus that comprises the element.

All the embodiments in the present specification are described in a related manner, and the same and similar parts among the embodiments may be referred to each other, and each embodiment focuses on the differences from the other embodiments. In particular, for the apparatus embodiment, and the storage medium embodiment, since they are substantially similar to the method embodiment, the description is relatively simple, and in relation to the description, reference may be made to some portions of the description of the method embodiment.

The above description is only for the preferred embodiment of the present invention, and is not intended to limit the scope of the present invention. Any modification, equivalent replacement, or improvement made within the spirit and principle of the present invention shall fall within the protection scope of the present invention.

Claims

1. A method of interpreting recommendations, the method comprising:

according to the importance value of each feature, interpreting the recommendation result to be interpreted;

the step of performing multiple perturbations on the content features and/or the user features in the initial sample to obtain multiple perturbed samples includes:

2. The method of claim 1,

3. The method of claim 2,

the step of obtaining the importance value of each feature to the predicted click probability according to the difference between the initial predicted click probability and the predicted click probability after each disturbance comprises the following steps:

4. The method of claim 1, wherein the predictive model comprises: a first submodel and a second submodel;

inputting the dense features into a first submodel to obtain combined features;

5. The method of claim 4,

the step of inputting the prediction model and the initial sample into a preset local feature diagnosis model independent of the prediction model comprises the following steps:

6. The method of claim 5,

7. The method according to any of claims 4 to 6, wherein the first sub-model is: and the second sub-model is a logistic regression LR model or a factorization machine FM model.

8. An apparatus for interpreting a recommendation, the apparatus comprising: a first obtaining module, a building module, a second obtaining module, and an interpreting module, wherein,

the interpretation module is used for interpreting the recommendation result to be interpreted according to the importance value of each feature;

the perturbation sub-module comprises:

9. The apparatus of claim 8,

the initial probability obtaining submodule is specifically used for inputting content characteristics and user characteristics in an initial sample into the prediction model to obtain corresponding initial prediction click probability;

10. The apparatus of claim 9,

the importance value obtaining submodule is specifically used for respectively calculating the difference between the initial predicted click probability and the disturbed predicted click probability corresponding to each undetermined feature; and taking each obtained difference value as an importance value of the corresponding undetermined characteristic.

11. The apparatus of claim 8, wherein the predictive model comprises: a first submodel and a second submodel;

the device, still include:

12. The apparatus of claim 11, wherein the input submodule comprises:

13. The apparatus of claim 12,

the disturbance submodule is specifically used for carrying out multiple disturbances on the spliced features to obtain multiple disturbance samples;

14. The apparatus of any of claims 11 to 13, wherein the first sub-model is: and the second sub-model is a logistic regression LR model or a factorization machine FM model.

15. An electronic device, comprising a processor, a communication interface, a memory and a communication bus, wherein the processor, the communication interface and the memory communicate with each other via the communication bus;

the memory is used for storing a computer program;

the processor, when executing the program stored in the memory, implementing the method steps of any of claims 1-7.