CN108647226B

CN108647226B - Hybrid recommendation method based on variational automatic encoder

Info

Publication number: CN108647226B
Application number: CN201810253803.6A
Authority: CN
Inventors: 张寅�; 林建实
Original assignee: Zhejiang University ZJU
Current assignee: Zhejiang University ZJU
Priority date: 2018-03-26
Filing date: 2018-03-26
Publication date: 2021-11-02
Anticipated expiration: 2038-03-26
Also published as: CN108647226A

Abstract

The invention discloses a hybrid recommendation method based on a variational automatic encoder. The method comprises the steps of modeling scoring characteristics and content characteristics of users and articles by using a variational automatic encoder, encoding sparse characteristics by a factorization machine, and automatically performing characteristic high-order combination; meanwhile, the multi-view data characteristics of the user and the article are fused into the framework of the variational automatic encoder so as to solve the problem of cold start; and through the variation inference analysis of the hidden vector codes of the user and the article, the method provides the interpretability for the automatic encoder to generate the hidden vector codes; by inputting the characteristics corresponding to the user and the articles, the preference values of the user to the candidate article set can be obtained, and the recommendation results are obtained by sorting according to the preference values. Compared with the traditional recommendation method, the method has better recommendation effect.

Description

Hybrid recommendation method based on variational automatic encoder

Technical Field

The invention relates to a computer recommendation system, in particular to a hybrid recommendation method based on an automatic encoder.

Background

In recent years, with the continuous development of networks and information technologies, the data volume, the generation speed and the complexity of online information are rapidly increased, and the personalized recommendation system has become an important technical means for extracting information from complex data and is widely applied in the industry.

Traditional collaborative filtering-based recommendation methods, particularly matrix factorization series methods, have proven to be very effective in the industry, and although implicit feedback data such as browsing, clicking and collecting are easier to collect than explicit feedback data such as movie scoring, merchandise evaluation, etc., the cold start problem and feature sparsity problem remain important factors that limit the performance of recommendation systems.

In recent years, deep learning has been developed in the fields of graphic images, natural language processing and the like, and has proved its excellent performance in feature processing, so that the application of deep learning to recommendation systems has become an important direction in this field. However, the existing models based on the deep neural network often process the content characteristics of users and articles, and the key of the recommendation system is to describe the interaction relationship between the users and the articles, on this point, the deep learning research is not directly applied, and more hidden vector codes generated are directly utilized to be input into a matrix decomposition frame.

Disclosure of Invention

Aiming at the blank and the defects of the prior art, the invention provides a hybrid recommendation method based on a variational automatic encoder. The technical scheme adopted by the invention is as follows:

the hybrid recommendation method based on the variational automatic encoder comprises the following steps:

(1) processing log data according to a specific application configuration environment to obtain interaction relation information of a user and an article, wherein the interaction relation information comprises implicit feedback and explicit feedback; performing feature processing for different information types: for implicit feedback data, the mark of interactive behavior is 1, otherwise, the mark is 0; recording the specific score value of the display feedback data, and then carrying out normalization processing on the characteristic value;

(2) multi-view information of a user and an article is collected respectively, wherein the multi-view information comprises user portrait information and article content information, and the cold start problem is solved;

(3) collecting recommendation feedback information which is not favored by a user except for the articles with historical behaviors, generating negative samples, and enabling the number of the positive and negative samples to be the same;

(4) constructing a model based on a variational automatic encoder hybrid recommendation method, performing gradient updating of variables in an alternate iteration mode, training the model, and storing final model parameters; for the objects and users with historical interactive behaviors, corresponding hidden vector codes are reserved;

(5) in the prediction stage, the user and the article which have the hidden vector coding are directly used as the input of a generalized matrix decomposition module in the model, and the preference value of the user to the specific article is calculated; for users and articles lacking the hidden vector codes, calculating the corresponding hidden vector codes through a trained model, and calculating preference values of the hidden vector codes;

(6) for a specific user, calculating preference values of the user to the articles in the candidate article set, and sequencing the preference values to obtain a recommended article list of the user;

and (3) regularly arranging logs and repeating the calculation models from (1) to (4) in the execution process of the method, and updating the hidden vector codes of the users and the articles.

Preferably, the step (3) includes: for each user, dividing the article into a positive sample and a negative sample according to the existing interactive behavior, and screening a part of negative samples for the article without the interactive record in a sampling mode.

Preferably, the model based on the variational automatic encoder hybrid recommendation method in the step (4) is composed of three modules in total, including a variational automatic encoder on the user side, a variational automatic encoder on the article side and a generalized matrix decomposition module, wherein the variational automatic encoder is divided into a decoder and an encoder; and (4) after receiving the user and article characteristic values and the corresponding positive and negative sample preference values obtained in the steps (2), (3) and (4), training the model.

Preferably, the gradient updating formula of the variables in the step (4) is as follows:

wherein phi_u,Φ_v,Θ_u,Θ_vAnd Ψ is an encoder parameter of the user-automated encoder, an encoder parameter of the commodity-automated encoder, a decoder parameter of the user-automated encoder, a decoder parameter of the commodity-automated encoder, and a parameter of the generalized matrix decomposition module, θ and Φ are an encoder module parameter and a decoder module parameter, η_u,η_v,η_ΨRespectively, the rate of updating the parameters, Z, of the user-side autoencoder, the article-side autoencoder, and the generalized matrix decomposition module_u,Z_vRespectively, the hidden vector codes, X, being generated by an automatic user-side encoder and an automatic article-side encoder^B,U^BRespectively, a user multi-view feature of random gradient descent batch size B and a scoring feature of the user, Y^B,V^BA multi-view characteristic of the item and a scoring characteristic of the item, respectively, of batch size B, U and V are scoring characteristics of the user and the item, respectively, f_pooling(U),f_pooling(V) user and object, respectivelyAnd outputting the scoring characteristics of the products after the pooling operation.

Preferably, the step (5) includes the steps of:

1) saving the model training parameter phi obtained after the step (4) is implemented_u,Φ_v,Θ_u,Θ_vAnd Ψ for developing a prediction;

2) for users and articles with interactive behaviors, directly reading the stored hidden vector codes; for unknown users and articles, the calculation of the hidden vector coding is carried out through an encoder part;

3) for the encoder part of the user, implicit vector encoding of user i

The calculation formula is as follows:

where g (-) is the activation function of each layer, u_i,x_iRespectively the scoring feature and the multi-view feature of user i,

and

respectively, user i generates a mean vector, a variance vector and an implicit vector code through a variational automatic coder,

is the output result vector of the k hidden layer when the hidden vector of the user is encoded,

the weight vector corresponding to the k hidden layer is calculated when the hidden vector of the user is coded, and is respectively used for processing the output of the hidden layer and the multi-view characteristic input,

calculating the k hidden layer bias term corresponding to the user when encoding the hidden vector, wherein k is 2,3 …, L is the number of hidden layers, and L is the number of hidden layers

Is output aiming at mean value vector when computing implicit vector coding of user

The weight term of (a) is,

The bias term of (a) is,

is output aiming at variance vector when computing implicit vector coding of user

The weight term of (a) is,

The bias term of (d); ε is the number sampled in accordance with a normal distribution with a mean of 0 and a variance of 1;

4) for the encoder part of the article, implicit vector encoding of article i

The calculation formula is as follows:

where g (-) is the activation function of each layer, v_i,y_iRespectively the scoring feature and the multi-view feature of item i,

and

respectively, mean vector, variance vector and implicit vector codes generated by the item i through a variational automatic encoder,

is the output result vector of the k hidden layer when the hidden vector coding of the article is calculated,

the weight vector corresponding to the k hidden layer when calculating the hidden vector code of the article is respectively used for processing the output of the hidden layer and the multi-view characteristic input,

when calculating the hidden vector coding of the article, corresponding to the k hidden layer bias term, k is 2,3 …, L is the number of hidden layers, and L is the number of hidden layers

Is output relative to the mean vector when calculating the hidden vector code of the article

The weight term of (a) is,

The bias term of (a) is,

is the square difference vector output when calculating the hidden vector code of the article

The weight term of (a) is,

The weight term bias term of (1); ε is the mean of coincidence value of 0 and the variance of1 is sampled by a normal distribution;

5) calculating the scoring preference value of the user to the item, wherein the formula is as follows:

R＝f_Ψ(Z_u，Z_v)

wherein Z_uHidden vector coding for users, Z_vFor implicit vector coding of articles, f_Ψ(. h) is a function fitted to a neural network architecture with Ψ as a parameter.

According to the method, the grading feature and the content feature of the user and the article are modeled by using the variational automatic encoder, meanwhile, the multi-view data feature of the user and the article is fused into the framework of the variational automatic encoder, and the preference value of the user to the candidate article set can be obtained and the recommendation result can be obtained through the variational inference analysis of the hidden vector encoding of the user and the article. Compared with the traditional recommendation method, the method has better recommendation effect.

Drawings

FIG. 1 is an overall model diagram of a hybrid recommendation method based on a variational auto-encoder;

fig. 2 is a diagram of a user-side variational autoencoder network architecture.

Detailed Description

In order to make the objects, technical solutions and advantages of the present invention more apparent, the present invention is further described in detail below with reference to the accompanying drawings.

(1) processing log data of the system according to a specific application configuration environment, and obtaining interaction relation information such as browsing, collection, clicking, comment and the like of a user and an article through construction of a data warehouse and cleaning of characteristic data, wherein the interaction relation information mainly comprises implicit feedback and explicit feedback; performing feature processing for different information types: for implicit feedback data, the mark of interactive behavior is 1, otherwise, the mark is 0; recording the specific score value of the display feedback data; then, normalizing the characteristic value;

(2) multi-view information of users and articles is collected respectively and managed in a data warehouse mode, so that the problem of cold start is solved: collecting user portrait information, such as user age, gender, school, specialty, past behavior records, etc.; collecting content information of the article, such as graphic features of pictures, features extracted by describing texts through a natural language processing method, click rate and collection rate of the article, and the like;

(3) collecting other recommendation feedback information of the user except for the articles with the historical behavior records, for example, the user presents preference to some articles, generating negative samples, and then using the negative samples to make the integral positive and negative sample numbers approximately same;

the method comprises the following steps: for each user, dividing the article into a positive sample and a negative sample according to the existing interactive behavior, and screening a part of negative samples for the article without the interactive record in a sampling mode.

in this step, the model construction process based on the variational automatic encoder hybrid recommendation method is as follows:

the model based on the variational automatic encoder hybrid recommendation method mainly comprises three parts, namely an automatic encoder framework on the left side and an automatic encoder framework on the right side and a Multi-Layer neural network framework in the middle, namely an MLP (Multi-Layer Perceptin) module in FIG. 1. Two automatic encoder frameworks are adopted to encode a user and an article respectively, and respectively generate hidden Vector representations (tension vectors) of the user and the article and a pooling layer result of a factorization machine as the input of a multilayer neural network. Multi-view feature x for simultaneous user and item_iAnd y_jConcatenating into the input of each hidden layer of the Encoder (Encoder) module to enable the hidden vector representation of the user and the article to learn information from multiple view sources, when there is no corresponding scoring information for a new user or new articleAnd generating a hidden vector by the data of other views for score estimation, thereby relieving the cold start problem.

As shown in FIG. 2, taking the user-side encoder as an example, U is defined as user rating data, X is the characteristic of multi-view data of the user, and the characteristic concatenation is performed only on the encoder side, phi_uAs parameter configuration on the encoder side, and Z_uIs the intermediate layer of the generated implicit vector code, passed through the decoder Θ_uSeparately reconstruct the scoring data

And multi-view data

The derivation is performed on the user side, the architecture on the article side is similar, and for the sake of symbolic representation, Θ is not distinguished in this section_uAnd Θ. Process conditional probability P of reconstructing back to original inputs U and X from low-dimensional hidden vector Z_θ(X, U | Z) denotes that θ is a parameter in the reconstruction process, and according to the maximum likelihood estimation, the goal is to maximize the likelihood probability P (X, U) ═ P (X, U; Z, θ), and the unknown implicit vector code Z and the reconstruction process θ are solved, so that the probability of reconstructing the original input X and U is maximized:

the posterior probability P (Z | X, U) of the hidden vector Z is not computable, and one way to vary the autoencoder is to introduce Q_φ(Z | X, U) to approximate P_θ(Z|X,U)。

Specifically, on the right side of the above equation, + logQ_φ(Z|X,U)-logQ_φ(Z | X, U) is as follows:

both sides simultaneously pair Q_φ(Z | X, U) is expected to:

the goal of maximum likelihood estimation is to make the likelihood probability P of a sample_θ(X, U) is maximum and due to Q_φ(Z | X, U) is P_θApproximate distribution of (Z | X, U), KL (Q)_φ(Z|X,U)||P_θ(Z | X, U)) > 0, so:

then

Is the Lower Bound of likelihood probability, called the Variational Lower Bound (Variational Lower Bound).

Transformed by a bayesian formula:

maximizing the lower bound of likelihood probability requires approximating distribution Q using assumptions_φUnder (Z | X, U) conditions, the expected maximum of the probabilities of X and U can be generated, while the distribution used can be such that the assumed distribution approximates the a priori distribution of Z. Only the optimal solution theta and phi of the lower bound of the maximum likelihood probability is obtained, an automatic encoder can be designed, and P is calculated_θThe process of (X, U | Z) is represented as a generator, from a given P_θ(Z) in the case of prior probability distribution, X and U, i.e., the data points in the sample, are generated by the generator with the highest probability of occurrence when the reconstruction error for the original inputs X and U is the smallest.

Similarly, the lower bound of the optimization variation of the network architecture on the item side can be obtained, and for consistency of notation, the optimization objective on the user side is re-described, as follows:

consistent with the conventional matrix decomposition-based recommendation method, the distributions of p (z) and q (z | x, u) by assuming implicit vector coding are gaussian distributions. And then expanding to respectively obtain the optimization targets of the user and the article, namely the lower variation bound is as follows:

combining the optimization objective of the generalized matrix decomposition model, the final loss function is:

the model based on the hybrid recommendation method of the variational automatic encoder consists of three modules in total, including the variational automatic encoder at one side of a user, the variational automatic encoder at one side of an article and a generalized matrix decomposition module, wherein the variational automatic encoder is divided according to a decoder and an encoder, parameters of five parts are totally included, and an updated recursion formula is as follows:

wherein phi_u,Φ_v,Θ_u,Θ_vAnd Ψ is an encoder parameter of the user-automated encoder, an encoder parameter of the commodity-automated encoder, a decoder parameter of the user-automated encoder, a decoder parameter of the commodity-automated encoder, and a parameter of the generalized matrix decomposition module, θ and Φ are general names of an encoder module parameter and a decoder module parameter, η_u,η_v,η_ΨRespectively, the rate of updating the parameters, Z, of the user-side autoencoder, the article-side autoencoder, and the generalized matrix decomposition module_u,Z_vRespectively, the hidden vector codes, X, being generated by an automatic user-side encoder and an automatic article-side encoder^B,U^BRespectively, a user multi-view feature of random gradient descent batch size B and a scoring feature of the user, Y^B,V^BA multi-view characteristic of the item and a scoring characteristic of the item, respectively, of batch size B, U and V are scoring characteristics of the user and the item, respectively, f_pooling(U),f_pooling(V) is the output of the user and item scoring features after the pooling operation, respectively. And alternately and iteratively optimizing the parameters of the five parts by an optimization mode such as gradient descent and the like. Meanwhile, the final parameters of the five parts are reserved, and the hidden vector codes of the existing users and articles can be obtained through calculation and stored, so that the existing users and articles can be directly used as the input of the generalized matrix decomposition module when the existing users and articles are encountered subsequently, and the calculation is accelerated.

the method comprises the following substeps:

1) saving the model training parameter phi obtained after the step (4) is implemented_u,Φ_v,Θ_u,Θ_vAnd Ψ, develop a prediction;

2) for users and articles with interactive behaviors, directly reading the stored hidden vector codes; for unknown users and articles, calculating through an encoder part;

3) for the encoder part of the user, implicit vector encoding of user i

The calculation formula is as follows:

and

The weight term of (a) is,

The bias term of (a) is,

The weight term of (a) is,

is to calculate the hour hand of the implicit vector coding of the userFor variance vector output

4) for the encoder part of the article, implicit vector encoding of article i

The calculation formula is as follows:

and

The weight term of (a) is,

The bias term of (a) is,

The weight term of (a) is,

The weight term bias term of (1); ε is the value sampled in accordance with a normal distribution with a mean of 0 and a variance of 1；

R＝f_Ψ(Z_u，Z_v)

wherein Z_uHidden vector coding for users, Z_vFor implicit vector coding of articles, f_Ψ(. h) is a function fitted to the neural network architecture with Ψ as a parameter, which can be chosen to correspond to the form desired.

in the whole recommendation method execution process, the system logs are generated continuously, so that the logs need to be regularly sorted, the calculation models from (1) to (4) need to be repeated, and the hidden vector codes of the users and the articles need to be updated.

Claims

1. A hybrid recommendation method based on a variational automatic encoder is characterized by comprising the following steps:

(2) collecting multi-view information of a user and an article respectively, wherein the multi-view information comprises user portrait information and article content information;

2. The variational automatic encoder-based hybrid recommendation method according to claim 1, wherein said step (3) comprises: for each user, dividing the article into a positive sample and a negative sample according to the existing interactive behavior, and screening a part of negative samples for the article without the interactive record in a sampling mode.

3. The variational automatic encoder-based hybrid recommendation method according to claim 1, wherein the variational automatic encoder-based hybrid recommendation method model in step (4) is composed of a total of three modules, including a user-side variational automatic encoder, an article-side variational automatic encoder and a generalized matrix decomposition module, wherein the variational automatic encoder is divided into a decoder and an encoder; and (4) after receiving the user and article characteristic values and the corresponding positive and negative sample preference values obtained in the steps (2), (3) and (4), training the model.

4. The method for recommending a mixture based on a variational automatic encoder according to claim 1, wherein said step (4) comprises the following gradient update formula of variables:

wherein phi_u，Φ_v，Θ_u，Θ_vAnd Ψ is an encoder parameter of the user-automated encoder, an encoder parameter of the commodity-automated encoder, a decoder parameter of the user-automated encoder, a decoder parameter of the commodity-automated encoder, and a parameter of the generalized matrix decomposition module, θ and Φ are an encoder module parameter and a decoder module parameter, η_u，η_v，η_ΨRespectively, the rate of updating the parameters, Z, of the user-side autoencoder, the article-side autoencoder, and the generalized matrix decomposition module_u，Z_vRespectively, the hidden vector codes, X, being generated by an automatic user-side encoder and an automatic article-side encoder^B，U^BRespectively, a user multi-view feature of random gradient descent batch size B and a scoring feature of the user, Y^B，V^BA multi-view characteristic of the item and a scoring characteristic of the item, respectively, of batch size B, U and V are scoring characteristics of the user and the item, respectively, f_pooling(U)，f_pooling(V) output of the user and item scoring features after pooling operations, respectively;

a parameter gradient representing a batch size B;

representing a global parameter gradient.

5. The variational automatic encoder-based hybrid recommendation method according to claim 1, characterized in that said step (5) comprises the steps of:

1) saving the model training parameter phi obtained after the step (4) is implemented_u，Φ_v，Θ_u，Θ_vAnd Ψ for developing a prediction;

3) for the encoder part of the user, implicit vector encoding of user i

The calculation formula is as follows:

where g (-) is the activation function of each layer, u_i，x_iRespectively the scoring feature and the multi-view feature of user i,

and

is the output result vector of the k-th hidden layer during the calculation of the hidden vector encoding of the user, W_k ^(en)，V_k ^(en)The weight vector corresponding to the k hidden layer is calculated when the hidden vector of the user is coded, and is respectively used for processing the output of the hidden layer and the multi-view characteristic input,

calculating a hidden vector coding of a user corresponding to a k hidden layer bias term, wherein k is 2,3, L is the number of hidden layers, and W is the number of hidden layers_L ^(μ)，V_L ^(μ)Is output aiming at mean value vector when computing implicit vector coding of user

The weight term of (a) is,

Bias term of (1), W_L ^(σ)，V_L ^(σ)Is output aiming at variance vector when computing implicit vector coding of user

The weight term of (a) is,

4) for the encoder part of the article, implicit vector encoding of article i

The calculation formula is as follows:

wherein g (. cndot.) is the excitation of each layerLive function, v_i，y_iRespectively the scoring feature and the multi-view feature of item i,

and

when the hidden vector coding of the object is calculated, corresponding to a k hidden layer bias term, k is 2,3, L is the number of hidden layers, and L is the number of hidden layers

The weight term of (a) is,

The bias term of (a) is,

The weight term of (a) is,

The weight term bias term of (1); ε is the number sampled in accordance with a normal distribution with a mean of 0 and a variance of 1;

R＝f_Ψ(Z_u，Z_v)