CN110263257B

CN110263257B - Deep learning based recommendation method for processing multi-source heterogeneous data

Info

Publication number: CN110263257B
Application number: CN201910547320.1A
Authority: CN
Inventors: 冀振燕; 宋晓军; 赵颖斯; 皮怀雨; 李俊东
Original assignee: Beijing Jiaotong University
Current assignee: Beijing Jiaotong University
Priority date: 2019-06-24
Filing date: 2019-06-24
Publication date: 2021-08-17
Anticipated expiration: 2039-06-24
Also published as: CN110263257A

Abstract

In recent years, deep learning has been widely used in the fields of image and audio recognition, text classification, representation learning, and the like, and a recommendation system based on deep learning has become a research focus of students. The deep learning model has excellent effect in the representation learning of specific data such as images, texts and the like, avoids complex feature engineering, can obtain nonlinear multi-level abstract feature representation of heterogeneous data, and overcomes the heterogeneity of various data. At present, a deep learning recommendation model fusing scoring, comments and social networks is not proposed. The recommendation method and the recommendation system have the advantages that based on the deep learning algorithm, a recommendation process with high expansibility is provided, related algorithms and principles suitable for use of different data are analyzed, a final loss function combining comments, scores and social information is provided according to loss functions of the different data, and accuracy of a recommendation result is improved.

Description

Deep learning based recommendation method for processing multi-source heterogeneous data

Technical Field

In recent years, deep learning has been widely used in the fields of image and audio recognition, text classification, representation learning, and the like, and a recommendation system based on deep learning has become a research focus of students. The deep learning model has excellent effect in the representation learning of specific data such as images, texts and the like, avoids complex feature engineering, can obtain nonlinear multi-level abstract feature representation of heterogeneous data, and overcomes the heterogeneity of various data. At present, a deep learning recommendation model fusing scoring, comments and social networks is not proposed. The patent provides a recommendation model with strong expansibility based on a deep learning algorithm.

Background

The current deep learning model cannot be recommended by combining scoring, comments and social network information. Because the feature representation of multi-source heterogeneous data has difficulty, social information and other user and article interaction information cannot be directly fused. If the deep learning method can be adopted to learn the expressions of different heterogeneous data and unify the expressions into a deep learning model, the defect that different algorithms need to be selected in the aspect of algorithm fusion in the prior research is overcome, and the accuracy of the recommendation result is obviously improved by using the deep learning characteristic expression. In order to fully utilize the advantages of the three data, the method and the system fuse the characteristics of scoring and commenting and add social information into the training process, and provide a multi-source heterogeneous data recommendation model based on deep learning.

For review data, the traditional topic model cannot accurately represent the characteristics of text, and the patent learns the characteristic representation of the review document through a PV-DBOW model, which assumes independence between words in the document and uses the document to predict each observed word. PV-DBOW represents each document by a dense vector that is trained to predict words in the document. For the scoring data, the traditional matrix decomposition method faces the difficulties of data sparseness and low accuracy, and the neural network training scoring is used in the method, so that the characteristics of users and articles can be better embodied. For social network data, the patent adds social relationship information of users to a BPR-based pair learning method, so that the method is more reasonable in sampling, and accuracy of recommendation results is improved.

Disclosure of Invention

A recommendation model (or called a recommendation method) capable of processing multi-source heterogeneous data is provided based on deep learning, and the model has the advantages of high accuracy, strong expandability and the like. The model selects a text paragraph representation learning method based on deep learning, designs a neural network based on score learning user and article characteristics, and restrains pair-based learning through a social network. Because the existing text representation learning method based on deep learning is mature, the existing network can be directly used, and the training result and other features can be fused together for training, so that more accurate fusion feature representation can be obtained. The difference between the scoring data and the text data is that the characteristics of the user and the object can be obtained by direct learning, so that the vector representation of the scoring and the characteristics of the entity do not need to be learned.

The deep learning-based multi-source heterogeneous data recommendation model comprises three types of data: commenting, scoring and social networking, each data has its own characteristics, and features users or articles from different angles. The vector representation of various data is learned through a depth model, and then the fusion characteristics of the user or the article are obtained through a series connection method. The comment features reflect the attitude of the user to the article and can also be used for representing the attribute of the article, and the model learns the feature representation of the comment section through a PV-DBOW algorithm and obtains the vector representation of the user or the article through weighting and superposition. The scoring characteristics are overall evaluation of the user on the articles, the satisfaction level of the user on the articles is reflected, and the BPR can be used for learning the nonlinear characteristics of the user and the articles. The social network embodies the friend relationship among the users, indirectly influences the interaction between the users and the articles, and can strengthen the restriction on purchasing actions of the users by utilizing the social network relationship, thereby further improving the accuracy of the recommendation result.

The method comprises the following steps:

(1) text feature extraction: feature vector representations of text paragraphs are learned using a PV-DBOW model. The model uses a Distributed Bag-of-Words model that uses a paragraph vector to predict randomly sampled Words in the paragraph.

(2) And (3) score feature extraction: a two-tier fully connected neural network is used to learn the user's rating of an item. Unlike the text feature learning model, the method can directly obtain the feature vector representation of the user and the article, and does not directly extract the scored features.

(3) User item feature fusion: according to the comment text characteristics obtained in the step (1), the comment characteristic vectors sent by each user can be subjected to weighted summation to obtain user characteristics, and the comment characteristic vectors received by the article are subjected to weighted summation to obtain article characteristics. And finally, fusing the text and the scoring characteristics of the user by using a fusion function to obtain the fusion characteristics of the user, and fusing the text and the scoring characteristics of the article to obtain the fusion characteristics of the article.

(4) Optimization based on BPR: and obtaining a triple with user preference based on social network sampling, and obtaining the optimal model parameter according to Bayesian theory optimization.

(5) Recommending: and (4) inputting the feature vectors of the user and the articles into the model according to the model parameters obtained in the step (4) to recommend the articles for the user.

In the step (1), the text feature extraction includes the following four steps:

text preprocessing

Each paragraph is represented using a one-hot vector, representing a column in the paragraph matrix. And removing the duplication of the words in the comment text, adding the words into a word stock, and representing each word by using a unique one-hot vector. After the construction is completed, each column in the comment matrix can uniquely correspond to one comment.

Word sampling

The model predicts words in the paragraphs using a paragraph vector, where the words in the paragraphs are obtained by randomly sampling the paragraphs. Each word is considered to exist independently in the paragraph, and the order of the words does not affect the learning result of the paragraph vector.

(iii) optimization

And (4) using the vector of the paragraph in the step I as input, using the word obtained by sampling in the step II as output, and continuously iterating to train the paragraph vector model. The model is constructed based on a neural network, a softmax classifier is adopted, and model parameters are obtained by using a random gradient descent method.

Feature representation of comment text

After training is completed, each column of the paragraph matrix is the feature vector of each comment. And multiplying the one-hot vector of each section of comment defined in the step (r) by a matrix to obtain the feature representation of the current section.

The step (2) of scoring feature extraction comprises the following three steps:

construction of neural network

The scoring feature extraction model is based on two layers of fully-connected neural networks and uses an ELU activation function. The input of the neural network is the result of multiplying the characteristic of the user and the characteristic element of the article. The neural network output is the user's score for the item.

(ii) user item feature optimization

And according to the objective function, the grading feature vectors of the user and the goods are continuously optimized by using a random gradient descent method, so that the loss is reduced. And when the prediction score is closer to the actual score, stopping training, and obtaining the score characteristics of the user and the article.

And (3) fusing the characteristics of the step (1) and the step (2) to obtain new fusion characteristics.

The scoring characteristics represent the overall evaluation of the user on the item, and are simple and clear, and the comments include different viewpoints of the user and are more detailed. And (3) fusing the comment and the scoring characteristics to obtain richer and more comprehensive user characteristic representation. The fusion method adopts the mode that the comment feature vector and the score feature vector are connected in series to obtain a fusion feature expression vector.

The optimization based on the BPR in the step (4) mainly comprises the following steps:

generating triples

Because the preferences of users tend to have similarities with their friends, the reflection in reality is that users are more likely to select items that their friends purchase or prefer. The preference similarity between the users is applied to sampling of the BPR model, and a triple which is more consistent with the user behavior is obtained by more reasonably constraining the sampling process, so that the accuracy of subsequent model training and recommendation is improved.

Model optimization

And providing a uniform objective function for model optimization. In the foregoing, a fusion function of multi-source heterogeneous data has been proposed, and an objective function needs to be constructed according to fusion features, so that the fusion features can more accurately represent features of a user or an article in a learning process. The method can be used for solving by using a random gradient descent method, the conventional deep learning framework integrates a random gradient descent algorithm, and final characteristic vectors of users and articles can be obtained by calling a method library.

And (5) recommending the interested items for the user.

The feature vectors of each user and other unpurchased or unpurchased items are multiplied to obtain the preference score of the user for the item, and the higher the score is, the more likely the user is to buy or browse the item. And the Top-N recommendation list of the user can be obtained by arranging the scores of all the articles in a descending order and taking the Top N articles.

Drawings

FIG. 1 is a flow diagram of a hybrid recommendation model based on multi-source heterogeneous data.

Detailed Description

According to the method introduction in the specification, the following steps are required for implementing the recommendation model based on the multi-source heterogeneous data:

(1) text feature extraction

Text preprocessing

Using d_uvTo represent the comment text of the user u on the item v, the words contained in the comment text are represented by w, and the feature vectors of the user and the item learned through the comment of the user on the item use u₁And v₁To indicate, the feature vector of a paragraph uses d_uvTo indicate that the word vector is represented using w and that the words of all comments are stored in the lexicon V. These feature vectors all have a dimension number of K.

Word sampling

For each paragraph, a text region is randomly selected, and words are randomly sampled from the region as a result of training the classifier. The size of the text region and the number of words selected in the region are set manually.

(iii) optimization

Each comment is mapped to a random high-dimensional semantic space, then words contained in the paragraphs are predicted, and learning optimization is performed to obtain a relatively high-dimensional semantic spaceAccurate paragraph feature vector representation. According to the assumption of the bag-of-words model, each word w is in the document d_uvThe probability of occurrence in (a) is calculated using softmax:

where w' represents all words belonging to the lexicon V and exp represents an exponential function with e as base. The probability of any word in the document can be obtained through the formula. In the process of actually maximizing the probability of the occurrence of words, the solution cost of gradient solution is high. In order to reduce the computational overhead, a negative sampling method is often adopted in the computation process, and part of the words in the non-appearing words are sampled according to a predefined noise distribution and are used as negative samples to perform approximate computation, rather than using all the words in the word bank. Based on the strategy of negative sampling, then the objective function of PV-DBOW is defined as:

the above formula adds all combinations of words and documents, wherein

Is word w in document d_uvIf not, the function value is 0.

Representing sigmoid function, t is the number of negative samples,

is shown in the noise distribution P_VIn (1),

the expectation is that.

Feature representation of comment text

According toThe above objective function can obtain the feature representation d of the document_uvSimilar to the recommendation model based on traditional machine learning methods presented herein, the feature vectors of users and items may be represented according to the feature vectors of reviews. However, here the feature representation of the user and the item is no longer calculated from the average of the feature vectors of the comments, but is learned through subsequent model ensemble optimization.

Weighting and adding the feature vectors of all comments of the user and normalizing to obtain a user feature factor:

wherein D_uDenotes the number of comments, p'_ukRepresenting the total probability, W, of the user on topic k_uvRepresenting the weight, p, of user u for the ith comment issued_ukIs a normalized representation thereof. The characteristic factors of the user u are:

p_u＝(p_u1，...，p_uK)

the dimension of the user characteristic factor is K. The article characteristic factor can be calculated using a similar formula:

wherein D_vRepresents all number of reviews, q 'received by the item'_vkRepresenting the total probability of an item on topic k, q_vkIs a normalized representation thereof, W_uvRepresenting the weight of item v against the received u-th review. The characteristic factors of the article are:

q_v＝(q_v1，...，q_vK)

k is the dimension number of the article and is consistent with the user.

Wherein W_uvTo comment d_uvFor the weights of the user u and the item i, the importance degrees of different comments can be distinguished through the weights, so that reasonable user and item characteristics are constructed.

(2) Scoring feature extraction

Construction of neural network

Training with two layers of fully connected neural networks results in a final user-to-item score. Feature vector representations of the user and the item can be directly obtained. Definition of r_uiTo represent the user u's score for item i, then for an arbitrary score r_uiHaving a user r_uAnd corresponding article r_iCorresponding to it. Then two layers of neural network prediction formulas can be obtained:

r_ui＝φ(U₂·φ(U₁(r_u⊙r_i)+c₁)+c₂)

wherein [ ] indicates the multiplication of elements, and φ (x) is an ELU activation function, U₁、U₂、c₁And c₂The weights and bias parameters to be learned.

(ii) user item feature optimization

The objective function is the square of the subtraction of the prediction score and the real score, and the optimal user and item representation can be obtained by optimizing the parameters to ensure that the objective function is minimum.

(3) User item feature fusion

And constructing feature vectors of the user and the article through the interaction information between the user and the article. A function f (-) is proposed that fuses features, assuming that features learned from scoring and text data are denoted x₁，x₂Then, the fused feature can be obtained by the fusion function:

x＝f(x₁，x₂)

wherein x is the fused feature. The simple series connection mode is used for fusion, the expansibility of the characteristics of the user and the article can be enhanced, and the method has important significance for the model based on the multi-source heterogeneous data. The characteristic obtained by the function f (-) is then

(4) BPR-based optimization

Generating triples

According to the purchase or browsing records of the user and the social network, for each user u, an item purchased or browsed by the user is defined as i, an item which is never contacted by the user is defined as j, and an item purchased by a friend of the user is defined as p. All item sets in the system are defined as D, and then the item set purchased and browsed by the user u is defined as D_uThe item purchased by the user's friend is defined as D_p. The items that best represent the user's preferences are first the items D that the user has purchased_iSecond, based on the similarity of friend preferences, a user is likely to purchase an item D that his friend has purchased but that the user has not purchased_p\D_u. Finally, the least likely item for purchase by the user is D \ D (D)_u∪D_p). Constructing triplets of users and items from social network information as a training set, which may be represented as:

T：＝{(u，i，j)|i∈(D_u∪D_p)，j∈D\(D_u∪D_p)}

wherein (u, i, j) is a user item triplet, representing that the user u has a preference degree for the item i greater than that of the item j. The item i belongs to an item which is purchased by the user or purchased by a direct friend of the user, and the item j belongs to an item which is purchased by the user and has not been purchased by the direct friend of the user. Therefore, the user item triples are constructed based on the direct friend relationship of the user and are used for the training of the subsequent BPR model.

Model optimization

It is known from the previous definition that user u has a greater preference for item i than item j. The loss function, expressed in terms of the user and item characteristics, is represented by a function g (·), defined herein as sigmoid function to calculate different degrees of user preference for different items, then g (u, i, j) ═ σ (u, j)^Ti-u^Tj) In that respect Therefore, the objective function of the recommendation model for fusing multi-source heterogeneous data is defined as follows:

wherein W is the weight parameter of each model, and the weight parameters of each comment of the user in the comment representation learning model are different and need to be obtained through learning. In the model of score calculation, the obtained characteristics of the user and the article are direct, namely the weight parameter is set to be 1, and the updating is not required to be carried out through an optimization objective function. Wherein, Θ represents other parameters needing to be learned in the model, and Θ ═ Θ₁，Θ₂}＝{{w，d_uv}，{U₁，U₂，c₁，c₂，r_u，r_i}}. λ is the punishment parameter of each model, and the values are all [0, 1 ]]On the interval. The objective function of the scoring model is preceded by a negative sign because the objective function of the scoring model needs to be minimized, while the objective function of the overall model is maximized.

(5) Recommending

The personalized recommendation list can be obtained by multiplying the feature vectors of the user and the article:

s＝u^Tv

Claims

1. A recommendation method for processing multi-source heterogeneous data based on deep learning comprises the following steps:

(1) text feature extraction: preprocessing the comment text to obtain a comment feature vector of a user and a comment feature vector of an article, wherein a PV-DBOW model is used for learning feature vector representation of a text paragraph; the model adopts a distributed bag-of-words model, and uses a paragraph vector to predict words sampled randomly in the paragraph;

(2) and (3) score feature extraction: learning the scoring of the user on the article by using two layers of fully-connected neural networks, and respectively obtaining the scoring feature vector representations of the user and the article;

(3) fusing user and article characteristics: according to the comment text characteristics obtained in the step (1), for each user, carrying out weighted summation on comment characteristic vectors sent by the user to obtain the user characteristics, carrying out weighted summation on comment characteristic vectors received by the articles to obtain article characteristics, finally, fusing the text and the scoring characteristics of the user by using a fusion function to obtain fusion characteristics of the user, and fusing the text and the scoring characteristics of the articles to obtain fusion characteristics of the articles;

(4) optimizing BPR based on Bayes personalized sorting: obtaining a triple with user preference based on social network sampling, and obtaining an optimal model parameter according to Bayesian theory optimization, wherein the triple is a user item triple, marked as (u, i, j), and represents that the preference degree of a user u for an item i is greater than that of an item j;

(5) recommending: and (4) inputting the fusion features of the user and the fusion feature vectors of the articles into the model according to the model parameters obtained in the step (4) to recommend the articles for the user.

2. The method of claim 1, for the (1) text feature extraction step described, wherein the pre-processing of the comment text uses d_uvTo represent the comment text of the user u on the item v, and the words contained in the comment text are represented by w; feature vector usage u of user and article learned through comment text of user on article₁And v₁To indicate, the feature vector of a paragraph uses d_uvTo indicate that the word vector uses w toRepresenting that all the words of the comments are stored in the word bank V; these feature vectors all have a dimension number of K.

3. The method of claim 1, for the (1) text feature extraction step described, wherein the word sampling randomly selects a text region for each paragraph, and randomly samples words from the region as a result of training the classifier; the size of the text region and the number of words selected in the region are set manually.

4. The method according to claim 2, for the (1) text feature extraction step, each comment segment is mapped into a random high-dimensional semantic space through optimization, then words contained in the paragraph are predicted, and a paragraph feature vector representation is obtained through learning optimization; according to the assumption of the bag-of-words model, each word w is in the document d_uvThe probability of occurrence in (a) is calculated using softmax:

wherein w' represents all words belonging to the lexicon V, exp represents an exponential function with e as the base; the probability of any word in the document is obtained through the formula; in the calculation process, a negative sampling method is adopted, partial words are sampled in the non-appeared words according to a predefined noise distribution and are used as negative samples to carry out approximate calculation instead of using all the words in a word stock; based on the strategy of negative sampling, then the objective function of PV-DBOW is defined as:

the above formula adds all combinations of words and documents, wherein

Is word w in document d_uvIf the number of times does not appear, the function value is 0;

representing sigmoid function, t is the number of negative samples,

is shown in the noise distribution P_VIn

The expectation is that.

5. The method of claim 4, for the (1) text feature extraction step described, wherein the feature representation of the comment text specifically has the following features: obtaining the feature representation d of the document according to the objective function_uvSimilar to the recommendation model based on the traditional machine learning method, the feature vectors of the users and the articles are expressed according to the feature vectors of the comments, and the features of the users and the articles are obtained by subsequent model integration optimization;

wherein D_uDenotes the number of comments, p'_ukRepresenting the total probability, W, of the user on topic k_uvRepresenting the weight, p, of user u for the ith comment issued_ukIs a normalized representation thereof; the characteristic factors of the user u are:

p_u＝(p_u1,…,p_uK)

the dimension of the user characteristic factor is K; the characteristic factor of the article is calculated by adopting the following formula:

wherein D_vRepresents all number of reviews, q 'received by the item'_vkRepresenting the total probability of an item on topic k, q_vkIs a normalized representation thereof, W_uvRepresents the weight of item v for the received u-th review; the characteristic factors of the article are:

q_v＝(q_v1,…,q_vK)

k is the dimension number of the article and is consistent with the dimension number of the user; wherein the weight W_uvThe importance degree of the comments for the user to different comments is distinguished, so that reasonable user and article characteristics are constructed.

6. The method of claim 1, for the (3) user feature fusion step described, constructing feature vectors of the user and the item through interaction information between the user and the item; a function f (-) is proposed that fuses features, assuming that features learned from scoring and text data are denoted x₁,x₂Then, the fused feature can be obtained by the fusion function:

x＝f(x₁,x₂)

wherein x is the fused feature, and the fusion is performed in a simple tandem manner, so that the feature obtained by the function f (-) is

7. The method according to claim 1, for the (4) described optimization step based on the bayesian personalized ranking BPR, wherein the triplet generation is based on the user's purchase or browsing history and social network, for each user u, defining the items purchased or browsed by the user as i, the items never contacted by the user as j, and the items purchased by the friends of the user as p; all item sets in the system are defined as D, and then the item set purchased or browsed by the user u is defined as D_iThe item purchased by the user's friend is defined as D_p(ii) a The items that best represent the user's preferences are first a collection of items D that the user has purchased_u(ii) a Second, based on the similarity of the friend's preferences, the user is likely to purchase an item D that his friend has purchased but that the user has not purchased_p\D_u(ii) a Finally, the least likely item for purchase by the user is D \ D (D)_u∪D_p) (ii) a Constructing triplets of users and items from social network information as a training set, which may be represented as:

T:＝{(u,i,j)|i∈(D_u∪D_p),j∈D\(D_u∪D_p)}

wherein (u, i, j) is a user item triplet, which represents that the preference degree of the user u for the item i is greater than that of the item j; the item i belongs to an item purchased by a user or purchased by a direct friend of the user, and the item j belongs to an item which is not purchased by the user and has not been purchased by the direct friend of the user; therefore, the user item triple is constructed based on the direct friend relationship of the user and is used for training a subsequent Bayesian personalized ranking BPR model.

8. The method according to claim 1, for the described (5) recommendation step, deriving a personalized recommendation list by multiplication of the fused feature vectors of the user and the item:

s＝u^Tv

multiplying the fused feature vectors of each user and other unpurchased or unpurchased items to obtain the preference score of the user for the item, wherein the higher the score is, the more likely the user is to buy or browse the item; and the Top-N recommendation list of the user can be obtained by arranging the scores of all the articles in a descending order and taking the Top N articles.