CN110263257B - Deep learning based recommendation method for processing multi-source heterogeneous data - Google Patents

Deep learning based recommendation method for processing multi-source heterogeneous data Download PDF

Info

Publication number
CN110263257B
CN110263257B CN201910547320.1A CN201910547320A CN110263257B CN 110263257 B CN110263257 B CN 110263257B CN 201910547320 A CN201910547320 A CN 201910547320A CN 110263257 B CN110263257 B CN 110263257B
Authority
CN
China
Prior art keywords
user
item
feature
text
words
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Active
Application number
CN201910547320.1A
Other languages
Chinese (zh)
Other versions
CN110263257A (en
Inventor
冀振燕
宋晓军
赵颖斯
皮怀雨
李俊东
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Beijing Jiaotong University
Original Assignee
Beijing Jiaotong University
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Beijing Jiaotong University filed Critical Beijing Jiaotong University
Priority to CN201910547320.1A priority Critical patent/CN110263257B/en
Publication of CN110263257A publication Critical patent/CN110263257A/en
Application granted granted Critical
Publication of CN110263257B publication Critical patent/CN110263257B/en
Active legal-status Critical Current
Anticipated expiration legal-status Critical

Links

Images

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F16/00Information retrieval; Database structures therefor; File system structures therefor
    • G06F16/90Details of database functions independent of the retrieved data types
    • G06F16/95Retrieval from the web
    • G06F16/953Querying, e.g. by the use of web search engines
    • G06F16/9535Search customisation based on user profiles and personalisation
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F18/00Pattern recognition
    • G06F18/20Analysing
    • G06F18/24Classification techniques
    • G06F18/241Classification techniques relating to the classification model, e.g. parametric or non-parametric approaches
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06NCOMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
    • G06N3/00Computing arrangements based on biological models
    • G06N3/02Neural networks
    • G06N3/04Architecture, e.g. interconnection topology
    • G06N3/045Combinations of networks
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06NCOMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
    • G06N3/00Computing arrangements based on biological models
    • G06N3/02Neural networks
    • G06N3/08Learning methods
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06QINFORMATION AND COMMUNICATION TECHNOLOGY [ICT] SPECIALLY ADAPTED FOR ADMINISTRATIVE, COMMERCIAL, FINANCIAL, MANAGERIAL OR SUPERVISORY PURPOSES; SYSTEMS OR METHODS SPECIALLY ADAPTED FOR ADMINISTRATIVE, COMMERCIAL, FINANCIAL, MANAGERIAL OR SUPERVISORY PURPOSES, NOT OTHERWISE PROVIDED FOR
    • G06Q30/00Commerce
    • G06Q30/06Buying, selling or leasing transactions
    • G06Q30/0601Electronic shopping [e-shopping]
    • G06Q30/0631Item recommendations

Landscapes

  • Engineering & Computer Science (AREA)
  • Theoretical Computer Science (AREA)
  • Physics & Mathematics (AREA)
  • General Physics & Mathematics (AREA)
  • Data Mining & Analysis (AREA)
  • General Engineering & Computer Science (AREA)
  • Life Sciences & Earth Sciences (AREA)
  • Artificial Intelligence (AREA)
  • Business, Economics & Management (AREA)
  • Databases & Information Systems (AREA)
  • Evolutionary Computation (AREA)
  • Computational Linguistics (AREA)
  • Accounting & Taxation (AREA)
  • Health & Medical Sciences (AREA)
  • Biomedical Technology (AREA)
  • Biophysics (AREA)
  • Finance (AREA)
  • General Health & Medical Sciences (AREA)
  • Molecular Biology (AREA)
  • Computing Systems (AREA)
  • Mathematical Physics (AREA)
  • Software Systems (AREA)
  • Bioinformatics & Computational Biology (AREA)
  • Bioinformatics & Cheminformatics (AREA)
  • Evolutionary Biology (AREA)
  • Computer Vision & Pattern Recognition (AREA)
  • Development Economics (AREA)
  • Economics (AREA)
  • Marketing (AREA)
  • Strategic Management (AREA)
  • General Business, Economics & Management (AREA)
  • Machine Translation (AREA)
  • Information Retrieval, Db Structures And Fs Structures Therefor (AREA)

Abstract

In recent years, deep learning has been widely used in the fields of image and audio recognition, text classification, representation learning, and the like, and a recommendation system based on deep learning has become a research focus of students. The deep learning model has excellent effect in the representation learning of specific data such as images, texts and the like, avoids complex feature engineering, can obtain nonlinear multi-level abstract feature representation of heterogeneous data, and overcomes the heterogeneity of various data. At present, a deep learning recommendation model fusing scoring, comments and social networks is not proposed. The recommendation method and the recommendation system have the advantages that based on the deep learning algorithm, a recommendation process with high expansibility is provided, related algorithms and principles suitable for use of different data are analyzed, a final loss function combining comments, scores and social information is provided according to loss functions of the different data, and accuracy of a recommendation result is improved.

Description

Deep learning based recommendation method for processing multi-source heterogeneous data
Technical Field
In recent years, deep learning has been widely used in the fields of image and audio recognition, text classification, representation learning, and the like, and a recommendation system based on deep learning has become a research focus of students. The deep learning model has excellent effect in the representation learning of specific data such as images, texts and the like, avoids complex feature engineering, can obtain nonlinear multi-level abstract feature representation of heterogeneous data, and overcomes the heterogeneity of various data. At present, a deep learning recommendation model fusing scoring, comments and social networks is not proposed. The patent provides a recommendation model with strong expansibility based on a deep learning algorithm.
Background
The current deep learning model cannot be recommended by combining scoring, comments and social network information. Because the feature representation of multi-source heterogeneous data has difficulty, social information and other user and article interaction information cannot be directly fused. If the deep learning method can be adopted to learn the expressions of different heterogeneous data and unify the expressions into a deep learning model, the defect that different algorithms need to be selected in the aspect of algorithm fusion in the prior research is overcome, and the accuracy of the recommendation result is obviously improved by using the deep learning characteristic expression. In order to fully utilize the advantages of the three data, the method and the system fuse the characteristics of scoring and commenting and add social information into the training process, and provide a multi-source heterogeneous data recommendation model based on deep learning.
For review data, the traditional topic model cannot accurately represent the characteristics of text, and the patent learns the characteristic representation of the review document through a PV-DBOW model, which assumes independence between words in the document and uses the document to predict each observed word. PV-DBOW represents each document by a dense vector that is trained to predict words in the document. For the scoring data, the traditional matrix decomposition method faces the difficulties of data sparseness and low accuracy, and the neural network training scoring is used in the method, so that the characteristics of users and articles can be better embodied. For social network data, the patent adds social relationship information of users to a BPR-based pair learning method, so that the method is more reasonable in sampling, and accuracy of recommendation results is improved.
Disclosure of Invention
A recommendation model (or called a recommendation method) capable of processing multi-source heterogeneous data is provided based on deep learning, and the model has the advantages of high accuracy, strong expandability and the like. The model selects a text paragraph representation learning method based on deep learning, designs a neural network based on score learning user and article characteristics, and restrains pair-based learning through a social network. Because the existing text representation learning method based on deep learning is mature, the existing network can be directly used, and the training result and other features can be fused together for training, so that more accurate fusion feature representation can be obtained. The difference between the scoring data and the text data is that the characteristics of the user and the object can be obtained by direct learning, so that the vector representation of the scoring and the characteristics of the entity do not need to be learned.
The deep learning-based multi-source heterogeneous data recommendation model comprises three types of data: commenting, scoring and social networking, each data has its own characteristics, and features users or articles from different angles. The vector representation of various data is learned through a depth model, and then the fusion characteristics of the user or the article are obtained through a series connection method. The comment features reflect the attitude of the user to the article and can also be used for representing the attribute of the article, and the model learns the feature representation of the comment section through a PV-DBOW algorithm and obtains the vector representation of the user or the article through weighting and superposition. The scoring characteristics are overall evaluation of the user on the articles, the satisfaction level of the user on the articles is reflected, and the BPR can be used for learning the nonlinear characteristics of the user and the articles. The social network embodies the friend relationship among the users, indirectly influences the interaction between the users and the articles, and can strengthen the restriction on purchasing actions of the users by utilizing the social network relationship, thereby further improving the accuracy of the recommendation result.
The method comprises the following steps:
(1) text feature extraction: feature vector representations of text paragraphs are learned using a PV-DBOW model. The model uses a Distributed Bag-of-Words model that uses a paragraph vector to predict randomly sampled Words in the paragraph.
(2) And (3) score feature extraction: a two-tier fully connected neural network is used to learn the user's rating of an item. Unlike the text feature learning model, the method can directly obtain the feature vector representation of the user and the article, and does not directly extract the scored features.
(3) User item feature fusion: according to the comment text characteristics obtained in the step (1), the comment characteristic vectors sent by each user can be subjected to weighted summation to obtain user characteristics, and the comment characteristic vectors received by the article are subjected to weighted summation to obtain article characteristics. And finally, fusing the text and the scoring characteristics of the user by using a fusion function to obtain the fusion characteristics of the user, and fusing the text and the scoring characteristics of the article to obtain the fusion characteristics of the article.
(4) Optimization based on BPR: and obtaining a triple with user preference based on social network sampling, and obtaining the optimal model parameter according to Bayesian theory optimization.
(5) Recommending: and (4) inputting the feature vectors of the user and the articles into the model according to the model parameters obtained in the step (4) to recommend the articles for the user.
In the step (1), the text feature extraction includes the following four steps:
text preprocessing
Each paragraph is represented using a one-hot vector, representing a column in the paragraph matrix. And removing the duplication of the words in the comment text, adding the words into a word stock, and representing each word by using a unique one-hot vector. After the construction is completed, each column in the comment matrix can uniquely correspond to one comment.
Word sampling
The model predicts words in the paragraphs using a paragraph vector, where the words in the paragraphs are obtained by randomly sampling the paragraphs. Each word is considered to exist independently in the paragraph, and the order of the words does not affect the learning result of the paragraph vector.
(iii) optimization
And (4) using the vector of the paragraph in the step I as input, using the word obtained by sampling in the step II as output, and continuously iterating to train the paragraph vector model. The model is constructed based on a neural network, a softmax classifier is adopted, and model parameters are obtained by using a random gradient descent method.
Feature representation of comment text
After training is completed, each column of the paragraph matrix is the feature vector of each comment. And multiplying the one-hot vector of each section of comment defined in the step (r) by a matrix to obtain the feature representation of the current section.
The step (2) of scoring feature extraction comprises the following three steps:
construction of neural network
The scoring feature extraction model is based on two layers of fully-connected neural networks and uses an ELU activation function. The input of the neural network is the result of multiplying the characteristic of the user and the characteristic element of the article. The neural network output is the user's score for the item.
(ii) user item feature optimization
And according to the objective function, the grading feature vectors of the user and the goods are continuously optimized by using a random gradient descent method, so that the loss is reduced. And when the prediction score is closer to the actual score, stopping training, and obtaining the score characteristics of the user and the article.
And (3) fusing the characteristics of the step (1) and the step (2) to obtain new fusion characteristics.
The scoring characteristics represent the overall evaluation of the user on the item, and are simple and clear, and the comments include different viewpoints of the user and are more detailed. And (3) fusing the comment and the scoring characteristics to obtain richer and more comprehensive user characteristic representation. The fusion method adopts the mode that the comment feature vector and the score feature vector are connected in series to obtain a fusion feature expression vector.
The optimization based on the BPR in the step (4) mainly comprises the following steps:
generating triples
Because the preferences of users tend to have similarities with their friends, the reflection in reality is that users are more likely to select items that their friends purchase or prefer. The preference similarity between the users is applied to sampling of the BPR model, and a triple which is more consistent with the user behavior is obtained by more reasonably constraining the sampling process, so that the accuracy of subsequent model training and recommendation is improved.
Model optimization
And providing a uniform objective function for model optimization. In the foregoing, a fusion function of multi-source heterogeneous data has been proposed, and an objective function needs to be constructed according to fusion features, so that the fusion features can more accurately represent features of a user or an article in a learning process. The method can be used for solving by using a random gradient descent method, the conventional deep learning framework integrates a random gradient descent algorithm, and final characteristic vectors of users and articles can be obtained by calling a method library.
And (5) recommending the interested items for the user.
The feature vectors of each user and other unpurchased or unpurchased items are multiplied to obtain the preference score of the user for the item, and the higher the score is, the more likely the user is to buy or browse the item. And the Top-N recommendation list of the user can be obtained by arranging the scores of all the articles in a descending order and taking the Top N articles.
Drawings
FIG. 1 is a flow diagram of a hybrid recommendation model based on multi-source heterogeneous data.
Detailed Description
According to the method introduction in the specification, the following steps are required for implementing the recommendation model based on the multi-source heterogeneous data:
(1) text feature extraction
Text preprocessing
Using duvTo represent the comment text of the user u on the item v, the words contained in the comment text are represented by w, and the feature vectors of the user and the item learned through the comment of the user on the item use u1And v1To indicate, the feature vector of a paragraph uses duvTo indicate that the word vector is represented using w and that the words of all comments are stored in the lexicon V. These feature vectors all have a dimension number of K.
Word sampling
For each paragraph, a text region is randomly selected, and words are randomly sampled from the region as a result of training the classifier. The size of the text region and the number of words selected in the region are set manually.
(iii) optimization
Each comment is mapped to a random high-dimensional semantic space, then words contained in the paragraphs are predicted, and learning optimization is performed to obtain a relatively high-dimensional semantic spaceAccurate paragraph feature vector representation. According to the assumption of the bag-of-words model, each word w is in the document duvThe probability of occurrence in (a) is calculated using softmax:
Figure GDA0002974400780000031
where w' represents all words belonging to the lexicon V and exp represents an exponential function with e as base. The probability of any word in the document can be obtained through the formula. In the process of actually maximizing the probability of the occurrence of words, the solution cost of gradient solution is high. In order to reduce the computational overhead, a negative sampling method is often adopted in the computation process, and part of the words in the non-appearing words are sampled according to a predefined noise distribution and are used as negative samples to perform approximate computation, rather than using all the words in the word bank. Based on the strategy of negative sampling, then the objective function of PV-DBOW is defined as:
Figure GDA0002974400780000032
the above formula adds all combinations of words and documents, wherein
Figure GDA0002974400780000033
Is word w in document duvIf not, the function value is 0.
Figure GDA0002974400780000034
Representing sigmoid function, t is the number of negative samples,
Figure GDA0002974400780000035
is shown in the noise distribution PVIn (1),
Figure GDA0002974400780000036
the expectation is that.
Feature representation of comment text
According toThe above objective function can obtain the feature representation d of the documentuvSimilar to the recommendation model based on traditional machine learning methods presented herein, the feature vectors of users and items may be represented according to the feature vectors of reviews. However, here the feature representation of the user and the item is no longer calculated from the average of the feature vectors of the comments, but is learned through subsequent model ensemble optimization.
Weighting and adding the feature vectors of all comments of the user and normalizing to obtain a user feature factor:
Figure GDA0002974400780000037
Figure GDA0002974400780000038
wherein DuDenotes the number of comments, p'ukRepresenting the total probability, W, of the user on topic kuvRepresenting the weight, p, of user u for the ith comment issuedukIs a normalized representation thereof. The characteristic factors of the user u are:
pu=(pu1,...,puK)
the dimension of the user characteristic factor is K. The article characteristic factor can be calculated using a similar formula:
Figure GDA0002974400780000039
Figure GDA00029744007800000310
wherein DvRepresents all number of reviews, q 'received by the item'vkRepresenting the total probability of an item on topic k, qvkIs a normalized representation thereof, WuvRepresenting the weight of item v against the received u-th review. The characteristic factors of the article are:
qv=(qv1,...,qvK)
k is the dimension number of the article and is consistent with the user.
Wherein WuvTo comment duvFor the weights of the user u and the item i, the importance degrees of different comments can be distinguished through the weights, so that reasonable user and item characteristics are constructed.
(2) Scoring feature extraction
Construction of neural network
Training with two layers of fully connected neural networks results in a final user-to-item score. Feature vector representations of the user and the item can be directly obtained. Definition of ruiTo represent the user u's score for item i, then for an arbitrary score ruiHaving a user ruAnd corresponding article riCorresponding to it. Then two layers of neural network prediction formulas can be obtained:
rui=φ(U2·φ(U1(ru⊙ri)+c1)+c2)
wherein [ ] indicates the multiplication of elements, and φ (x) is an ELU activation function, U1、U2、c1And c2The weights and bias parameters to be learned.
(ii) user item feature optimization
The objective function is the square of the subtraction of the prediction score and the real score, and the optimal user and item representation can be obtained by optimizing the parameters to ensure that the objective function is minimum.
Figure GDA00029744007800000311
(3) User item feature fusion
And constructing feature vectors of the user and the article through the interaction information between the user and the article. A function f (-) is proposed that fuses features, assuming that features learned from scoring and text data are denoted x1,x2Then, the fused feature can be obtained by the fusion function:
x=f(x1,x2)
wherein x is the fused feature. The simple series connection mode is used for fusion, the expansibility of the characteristics of the user and the article can be enhanced, and the method has important significance for the model based on the multi-source heterogeneous data. The characteristic obtained by the function f (-) is then
Figure GDA0002974400780000041
(4) BPR-based optimization
Generating triples
According to the purchase or browsing records of the user and the social network, for each user u, an item purchased or browsed by the user is defined as i, an item which is never contacted by the user is defined as j, and an item purchased by a friend of the user is defined as p. All item sets in the system are defined as D, and then the item set purchased and browsed by the user u is defined as DuThe item purchased by the user's friend is defined as Dp. The items that best represent the user's preferences are first the items D that the user has purchasediSecond, based on the similarity of friend preferences, a user is likely to purchase an item D that his friend has purchased but that the user has not purchasedp\Du. Finally, the least likely item for purchase by the user is D \ D (D)u∪Dp). Constructing triplets of users and items from social network information as a training set, which may be represented as:
T:={(u,i,j)|i∈(Du∪Dp),j∈D\(Du∪Dp)}
wherein (u, i, j) is a user item triplet, representing that the user u has a preference degree for the item i greater than that of the item j. The item i belongs to an item which is purchased by the user or purchased by a direct friend of the user, and the item j belongs to an item which is purchased by the user and has not been purchased by the direct friend of the user. Therefore, the user item triples are constructed based on the direct friend relationship of the user and are used for the training of the subsequent BPR model.
Model optimization
It is known from the previous definition that user u has a greater preference for item i than item j. The loss function, expressed in terms of the user and item characteristics, is represented by a function g (·), defined herein as sigmoid function to calculate different degrees of user preference for different items, then g (u, i, j) ═ σ (u, j)Ti-uTj) In that respect Therefore, the objective function of the recommendation model for fusing multi-source heterogeneous data is defined as follows:
Figure GDA0002974400780000042
wherein W is the weight parameter of each model, and the weight parameters of each comment of the user in the comment representation learning model are different and need to be obtained through learning. In the model of score calculation, the obtained characteristics of the user and the article are direct, namely the weight parameter is set to be 1, and the updating is not required to be carried out through an optimization objective function. Wherein, Θ represents other parameters needing to be learned in the model, and Θ ═ Θ1,Θ2}={{w,duv},{U1,U2,c1,c2,ru,ri}}. λ is the punishment parameter of each model, and the values are all [0, 1 ]]On the interval. The objective function of the scoring model is preceded by a negative sign because the objective function of the scoring model needs to be minimized, while the objective function of the overall model is maximized.
(5) Recommending
The personalized recommendation list can be obtained by multiplying the feature vectors of the user and the article:
s=uTv
the feature vectors of each user and other unpurchased or unpurchased items are multiplied to obtain the preference score of the user for the item, and the higher the score is, the more likely the user is to buy or browse the item. And the Top-N recommendation list of the user can be obtained by arranging the scores of all the articles in a descending order and taking the Top N articles.

Claims (8)

1. A recommendation method for processing multi-source heterogeneous data based on deep learning comprises the following steps:
(1) text feature extraction: preprocessing the comment text to obtain a comment feature vector of a user and a comment feature vector of an article, wherein a PV-DBOW model is used for learning feature vector representation of a text paragraph; the model adopts a distributed bag-of-words model, and uses a paragraph vector to predict words sampled randomly in the paragraph;
(2) and (3) score feature extraction: learning the scoring of the user on the article by using two layers of fully-connected neural networks, and respectively obtaining the scoring feature vector representations of the user and the article;
(3) fusing user and article characteristics: according to the comment text characteristics obtained in the step (1), for each user, carrying out weighted summation on comment characteristic vectors sent by the user to obtain the user characteristics, carrying out weighted summation on comment characteristic vectors received by the articles to obtain article characteristics, finally, fusing the text and the scoring characteristics of the user by using a fusion function to obtain fusion characteristics of the user, and fusing the text and the scoring characteristics of the articles to obtain fusion characteristics of the articles;
(4) optimizing BPR based on Bayes personalized sorting: obtaining a triple with user preference based on social network sampling, and obtaining an optimal model parameter according to Bayesian theory optimization, wherein the triple is a user item triple, marked as (u, i, j), and represents that the preference degree of a user u for an item i is greater than that of an item j;
(5) recommending: and (4) inputting the fusion features of the user and the fusion feature vectors of the articles into the model according to the model parameters obtained in the step (4) to recommend the articles for the user.
2. The method of claim 1, for the (1) text feature extraction step described, wherein the pre-processing of the comment text uses duvTo represent the comment text of the user u on the item v, and the words contained in the comment text are represented by w; feature vector usage u of user and article learned through comment text of user on article1And v1To indicate, the feature vector of a paragraph uses duvTo indicate that the word vector uses w toRepresenting that all the words of the comments are stored in the word bank V; these feature vectors all have a dimension number of K.
3. The method of claim 1, for the (1) text feature extraction step described, wherein the word sampling randomly selects a text region for each paragraph, and randomly samples words from the region as a result of training the classifier; the size of the text region and the number of words selected in the region are set manually.
4. The method according to claim 2, for the (1) text feature extraction step, each comment segment is mapped into a random high-dimensional semantic space through optimization, then words contained in the paragraph are predicted, and a paragraph feature vector representation is obtained through learning optimization; according to the assumption of the bag-of-words model, each word w is in the document duvThe probability of occurrence in (a) is calculated using softmax:
Figure FDA0003143836880000021
wherein w' represents all words belonging to the lexicon V, exp represents an exponential function with e as the base; the probability of any word in the document is obtained through the formula; in the calculation process, a negative sampling method is adopted, partial words are sampled in the non-appeared words according to a predefined noise distribution and are used as negative samples to carry out approximate calculation instead of using all the words in a word stock; based on the strategy of negative sampling, then the objective function of PV-DBOW is defined as:
Figure FDA0003143836880000022
the above formula adds all combinations of words and documents, wherein
Figure FDA0003143836880000023
Is word w in document duvIf the number of times does not appear, the function value is 0;
Figure FDA0003143836880000024
representing sigmoid function, t is the number of negative samples,
Figure FDA0003143836880000025
is shown in the noise distribution PVIn
Figure FDA0003143836880000026
The expectation is that.
5. The method of claim 4, for the (1) text feature extraction step described, wherein the feature representation of the comment text specifically has the following features: obtaining the feature representation d of the document according to the objective functionuvSimilar to the recommendation model based on the traditional machine learning method, the feature vectors of the users and the articles are expressed according to the feature vectors of the comments, and the features of the users and the articles are obtained by subsequent model integration optimization;
weighting and adding the feature vectors of all comments of the user and normalizing to obtain a user feature factor:
Figure FDA0003143836880000027
Figure FDA0003143836880000028
wherein DuDenotes the number of comments, p'ukRepresenting the total probability, W, of the user on topic kuvRepresenting the weight, p, of user u for the ith comment issuedukIs a normalized representation thereof; the characteristic factors of the user u are:
pu=(pu1,…,puK)
the dimension of the user characteristic factor is K; the characteristic factor of the article is calculated by adopting the following formula:
Figure FDA0003143836880000029
Figure FDA0003143836880000031
wherein DvRepresents all number of reviews, q 'received by the item'vkRepresenting the total probability of an item on topic k, qvkIs a normalized representation thereof, WuvRepresents the weight of item v for the received u-th review; the characteristic factors of the article are:
qv=(qv1,…,qvK)
k is the dimension number of the article and is consistent with the dimension number of the user; wherein the weight WuvThe importance degree of the comments for the user to different comments is distinguished, so that reasonable user and article characteristics are constructed.
6. The method of claim 1, for the (3) user feature fusion step described, constructing feature vectors of the user and the item through interaction information between the user and the item; a function f (-) is proposed that fuses features, assuming that features learned from scoring and text data are denoted x1,x2Then, the fused feature can be obtained by the fusion function:
x=f(x1,x2)
wherein x is the fused feature, and the fusion is performed in a simple tandem manner, so that the feature obtained by the function f (-) is
Figure FDA0003143836880000032
7. The method according to claim 1, for the (4) described optimization step based on the bayesian personalized ranking BPR, wherein the triplet generation is based on the user's purchase or browsing history and social network, for each user u, defining the items purchased or browsed by the user as i, the items never contacted by the user as j, and the items purchased by the friends of the user as p; all item sets in the system are defined as D, and then the item set purchased or browsed by the user u is defined as DiThe item purchased by the user's friend is defined as Dp(ii) a The items that best represent the user's preferences are first a collection of items D that the user has purchasedu(ii) a Second, based on the similarity of the friend's preferences, the user is likely to purchase an item D that his friend has purchased but that the user has not purchasedp\Du(ii) a Finally, the least likely item for purchase by the user is D \ D (D)u∪Dp) (ii) a Constructing triplets of users and items from social network information as a training set, which may be represented as:
T:={(u,i,j)|i∈(Du∪Dp),j∈D\(Du∪Dp)}
wherein (u, i, j) is a user item triplet, which represents that the preference degree of the user u for the item i is greater than that of the item j; the item i belongs to an item purchased by a user or purchased by a direct friend of the user, and the item j belongs to an item which is not purchased by the user and has not been purchased by the direct friend of the user; therefore, the user item triple is constructed based on the direct friend relationship of the user and is used for training a subsequent Bayesian personalized ranking BPR model.
8. The method according to claim 1, for the described (5) recommendation step, deriving a personalized recommendation list by multiplication of the fused feature vectors of the user and the item:
s=uTv
multiplying the fused feature vectors of each user and other unpurchased or unpurchased items to obtain the preference score of the user for the item, wherein the higher the score is, the more likely the user is to buy or browse the item; and the Top-N recommendation list of the user can be obtained by arranging the scores of all the articles in a descending order and taking the Top N articles.
CN201910547320.1A 2019-06-24 2019-06-24 Deep learning based recommendation method for processing multi-source heterogeneous data Active CN110263257B (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN201910547320.1A CN110263257B (en) 2019-06-24 2019-06-24 Deep learning based recommendation method for processing multi-source heterogeneous data

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN201910547320.1A CN110263257B (en) 2019-06-24 2019-06-24 Deep learning based recommendation method for processing multi-source heterogeneous data

Publications (2)

Publication Number Publication Date
CN110263257A CN110263257A (en) 2019-09-20
CN110263257B true CN110263257B (en) 2021-08-17

Family

ID=67920670

Family Applications (1)

Application Number Title Priority Date Filing Date
CN201910547320.1A Active CN110263257B (en) 2019-06-24 2019-06-24 Deep learning based recommendation method for processing multi-source heterogeneous data

Country Status (1)

Country Link
CN (1) CN110263257B (en)

Families Citing this family (9)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN111045716B (en) * 2019-11-04 2022-02-22 中山大学 Related patch recommendation method based on heterogeneous data
CN111046672B (en) * 2019-12-11 2020-07-14 山东众阳健康科技集团有限公司 Multi-scene text abstract generation method
CN111291266B (en) * 2020-02-13 2023-03-21 深圳市雅阅科技有限公司 Artificial intelligence based recommendation method and device, electronic equipment and storage medium
CN111274406A (en) * 2020-03-02 2020-06-12 湘潭大学 Text classification method based on deep learning hybrid model
CN111612573B (en) * 2020-04-30 2023-04-25 杭州电子科技大学 Recommendation system scoring recommendation prediction method based on full Bayesian method
CN112232929A (en) * 2020-11-05 2021-01-15 南京工业大学 Multi-modal diversity recommendation list generation method for complementary articles
CN112364258B (en) * 2020-11-23 2024-02-27 北京明略软件系统有限公司 Recommendation method and system based on map, storage medium and electronic equipment
CN113064965A (en) * 2021-03-23 2021-07-02 南京航空航天大学 Intelligent recommendation method for similar cases of civil aviation unplanned events based on deep learning
CN112967101B (en) * 2021-04-07 2023-04-07 重庆大学 Collaborative filtering article recommendation method based on multi-interaction information of social users

Citations (5)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN103399858A (en) * 2013-07-01 2013-11-20 吉林大学 Socialization collaborative filtering recommendation method based on trust
CN103778260A (en) * 2014-03-03 2014-05-07 哈尔滨工业大学 Individualized microblog information recommending system and method
CN106022869A (en) * 2016-05-12 2016-10-12 北京邮电大学 Consumption object recommending method and consumption object recommending device
CN106600482A (en) * 2016-12-30 2017-04-26 西北工业大学 Multi-source social data fusion multi-angle travel information perception and intelligent recommendation method
CN107025606A (en) * 2017-03-29 2017-08-08 西安电子科技大学 The item recommendation method of score data and trusting relationship is combined in a kind of social networks

Family Cites Families (2)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US20080077574A1 (en) * 2006-09-22 2008-03-27 John Nicholas Gross Topic Based Recommender System & Methods
CN108595527A (en) * 2018-03-28 2018-09-28 中山大学 A kind of personalized recommendation method and system of the multi-source heterogeneous information of fusion

Patent Citations (5)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN103399858A (en) * 2013-07-01 2013-11-20 吉林大学 Socialization collaborative filtering recommendation method based on trust
CN103778260A (en) * 2014-03-03 2014-05-07 哈尔滨工业大学 Individualized microblog information recommending system and method
CN106022869A (en) * 2016-05-12 2016-10-12 北京邮电大学 Consumption object recommending method and consumption object recommending device
CN106600482A (en) * 2016-12-30 2017-04-26 西北工业大学 Multi-source social data fusion multi-angle travel information perception and intelligent recommendation method
CN107025606A (en) * 2017-03-29 2017-08-08 西安电子科技大学 The item recommendation method of score data and trusting relationship is combined in a kind of social networks

Non-Patent Citations (3)

* Cited by examiner, † Cited by third party
Title
Recommendation Based on Review Texts and Social Communities: A Hybrid Model;Ji Z et al.;《IEEE Access》;20190228;全文 *
个性化图像检索和推荐;冀振燕等;《北京邮电大学学报》;20170615(第03期);全文 *
融合多源异构数据的混合推荐模型;冀振燕 等;《北京邮电大学学报》;20190228;全文 *

Also Published As

Publication number Publication date
CN110263257A (en) 2019-09-20

Similar Documents

Publication Publication Date Title
CN110263257B (en) Deep learning based recommendation method for processing multi-source heterogeneous data
CN108804689B (en) Question-answering platform-oriented label recommendation method integrating user hidden connection relation
CN111460130B (en) Information recommendation method, device, equipment and readable storage medium
CN111222332B (en) Commodity recommendation method combining attention network and user emotion
Sasikala et al. Sentiment analysis of online product reviews using DLMNN and future prediction of online product using IANFIS
Jain et al. A comparative study of machine learning and deep learning techniques for sentiment analysis
CN109584006B (en) Cross-platform commodity matching method based on deep matching model
CN112232929A (en) Multi-modal diversity recommendation list generation method for complementary articles
Arava et al. Sentiment Analysis using deep learning for use in recommendation systems of various public media applications
CN112364236A (en) Target object recommendation system, method and device, and data processing method and device
Mir et al. Online fake review detection using supervised machine learning and BERT model
Liu E‐Commerce Precision Marketing Model Based on Convolutional Neural Network
CN116205700A (en) Recommendation method and device for target product, computer equipment and storage medium
CN115878804A (en) E-commerce comment multi-classification emotion analysis method based on AB-CNN model
CN115935067A (en) Article recommendation method integrating semantics and structural view for socialized recommendation
Drif et al. A sentiment enhanced deep collaborative filtering recommender system
Boumhidi Mining user’s opinions and emojis for reputation generation using deep learning
Rokade et al. Forecasting movie rating using k-nearest neighbor based collaborative filtering
Hoiriyah et al. Lexicon-Based and Naive Bayes Sentiment Analysis for Recommending the Best Marketplace Selection as a Marketing Strategy for MSMEs
CN106528584A (en) An ensemble learning-based group recommendation method
KR102659929B1 (en) System for online sale
Yuyao Multi-round tag recommendation algorithm for shopping guide robots
Wei et al. Devising a Cross-Domain Model to Detect Fake Review Comments
DIVYA et al. Matrix factorization for movie recommended system using deep learning
Bhadana et al. The Sentimental Analysis of Social Media Data: A Survey

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination
GR01 Patent grant
GR01 Patent grant
EE01 Entry into force of recordation of patent licensing contract

Application publication date: 20190920

Assignee: Institute of Software, Chinese Academy of Sciences

Assignor: Beijing Jiaotong University

Contract record no.: X2022990000602

Denomination of invention: Recommendation method for processing multi-source heterogeneous data based on deep learning

Granted publication date: 20210817

License type: Common License

Record date: 20220905

EE01 Entry into force of recordation of patent licensing contract