CN110321485A - A kind of proposed algorithm of combination user comment and score information - Google Patents

A kind of proposed algorithm of combination user comment and score information Download PDF

Info

Publication number
CN110321485A
CN110321485A CN201910531413.5A CN201910531413A CN110321485A CN 110321485 A CN110321485 A CN 110321485A CN 201910531413 A CN201910531413 A CN 201910531413A CN 110321485 A CN110321485 A CN 110321485A
Authority
CN
China
Prior art keywords
user
formula
theme
parameter
comment
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Pending
Application number
CN201910531413.5A
Other languages
Chinese (zh)
Inventor
李慧
张舒
刘飞
施珺
戴红伟
樊宁
杨玉
李海宁
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Huaihai Institute of Techology
Original Assignee
Huaihai Institute of Techology
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Huaihai Institute of Techology filed Critical Huaihai Institute of Techology
Priority to CN201910531413.5A priority Critical patent/CN110321485A/en
Publication of CN110321485A publication Critical patent/CN110321485A/en
Pending legal-status Critical Current

Links

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F16/00Information retrieval; Database structures therefor; File system structures therefor
    • G06F16/30Information retrieval; Database structures therefor; File system structures therefor of unstructured textual data
    • G06F16/33Querying
    • G06F16/335Filtering based on additional data, e.g. user or group profiles
    • G06F16/337Profile generation, learning or modification
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F16/00Information retrieval; Database structures therefor; File system structures therefor
    • G06F16/90Details of database functions independent of the retrieved data types
    • G06F16/95Retrieval from the web
    • G06F16/953Querying, e.g. by the use of web search engines
    • G06F16/9535Search customisation based on user profiles and personalisation

Landscapes

  • Engineering & Computer Science (AREA)
  • Databases & Information Systems (AREA)
  • Theoretical Computer Science (AREA)
  • Data Mining & Analysis (AREA)
  • Physics & Mathematics (AREA)
  • General Engineering & Computer Science (AREA)
  • General Physics & Mathematics (AREA)
  • Computational Linguistics (AREA)
  • Information Retrieval, Db Structures And Fs Structures Therefor (AREA)

Abstract

The invention discloses the proposed algorithm of a kind of combination user comment and score information, the specific steps realized are as follows: construct the generative probabilistic model for finding potential theme dimension in user comment text;Construct the recommendation objective function combined based on user's rating matrix decomposition model with motif discovery model;It is realized by the iterative calculation to objective function and is predicted based on the Products Show of user comment text and score data.Algorithm of the invention has fully considered the comment information of user, using the potential theme distribution in comment text, user's score data is combined with user comment text, efficiently solves the problems, such as the cold start-up in recommender system;Simultaneously than individually considering that the method for two kinds of data sources more can accurately carry out score in predicting, especially suitable for the score in predicting to new product and new user.

Description

A kind of proposed algorithm of combination user comment and score information
Technical field:
The present invention relates to proposed algorithm field, specially a kind of proposed algorithm of combination user comment and score information.
Background technique:
Recommender system is widely applied in disparate networks platform, it has changed user and has produced in online discovery and assessment The mode of product.Existing recommended method can be divided into two major classes: collaborative filtering method and content-based recommendation method.It cooperateed with Filtering method is to comment grading information to be modeled based on dominant, although comparatively ideal recommendation effect can be obtained, there is scoring The sparsity problem of data.Content-based recommendation method is that by excavating there are the commodity of same or similar attribute to be pushed away It recommends, the recommendation that this method generates has the problem of recommendation results unification.There are many research about scoring modeling, however score The cold start-up of sparsity, recommendation possessed by data and the transfer of user preferences are a problem always, cannot be solved well Certainly.At the same time, another feedback system on website, i.e. comment itself are commented on, is often ignored.Therefore, Recent study Person excavates the information such as the label of the relationship of user, the comment of user and commodity by trial to further increase the matter of recommended method Amount.Although also many research work be all research scoring and user comment text, they be all by this two o'clock in isolation Research, few researchs, which attempt to combine in both information sources, carries out proposed algorithm research.
Summary of the invention:
The purpose of the present invention is in view of the drawbacks of the prior art, provide the recommendation of a kind of combination user comment and score information Algorithm, to solve the problems, such as that above-mentioned background technique proposes.
To achieve the above object, the invention provides the following technical scheme: a kind of combination user comment and score information push away Recommend algorithm, comprising the following steps:
Step (1): the generative probabilistic model for finding potential theme dimension in user comment text, calculation formula are constructed It is as follows:
In formula, NdIndicate the number of words of file d;
In formula, θiIndicate that the k for project i ties up theme distribution;
In formula, Zu, i, j indicate theme of the user u about j-th of word of project i;
In formula, Wu, i, j indicate j-th of word that user u comments on project i;
Step (2): the recommendation function that building is combined based on user's rating matrix decomposition model with motif discovery model, Calculation formula is as follows:
In formula, Θ indicates scoring, the i.e. parameter based on rating matrix decomposition model;
In formula, Φ={ θ, φ } indicates topic parameter, the i.e. parameter sets based on comment LDA motif discovery model;
In formula, k indicates the weight of control transfer function;
In formula, z is the topic parameter of each word in corpus T;
In formula, rec (u, i) is that user u scores to the prediction of project i;
In formula, ru,iIndicate grading of the user u to i;
In formula, λ is a hyper parameter, for controlling this two-part weight;
In formula, and logL (T | θ, φ, z) indicate user comment collection LDA log-likelihood function;
Step (3): the product based on user comment text and score data is realized by the iterative calculation to objective function Recommend prediction, minimize objective function are as follows:
As a preferred technical solution of the present invention, step (1) building is used to find in user comment text Process is embodied in the generative probabilistic model of potential theme dimension are as follows:
Step (1a): each document is considered as the sequence being made of N number of word, M is in document sets packet containing piece text The number of shelves, z indicate that the theme distribution of word, α and β are the hyper parameter of the distribution θ and word distribution phi of theme in text, clothes respectively It is distributed from priori Di Li Cray;
Step (1b): by each document d ∈ D and a theme θ with K dimension theme distributiondIt is associated, i.e. in document d The word segment of discussion topic K, the text in text d is with θd,kProbability in discussion topic k;
Step (1c): assuming that theme distribution θdDirichlet distribution itself is obeyed, last model includes each theme Word distribution phik, the theme distribution θ of each documentdAnd each word zd,jTheme distribution;
Step (1d): parameter Φ={ θ, φ } and theme distribution z is updated by sampling;
Step (1e): given word distribution phikAfter the theme distribution of each word, it may be constructed and belong to particular text corpus Library, calculation formula are as follows:
In formula, NdIndicate the number of words of file d.
As a preferred technical solution of the present invention, step (2) building decomposes mould based on user's rating matrix Process is embodied in the recommendation function that type is combined with motif discovery model are as follows:
Step (2a): being defined " document " in this model, and it is as follows to define method:
Document is exported from comment text, is that all comment set of specific project i are defined as document di
Step (2b): grading parameters γ will be learntiWith user comment parameter θiThe two connects, herein it is implicitly assumed that The number K for the latent factor that rating matrix decomposes is identical with the potential theme number K of comment text, and latent factor is with identical Weight, the transformational relation constructed between latent factor and potential theme is as follows:
In formula, θi,kTheme of the expression project i on potential feature K, is used here exponential form and ensures several θi,kAll For positive value, and meet ∑kθi,k=1, γi,kValue of the latent factor vector of expression project i on feature k introduces parameter k and comes " peak value " of transformation is controlled, in other words, k indicates the weight of control conversion, as κ → ∞, θiWill be close to unit vector, the unit Vector is only for γiMaximal index value 1;As κ → 0, θiClose to being uniformly distributed, intuitively, big κ indicates that user only discusses Most important theme, and small κ then indicates that user equably discusses all themes;
Step (2c): the corpus objective function T based on user's scoring and user comment is defined:
In formula, Θ and Φ={ θ, φ } respectively indicate scoring and topic parameter, the i.e. ginseng based on rating matrix decomposition model It counts and the parameter sets based on comment LDA motif discovery model, k indicates the weight of control transfer function, z is every in corpus T The topic parameter of a word, and logL (T | θ, φ, z) indicate user comment collection LDA log-likelihood function, the first part of this equation It is the error of prediction grading point, second part is the log-likelihood function of comment text topic model, and λ is a hyper parameter, is used In controlling this two-part weight, rec (u, i) is that user u scores to the prediction of project i, can be obtained by following formula:
Rec (u, i)=alpha+betauiu·γi
Wherein, α indicates global bias, i.e., the average value of whole score datas, and reflection different data collection scores to user It influences, βuIt is user and project biasing, the influence of two biasings reflection different users and disparity items to scoring, γ respectively with β i The K that u and γ i respectively indicates user u and project i ties up potential theme vector, and γ i can intuitively be considered as " belonging to for product i Property ", and γ u is user to these attributes " preference ", meanwhile, the training corpus that a given scoring is T, parameter Θ= The selection of { α, β u, β i, γ u, γ i } is usually to minimize mean square error (MSE), it may be assumed that
Wherein, Ω (Θ) is the regularizer for punishing " complexity " model.
As a preferred technical solution of the present invention, the step (3) is realized by the iterative calculation to objective function Specific implementation process based on the prediction of the Products Show of user comment text and score data are as follows:
Step (3a): substituting into the formula in step (2c) for the formula in step (1e), and building minimizes objective function, Calculation formula is as follows:
Step (3b): the theme z of fixed wordd,jAfterwards, gradient descent method can be used to parameter TMF model matrix resolution parameter The quantity k of lumped parameter Θ and motif discovery parameter sets Φ and potential dimension/theme is solved, and calculation formula is as follows:
Wherein, the derivation process of objective function is solved to above formula, calculation formula is as follows:
In formula, nd,kIndicate the number of theme k occur in document d;
Step (3c): no longer become by step (3b) and the continuous iteration of Gibbs sampling method until to the parameter of output Change or reach certain threshold value, algorithm reaches convergence, wherein gibbs sampler algorithm calculation formula is as follows:
Beneficial effects of the present invention: algorithm of the invention has fully considered the comment information of user, using in comment text Potential theme distribution, user's score data is combined with user comment text, efficiently solve in recommender system cold opens Dynamic problem;Simultaneously than individually considering that the method for two kinds of data sources more can accurately carry out score in predicting, especially suitable for new The score in predicting of product and new user, because the history score data that these new users may possess is very little, so that it cannot right Its potential factor is modeled.
Detailed description of the invention:
Fig. 1 is the graphical representation of probability production model LDA of the present invention;
Fig. 2 is influence comparison diagram of the various regularization parameter λ values of the present invention to mean square error MSE value;
Fig. 3 is inventive algorithm and comparison diagram of the various conventional recommendation algorithms in MSE index value;
Fig. 4 is inventive algorithm and comparison diagram of the various conventional recommendation algorithms in ACC index value.
Specific embodiment:
The preferred embodiments of the present invention will be described in detail with reference to the accompanying drawing, so that advantages and features of the invention energy It is easier to be understood by those skilled in the art, so as to make a clearer definition of the protection scope of the present invention.
The present invention provides a kind of technical solution: a kind of proposed algorithm of combination user comment and score information:
The following steps are included:
Step (1): the generative probabilistic model for finding potential theme dimension in user comment text, calculation formula are constructed It is as follows:
In formula, NdIndicate the number of words of file d;
In formula, θiIndicate that the k for project i ties up theme distribution;
In formula, Zu, i, j indicate theme of the user u about j-th of word of project i;
In formula, Wu, i, j indicate j-th of word that user u comments on project i;
Step (2): the recommendation function that building is combined based on user's rating matrix decomposition model with motif discovery model, Calculation formula is as follows:
In formula, Θ indicates scoring, the i.e. parameter based on rating matrix decomposition model;
In formula, Φ={ θ, φ } indicates topic parameter, the i.e. parameter sets based on comment LDA motif discovery model;
In formula, k indicates the weight of control transfer function;
In formula, z is the topic parameter of each word in corpus T;
In formula, rec (u, i) is that user u scores to the prediction of project i;
In formula, ru,iIndicate grading of the user u to i;
In formula, λ is a hyper parameter, for controlling this two-part weight;
In formula, and logL (T | θ, φ, z) indicate user comment collection LDA log-likelihood function;
Step (3): the product based on user comment text and score data is realized by the iterative calculation to objective function Recommend prediction, minimize objective function are as follows:
The generative probabilistic model specific implementation for finding potential theme dimension in user comment text of step (1) building Process are as follows:
Step (1a): each document is considered as the sequence being made of N number of word, M is in document sets packet containing piece text The number of shelves, z indicate that the theme distribution of word, α and β are the hyper parameter of the distribution θ and word distribution phi of theme in text, clothes respectively It is distributed from priori Di Li Cray;
Step (1b): by each document d ∈ D and a theme θ with K dimension theme distributiondIt is associated, i.e. in document d The word segment of discussion topic K, the text in text d is with θd,kProbability in discussion topic k;
Step (1c): assuming that theme distribution θdDirichlet distribution itself is obeyed, last model includes each theme Word distribution phik, the theme distribution θ of each documentdAnd each word zd,jTheme distribution,
Step (1d): parameter Φ={ θ, φ } and theme distribution z is updated by sampling;
Step (1e): given word distribution phikAfter the theme distribution of each word, it may be constructed and belong to particular text corpus Library, calculation formula are as follows:
In formula, NdIndicate the number of words of file d.
The recommendation function tool of step (2) building combined based on user's rating matrix decomposition model with motif discovery model Body implementing procedure are as follows:
Step (2a): being defined " document " in this model, and it is as follows to define method:
Document is exported from comment text, is that all comment set of specific project i are defined as document di
Step (2b): grading parameters γ will be learntiWith user comment parameter θiThe two connects, herein it is implicitly assumed that The number K for the latent factor that rating matrix decomposes is identical with the potential theme number K of comment text, and latent factor is with identical Weight, the transformational relation constructed between latent factor and potential theme is as follows:
In formula, θi,kTheme of the expression project i on potential feature K, is used here exponential form and ensures several θi,kAll For positive value, and meet ∑kθi,k=1, γi,kValue of the latent factor vector of expression project i on feature k introduces parameter k and comes " peak value " of transformation is controlled, in other words, k indicates the weight of control conversion, as κ → ∞, θiWill be close to unit vector, the unit Vector is only for γiMaximal index value 1;As κ → 0, θiClose to being uniformly distributed, intuitively, big κ indicates that user only discusses Most important theme, and small κ then indicates that user equably discusses all themes;
Step (2c): the corpus objective function T based on user's scoring and user comment is defined:
In formula, Θ and Φ={ θ, φ } respectively indicate scoring and topic parameter, the i.e. ginseng based on rating matrix decomposition model It counts and the parameter sets based on comment LDA motif discovery model, k indicates the weight of control transfer function, z is every in corpus T The topic parameter of a word, and logL (T | θ, φ, z) indicate user comment collection LDA log-likelihood function, the first part of this equation It is the error of prediction grading point, second part is the log-likelihood function of comment text topic model, and λ is a hyper parameter, is used In controlling this two-part weight, rec (u, i) is that user u scores to the prediction of project i, can be obtained by following formula:
Rec (u, i)=alpha+betauiu·γi
Wherein, α indicates global bias, i.e., the average value of whole score datas, and reflection different data collection scores to user It influences, βuIt is user and project biasing, the influence of two biasings reflection different users and disparity items to scoring, γ respectively with β i The K that u and γ i respectively indicates user u and project i ties up potential theme vector, and γ i can intuitively be considered as " belonging to for product i Property ", and γ u is user to these attributes " preference ", meanwhile, the training corpus that a given scoring is T, parameter Θ= The selection of { α, β u, β i, γ u, γ i } is usually to minimize mean square error (MSE), it may be assumed that
Wherein, Ω (Θ) is the regularizer for punishing " complexity " model.
Step (3) realizes that the product based on user comment text and score data pushes away by the iterative calculation to objective function Recommend the specific implementation process of prediction are as follows:
Step (3a): substituting into the formula in step (2c) for the formula in step (1e), and building minimizes objective function, Calculation formula is as follows:
Step (3b): the theme z of fixed wordd,jAfterwards, gradient descent method can be used to parameter TMF model matrix resolution parameter The quantity k of lumped parameter Θ and motif discovery parameter sets Φ and potential dimension/theme is solved, and calculation formula is as follows:
Wherein, the derivation process of objective function is solved to above formula, calculation formula is as follows:
In formula, nd,kIndicate the number of theme k occur in document d;
Step (3c): no longer become by step (3b) and the continuous iteration of Gibbs sampling method until to the parameter of output Change or reach certain threshold value, algorithm reaches convergence, wherein gibbs sampler algorithm calculation formula is as follows:
Include optimization algorithm performance to verify the present invention, chooses conventional method Offset, LFM of current Products Show (LatentFactorModel), SVD++ model, SlopeOne comparative test.
Offset method: this method is a kind of collaborative filtering model based on global bias, is used in model construction The average predicted value as the commodity of commodity, the i.e. prediction using the average mark of all marking of certain commodity as user to the commodity Scoring.
LFM (LatentFactorModel), hidden semantic model: this method by matrix decomposition (SVD) to unknown commodity into Row prediction scoring.This model only accounts for the score information of user, and there is no the comment text information for considering user.
SVD++ model: the model is the information that neighborhood commodity are added in SVD model, so that SVD++ model has been obtained, Using the accumulation result of latent factor possessed by user's history comment commodity as field merchandise news.
SlopeOne: this method is the current widely used collaborative filtering method based on commodity, and algorithm operation is high Effect is succinct, and can be obtained by Open-Source Tools.
For quantization performance, the common mean square error MSE of in-service evaluation recommender system of the present invention determines as evaluation index Adopted formula is as follows:
Wherein M indicates the total quantity of prediction scoring,Indicate that user u scores to the prediction of project i, ruiIndicate u pairs of user The practical scoring of project i.
In addition to use mean square error MSE be evaluation index other than, this experiment also introduces accuracy (Accuracy) conduct weigh Second index of amount comment prediction accuracy, ACC are defined as follows:
Wherein m indicates system prediction scoring and the practical consistent frequency that scores of user.
We use empirical value, i.e. α=0.2 and β=0.1 to the value of hyper parameter α and β in LDA model in an experiment.
Statistical information (table 1) selection of data set used in the present invention collects user comment data in various public resources.Number According to main source be Amazon, obtain about 35,000,000 user comments.In order to obtain these data, 75,000,000 are listed first The character string of a similar asin (goods number of Amazon oneself), these character strings are obtained from the Internet Archive, In there are about 2,500,000 commodity at least one user comments.According to the top categories (such as books, film) of every kind of product by this Data set is further divided into 26 parts.This data set is the superset of existing publicly available Amazon data set.Amount to from Forty-two million user comment is obtained in 1000 general-purpose families and 3,000,000 projects.The user that data set covers 5,100,000,000 words in total comments By.
Parameter K value verification test (table 2) shows under different number of theme, the mean square error MSE of the proposed algorithm and Accuracy ACC.The result of MSE and ACC when taking different value by comparing K in table 1, it can be seen that the theme when the value of K is 10 It is the most clear to divide.When the value of K constantly increases, system performance is constantly promoted.It is noted that when K value increases from 10 When to 20, the amplitude that system performance increases is smaller, therefore tests the theme number for finally setting K=10 to default.In order to make LDA Model can realize fast convergence in comment data, and experiment sets 100 for the number of iterations.
(Fig. 2) is tested in influence of the various regularization parameter λ values to mean square error MSE value.The effect of regularization parameter λ is to use Control the regularization weight of subject matter preferences, regularization term is in machine learning algorithm for avoiding model excessive with actual result One of effective means of fitting.The experimental result of Fig. 2 gives the behavior pattern of parameter system under different values, by number in figure According to it is found that the MSE index of system tends to be steady when λ=0.5 value nearby.Therefore present invention experiment finally joins regularization The value of number λ is set as 0.5.
It, can be with by experimental result by inventive algorithm and comparison (Fig. 3) of the various conventional recommendation algorithms in MSE index value Find out that latent factor matrix decomposition effectively improves the recommendation quality of recommender system, LFM is than the side Offset based on global bias Method performance will be got well, and the performance of SVD++ method is better than LFM.TMF method due to consider user scoring and comment information, because This obtains optimal recommendation performance.
After randomly selecting 10 product categories, inventive algorithm and pair of the various conventional recommendation algorithms in MSE index value Than (table 3), in order to verify the validity of motif discovery, experiment has randomly selected 10 classes in Amazon28 subclass data again Not, comprising level-ones classifications such as mother and baby, food, phonotape and videotape, makeups.TMF model is in 10 subclasses from the point of view of comprehensive all data subsets Better than conventional model, in such as " clothes " and " shoes and hats " several classifications, TMF algorithm is the most obvious than what other algorithms were promoted. It can be seen that these classifications that TMF behaves oneself best all are more subjective.Because of these class users federation in comment Numerous aspects of this product are prompted, therefore use comment text, TMF can preferably " separate " objective figures of product and comment Theorist is to its subjective opinion.
Inventive algorithm and comparison (Fig. 4) of the various conventional recommendation algorithms in ACC index value, the practical value that scores is 1- Result need to be rounded nearby when the prediction scoring that algorithm obtains is decimal to calculate ACC by 5 integer.From the results of view, TMF Algorithm behaves oneself best.Test set takes respectively to be randomly assigned to distribute two kinds of situations with timing.As a whole, the performance of score in predicting It is better than timing distribution method in the case where being randomly assigned.
Table 1
Table 2
Table 3
The embodiments described above only express several embodiments of the present invention, and the description thereof is more specific and detailed, but simultaneously It cannot therefore be construed as limiting the scope of the patent.It should be pointed out that coming for those of ordinary skill in the art It says, without departing from the inventive concept of the premise, various modifications and improvements can be made, these belong to protection of the invention Range.

Claims (4)

1. a kind of proposed algorithm of combination user comment and score information, which comprises the following steps:
Step (1): the generative probabilistic model for finding potential theme dimension in user comment text is constructed, calculation formula is such as Under:
In formula, NdIndicate the number of words of file d;
In formula, θiIndicate that the k for project i ties up theme distribution;
In formula, Zu, i, j indicate theme of the user u about j-th of word of project i;
In formula, Wu, i, j indicate j-th of word that user u comments on project i;
Step (2): the recommendation function that building is combined based on user's rating matrix decomposition model with motif discovery model calculates Formula is as follows:
In formula, Θ indicates scoring, the i.e. parameter based on rating matrix decomposition model;
In formula, Φ={ θ, φ } indicates topic parameter, the i.e. parameter sets based on comment LDA motif discovery model;
In formula, k indicates the weight of control transfer function;
In formula, z is the topic parameter of each word in corpus T;
In formula, rec (u, i) is that user u scores to the prediction of project i;
In formula, ru,iIndicate grading of the user u to i;
In formula, λ is a hyper parameter, for controlling this two-part weight;
In formula, and logL (T | θ, φ, z) indicate user comment collection LDA log-likelihood function;
Step (3): the Products Show based on user comment text and score data is realized by the iterative calculation to objective function Prediction minimizes objective function are as follows:
2. the proposed algorithm of a kind of combination user comment and score information according to claim 1, it is characterised in that: described The generative probabilistic model specific implementation process for finding potential theme dimension in user comment text of step (1) building are as follows:
Step (1a): each document is considered as the sequence being made of N number of word, M is the document containing a piece in document sets packet Number, z indicate that the theme distribution of word, α and β are the hyper parameter of the distribution θ and word distribution phi of theme in text respectively, obey first Test the distribution of Di Li Cray;
Step (1b): by each document d ∈ D and a theme θ with K dimension theme distributiondIt is associated, i.e. master is discussed in document d The word segment of K is inscribed, the text in text d is with θd,kProbability in discussion topic k;
Step (1c): assuming that theme distribution θdDirichlet distribution itself is obeyed, last model includes the word of each theme Distribution phik, the theme distribution θ of each documentdAnd each word zd,jTheme distribution;
Step (1d): parameter Φ={ θ, φ } and theme distribution z is updated by sampling;
Step (1e): given word distribution phikAfter the theme distribution of each word, it may be constructed and belong to particular text corpus, Calculation formula is as follows:
In formula, NdIndicate the number of words of file d.
3. the proposed algorithm of a kind of combination user comment and score information according to claim 1, it is characterised in that: described The recommendation function specific implementation stream of step (2) building combined based on user's rating matrix decomposition model with motif discovery model Journey are as follows:
Step (2a): being defined " document " in this model, and it is as follows to define method:
Document is exported from comment text, is that all comment set of specific project i are defined as document di
Step (2b): grading parameters γ will be learntiWith user comment parameter θiThe two connects, herein it is implicitly assumed that scoring The number K of the latent factor of matrix decomposition is identical with the potential theme number K of comment text, and latent factor power having the same Weight, the transformational relation constructed between latent factor and potential theme are as follows:
In formula, θi,kTheme of the expression project i on potential feature K, is used here exponential form and ensures several θi,kAll it is positive Value, and meet ∑kθi,k=1, γi,kValue of the latent factor vector of expression project i on feature k introduces parameter k to control " peak value " of transformation, in other words, k indicate the weight of control conversion, as κ → ∞, θiWill be close to unit vector, the unit vector Only for γiMaximal index value 1;As κ → 0, θiClose to being uniformly distributed, intuitively, it is most heavy that big κ indicates that user only discusses The theme wanted, and small κ then indicates that user equably discusses all themes;
Step (2c): the corpus objective function T based on user's scoring and user comment is defined:
In formula, Θ and Φ={ θ, φ } respectively indicate scoring and topic parameter, i.e., the parameter based on rating matrix decomposition model and Based on the parameter sets of comment LDA motif discovery model, k indicates the weight of control transfer function, and z is each list in corpus T The topic parameter of word, and logL (T | θ, φ, z) indicate user comment collection LDA log-likelihood function, the first part of this equation is pre- The error for fraction of testing and assessing, second part is the log-likelihood function of comment text topic model, and λ is a hyper parameter, for controlling This two-part weight is made, rec (u, i) is that user u scores to the prediction of project i, it can be obtained by following formula:
Rec (u, i)=alpha+betauiu·γi
Wherein, α indicates global bias, i.e., the average value of whole score datas reflects the influence that different data collection scores to user, βuIt is user and project biasing, the influence of two biasings reflection different users and disparity items to scoring, γ u and γ respectively with β i The K that i respectively indicates user u and project i ties up potential theme vector, and γ i can intuitively be considered as " attribute " of product i, and γ u is user to these attributes " preference ", meanwhile, give the training corpus that a scoring is T, parameter Θ={ α, β u, β I, γ u, γ i } selection be usually minimize mean square error (MSE), it may be assumed that
Wherein, Ω (Θ) is the regularizer for punishing " complexity " model.
4. the proposed algorithm of a kind of combination user comment and score information according to claim 1, it is characterised in that: described Step (3) is predicted by the iterative calculation realization to objective function based on the Products Show of user comment text and score data Process is embodied are as follows:
Step (3a): substituting into the formula in step (2c) for the formula in step (1e), and building minimizes objective function, calculates Formula is as follows:
Step (3b): the theme z of fixed wordd,jAfterwards, gradient descent method can be used to parameter TMF model matrix resolution parameter set The quantity k of parameter Θ and motif discovery parameter sets Φ and potential dimension/theme is solved, and calculation formula is as follows:
Wherein, the derivation process of objective function is solved to above formula, calculation formula is as follows:
In formula, nd,kIndicate the number of theme k occur in document d;
Step (3c): by step (3b) and the continuous iteration of Gibbs sampling method until no longer changing to the parameter of output or Person reaches certain threshold value, and algorithm reaches convergence, wherein gibbs sampler algorithm calculation formula is as follows:
CN201910531413.5A 2019-06-19 2019-06-19 A kind of proposed algorithm of combination user comment and score information Pending CN110321485A (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN201910531413.5A CN110321485A (en) 2019-06-19 2019-06-19 A kind of proposed algorithm of combination user comment and score information

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN201910531413.5A CN110321485A (en) 2019-06-19 2019-06-19 A kind of proposed algorithm of combination user comment and score information

Publications (1)

Publication Number Publication Date
CN110321485A true CN110321485A (en) 2019-10-11

Family

ID=68119820

Family Applications (1)

Application Number Title Priority Date Filing Date
CN201910531413.5A Pending CN110321485A (en) 2019-06-19 2019-06-19 A kind of proposed algorithm of combination user comment and score information

Country Status (1)

Country Link
CN (1) CN110321485A (en)

Cited By (7)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN111061962A (en) * 2019-11-25 2020-04-24 上海海事大学 Recommendation method based on user score analysis
CN111259238A (en) * 2020-01-13 2020-06-09 山西大学 Post-interpretable recommendation method and device based on matrix decomposition
CN111563787A (en) * 2020-03-19 2020-08-21 天津大学 Recommendation system and method based on user comments and scores
CN111667344A (en) * 2020-06-08 2020-09-15 中森云链(成都)科技有限责任公司 Personalized recommendation method integrating comments and scores
CN111899063A (en) * 2020-06-17 2020-11-06 东南大学 Fresh agricultural product online recommendation method considering customer consumption behaviors and preference
CN112905908A (en) * 2021-03-04 2021-06-04 浙江机电职业技术学院 Collaborative filtering algorithm based on score LDA
CN112966203A (en) * 2021-03-12 2021-06-15 杨虡 Grade determination method and device, electronic equipment and storage medium

Citations (3)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN106202519A (en) * 2016-07-22 2016-12-07 桂林电子科技大学 A kind of combination user comment content and the item recommendation method of scoring
CN109903099A (en) * 2019-03-12 2019-06-18 合肥工业大学 Model building method and system for score in predicting
CN109902229A (en) * 2019-02-01 2019-06-18 中森云链(成都)科技有限责任公司 A kind of interpretable recommended method based on comment

Patent Citations (3)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN106202519A (en) * 2016-07-22 2016-12-07 桂林电子科技大学 A kind of combination user comment content and the item recommendation method of scoring
CN109902229A (en) * 2019-02-01 2019-06-18 中森云链(成都)科技有限责任公司 A kind of interpretable recommended method based on comment
CN109903099A (en) * 2019-03-12 2019-06-18 合肥工业大学 Model building method and system for score in predicting

Non-Patent Citations (2)

* Cited by examiner, † Cited by third party
Title
JULIAN MCAULEY等: ""Hidden Factors and Hidden Topics:Understanding Rating Dimensions with Review Text"", 《RECSYS 13:PROCEEDINGS OF THE 7TH ACM CONFERENCE ON RECOMMENDER SYSTEMS》 *
李琳等: ""融合评分矩阵与评论文本的商品推荐模型"", 《计算机学报》 *

Cited By (9)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN111061962A (en) * 2019-11-25 2020-04-24 上海海事大学 Recommendation method based on user score analysis
CN111061962B (en) * 2019-11-25 2023-09-29 上海海事大学 Recommendation method based on user scoring analysis
CN111259238A (en) * 2020-01-13 2020-06-09 山西大学 Post-interpretable recommendation method and device based on matrix decomposition
CN111259238B (en) * 2020-01-13 2023-04-14 山西大学 Post-interpretable recommendation method and device based on matrix decomposition
CN111563787A (en) * 2020-03-19 2020-08-21 天津大学 Recommendation system and method based on user comments and scores
CN111667344A (en) * 2020-06-08 2020-09-15 中森云链(成都)科技有限责任公司 Personalized recommendation method integrating comments and scores
CN111899063A (en) * 2020-06-17 2020-11-06 东南大学 Fresh agricultural product online recommendation method considering customer consumption behaviors and preference
CN112905908A (en) * 2021-03-04 2021-06-04 浙江机电职业技术学院 Collaborative filtering algorithm based on score LDA
CN112966203A (en) * 2021-03-12 2021-06-15 杨虡 Grade determination method and device, electronic equipment and storage medium

Similar Documents

Publication Publication Date Title
CN110321485A (en) A kind of proposed algorithm of combination user comment and score information
CN108959603B (en) Personalized recommendation system and method based on deep neural network
Jacobs et al. Model-based purchase predictions for large assortments
Zhang et al. Taxonomy discovery for personalized recommendation
CN103164463B (en) Method and device for recommending labels
Koren et al. Ordrec: an ordinal model for predicting personalized item rating distributions
CN107451894B (en) Data processing method, device and computer readable storage medium
Shams et al. A non-parametric LDA-based induction method for sentiment analysis
Li et al. Personalization recommendation algorithm based on trust correlation degree and matrix factorization
Liu et al. Towards a dynamic top-n recommendation framework
Zhang et al. Recommender systems based on ranking performance optimization
CN113420221B (en) Interpretable recommendation method integrating implicit article preference and explicit feature preference of user
Hazrati et al. Simulating the impact of recommender systems on the evolution of collective users' choices
Sridhar et al. Content-Based Movie Recommendation System Using MBO with DBN.
Du et al. Personalized product service scheme recommendation based on trust and cloud model
Sang et al. A ranking based recommender system for cold start & data sparsity problem
CN108182264B (en) Ranking recommendation method based on cross-domain ranking recommendation model
Claeys et al. Dynamic allocation optimization in a/b-tests using classification-based preprocessing
Krasnoshchok et al. Extended content-boosted matrix factorization algorithm for recommender systems
Mathur et al. A graph-based recommender system for food products
Lv et al. Supplier recommendation based on knowledge graph embedding
Yan et al. Tackling the achilles heel of social networks: Influence propagation based language model smoothing
Aslanyan et al. Utilizing textual reviews in latent factor models for recommender systems
Duan et al. An adaptive dirichlet multinomial mixture model for short text streaming clustering
Abbas et al. A deep learning approach for context-aware citation recommendation using rhetorical zone classification and similarity to overcome cold-start problem

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination