CN110321485A - A kind of proposed algorithm of combination user comment and score information - Google Patents
A kind of proposed algorithm of combination user comment and score information Download PDFInfo
- Publication number
- CN110321485A CN110321485A CN201910531413.5A CN201910531413A CN110321485A CN 110321485 A CN110321485 A CN 110321485A CN 201910531413 A CN201910531413 A CN 201910531413A CN 110321485 A CN110321485 A CN 110321485A
- Authority
- CN
- China
- Prior art keywords
- user
- formula
- theme
- parameter
- comment
- Prior art date
- Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
- Pending
Links
Classifications
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F16/00—Information retrieval; Database structures therefor; File system structures therefor
- G06F16/30—Information retrieval; Database structures therefor; File system structures therefor of unstructured textual data
- G06F16/33—Querying
- G06F16/335—Filtering based on additional data, e.g. user or group profiles
- G06F16/337—Profile generation, learning or modification
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F16/00—Information retrieval; Database structures therefor; File system structures therefor
- G06F16/90—Details of database functions independent of the retrieved data types
- G06F16/95—Retrieval from the web
- G06F16/953—Querying, e.g. by the use of web search engines
- G06F16/9535—Search customisation based on user profiles and personalisation
Landscapes
- Engineering & Computer Science (AREA)
- Databases & Information Systems (AREA)
- Theoretical Computer Science (AREA)
- Data Mining & Analysis (AREA)
- Physics & Mathematics (AREA)
- General Engineering & Computer Science (AREA)
- General Physics & Mathematics (AREA)
- Computational Linguistics (AREA)
- Information Retrieval, Db Structures And Fs Structures Therefor (AREA)
Abstract
The invention discloses the proposed algorithm of a kind of combination user comment and score information, the specific steps realized are as follows: construct the generative probabilistic model for finding potential theme dimension in user comment text;Construct the recommendation objective function combined based on user's rating matrix decomposition model with motif discovery model;It is realized by the iterative calculation to objective function and is predicted based on the Products Show of user comment text and score data.Algorithm of the invention has fully considered the comment information of user, using the potential theme distribution in comment text, user's score data is combined with user comment text, efficiently solves the problems, such as the cold start-up in recommender system;Simultaneously than individually considering that the method for two kinds of data sources more can accurately carry out score in predicting, especially suitable for the score in predicting to new product and new user.
Description
Technical field:
The present invention relates to proposed algorithm field, specially a kind of proposed algorithm of combination user comment and score information.
Background technique:
Recommender system is widely applied in disparate networks platform, it has changed user and has produced in online discovery and assessment
The mode of product.Existing recommended method can be divided into two major classes: collaborative filtering method and content-based recommendation method.It cooperateed with
Filtering method is to comment grading information to be modeled based on dominant, although comparatively ideal recommendation effect can be obtained, there is scoring
The sparsity problem of data.Content-based recommendation method is that by excavating there are the commodity of same or similar attribute to be pushed away
It recommends, the recommendation that this method generates has the problem of recommendation results unification.There are many research about scoring modeling, however score
The cold start-up of sparsity, recommendation possessed by data and the transfer of user preferences are a problem always, cannot be solved well
Certainly.At the same time, another feedback system on website, i.e. comment itself are commented on, is often ignored.Therefore, Recent study
Person excavates the information such as the label of the relationship of user, the comment of user and commodity by trial to further increase the matter of recommended method
Amount.Although also many research work be all research scoring and user comment text, they be all by this two o'clock in isolation
Research, few researchs, which attempt to combine in both information sources, carries out proposed algorithm research.
Summary of the invention:
The purpose of the present invention is in view of the drawbacks of the prior art, provide the recommendation of a kind of combination user comment and score information
Algorithm, to solve the problems, such as that above-mentioned background technique proposes.
To achieve the above object, the invention provides the following technical scheme: a kind of combination user comment and score information push away
Recommend algorithm, comprising the following steps:
Step (1): the generative probabilistic model for finding potential theme dimension in user comment text, calculation formula are constructed
It is as follows:
In formula, NdIndicate the number of words of file d;
In formula, θiIndicate that the k for project i ties up theme distribution;
In formula, Zu, i, j indicate theme of the user u about j-th of word of project i;
In formula, Wu, i, j indicate j-th of word that user u comments on project i;
Step (2): the recommendation function that building is combined based on user's rating matrix decomposition model with motif discovery model,
Calculation formula is as follows:
In formula, Θ indicates scoring, the i.e. parameter based on rating matrix decomposition model;
In formula, Φ={ θ, φ } indicates topic parameter, the i.e. parameter sets based on comment LDA motif discovery model;
In formula, k indicates the weight of control transfer function;
In formula, z is the topic parameter of each word in corpus T;
In formula, rec (u, i) is that user u scores to the prediction of project i;
In formula, ru,iIndicate grading of the user u to i;
In formula, λ is a hyper parameter, for controlling this two-part weight;
In formula, and logL (T | θ, φ, z) indicate user comment collection LDA log-likelihood function;
Step (3): the product based on user comment text and score data is realized by the iterative calculation to objective function
Recommend prediction, minimize objective function are as follows:
As a preferred technical solution of the present invention, step (1) building is used to find in user comment text
Process is embodied in the generative probabilistic model of potential theme dimension are as follows:
Step (1a): each document is considered as the sequence being made of N number of word, M is in document sets packet containing piece text
The number of shelves, z indicate that the theme distribution of word, α and β are the hyper parameter of the distribution θ and word distribution phi of theme in text, clothes respectively
It is distributed from priori Di Li Cray;
Step (1b): by each document d ∈ D and a theme θ with K dimension theme distributiondIt is associated, i.e. in document d
The word segment of discussion topic K, the text in text d is with θd,kProbability in discussion topic k;
Step (1c): assuming that theme distribution θdDirichlet distribution itself is obeyed, last model includes each theme
Word distribution phik, the theme distribution θ of each documentdAnd each word zd,jTheme distribution;
Step (1d): parameter Φ={ θ, φ } and theme distribution z is updated by sampling;
Step (1e): given word distribution phikAfter the theme distribution of each word, it may be constructed and belong to particular text corpus
Library, calculation formula are as follows:
In formula, NdIndicate the number of words of file d.
As a preferred technical solution of the present invention, step (2) building decomposes mould based on user's rating matrix
Process is embodied in the recommendation function that type is combined with motif discovery model are as follows:
Step (2a): being defined " document " in this model, and it is as follows to define method:
Document is exported from comment text, is that all comment set of specific project i are defined as document di;
Step (2b): grading parameters γ will be learntiWith user comment parameter θiThe two connects, herein it is implicitly assumed that
The number K for the latent factor that rating matrix decomposes is identical with the potential theme number K of comment text, and latent factor is with identical
Weight, the transformational relation constructed between latent factor and potential theme is as follows:
In formula, θi,kTheme of the expression project i on potential feature K, is used here exponential form and ensures several θi,kAll
For positive value, and meet ∑kθi,k=1, γi,kValue of the latent factor vector of expression project i on feature k introduces parameter k and comes
" peak value " of transformation is controlled, in other words, k indicates the weight of control conversion, as κ → ∞, θiWill be close to unit vector, the unit
Vector is only for γiMaximal index value 1;As κ → 0, θiClose to being uniformly distributed, intuitively, big κ indicates that user only discusses
Most important theme, and small κ then indicates that user equably discusses all themes;
Step (2c): the corpus objective function T based on user's scoring and user comment is defined:
In formula, Θ and Φ={ θ, φ } respectively indicate scoring and topic parameter, the i.e. ginseng based on rating matrix decomposition model
It counts and the parameter sets based on comment LDA motif discovery model, k indicates the weight of control transfer function, z is every in corpus T
The topic parameter of a word, and logL (T | θ, φ, z) indicate user comment collection LDA log-likelihood function, the first part of this equation
It is the error of prediction grading point, second part is the log-likelihood function of comment text topic model, and λ is a hyper parameter, is used
In controlling this two-part weight, rec (u, i) is that user u scores to the prediction of project i, can be obtained by following formula:
Rec (u, i)=alpha+betau+βi+γu·γi
Wherein, α indicates global bias, i.e., the average value of whole score datas, and reflection different data collection scores to user
It influences, βuIt is user and project biasing, the influence of two biasings reflection different users and disparity items to scoring, γ respectively with β i
The K that u and γ i respectively indicates user u and project i ties up potential theme vector, and γ i can intuitively be considered as " belonging to for product i
Property ", and γ u is user to these attributes " preference ", meanwhile, the training corpus that a given scoring is T, parameter Θ=
The selection of { α, β u, β i, γ u, γ i } is usually to minimize mean square error (MSE), it may be assumed that
Wherein, Ω (Θ) is the regularizer for punishing " complexity " model.
As a preferred technical solution of the present invention, the step (3) is realized by the iterative calculation to objective function
Specific implementation process based on the prediction of the Products Show of user comment text and score data are as follows:
Step (3a): substituting into the formula in step (2c) for the formula in step (1e), and building minimizes objective function,
Calculation formula is as follows:
Step (3b): the theme z of fixed wordd,jAfterwards, gradient descent method can be used to parameter TMF model matrix resolution parameter
The quantity k of lumped parameter Θ and motif discovery parameter sets Φ and potential dimension/theme is solved, and calculation formula is as follows:
Wherein, the derivation process of objective function is solved to above formula, calculation formula is as follows:
In formula, nd,kIndicate the number of theme k occur in document d;
Step (3c): no longer become by step (3b) and the continuous iteration of Gibbs sampling method until to the parameter of output
Change or reach certain threshold value, algorithm reaches convergence, wherein gibbs sampler algorithm calculation formula is as follows:
Beneficial effects of the present invention: algorithm of the invention has fully considered the comment information of user, using in comment text
Potential theme distribution, user's score data is combined with user comment text, efficiently solve in recommender system cold opens
Dynamic problem;Simultaneously than individually considering that the method for two kinds of data sources more can accurately carry out score in predicting, especially suitable for new
The score in predicting of product and new user, because the history score data that these new users may possess is very little, so that it cannot right
Its potential factor is modeled.
Detailed description of the invention:
Fig. 1 is the graphical representation of probability production model LDA of the present invention;
Fig. 2 is influence comparison diagram of the various regularization parameter λ values of the present invention to mean square error MSE value;
Fig. 3 is inventive algorithm and comparison diagram of the various conventional recommendation algorithms in MSE index value;
Fig. 4 is inventive algorithm and comparison diagram of the various conventional recommendation algorithms in ACC index value.
Specific embodiment:
The preferred embodiments of the present invention will be described in detail with reference to the accompanying drawing, so that advantages and features of the invention energy
It is easier to be understood by those skilled in the art, so as to make a clearer definition of the protection scope of the present invention.
The present invention provides a kind of technical solution: a kind of proposed algorithm of combination user comment and score information:
The following steps are included:
Step (1): the generative probabilistic model for finding potential theme dimension in user comment text, calculation formula are constructed
It is as follows:
In formula, NdIndicate the number of words of file d;
In formula, θiIndicate that the k for project i ties up theme distribution;
In formula, Zu, i, j indicate theme of the user u about j-th of word of project i;
In formula, Wu, i, j indicate j-th of word that user u comments on project i;
Step (2): the recommendation function that building is combined based on user's rating matrix decomposition model with motif discovery model,
Calculation formula is as follows:
In formula, Θ indicates scoring, the i.e. parameter based on rating matrix decomposition model;
In formula, Φ={ θ, φ } indicates topic parameter, the i.e. parameter sets based on comment LDA motif discovery model;
In formula, k indicates the weight of control transfer function;
In formula, z is the topic parameter of each word in corpus T;
In formula, rec (u, i) is that user u scores to the prediction of project i;
In formula, ru,iIndicate grading of the user u to i;
In formula, λ is a hyper parameter, for controlling this two-part weight;
In formula, and logL (T | θ, φ, z) indicate user comment collection LDA log-likelihood function;
Step (3): the product based on user comment text and score data is realized by the iterative calculation to objective function
Recommend prediction, minimize objective function are as follows:
The generative probabilistic model specific implementation for finding potential theme dimension in user comment text of step (1) building
Process are as follows:
Step (1a): each document is considered as the sequence being made of N number of word, M is in document sets packet containing piece text
The number of shelves, z indicate that the theme distribution of word, α and β are the hyper parameter of the distribution θ and word distribution phi of theme in text, clothes respectively
It is distributed from priori Di Li Cray;
Step (1b): by each document d ∈ D and a theme θ with K dimension theme distributiondIt is associated, i.e. in document d
The word segment of discussion topic K, the text in text d is with θd,kProbability in discussion topic k;
Step (1c): assuming that theme distribution θdDirichlet distribution itself is obeyed, last model includes each theme
Word distribution phik, the theme distribution θ of each documentdAnd each word zd,jTheme distribution,
Step (1d): parameter Φ={ θ, φ } and theme distribution z is updated by sampling;
Step (1e): given word distribution phikAfter the theme distribution of each word, it may be constructed and belong to particular text corpus
Library, calculation formula are as follows:
In formula, NdIndicate the number of words of file d.
The recommendation function tool of step (2) building combined based on user's rating matrix decomposition model with motif discovery model
Body implementing procedure are as follows:
Step (2a): being defined " document " in this model, and it is as follows to define method:
Document is exported from comment text, is that all comment set of specific project i are defined as document di;
Step (2b): grading parameters γ will be learntiWith user comment parameter θiThe two connects, herein it is implicitly assumed that
The number K for the latent factor that rating matrix decomposes is identical with the potential theme number K of comment text, and latent factor is with identical
Weight, the transformational relation constructed between latent factor and potential theme is as follows:
In formula, θi,kTheme of the expression project i on potential feature K, is used here exponential form and ensures several θi,kAll
For positive value, and meet ∑kθi,k=1, γi,kValue of the latent factor vector of expression project i on feature k introduces parameter k and comes
" peak value " of transformation is controlled, in other words, k indicates the weight of control conversion, as κ → ∞, θiWill be close to unit vector, the unit
Vector is only for γiMaximal index value 1;As κ → 0, θiClose to being uniformly distributed, intuitively, big κ indicates that user only discusses
Most important theme, and small κ then indicates that user equably discusses all themes;
Step (2c): the corpus objective function T based on user's scoring and user comment is defined:
In formula, Θ and Φ={ θ, φ } respectively indicate scoring and topic parameter, the i.e. ginseng based on rating matrix decomposition model
It counts and the parameter sets based on comment LDA motif discovery model, k indicates the weight of control transfer function, z is every in corpus T
The topic parameter of a word, and logL (T | θ, φ, z) indicate user comment collection LDA log-likelihood function, the first part of this equation
It is the error of prediction grading point, second part is the log-likelihood function of comment text topic model, and λ is a hyper parameter, is used
In controlling this two-part weight, rec (u, i) is that user u scores to the prediction of project i, can be obtained by following formula:
Rec (u, i)=alpha+betau+βi+γu·γi
Wherein, α indicates global bias, i.e., the average value of whole score datas, and reflection different data collection scores to user
It influences, βuIt is user and project biasing, the influence of two biasings reflection different users and disparity items to scoring, γ respectively with β i
The K that u and γ i respectively indicates user u and project i ties up potential theme vector, and γ i can intuitively be considered as " belonging to for product i
Property ", and γ u is user to these attributes " preference ", meanwhile, the training corpus that a given scoring is T, parameter Θ=
The selection of { α, β u, β i, γ u, γ i } is usually to minimize mean square error (MSE), it may be assumed that
Wherein, Ω (Θ) is the regularizer for punishing " complexity " model.
Step (3) realizes that the product based on user comment text and score data pushes away by the iterative calculation to objective function
Recommend the specific implementation process of prediction are as follows:
Step (3a): substituting into the formula in step (2c) for the formula in step (1e), and building minimizes objective function,
Calculation formula is as follows:
Step (3b): the theme z of fixed wordd,jAfterwards, gradient descent method can be used to parameter TMF model matrix resolution parameter
The quantity k of lumped parameter Θ and motif discovery parameter sets Φ and potential dimension/theme is solved, and calculation formula is as follows:
Wherein, the derivation process of objective function is solved to above formula, calculation formula is as follows:
In formula, nd,kIndicate the number of theme k occur in document d;
Step (3c): no longer become by step (3b) and the continuous iteration of Gibbs sampling method until to the parameter of output
Change or reach certain threshold value, algorithm reaches convergence, wherein gibbs sampler algorithm calculation formula is as follows:
Include optimization algorithm performance to verify the present invention, chooses conventional method Offset, LFM of current Products Show
(LatentFactorModel), SVD++ model, SlopeOne comparative test.
Offset method: this method is a kind of collaborative filtering model based on global bias, is used in model construction
The average predicted value as the commodity of commodity, the i.e. prediction using the average mark of all marking of certain commodity as user to the commodity
Scoring.
LFM (LatentFactorModel), hidden semantic model: this method by matrix decomposition (SVD) to unknown commodity into
Row prediction scoring.This model only accounts for the score information of user, and there is no the comment text information for considering user.
SVD++ model: the model is the information that neighborhood commodity are added in SVD model, so that SVD++ model has been obtained,
Using the accumulation result of latent factor possessed by user's history comment commodity as field merchandise news.
SlopeOne: this method is the current widely used collaborative filtering method based on commodity, and algorithm operation is high
Effect is succinct, and can be obtained by Open-Source Tools.
For quantization performance, the common mean square error MSE of in-service evaluation recommender system of the present invention determines as evaluation index
Adopted formula is as follows:
Wherein M indicates the total quantity of prediction scoring,Indicate that user u scores to the prediction of project i, ruiIndicate u pairs of user
The practical scoring of project i.
In addition to use mean square error MSE be evaluation index other than, this experiment also introduces accuracy (Accuracy) conduct weigh
Second index of amount comment prediction accuracy, ACC are defined as follows:
Wherein m indicates system prediction scoring and the practical consistent frequency that scores of user.
We use empirical value, i.e. α=0.2 and β=0.1 to the value of hyper parameter α and β in LDA model in an experiment.
Statistical information (table 1) selection of data set used in the present invention collects user comment data in various public resources.Number
According to main source be Amazon, obtain about 35,000,000 user comments.In order to obtain these data, 75,000,000 are listed first
The character string of a similar asin (goods number of Amazon oneself), these character strings are obtained from the Internet Archive,
In there are about 2,500,000 commodity at least one user comments.According to the top categories (such as books, film) of every kind of product by this
Data set is further divided into 26 parts.This data set is the superset of existing publicly available Amazon data set.Amount to from
Forty-two million user comment is obtained in 1000 general-purpose families and 3,000,000 projects.The user that data set covers 5,100,000,000 words in total comments
By.
Parameter K value verification test (table 2) shows under different number of theme, the mean square error MSE of the proposed algorithm and
Accuracy ACC.The result of MSE and ACC when taking different value by comparing K in table 1, it can be seen that the theme when the value of K is 10
It is the most clear to divide.When the value of K constantly increases, system performance is constantly promoted.It is noted that when K value increases from 10
When to 20, the amplitude that system performance increases is smaller, therefore tests the theme number for finally setting K=10 to default.In order to make LDA
Model can realize fast convergence in comment data, and experiment sets 100 for the number of iterations.
(Fig. 2) is tested in influence of the various regularization parameter λ values to mean square error MSE value.The effect of regularization parameter λ is to use
Control the regularization weight of subject matter preferences, regularization term is in machine learning algorithm for avoiding model excessive with actual result
One of effective means of fitting.The experimental result of Fig. 2 gives the behavior pattern of parameter system under different values, by number in figure
According to it is found that the MSE index of system tends to be steady when λ=0.5 value nearby.Therefore present invention experiment finally joins regularization
The value of number λ is set as 0.5.
It, can be with by experimental result by inventive algorithm and comparison (Fig. 3) of the various conventional recommendation algorithms in MSE index value
Find out that latent factor matrix decomposition effectively improves the recommendation quality of recommender system, LFM is than the side Offset based on global bias
Method performance will be got well, and the performance of SVD++ method is better than LFM.TMF method due to consider user scoring and comment information, because
This obtains optimal recommendation performance.
After randomly selecting 10 product categories, inventive algorithm and pair of the various conventional recommendation algorithms in MSE index value
Than (table 3), in order to verify the validity of motif discovery, experiment has randomly selected 10 classes in Amazon28 subclass data again
Not, comprising level-ones classifications such as mother and baby, food, phonotape and videotape, makeups.TMF model is in 10 subclasses from the point of view of comprehensive all data subsets
Better than conventional model, in such as " clothes " and " shoes and hats " several classifications, TMF algorithm is the most obvious than what other algorithms were promoted.
It can be seen that these classifications that TMF behaves oneself best all are more subjective.Because of these class users federation in comment
Numerous aspects of this product are prompted, therefore use comment text, TMF can preferably " separate " objective figures of product and comment
Theorist is to its subjective opinion.
Inventive algorithm and comparison (Fig. 4) of the various conventional recommendation algorithms in ACC index value, the practical value that scores is 1-
Result need to be rounded nearby when the prediction scoring that algorithm obtains is decimal to calculate ACC by 5 integer.From the results of view, TMF
Algorithm behaves oneself best.Test set takes respectively to be randomly assigned to distribute two kinds of situations with timing.As a whole, the performance of score in predicting
It is better than timing distribution method in the case where being randomly assigned.
Table 1
Table 2
Table 3
The embodiments described above only express several embodiments of the present invention, and the description thereof is more specific and detailed, but simultaneously
It cannot therefore be construed as limiting the scope of the patent.It should be pointed out that coming for those of ordinary skill in the art
It says, without departing from the inventive concept of the premise, various modifications and improvements can be made, these belong to protection of the invention
Range.
Claims (4)
1. a kind of proposed algorithm of combination user comment and score information, which comprises the following steps:
Step (1): the generative probabilistic model for finding potential theme dimension in user comment text is constructed, calculation formula is such as
Under:
In formula, NdIndicate the number of words of file d;
In formula, θiIndicate that the k for project i ties up theme distribution;
In formula, Zu, i, j indicate theme of the user u about j-th of word of project i;
In formula, Wu, i, j indicate j-th of word that user u comments on project i;
Step (2): the recommendation function that building is combined based on user's rating matrix decomposition model with motif discovery model calculates
Formula is as follows:
In formula, Θ indicates scoring, the i.e. parameter based on rating matrix decomposition model;
In formula, Φ={ θ, φ } indicates topic parameter, the i.e. parameter sets based on comment LDA motif discovery model;
In formula, k indicates the weight of control transfer function;
In formula, z is the topic parameter of each word in corpus T;
In formula, rec (u, i) is that user u scores to the prediction of project i;
In formula, ru,iIndicate grading of the user u to i;
In formula, λ is a hyper parameter, for controlling this two-part weight;
In formula, and logL (T | θ, φ, z) indicate user comment collection LDA log-likelihood function;
Step (3): the Products Show based on user comment text and score data is realized by the iterative calculation to objective function
Prediction minimizes objective function are as follows:
2. the proposed algorithm of a kind of combination user comment and score information according to claim 1, it is characterised in that: described
The generative probabilistic model specific implementation process for finding potential theme dimension in user comment text of step (1) building are as follows:
Step (1a): each document is considered as the sequence being made of N number of word, M is the document containing a piece in document sets packet
Number, z indicate that the theme distribution of word, α and β are the hyper parameter of the distribution θ and word distribution phi of theme in text respectively, obey first
Test the distribution of Di Li Cray;
Step (1b): by each document d ∈ D and a theme θ with K dimension theme distributiondIt is associated, i.e. master is discussed in document d
The word segment of K is inscribed, the text in text d is with θd,kProbability in discussion topic k;
Step (1c): assuming that theme distribution θdDirichlet distribution itself is obeyed, last model includes the word of each theme
Distribution phik, the theme distribution θ of each documentdAnd each word zd,jTheme distribution;
Step (1d): parameter Φ={ θ, φ } and theme distribution z is updated by sampling;
Step (1e): given word distribution phikAfter the theme distribution of each word, it may be constructed and belong to particular text corpus,
Calculation formula is as follows:
In formula, NdIndicate the number of words of file d.
3. the proposed algorithm of a kind of combination user comment and score information according to claim 1, it is characterised in that: described
The recommendation function specific implementation stream of step (2) building combined based on user's rating matrix decomposition model with motif discovery model
Journey are as follows:
Step (2a): being defined " document " in this model, and it is as follows to define method:
Document is exported from comment text, is that all comment set of specific project i are defined as document di;
Step (2b): grading parameters γ will be learntiWith user comment parameter θiThe two connects, herein it is implicitly assumed that scoring
The number K of the latent factor of matrix decomposition is identical with the potential theme number K of comment text, and latent factor power having the same
Weight, the transformational relation constructed between latent factor and potential theme are as follows:
In formula, θi,kTheme of the expression project i on potential feature K, is used here exponential form and ensures several θi,kAll it is positive
Value, and meet ∑kθi,k=1, γi,kValue of the latent factor vector of expression project i on feature k introduces parameter k to control
" peak value " of transformation, in other words, k indicate the weight of control conversion, as κ → ∞, θiWill be close to unit vector, the unit vector
Only for γiMaximal index value 1;As κ → 0, θiClose to being uniformly distributed, intuitively, it is most heavy that big κ indicates that user only discusses
The theme wanted, and small κ then indicates that user equably discusses all themes;
Step (2c): the corpus objective function T based on user's scoring and user comment is defined:
In formula, Θ and Φ={ θ, φ } respectively indicate scoring and topic parameter, i.e., the parameter based on rating matrix decomposition model and
Based on the parameter sets of comment LDA motif discovery model, k indicates the weight of control transfer function, and z is each list in corpus T
The topic parameter of word, and logL (T | θ, φ, z) indicate user comment collection LDA log-likelihood function, the first part of this equation is pre-
The error for fraction of testing and assessing, second part is the log-likelihood function of comment text topic model, and λ is a hyper parameter, for controlling
This two-part weight is made, rec (u, i) is that user u scores to the prediction of project i, it can be obtained by following formula:
Rec (u, i)=alpha+betau+βi+γu·γi
Wherein, α indicates global bias, i.e., the average value of whole score datas reflects the influence that different data collection scores to user,
βuIt is user and project biasing, the influence of two biasings reflection different users and disparity items to scoring, γ u and γ respectively with β i
The K that i respectively indicates user u and project i ties up potential theme vector, and γ i can intuitively be considered as " attribute " of product i, and
γ u is user to these attributes " preference ", meanwhile, give the training corpus that a scoring is T, parameter Θ={ α, β u, β
I, γ u, γ i } selection be usually minimize mean square error (MSE), it may be assumed that
Wherein, Ω (Θ) is the regularizer for punishing " complexity " model.
4. the proposed algorithm of a kind of combination user comment and score information according to claim 1, it is characterised in that: described
Step (3) is predicted by the iterative calculation realization to objective function based on the Products Show of user comment text and score data
Process is embodied are as follows:
Step (3a): substituting into the formula in step (2c) for the formula in step (1e), and building minimizes objective function, calculates
Formula is as follows:
Step (3b): the theme z of fixed wordd,jAfterwards, gradient descent method can be used to parameter TMF model matrix resolution parameter set
The quantity k of parameter Θ and motif discovery parameter sets Φ and potential dimension/theme is solved, and calculation formula is as follows:
Wherein, the derivation process of objective function is solved to above formula, calculation formula is as follows:
In formula, nd,kIndicate the number of theme k occur in document d;
Step (3c): by step (3b) and the continuous iteration of Gibbs sampling method until no longer changing to the parameter of output or
Person reaches certain threshold value, and algorithm reaches convergence, wherein gibbs sampler algorithm calculation formula is as follows:
Priority Applications (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN201910531413.5A CN110321485A (en) | 2019-06-19 | 2019-06-19 | A kind of proposed algorithm of combination user comment and score information |
Applications Claiming Priority (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN201910531413.5A CN110321485A (en) | 2019-06-19 | 2019-06-19 | A kind of proposed algorithm of combination user comment and score information |
Publications (1)
Publication Number | Publication Date |
---|---|
CN110321485A true CN110321485A (en) | 2019-10-11 |
Family
ID=68119820
Family Applications (1)
Application Number | Title | Priority Date | Filing Date |
---|---|---|---|
CN201910531413.5A Pending CN110321485A (en) | 2019-06-19 | 2019-06-19 | A kind of proposed algorithm of combination user comment and score information |
Country Status (1)
Country | Link |
---|---|
CN (1) | CN110321485A (en) |
Cited By (7)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN111061962A (en) * | 2019-11-25 | 2020-04-24 | 上海海事大学 | Recommendation method based on user score analysis |
CN111259238A (en) * | 2020-01-13 | 2020-06-09 | 山西大学 | Post-interpretable recommendation method and device based on matrix decomposition |
CN111563787A (en) * | 2020-03-19 | 2020-08-21 | 天津大学 | Recommendation system and method based on user comments and scores |
CN111667344A (en) * | 2020-06-08 | 2020-09-15 | 中森云链(成都)科技有限责任公司 | Personalized recommendation method integrating comments and scores |
CN111899063A (en) * | 2020-06-17 | 2020-11-06 | 东南大学 | Fresh agricultural product online recommendation method considering customer consumption behaviors and preference |
CN112905908A (en) * | 2021-03-04 | 2021-06-04 | 浙江机电职业技术学院 | Collaborative filtering algorithm based on score LDA |
CN112966203A (en) * | 2021-03-12 | 2021-06-15 | 杨虡 | Grade determination method and device, electronic equipment and storage medium |
Citations (3)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN106202519A (en) * | 2016-07-22 | 2016-12-07 | 桂林电子科技大学 | A kind of combination user comment content and the item recommendation method of scoring |
CN109903099A (en) * | 2019-03-12 | 2019-06-18 | 合肥工业大学 | Model building method and system for score in predicting |
CN109902229A (en) * | 2019-02-01 | 2019-06-18 | 中森云链(成都)科技有限责任公司 | A kind of interpretable recommended method based on comment |
-
2019
- 2019-06-19 CN CN201910531413.5A patent/CN110321485A/en active Pending
Patent Citations (3)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN106202519A (en) * | 2016-07-22 | 2016-12-07 | 桂林电子科技大学 | A kind of combination user comment content and the item recommendation method of scoring |
CN109902229A (en) * | 2019-02-01 | 2019-06-18 | 中森云链(成都)科技有限责任公司 | A kind of interpretable recommended method based on comment |
CN109903099A (en) * | 2019-03-12 | 2019-06-18 | 合肥工业大学 | Model building method and system for score in predicting |
Non-Patent Citations (2)
Title |
---|
JULIAN MCAULEY等: ""Hidden Factors and Hidden Topics:Understanding Rating Dimensions with Review Text"", 《RECSYS 13:PROCEEDINGS OF THE 7TH ACM CONFERENCE ON RECOMMENDER SYSTEMS》 * |
李琳等: ""融合评分矩阵与评论文本的商品推荐模型"", 《计算机学报》 * |
Cited By (9)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN111061962A (en) * | 2019-11-25 | 2020-04-24 | 上海海事大学 | Recommendation method based on user score analysis |
CN111061962B (en) * | 2019-11-25 | 2023-09-29 | 上海海事大学 | Recommendation method based on user scoring analysis |
CN111259238A (en) * | 2020-01-13 | 2020-06-09 | 山西大学 | Post-interpretable recommendation method and device based on matrix decomposition |
CN111259238B (en) * | 2020-01-13 | 2023-04-14 | 山西大学 | Post-interpretable recommendation method and device based on matrix decomposition |
CN111563787A (en) * | 2020-03-19 | 2020-08-21 | 天津大学 | Recommendation system and method based on user comments and scores |
CN111667344A (en) * | 2020-06-08 | 2020-09-15 | 中森云链(成都)科技有限责任公司 | Personalized recommendation method integrating comments and scores |
CN111899063A (en) * | 2020-06-17 | 2020-11-06 | 东南大学 | Fresh agricultural product online recommendation method considering customer consumption behaviors and preference |
CN112905908A (en) * | 2021-03-04 | 2021-06-04 | 浙江机电职业技术学院 | Collaborative filtering algorithm based on score LDA |
CN112966203A (en) * | 2021-03-12 | 2021-06-15 | 杨虡 | Grade determination method and device, electronic equipment and storage medium |
Similar Documents
Publication | Publication Date | Title |
---|---|---|
CN110321485A (en) | A kind of proposed algorithm of combination user comment and score information | |
CN108959603B (en) | Personalized recommendation system and method based on deep neural network | |
Jacobs et al. | Model-based purchase predictions for large assortments | |
Zhang et al. | Taxonomy discovery for personalized recommendation | |
CN103164463B (en) | Method and device for recommending labels | |
Koren et al. | Ordrec: an ordinal model for predicting personalized item rating distributions | |
CN107451894B (en) | Data processing method, device and computer readable storage medium | |
Shams et al. | A non-parametric LDA-based induction method for sentiment analysis | |
Li et al. | Personalization recommendation algorithm based on trust correlation degree and matrix factorization | |
Liu et al. | Towards a dynamic top-n recommendation framework | |
Zhang et al. | Recommender systems based on ranking performance optimization | |
CN113420221B (en) | Interpretable recommendation method integrating implicit article preference and explicit feature preference of user | |
Hazrati et al. | Simulating the impact of recommender systems on the evolution of collective users' choices | |
Sridhar et al. | Content-Based Movie Recommendation System Using MBO with DBN. | |
Du et al. | Personalized product service scheme recommendation based on trust and cloud model | |
Sang et al. | A ranking based recommender system for cold start & data sparsity problem | |
CN108182264B (en) | Ranking recommendation method based on cross-domain ranking recommendation model | |
Claeys et al. | Dynamic allocation optimization in a/b-tests using classification-based preprocessing | |
Krasnoshchok et al. | Extended content-boosted matrix factorization algorithm for recommender systems | |
Mathur et al. | A graph-based recommender system for food products | |
Lv et al. | Supplier recommendation based on knowledge graph embedding | |
Yan et al. | Tackling the achilles heel of social networks: Influence propagation based language model smoothing | |
Aslanyan et al. | Utilizing textual reviews in latent factor models for recommender systems | |
Duan et al. | An adaptive dirichlet multinomial mixture model for short text streaming clustering | |
Abbas et al. | A deep learning approach for context-aware citation recommendation using rhetorical zone classification and similarity to overcome cold-start problem |
Legal Events
Date | Code | Title | Description |
---|---|---|---|
PB01 | Publication | ||
PB01 | Publication | ||
SE01 | Entry into force of request for substantive examination | ||
SE01 | Entry into force of request for substantive examination |