CN111666413A - Commodity comment recommendation method based on reviewer reliability regression prediction - Google Patents
Commodity comment recommendation method based on reviewer reliability regression prediction Download PDFInfo
- Publication number
- CN111666413A CN111666413A CN202010516638.6A CN202010516638A CN111666413A CN 111666413 A CN111666413 A CN 111666413A CN 202010516638 A CN202010516638 A CN 202010516638A CN 111666413 A CN111666413 A CN 111666413A
- Authority
- CN
- China
- Prior art keywords
- reviewer
- comment
- comments
- commodity
- reliability
- Prior art date
- Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
- Granted
Links
Images
Classifications
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F16/00—Information retrieval; Database structures therefor; File system structures therefor
- G06F16/30—Information retrieval; Database structures therefor; File system structures therefor of unstructured textual data
- G06F16/35—Clustering; Classification
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F40/00—Handling natural language data
- G06F40/20—Natural language analysis
- G06F40/279—Recognition of textual entities
- G06F40/289—Phrasal analysis, e.g. finite state techniques or chunking
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06Q—INFORMATION AND COMMUNICATION TECHNOLOGY [ICT] SPECIALLY ADAPTED FOR ADMINISTRATIVE, COMMERCIAL, FINANCIAL, MANAGERIAL OR SUPERVISORY PURPOSES; SYSTEMS OR METHODS SPECIALLY ADAPTED FOR ADMINISTRATIVE, COMMERCIAL, FINANCIAL, MANAGERIAL OR SUPERVISORY PURPOSES, NOT OTHERWISE PROVIDED FOR
- G06Q30/00—Commerce
- G06Q30/06—Buying, selling or leasing transactions
- G06Q30/0601—Electronic shopping [e-shopping]
- G06Q30/0623—Item investigation
- G06Q30/0625—Directed, with specific intent or strategy
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06Q—INFORMATION AND COMMUNICATION TECHNOLOGY [ICT] SPECIALLY ADAPTED FOR ADMINISTRATIVE, COMMERCIAL, FINANCIAL, MANAGERIAL OR SUPERVISORY PURPOSES; SYSTEMS OR METHODS SPECIALLY ADAPTED FOR ADMINISTRATIVE, COMMERCIAL, FINANCIAL, MANAGERIAL OR SUPERVISORY PURPOSES, NOT OTHERWISE PROVIDED FOR
- G06Q30/00—Commerce
- G06Q30/06—Buying, selling or leasing transactions
- G06Q30/0601—Electronic shopping [e-shopping]
- G06Q30/0631—Item recommendations
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06N—COMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
- G06N3/00—Computing arrangements based on biological models
- G06N3/02—Neural networks
- G06N3/04—Architecture, e.g. interconnection topology
- G06N3/048—Activation functions
-
- Y—GENERAL TAGGING OF NEW TECHNOLOGICAL DEVELOPMENTS; GENERAL TAGGING OF CROSS-SECTIONAL TECHNOLOGIES SPANNING OVER SEVERAL SECTIONS OF THE IPC; TECHNICAL SUBJECTS COVERED BY FORMER USPC CROSS-REFERENCE ART COLLECTIONS [XRACs] AND DIGESTS
- Y02—TECHNOLOGIES OR APPLICATIONS FOR MITIGATION OR ADAPTATION AGAINST CLIMATE CHANGE
- Y02P—CLIMATE CHANGE MITIGATION TECHNOLOGIES IN THE PRODUCTION OR PROCESSING OF GOODS
- Y02P90/00—Enabling technologies with a potential contribution to greenhouse gas [GHG] emissions mitigation
- Y02P90/30—Computing systems specially adapted for manufacturing
Abstract
The invention relates to the technical field of data mining and recommendation, in particular to a commodity comment recommendation method based on reviewer reliability regression prediction, which comprises the following steps of: extracting the attribute characteristics related to the reliability of the reviewer and calculating the attribute characteristic value of the reviewer; constructing a credible score model of the forecast reviewer by using a regression algorithm and calculating the credible score of the reviewer; extracting effective indexes related to comment sequencing and calculating the four effective index values of the comments of the commodities; and constructing a comment sorting model of each commodity by using LambdaMART, calculating and finally determining ranking scores of all comments of each commodity according to the comment sorting model, and recommending the comments according to the ranking scores of the comments. The method solves the problems that users cannot trust unknown users and cannot make correct judgment according to comments of other users in a plurality of websites.
Description
Technical Field
The invention relates to the technical field of data mining and recommendation, in particular to a commodity comment recommendation method based on reviewer reliability regression prediction.
Background
When a consumer purchases a commodity on each shopping site, the consumer usually reads the comments of the reviewers to decide whether to purchase the commodity. In one sense, reviews play a critical role in consumer decisions about whether to purchase a good. However, different consumers have their own judgment and standard for the same commodity, and the comment of a certain reviewer on a certain commodity does not mean that other reviewers have the same opinion on the commodity. In addition, some merchants adopt various means to encourage users to write false comments in order to improve sales, which seriously affects the interests of consumers. Therefore, among the various reviews, the consumer cannot know which review is trustworthy; the product reviews added to the shopping website are mostly ordered by the latest review time, resulting in some good reviews being ranked behind, so that the consumer may not have an opportunity to read. Since most of the credible comments come from credible reviewers, how to find real credible reviewers among a large number of reviewers and recommend the real comments with reference value of reliable reviewers to consumers is a problem to be solved urgently.
PageRank is a core algorithm used by Google search engine, which is designed for ranking web page scores. Today, PageRank is widely expanded for user ranking to compute the authority and influence of users in social networks. Shen et al propose several methods of ranking users in a web community or blog using the PageRank algorithm, such as defining several characteristics of social users, differentiating reputation and sociability, etc. Weng et al propose an algorithm, augmented from PageRank, to measure the influence of social users. Zhao et al, who propose a modif-based PageRank, provide a basic framework for ranking users in a social network, and propose that ranking users using user content is a next task that needs to be studied. While user ranking using PageRank is mainly for social networking. In some systems, however, the effect of using PageRank may be less than ideal if the user's social networking graph is unclear or the data is insufficient to construct an input-output social networking graph. Likewise, if there are few connections between reviewers for each item on some merchandise websites, it may be difficult to construct a social relationship diagram for the reviewers.
There are currently some studies directed to reviews and review ranking. Hsu et al propose a comment ranking method on social networking sites in order to present high quality comments to users. They propose to build ranking models based on characteristics of the content and based on characteristics of the user. Northcutt et al propose an evaluation-diversification ordering scheme based on maximum marginal relevance. Ahmad proposes to summarize comments using machine learning and natural language processing. Swapna and Jiang propose a learning model to predict profound comments by text features, utterance relations and associated features. Hu and Liu propose to summarize reviews using review features and emotional analysis. Samuel proposed a method to summarize user opinions from their comments using a natural network. Therefore, the current research aiming at the comments mainly adopts machine learning, natural language processing, emotion analysis and the like to analyze the contents of the comments, but ignores the reliability of the reviewers and the comments. The consumer always wants to read the trustworthy comments to decide whether to purchase the product. Typically, trustworthy reviews are available to trustworthy reviewers, and these trustworthy reviews also describe to the user the true information about aspects of the good. But the current research on credibility of reviewers and ranking of reviews with reference value is not enough.
Disclosure of Invention
In order to solve the problems that users cannot trust unknown user comments and cannot make correct judgment according to comments of other users in a plurality of websites, the invention provides a commodity comment recommendation method based on reviewer reliability regression prediction.
A commodity comment recommendation method based on reviewer reliability regression prediction comprises the following steps:
extracting relevant attribute characteristics of the credibility of the reviewers, and calculating attribute characteristic values of the reviewers according to a calculation formula of the attribute characteristics of each reviewer;
constructing a credible scoring model of the predicted reviewer by using a regression algorithm according to the calculated attribute characteristic value of the reviewer, and substituting the calculated attribute characteristic value of the reviewer into the credible scoring model of the predicted reviewer to obtain the credible score of the reviewer;
extracting effective indexes related to comment sequencing, and calculating effective index values of comments according to a calculation formula of each effective index of the comments;
according to the calculated effective index value of the comments and the reliability score of the reviewer, constructing a comment sorting model by using LambdaMART, and calculating and obtaining the final ranking of the comments according to the comment sorting model;
and arranging the comments according to the ranking order according to the final ranking of the comments, and preferentially recommending the comments arranged at the top.
Further, the attribute-related characteristics of the reliability of the reviewer include: the method comprises the following steps of length difference dif _ len of every two comments of a reviewer, the number num _ same _ com of the same comments of the same reviewer, the number num _ same _ star of the same star grades of the comments, different information dif _ tag of a comment keyword and a keyword tag provided by a website of each reviewer, scores dif _ star _ score of different star grades, useful words usefuul _ word and different keywords dif _ keyword in each comment i of the reviewer, the number num _ com of comments of one reviewer to one commodity, the number num _ star of scores of the reviewer to the same commodity and the number num _ img of pictures uploaded by the reviewer.
Further, the calculation method of the length difference dif _ len of each two comments of the reviewer includes:
wherein (i, j) represents a comment pair, len (i) represents the length of comment i, len (j) represents the length of comment j,represents the combination of n reviews by the reviewer, with n representing the total number of reviews by the reviewer.
Further, the calculation method of the comment keyword of each reviewer and the different information dif _ tag of the keyword tag provided by the website includes:
wherein n represents the total number of comments, num _ key _ word, of the revieweri-num_tagpIndicating that the keyword of each review is compared to the keyword tag provided on the website, and p indicates the item.
Further, the calculation method of the scores dif _ star _ score of different star levels includes:
where n is the total number of reviews by the reviewer, scoreiScore, representing a comment ipRepresents the score of the item p offered by the website.
Further, the useful words in each comment i of the reviewer r are calculated in a manner that includes:
wherein num _ useful _ wordiThe number of useful words per comment i of the reviewer r is represented, and n represents the total number of comments of the reviewer.
Further, the calculation methods of the different keywords include:
wherein num _ keyword (i, j) represents the same number of keywords commonly owned by the comment i and the comment j, max _ num _ keyword (i, j) is the value with the most number of the keywords owned by the comment i and the comment j,represents the combination of n reviews by the reviewer, with n representing the total number of reviews by the reviewer.
Further, the effective index related to the comment ranking comprises: credibility of the reviewer, similarity between keywords in the review and basic information of the commodity, different information in the review and other reviews, and review date.
Further, the step of constructing the model of the credible score of the predicted reviewer by using the regression algorithm comprises the following steps:
r_scorei=β0+β1fi1+β2fi2+...+βpfip+ei
i=1,2,...,k
wherein, β0Is a constant term, β1、β2、βpTo account for the slope coefficient of the variable, fi1、fi2、fipRespectively representing the characteristic attributes f of the reviewers i1、f2、fpValue of eiIndicates deviation and k indicates number of reviewers.
Furthermore, the comment ordering model is constructed by adopting LambdaMART, if all comments of a certain commodity are n, any two comments c are extracted from all the commentsi、cjForm a review pair, which is common toPossible combinations, for each comment pair cijCalculate the comment ciRanked on comment cjProbability of front pijAs a comment ranking model, the output of the comment ranking model is comment ciRank s ofiAnd comment cjRank s ofjThe expression of the review order model is as follows:
wherein, Pij=P(ci>cj) Express comment ciRanked on comment cjThe previous probability, parameter σ, determines the shape of the sigmoid function.
The invention has the beneficial effects that: the method can screen reliable commentators from massive commentaries for the user and timely read the real commentaries with reference value, and help the user to better decide whether to purchase the commodity. The invention extracts and designs an algorithm aiming at the characteristics of the reviewers and the characteristics of the comments, so that the user can preferentially read the reliable comments, thereby playing a role in helping the user to decide, and reducing the influence of bad reviewers and comments.
Drawings
The present invention will be described in further detail with reference to the accompanying drawings and specific embodiments.
FIG. 1 is a general flowchart of a commodity review recommendation method based on reviewer reliability regression prediction according to an embodiment of the present invention;
fig. 2 is a flowchart illustrating specific steps of a commodity review recommendation method based on reviewer reliability regression prediction according to an embodiment of the present invention.
Detailed Description
The technical solutions in the embodiments of the present invention will be clearly and completely described below with reference to the drawings in the embodiments of the present invention, and it is obvious that the described embodiments are only a part of the embodiments of the present invention, and not all of the embodiments. All other embodiments, which can be derived by a person skilled in the art from the embodiments given herein without making any creative effort, shall fall within the protection scope of the present invention.
Fig. 1 is a flow chart of a commodity review recommendation method based on reviewer reliability regression prediction, which can be used to predict the reliability of reviewers and provide real reviews with reference values, so as to help consumers to better purchase commodities, and includes, but is not limited to, the following steps:
extracting relevant attribute characteristics of the credibility of the reviewers, and calculating attribute characteristic values of the reviewers according to a calculation formula of the attribute characteristics of each reviewer;
constructing a credible scoring model of the predicted reviewer by using a regression algorithm according to the calculated attribute characteristic value of the reviewer, and substituting the calculated attribute characteristic value of the reviewer into the credible scoring model of the predicted reviewer to obtain the credible score of the reviewer;
extracting effective indexes related to comment sequencing, and calculating effective index values of comments according to a calculation formula of each effective index of the comments;
according to the calculated effective index value of the comments and the reliability score of the reviewer, constructing a comment sorting model by using LambdaMART, and calculating and obtaining the final ranking of the comments according to the comment sorting model;
and arranging the comments according to the ranking order according to the final ranking of the comments, and preferentially recommending the comments arranged in the front to the user.
In order to make the technical solution of the present invention clearer and more complete, each step of the method of the present invention is described in detail below.
In order to predict the reliability of the reviewer, firstly, extracting the related attribute features of the reliability of the reviewer, wherein the extracted related attribute features of the reliability of the reviewer comprise:
dif _ len: the difference in length of every two reviews of the reviewer is represented. Comments written by reviewers may not only be based on the basic characteristics of the commodity, but also include detailed portions of the commodity; in addition, there are some reviewers who use general terms to describe different commodities, such as "good", "express quickly", "not bad", etc., i.e., some reviewers use the same comments on different commodities. Thus, the difference in length of every two comments is calculated for the user.
Wherein (i, j) represents a comment pair, len (i) represents the length of comment i, len (j) represents the length of comment j,represents the combination of n reviews by the reviewer, with n representing the total number of reviews by the reviewer.
num _ same _ com: representing the number of identical comments by the same reviewer. Used to calculate the same number of reviews for each reviewer, this feature assesses the frequency with which reviewers write the same reviews for different commodities.
num _ same _ star: representing the number of equally starred reviews, which calculates how many equally starred reviews of a good each reviewer rates, this feature assesses the frequency with which reviewers give equally starred reviews of the same good.
dif _ tag: different information representing the comment keyword of each reviewer and the keyword tag provided by the website. Generally, for each commodity, the website gives some labels to mark the main features of the commodity according to the wording of the reviewer and the like. However, since the tag display space provided by the website is limited, other keywords describing more detail and diversification cannot be displayed. The following equation indicates that the reviewer r has n reviews, and this feature calculates how many different keywords are in each review and compares with the tags provided on the web site for item p, and then sums and averages.
Wherein n represents the total number of comments, num _ key _ word, of the revieweri-num_tagpIndicating that the keyword of each review is compared to the keyword tag provided on the website, and p indicates the item.
dif _ star _ score: representing the scores of the different stars. In a similar manner, we calculate the difference between the score given by comment i and the score of item p provided by the website, as in equation (3).
Where n is the total number of reviews by the reviewer, scoreiScore, representing a comment ipRepresents the score of the item p offered by the website.
useful _ word: the useful words in each comment i of the reviewer r are calculated as follows:
wherein num _ useful _ wordiEach of which represents the reviewer rThe number of useful words of bar comment i, n represents the total number of comments by the reviewer.
dif _ keyword: representing different keywords. When a trusted reviewer writes a review, the review typically contains not only basic information about the good, but also describes the distinctive points or diversification of the good. Good reviewers write comments with different words. Thus dif _ keyword is defined to observe the author's writing capabilities. As shown in equation (5), a general keyword is extracted for each commodity, describing the basic characteristics of the commodity. Then, how many identical keywords each comment pair (i, j) has is calculated, whereas how many different keywords between comment pairs (i, j) can be known, and the calculation method is as follows:
wherein num _ keyword (i, j) represents the same number of keywords commonly owned by the comment i and the comment j, max _ num _ keyword (i, j) is the value with the most number of the keywords owned by the comment i and the comment j,represents the combination of n reviews by the reviewer, with n representing the total number of reviews by the reviewer.
num _ com: indicating the number of reviews a reviewer has for a good. The reviewer may write reviews and rate a good multiple times. Some reviewers may advertise and score their own goods in their reviews. We therefore define this feature to see how many reviews each reviewer writes for each item.
num _ star: indicating the number of times the reviewer scores the same item.
num _ img: indicating the number of pictures uploaded by the reviewer. The reviewer may not only use the text when writing the comment, but may also take a picture or upload a picture and upload it with the text. The calculation mode of the number of pictures uploaded by the reviewer comprises the following steps:
wherein num _ imgiIndicating the number of pictures contained in the comment i, and num _ img indicating the total number of all pictures uploaded by the reviewer.
And constructing a prediction reviewer reliability score model by using a regression algorithm according to the calculated related attribute characteristic value of the reviewer reliability, and substituting the calculated attribute characteristic value of the reviewer into the prediction reviewer reliability score model to calculate the reliability score of the reviewer. The higher the score, the greater the confidence level, and vice versa.
Further, the concrete implementation mode of constructing the model for predicting the credible score of the reviewer by using the regression algorithm comprises the following steps:
r_scorei=β0+β1fi1+β2fi2+...+βpfip+ei
i=1,2,...,k
wherein, β0Is a constant term, β1、β2、βpTo account for the slope coefficient of the variable, fi1、fi2、fipRespectively representing the characteristic attributes f of the reviewers i1、f2、fpThe value of (a) is,iindicates deviation and k indicates number of reviewers.
The above-described predictive reviewer reliability score model is based on the following assumptions:
1. dependent variable r _ scoreiAnd an independent variable fpA linear regression relationship exists between the two;
2. there is no high correlation between the independent variables;
3. observations were by random selection;
4. the normal distribution of the residuals should be zero variance.
In order to sort the comments, four effective indexes of related comment sorting are extracted, and effective index values of the comments are calculated according to a calculation formula of each effective index of the comments.
The user often needs the highly reliable comment as a reference for purchasing the commodity, but the highly reliable comment is often held by the highly reliable reviewer, so that it is important to extract the highly reliable reviewer. People read comments to obtain helpful information, such as basic information of the product and experience information of other users besides the introduction, and the contents of the comments can help him/her decide whether to purchase the product. Besides, it is important to review the time, and by the time of the review, we can see whether the quality of the goods changes, such as what the goods were before, what the goods were now. Therefore, the invention calculates the effective index value of the comment by extracting the effective index related to the comment ordering.
Further, the extracted four effective indexes of the comment include:
1. reliability of the reviewer: the reliability score can be predicted from the reliability score of the reviewer, and whether the reviewer is trustworthy or not is reflected.
2. Similarity between the keywords in the comments and the basic information of the product: using Jaccard Similarity between each review and the merchandise information, the calculation is as follows:
where jac _ sim (c, p) represents the Jaccard similarity between the review c and each item p, wordc∪wordpRepresenting the same keyword, word, in comment c and item pc∩wordpRepresenting the total number of keywords in review c and item p.
3. The different information in the comment from other comments, which is the factor for obtaining other information besides the basic information provided by this comment, is calculated as follows:
dis(c,p)=1-jac_sim(c,p)
where dis (c, p) represents the difference between comment c and item p.
4. Date of review. Some users want to see recent comments rather than previous old comments. But some good comments may have been written before. Therefore, we calculate the time difference between the comment time and the current time, and represent the time difference between the comment time and the current time by diff _ date (c, t), as follows:
diff_date(c,t)=datet-datec
wherein datetDate, date representing commentcIndicating the current date.
In order to recommend the credible comments to the consumers, a comment ranking model of each commodity is constructed by using LambdaMART according to the calculated effective index value of the comments and the credibility score of the reviewer, and the final ranking of the comments is calculated and obtained according to the comment ranking model.
Further, in one embodiment, the implementation of the comment ranking model for each item includes: the method for constructing the comment ranking model by adopting LambdaMART specifically comprises the following steps: setting n comments of a certain commodity, and extracting any two comments c from all the commentsi、cjMake up a pair of comments cijCommon to the review rulesPossible combinations, for each comment pair cijCalculate the comment ciRanked on comment cjProbability of front pijWill be the probability pijAs a comment ranking model, the output of the comment ranking model is comment ciRank s ofiAnd comment cjRank s ofjThe expression of the review order model is as follows:
wherein, Pij=P(ci>cj) Express comment ciRanked on comment cjThe shape of the sigmoid function is determined by the previous probability and the parameter sigma, and the influence on the final result is small. When the loss function is finally constructed, a gradient descent method is used, and sequencing indexes (NDCG, ERR and the like) are added, so that the method for directly solving the sequencing problem avoids the defect that the traditional method for solving the sequencing problem through classification or regression.
By adopting the calculation mode, each comment of the commodity is calculated to obtain the comment ranking of each comment, the comments are ranked according to the comment ranking sequence, namely the comment with the highest ranking is placed at the first position, and by analogy, the comments are recommended according to the comment ranking sequence, and the comment ranking at the front is preferentially recommended
As an optional implementation, the comment ordering model of each commodity may also adopt the following implementation:
c_scorej=λ1cj1+λ2cj2+λ3cj3+λ4cj4
wherein, c _ scorejExpress the jth comment cjJ represents the jth comment, and j is 1,2 … … m, m represents the total number of comments for a certain product, cj1、cj2、cj3、cj4Presentation pair comment cjFour elements required for sorting, λ1、λ2、λ3、λ4The weights of the four elements are respectively represented.
By adopting the above calculation mode, each comment of the commodity is calculated to obtain the comment ranking score of each comment, the comments are arranged according to the sequence of the comment ranking scores from high to low, namely, the comment with the highest comment ranking score is placed at the first position, and so on, the comments are recommended according to the sequence of the comments, and the comments in the front are preferentially recommended.
In order to make the technical solution of the present invention clearer and more complete, the following will take the comment data of the united restaurants as an example of practical application, and further explain the concept, specific structure and technical effect of the present invention.
A American group reviewer writes 64 comments on 64 restaurants in total by 5 months in 2019, and uploads 24 pictures. The method provided by the technical scheme of the invention can adopt computer software and database statements to realize automatic operation process, and combines with the flow chart of the specific steps of the embodiment of the invention in figure 2, and the specific steps of the embodiment of the reviewer comprise:
extracting attribute characteristics related to the reliability of the reviewer, substituting into a calculation formula of each reviewer attribute characteristic to calculate an attribute characteristic value of the reviewer, and specifically calculating as follows:
num_same_com=19
num_same_star=12
num_com=64
num_star=65
img_num=24
and (3) performing normalization processing on the calculated related attribute characteristic values of the credibility of the reviewers, and inputting the normalized related attribute characteristic values into a trained regression model for predicting the credibility scores of the reviewers to calculate the ranking scores of the reviewers:
r_scorei=β0+β1*0.458+β2*0.19+...+β9*0.24+i
because each parameter of the trained prediction reviewer credibility score model is known, the credibility score of the reviewer can be directly obtained. And calculating the credibility scores of other reviewers by analogy, wherein the credibility scores of the reviewers can reflect the reliability of the reviewers, namely the higher the score is, the stronger the reliability is, and the reviewers have priority weight in the recommendation process.
Extracting effective indexes related to the sorted comments, and calculating effective index values of the comments according to a calculation formula of each effective index of the comments, wherein the calculation is as follows:
dis(c,p)=1-jac_sim(c,p)=1-0.0645
diff_date(c,t)=datet-datec=240
and substituting the calculated effective index value of the comment into a pre-trained LambdaMART comment sorting model to calculate the ranking of the comment. And calculating each comment of the commodity to obtain the rank of each comment, recommending the reviewers and the comments according to the ranking sequence of the comments, and preferentially recommending the comments and the reviewers which are arranged in the front.
Optionally, in an embodiment, the comment ranking score may be calculated by substituting the calculated comment effective index value into the comment ranking model according to an actual effect:
c_scorej=λ1cj1+λ2cj2+λ3cj3+λ4cj4
=0.31*1.775+0.3*0.0645+0.2*(1-0.0645)+0.2*0.24
by adopting the above calculation mode, each comment of each commodity is calculated to obtain a comment ranking score of each comment, the comments are arranged in the order of the comment ranking score from high to low, namely, the comment with the highest comment ranking score is placed at the first position, and so on, the comments and the comments in the front are recommended in the order of the comment ranking, and the comments and the reviewers in the front are recommended in priority.
Although embodiments of the present invention have been shown and described, it will be appreciated by those skilled in the art that changes, modifications, substitutions and alterations can be made in these embodiments without departing from the principles and spirit of the invention, the scope of which is defined in the appended claims and their equivalents.
Claims (10)
1. A commodity comment recommendation method based on comment party credibility regression prediction is characterized by comprising the following steps:
extracting relevant attribute characteristics of the credibility of the reviewers, and calculating attribute characteristic values of the reviewers according to a calculation formula of the attribute characteristics of each reviewer;
constructing a credible scoring model of the predicted reviewer by using a regression algorithm according to the calculated attribute characteristic value of the reviewer, and substituting the calculated attribute characteristic value of the reviewer into the credible scoring model of the predicted reviewer to obtain the credible score of the reviewer;
extracting effective indexes related to comment sequencing, and calculating effective index values of comments according to a calculation formula of each effective index of the comments;
according to the calculated effective index value of the comments and the reliability score of the reviewer, constructing a comment sorting model by using LambdaMART, and calculating and obtaining the final ranking of the comments according to the comment sorting model;
and arranging the comments according to the ranking order according to the final ranking of the comments, and preferentially recommending the comments arranged at the top.
2. The commodity review recommendation method based on reviewer reliability regression prediction according to claim 1, wherein the relevant attribute features of reviewer reliability include: the method comprises the following steps of length difference dif _ len of every two comments of a reviewer, the number num _ same _ com of the same comments of the same reviewer, the number num _ same _ star of the same star grades of the comments, different information dif _ tag of a comment keyword and a keyword tag provided by a website of each reviewer, scores dif _ star _ score of different star grades, useful words usefuul _ word and different keywords dif _ keyword in each comment i of the reviewer, the number num _ com of comments of one reviewer to one commodity, the number num _ star of scores of the reviewer to the same commodity and the number num _ img of pictures uploaded by the reviewer.
3. The commodity comment recommendation method based on reviewer reliability regression prediction according to claim 2, wherein the calculation manner of the length difference dif _ len of every two reviews of a reviewer comprises the following steps:
4. The commodity comment recommendation method based on reviewer reliability regression prediction according to claim 2, wherein the calculation manner of the different information dif _ tag of the comment keyword of each reviewer and the keyword tag provided by the website comprises:
wherein n represents the total number of comments, num _ key _ word, of the revieweri-num_tagpIndicating that the keyword of each review is compared to the keyword tag provided on the website, and p indicates the item.
5. The commodity comment recommendation method based on reviewer reliability regression prediction according to claim 2, wherein the scores dif _ star _ score of different star grades are calculated in a manner that:
where n is the total number of reviews by the reviewer, scoreiScore, representing a comment ipRepresents the score of the item p offered by the website.
6. The commodity comment recommendation method based on reviewer reliability regression prediction according to claim 2, wherein useful words in each comment i of a reviewer r are calculated in a manner that includes:
wherein num _ useful _ wordiThe number of useful words per comment i of the reviewer r is represented, and n represents the total number of comments of the reviewer.
7. The commodity comment recommendation method based on reviewer reliability regression prediction according to claim 2, wherein the calculation modes of different keywords comprise:
wherein num _ keyword (i, j) represents the same number of keywords commonly owned by the comment i and the comment j, max _ num _ keyword (i, j) is the value with the most number of the keywords owned by the comment i and the comment j,represents the combination of n reviews by the reviewer, with n representing the total number of reviews by the reviewer.
8. The commodity comment recommendation method based on reviewer reliability regression prediction according to claim 1, wherein the effective indicators related to comment ranking comprise: credibility of the reviewer, similarity between keywords in the review and basic information of the commodity, different information in the review and other reviews, and review date.
9. The commodity comment recommendation method based on reviewer reliability regression prediction according to claim 1, wherein the building of the model for predicting reviewer reliability scores using a regression algorithm comprises:
r_scorei=β0+β1fi1+β2fi2+...+βpfip+ei
i=1,2,...,k
wherein, β0Is a constant term, β1、β2、βpTo account for the slope coefficient of the variable, fi1、fi2、fipRespectively representing the characteristic attributes f of the reviewers i1、f2、fpThe value of (a) is,iindicates deviation and k indicates number of reviewers.
10. The commodity comment recommendation method based on reviewer reliability regression prediction as claimed in claim 1, wherein a comment ranking model is constructed by using lambdamard, and if n comments of a commodity are total, any two comments c are extracted from all commentsi、cjForm a review pair, which is common toPossible combinations, for each comment pair cijCalculate the comment ciRanked on comment cjProbability of front pijAs a comment ranking model, the output of the comment ranking model is comment ciRank s ofiAnd comment cjRank s ofjThe expression of the review order model is as follows:
wherein, Pij=P(ci>cj) Express comment ciRanked on comment cjThe previous probability, parameter σ, determines the shape of the sigmoid function.
Priority Applications (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN202010516638.6A CN111666413B (en) | 2020-06-09 | 2020-06-09 | Commodity comment recommendation method based on reviewer reliability regression prediction |
Applications Claiming Priority (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN202010516638.6A CN111666413B (en) | 2020-06-09 | 2020-06-09 | Commodity comment recommendation method based on reviewer reliability regression prediction |
Publications (2)
Publication Number | Publication Date |
---|---|
CN111666413A true CN111666413A (en) | 2020-09-15 |
CN111666413B CN111666413B (en) | 2023-04-07 |
Family
ID=72386072
Family Applications (1)
Application Number | Title | Priority Date | Filing Date |
---|---|---|---|
CN202010516638.6A Active CN111666413B (en) | 2020-06-09 | 2020-06-09 | Commodity comment recommendation method based on reviewer reliability regression prediction |
Country Status (1)
Country | Link |
---|---|
CN (1) | CN111666413B (en) |
Cited By (3)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN112801745A (en) * | 2021-02-02 | 2021-05-14 | 李海涛 | Big data platform based online comment validity recommendation method |
CN114282106A (en) * | 2021-12-22 | 2022-04-05 | 北京网聘咨询有限公司 | Method for quickly delivering position information |
CN117094856B (en) * | 2023-08-24 | 2024-04-30 | 哈尔滨工业大学 | Prediction method for user evaluation behavior after embedding OTA website based on panel logic model |
Citations (12)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
US20090150166A1 (en) * | 2007-12-05 | 2009-06-11 | International Business Machines Corporation | Hiring process by using social networking techniques to verify job seeker information |
CN104160414A (en) * | 2011-12-28 | 2014-11-19 | 英特尔公司 | System and method for identifying reviewers with incentives |
CN104462333A (en) * | 2014-12-03 | 2015-03-25 | 上海耀肖电子商务有限公司 | Shopping search recommending and alarming method and system |
CN106233316A (en) * | 2014-03-05 | 2016-12-14 | 电子湾有限公司 | Products & services are utilized to comment on |
CN106484679A (en) * | 2016-10-20 | 2017-03-08 | 北京邮电大学 | A kind of false review information recognition methodss being applied on consumption platform and device |
CN106537901A (en) * | 2014-03-26 | 2017-03-22 | 马克·W·帕布利科弗 | Computerized method and system for providing customized entertainment content |
CN107577759A (en) * | 2017-09-01 | 2018-01-12 | 安徽广播电视大学 | User comment auto recommending method |
US20180143975A1 (en) * | 2016-11-18 | 2018-05-24 | Lionbridge Technologies, Inc. | Collection strategies that facilitate arranging portions of documents into content collections |
CN108292995A (en) * | 2015-08-13 | 2018-07-17 | 聚集股份有限公司 | Method and system for characterizing user's prestige |
CN108470046A (en) * | 2018-03-07 | 2018-08-31 | 中国科学院自动化研究所 | Media event sort method and system based on media event search statement |
CN110489616A (en) * | 2019-07-19 | 2019-11-22 | 南京邮电大学 | A kind of search ordering method based on Ranknet and Lambdamart algorithm |
CN110827118A (en) * | 2019-10-18 | 2020-02-21 | 天津大学 | Method for automatically analyzing user comments in application store and recommending user comments to developer |
-
2020
- 2020-06-09 CN CN202010516638.6A patent/CN111666413B/en active Active
Patent Citations (12)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
US20090150166A1 (en) * | 2007-12-05 | 2009-06-11 | International Business Machines Corporation | Hiring process by using social networking techniques to verify job seeker information |
CN104160414A (en) * | 2011-12-28 | 2014-11-19 | 英特尔公司 | System and method for identifying reviewers with incentives |
CN106233316A (en) * | 2014-03-05 | 2016-12-14 | 电子湾有限公司 | Products & services are utilized to comment on |
CN106537901A (en) * | 2014-03-26 | 2017-03-22 | 马克·W·帕布利科弗 | Computerized method and system for providing customized entertainment content |
CN104462333A (en) * | 2014-12-03 | 2015-03-25 | 上海耀肖电子商务有限公司 | Shopping search recommending and alarming method and system |
CN108292995A (en) * | 2015-08-13 | 2018-07-17 | 聚集股份有限公司 | Method and system for characterizing user's prestige |
CN106484679A (en) * | 2016-10-20 | 2017-03-08 | 北京邮电大学 | A kind of false review information recognition methodss being applied on consumption platform and device |
US20180143975A1 (en) * | 2016-11-18 | 2018-05-24 | Lionbridge Technologies, Inc. | Collection strategies that facilitate arranging portions of documents into content collections |
CN107577759A (en) * | 2017-09-01 | 2018-01-12 | 安徽广播电视大学 | User comment auto recommending method |
CN108470046A (en) * | 2018-03-07 | 2018-08-31 | 中国科学院自动化研究所 | Media event sort method and system based on media event search statement |
CN110489616A (en) * | 2019-07-19 | 2019-11-22 | 南京邮电大学 | A kind of search ordering method based on Ranknet and Lambdamart algorithm |
CN110827118A (en) * | 2019-10-18 | 2020-02-21 | 天津大学 | Method for automatically analyzing user comments in application store and recommending user comments to developer |
Non-Patent Citations (2)
Title |
---|
JAMES HARTY 等: "Trust and Risk in Collaborative Environments" * |
李泽华: "酒店先期在线评论对后续评论影响的调节变量研究" * |
Cited By (4)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN112801745A (en) * | 2021-02-02 | 2021-05-14 | 李海涛 | Big data platform based online comment validity recommendation method |
CN114282106A (en) * | 2021-12-22 | 2022-04-05 | 北京网聘咨询有限公司 | Method for quickly delivering position information |
CN114282106B (en) * | 2021-12-22 | 2023-07-25 | 北京网聘咨询有限公司 | Quick delivering method for position information |
CN117094856B (en) * | 2023-08-24 | 2024-04-30 | 哈尔滨工业大学 | Prediction method for user evaluation behavior after embedding OTA website based on panel logic model |
Also Published As
Publication number | Publication date |
---|---|
CN111666413B (en) | 2023-04-07 |
Similar Documents
Publication | Publication Date | Title |
---|---|---|
Zhao et al. | Exploring demographic information in social media for product recommendation | |
Singh et al. | Predicting the “helpfulness” of online consumer reviews | |
Mostafa | Mining and mapping halal food consumers: A geo-located Twitter opinion polarity analysis | |
Wang et al. | Effect of online review sentiment on product sales: The moderating role of review credibility perception | |
US11734717B2 (en) | Dynamic predictive similarity grouping based on vectorization of merchant data | |
US10419820B2 (en) | Profiling media characters | |
Yang et al. | Integrating rich and heterogeneous information to design a ranking system for multiple products | |
US10685181B2 (en) | Linguistic expression of preferences in social media for prediction and recommendation | |
US11042591B2 (en) | Analytical search engine | |
CN109189904A (en) | Individuation search method and system | |
Kangale et al. | Mining consumer reviews to generate ratings of different product attributes while producing feature-based review-summary | |
Ran et al. | Marketing China to US travelers through electronic word-of-mouth and destination image: Taking Beijing as an example | |
Huang et al. | Uncovering the effects of textual features on trustworthiness of online consumer reviews: A computational-experimental approach | |
Ku et al. | Artificial intelligence and visual analytics: a deep-learning approach to analyze hotel reviews & responses | |
KR102227552B1 (en) | System for providing context awareness algorithm based restaurant sorting personalized service using review category | |
US11392631B2 (en) | System and method for programmatic generation of attribute descriptors | |
CN111666413B (en) | Commodity comment recommendation method based on reviewer reliability regression prediction | |
He et al. | Comparing consumer-produced product reviews across multiple websites with sentiment classification | |
Nan et al. | DO ONLY REVIEW CHARACTERISTICS AFFECT CONSUMERS'ONLINE BEHAVIORS? A STUDY OF RELATIONSHIP BETWEEN REVIEWS. | |
Liu et al. | The effects of customer online reviews on sales performance: The role of mobile phone’s quality characteristics | |
Powell et al. | Developing artwork pricing models for online art sales using text analytics | |
Chiny et al. | Towards a Machine Learning and Datamining approach to identify customer satisfaction factors on Airbnb | |
Guo et al. | The impact of online reviews on hotel ratings through the lens of elaboration likelihood model: A text mining approach | |
Gayer et al. | Similarity-based model for ordered categorical data | |
Yu-tao et al. | Study on the method of identifying opinion leaders based on online customer reviews |
Legal Events
Date | Code | Title | Description |
---|---|---|---|
PB01 | Publication | ||
PB01 | Publication | ||
SE01 | Entry into force of request for substantive examination | ||
SE01 | Entry into force of request for substantive examination | ||
GR01 | Patent grant | ||
GR01 | Patent grant |