CN111666413A - Commodity comment recommendation method based on reviewer reliability regression prediction - Google Patents

Commodity comment recommendation method based on reviewer reliability regression prediction Download PDF

Info

Publication number
CN111666413A
CN111666413A CN202010516638.6A CN202010516638A CN111666413A CN 111666413 A CN111666413 A CN 111666413A CN 202010516638 A CN202010516638 A CN 202010516638A CN 111666413 A CN111666413 A CN 111666413A
Authority
CN
China
Prior art keywords
reviewer
comment
comments
commodity
reliability
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Granted
Application number
CN202010516638.6A
Other languages
Chinese (zh)
Other versions
CN111666413B (en
Inventor
陈贤
王豪
夏英
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Chongqing University of Post and Telecommunications
Original Assignee
Chongqing University of Post and Telecommunications
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Chongqing University of Post and Telecommunications filed Critical Chongqing University of Post and Telecommunications
Priority to CN202010516638.6A priority Critical patent/CN111666413B/en
Publication of CN111666413A publication Critical patent/CN111666413A/en
Application granted granted Critical
Publication of CN111666413B publication Critical patent/CN111666413B/en
Active legal-status Critical Current
Anticipated expiration legal-status Critical

Links

Images

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F16/00Information retrieval; Database structures therefor; File system structures therefor
    • G06F16/30Information retrieval; Database structures therefor; File system structures therefor of unstructured textual data
    • G06F16/35Clustering; Classification
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F40/00Handling natural language data
    • G06F40/20Natural language analysis
    • G06F40/279Recognition of textual entities
    • G06F40/289Phrasal analysis, e.g. finite state techniques or chunking
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06QINFORMATION AND COMMUNICATION TECHNOLOGY [ICT] SPECIALLY ADAPTED FOR ADMINISTRATIVE, COMMERCIAL, FINANCIAL, MANAGERIAL OR SUPERVISORY PURPOSES; SYSTEMS OR METHODS SPECIALLY ADAPTED FOR ADMINISTRATIVE, COMMERCIAL, FINANCIAL, MANAGERIAL OR SUPERVISORY PURPOSES, NOT OTHERWISE PROVIDED FOR
    • G06Q30/00Commerce
    • G06Q30/06Buying, selling or leasing transactions
    • G06Q30/0601Electronic shopping [e-shopping]
    • G06Q30/0623Item investigation
    • G06Q30/0625Directed, with specific intent or strategy
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06QINFORMATION AND COMMUNICATION TECHNOLOGY [ICT] SPECIALLY ADAPTED FOR ADMINISTRATIVE, COMMERCIAL, FINANCIAL, MANAGERIAL OR SUPERVISORY PURPOSES; SYSTEMS OR METHODS SPECIALLY ADAPTED FOR ADMINISTRATIVE, COMMERCIAL, FINANCIAL, MANAGERIAL OR SUPERVISORY PURPOSES, NOT OTHERWISE PROVIDED FOR
    • G06Q30/00Commerce
    • G06Q30/06Buying, selling or leasing transactions
    • G06Q30/0601Electronic shopping [e-shopping]
    • G06Q30/0631Item recommendations
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06NCOMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
    • G06N3/00Computing arrangements based on biological models
    • G06N3/02Neural networks
    • G06N3/04Architecture, e.g. interconnection topology
    • G06N3/048Activation functions
    • YGENERAL TAGGING OF NEW TECHNOLOGICAL DEVELOPMENTS; GENERAL TAGGING OF CROSS-SECTIONAL TECHNOLOGIES SPANNING OVER SEVERAL SECTIONS OF THE IPC; TECHNICAL SUBJECTS COVERED BY FORMER USPC CROSS-REFERENCE ART COLLECTIONS [XRACs] AND DIGESTS
    • Y02TECHNOLOGIES OR APPLICATIONS FOR MITIGATION OR ADAPTATION AGAINST CLIMATE CHANGE
    • Y02PCLIMATE CHANGE MITIGATION TECHNOLOGIES IN THE PRODUCTION OR PROCESSING OF GOODS
    • Y02P90/00Enabling technologies with a potential contribution to greenhouse gas [GHG] emissions mitigation
    • Y02P90/30Computing systems specially adapted for manufacturing

Abstract

The invention relates to the technical field of data mining and recommendation, in particular to a commodity comment recommendation method based on reviewer reliability regression prediction, which comprises the following steps of: extracting the attribute characteristics related to the reliability of the reviewer and calculating the attribute characteristic value of the reviewer; constructing a credible score model of the forecast reviewer by using a regression algorithm and calculating the credible score of the reviewer; extracting effective indexes related to comment sequencing and calculating the four effective index values of the comments of the commodities; and constructing a comment sorting model of each commodity by using LambdaMART, calculating and finally determining ranking scores of all comments of each commodity according to the comment sorting model, and recommending the comments according to the ranking scores of the comments. The method solves the problems that users cannot trust unknown users and cannot make correct judgment according to comments of other users in a plurality of websites.

Description

Commodity comment recommendation method based on reviewer reliability regression prediction
Technical Field
The invention relates to the technical field of data mining and recommendation, in particular to a commodity comment recommendation method based on reviewer reliability regression prediction.
Background
When a consumer purchases a commodity on each shopping site, the consumer usually reads the comments of the reviewers to decide whether to purchase the commodity. In one sense, reviews play a critical role in consumer decisions about whether to purchase a good. However, different consumers have their own judgment and standard for the same commodity, and the comment of a certain reviewer on a certain commodity does not mean that other reviewers have the same opinion on the commodity. In addition, some merchants adopt various means to encourage users to write false comments in order to improve sales, which seriously affects the interests of consumers. Therefore, among the various reviews, the consumer cannot know which review is trustworthy; the product reviews added to the shopping website are mostly ordered by the latest review time, resulting in some good reviews being ranked behind, so that the consumer may not have an opportunity to read. Since most of the credible comments come from credible reviewers, how to find real credible reviewers among a large number of reviewers and recommend the real comments with reference value of reliable reviewers to consumers is a problem to be solved urgently.
PageRank is a core algorithm used by Google search engine, which is designed for ranking web page scores. Today, PageRank is widely expanded for user ranking to compute the authority and influence of users in social networks. Shen et al propose several methods of ranking users in a web community or blog using the PageRank algorithm, such as defining several characteristics of social users, differentiating reputation and sociability, etc. Weng et al propose an algorithm, augmented from PageRank, to measure the influence of social users. Zhao et al, who propose a modif-based PageRank, provide a basic framework for ranking users in a social network, and propose that ranking users using user content is a next task that needs to be studied. While user ranking using PageRank is mainly for social networking. In some systems, however, the effect of using PageRank may be less than ideal if the user's social networking graph is unclear or the data is insufficient to construct an input-output social networking graph. Likewise, if there are few connections between reviewers for each item on some merchandise websites, it may be difficult to construct a social relationship diagram for the reviewers.
There are currently some studies directed to reviews and review ranking. Hsu et al propose a comment ranking method on social networking sites in order to present high quality comments to users. They propose to build ranking models based on characteristics of the content and based on characteristics of the user. Northcutt et al propose an evaluation-diversification ordering scheme based on maximum marginal relevance. Ahmad proposes to summarize comments using machine learning and natural language processing. Swapna and Jiang propose a learning model to predict profound comments by text features, utterance relations and associated features. Hu and Liu propose to summarize reviews using review features and emotional analysis. Samuel proposed a method to summarize user opinions from their comments using a natural network. Therefore, the current research aiming at the comments mainly adopts machine learning, natural language processing, emotion analysis and the like to analyze the contents of the comments, but ignores the reliability of the reviewers and the comments. The consumer always wants to read the trustworthy comments to decide whether to purchase the product. Typically, trustworthy reviews are available to trustworthy reviewers, and these trustworthy reviews also describe to the user the true information about aspects of the good. But the current research on credibility of reviewers and ranking of reviews with reference value is not enough.
Disclosure of Invention
In order to solve the problems that users cannot trust unknown user comments and cannot make correct judgment according to comments of other users in a plurality of websites, the invention provides a commodity comment recommendation method based on reviewer reliability regression prediction.
A commodity comment recommendation method based on reviewer reliability regression prediction comprises the following steps:
extracting relevant attribute characteristics of the credibility of the reviewers, and calculating attribute characteristic values of the reviewers according to a calculation formula of the attribute characteristics of each reviewer;
constructing a credible scoring model of the predicted reviewer by using a regression algorithm according to the calculated attribute characteristic value of the reviewer, and substituting the calculated attribute characteristic value of the reviewer into the credible scoring model of the predicted reviewer to obtain the credible score of the reviewer;
extracting effective indexes related to comment sequencing, and calculating effective index values of comments according to a calculation formula of each effective index of the comments;
according to the calculated effective index value of the comments and the reliability score of the reviewer, constructing a comment sorting model by using LambdaMART, and calculating and obtaining the final ranking of the comments according to the comment sorting model;
and arranging the comments according to the ranking order according to the final ranking of the comments, and preferentially recommending the comments arranged at the top.
Further, the attribute-related characteristics of the reliability of the reviewer include: the method comprises the following steps of length difference dif _ len of every two comments of a reviewer, the number num _ same _ com of the same comments of the same reviewer, the number num _ same _ star of the same star grades of the comments, different information dif _ tag of a comment keyword and a keyword tag provided by a website of each reviewer, scores dif _ star _ score of different star grades, useful words usefuul _ word and different keywords dif _ keyword in each comment i of the reviewer, the number num _ com of comments of one reviewer to one commodity, the number num _ star of scores of the reviewer to the same commodity and the number num _ img of pictures uploaded by the reviewer.
Further, the calculation method of the length difference dif _ len of each two comments of the reviewer includes:
Figure BDA0002530367320000031
wherein (i, j) represents a comment pair, len (i) represents the length of comment i, len (j) represents the length of comment j,
Figure BDA0002530367320000032
represents the combination of n reviews by the reviewer, with n representing the total number of reviews by the reviewer.
Further, the calculation method of the comment keyword of each reviewer and the different information dif _ tag of the keyword tag provided by the website includes:
Figure BDA0002530367320000033
wherein n represents the total number of comments, num _ key _ word, of the revieweri-num_tagpIndicating that the keyword of each review is compared to the keyword tag provided on the website, and p indicates the item.
Further, the calculation method of the scores dif _ star _ score of different star levels includes:
Figure BDA0002530367320000034
where n is the total number of reviews by the reviewer, scoreiScore, representing a comment ipRepresents the score of the item p offered by the website.
Further, the useful words in each comment i of the reviewer r are calculated in a manner that includes:
Figure BDA0002530367320000041
wherein num _ useful _ wordiThe number of useful words per comment i of the reviewer r is represented, and n represents the total number of comments of the reviewer.
Further, the calculation methods of the different keywords include:
Figure BDA0002530367320000042
wherein num _ keyword (i, j) represents the same number of keywords commonly owned by the comment i and the comment j, max _ num _ keyword (i, j) is the value with the most number of the keywords owned by the comment i and the comment j,
Figure BDA0002530367320000043
represents the combination of n reviews by the reviewer, with n representing the total number of reviews by the reviewer.
Further, the effective index related to the comment ranking comprises: credibility of the reviewer, similarity between keywords in the review and basic information of the commodity, different information in the review and other reviews, and review date.
Further, the step of constructing the model of the credible score of the predicted reviewer by using the regression algorithm comprises the following steps:
r_scorei=β01fi12fi2+...+βpfip+ei
i=1,2,...,k
wherein, β0Is a constant term, β1、β2、βpTo account for the slope coefficient of the variable, fi1、fi2、fipRespectively representing the characteristic attributes f of the reviewers i1、f2、fpValue of eiIndicates deviation and k indicates number of reviewers.
Furthermore, the comment ordering model is constructed by adopting LambdaMART, if all comments of a certain commodity are n, any two comments c are extracted from all the commentsi、cjForm a review pair, which is common to
Figure BDA0002530367320000044
Possible combinations, for each comment pair cijCalculate the comment ciRanked on comment cjProbability of front pijAs a comment ranking model, the output of the comment ranking model is comment ciRank s ofiAnd comment cjRank s ofjThe expression of the review order model is as follows:
Figure BDA0002530367320000051
wherein, Pij=P(ci>cj) Express comment ciRanked on comment cjThe previous probability, parameter σ, determines the shape of the sigmoid function.
The invention has the beneficial effects that: the method can screen reliable commentators from massive commentaries for the user and timely read the real commentaries with reference value, and help the user to better decide whether to purchase the commodity. The invention extracts and designs an algorithm aiming at the characteristics of the reviewers and the characteristics of the comments, so that the user can preferentially read the reliable comments, thereby playing a role in helping the user to decide, and reducing the influence of bad reviewers and comments.
Drawings
The present invention will be described in further detail with reference to the accompanying drawings and specific embodiments.
FIG. 1 is a general flowchart of a commodity review recommendation method based on reviewer reliability regression prediction according to an embodiment of the present invention;
fig. 2 is a flowchart illustrating specific steps of a commodity review recommendation method based on reviewer reliability regression prediction according to an embodiment of the present invention.
Detailed Description
The technical solutions in the embodiments of the present invention will be clearly and completely described below with reference to the drawings in the embodiments of the present invention, and it is obvious that the described embodiments are only a part of the embodiments of the present invention, and not all of the embodiments. All other embodiments, which can be derived by a person skilled in the art from the embodiments given herein without making any creative effort, shall fall within the protection scope of the present invention.
Fig. 1 is a flow chart of a commodity review recommendation method based on reviewer reliability regression prediction, which can be used to predict the reliability of reviewers and provide real reviews with reference values, so as to help consumers to better purchase commodities, and includes, but is not limited to, the following steps:
extracting relevant attribute characteristics of the credibility of the reviewers, and calculating attribute characteristic values of the reviewers according to a calculation formula of the attribute characteristics of each reviewer;
constructing a credible scoring model of the predicted reviewer by using a regression algorithm according to the calculated attribute characteristic value of the reviewer, and substituting the calculated attribute characteristic value of the reviewer into the credible scoring model of the predicted reviewer to obtain the credible score of the reviewer;
extracting effective indexes related to comment sequencing, and calculating effective index values of comments according to a calculation formula of each effective index of the comments;
according to the calculated effective index value of the comments and the reliability score of the reviewer, constructing a comment sorting model by using LambdaMART, and calculating and obtaining the final ranking of the comments according to the comment sorting model;
and arranging the comments according to the ranking order according to the final ranking of the comments, and preferentially recommending the comments arranged in the front to the user.
In order to make the technical solution of the present invention clearer and more complete, each step of the method of the present invention is described in detail below.
In order to predict the reliability of the reviewer, firstly, extracting the related attribute features of the reliability of the reviewer, wherein the extracted related attribute features of the reliability of the reviewer comprise:
dif _ len: the difference in length of every two reviews of the reviewer is represented. Comments written by reviewers may not only be based on the basic characteristics of the commodity, but also include detailed portions of the commodity; in addition, there are some reviewers who use general terms to describe different commodities, such as "good", "express quickly", "not bad", etc., i.e., some reviewers use the same comments on different commodities. Thus, the difference in length of every two comments is calculated for the user.
Figure BDA0002530367320000061
Wherein (i, j) represents a comment pair, len (i) represents the length of comment i, len (j) represents the length of comment j,
Figure BDA0002530367320000062
represents the combination of n reviews by the reviewer, with n representing the total number of reviews by the reviewer.
num _ same _ com: representing the number of identical comments by the same reviewer. Used to calculate the same number of reviews for each reviewer, this feature assesses the frequency with which reviewers write the same reviews for different commodities.
num _ same _ star: representing the number of equally starred reviews, which calculates how many equally starred reviews of a good each reviewer rates, this feature assesses the frequency with which reviewers give equally starred reviews of the same good.
dif _ tag: different information representing the comment keyword of each reviewer and the keyword tag provided by the website. Generally, for each commodity, the website gives some labels to mark the main features of the commodity according to the wording of the reviewer and the like. However, since the tag display space provided by the website is limited, other keywords describing more detail and diversification cannot be displayed. The following equation indicates that the reviewer r has n reviews, and this feature calculates how many different keywords are in each review and compares with the tags provided on the web site for item p, and then sums and averages.
Figure BDA0002530367320000071
Wherein n represents the total number of comments, num _ key _ word, of the revieweri-num_tagpIndicating that the keyword of each review is compared to the keyword tag provided on the website, and p indicates the item.
dif _ star _ score: representing the scores of the different stars. In a similar manner, we calculate the difference between the score given by comment i and the score of item p provided by the website, as in equation (3).
Figure BDA0002530367320000072
Where n is the total number of reviews by the reviewer, scoreiScore, representing a comment ipRepresents the score of the item p offered by the website.
useful _ word: the useful words in each comment i of the reviewer r are calculated as follows:
Figure BDA0002530367320000073
wherein num _ useful _ wordiEach of which represents the reviewer rThe number of useful words of bar comment i, n represents the total number of comments by the reviewer.
dif _ keyword: representing different keywords. When a trusted reviewer writes a review, the review typically contains not only basic information about the good, but also describes the distinctive points or diversification of the good. Good reviewers write comments with different words. Thus dif _ keyword is defined to observe the author's writing capabilities. As shown in equation (5), a general keyword is extracted for each commodity, describing the basic characteristics of the commodity. Then, how many identical keywords each comment pair (i, j) has is calculated, whereas how many different keywords between comment pairs (i, j) can be known, and the calculation method is as follows:
Figure BDA0002530367320000081
wherein num _ keyword (i, j) represents the same number of keywords commonly owned by the comment i and the comment j, max _ num _ keyword (i, j) is the value with the most number of the keywords owned by the comment i and the comment j,
Figure BDA0002530367320000082
represents the combination of n reviews by the reviewer, with n representing the total number of reviews by the reviewer.
num _ com: indicating the number of reviews a reviewer has for a good. The reviewer may write reviews and rate a good multiple times. Some reviewers may advertise and score their own goods in their reviews. We therefore define this feature to see how many reviews each reviewer writes for each item.
num _ star: indicating the number of times the reviewer scores the same item.
num _ img: indicating the number of pictures uploaded by the reviewer. The reviewer may not only use the text when writing the comment, but may also take a picture or upload a picture and upload it with the text. The calculation mode of the number of pictures uploaded by the reviewer comprises the following steps:
Figure BDA0002530367320000083
wherein num _ imgiIndicating the number of pictures contained in the comment i, and num _ img indicating the total number of all pictures uploaded by the reviewer.
And constructing a prediction reviewer reliability score model by using a regression algorithm according to the calculated related attribute characteristic value of the reviewer reliability, and substituting the calculated attribute characteristic value of the reviewer into the prediction reviewer reliability score model to calculate the reliability score of the reviewer. The higher the score, the greater the confidence level, and vice versa.
Further, the concrete implementation mode of constructing the model for predicting the credible score of the reviewer by using the regression algorithm comprises the following steps:
r_scorei=β01fi12fi2+...+βpfip+ei
i=1,2,...,k
wherein, β0Is a constant term, β1、β2、βpTo account for the slope coefficient of the variable, fi1、fi2、fipRespectively representing the characteristic attributes f of the reviewers i1、f2、fpThe value of (a) is,iindicates deviation and k indicates number of reviewers.
The above-described predictive reviewer reliability score model is based on the following assumptions:
1. dependent variable r _ scoreiAnd an independent variable fpA linear regression relationship exists between the two;
2. there is no high correlation between the independent variables;
3. observations were by random selection;
4. the normal distribution of the residuals should be zero variance.
In order to sort the comments, four effective indexes of related comment sorting are extracted, and effective index values of the comments are calculated according to a calculation formula of each effective index of the comments.
The user often needs the highly reliable comment as a reference for purchasing the commodity, but the highly reliable comment is often held by the highly reliable reviewer, so that it is important to extract the highly reliable reviewer. People read comments to obtain helpful information, such as basic information of the product and experience information of other users besides the introduction, and the contents of the comments can help him/her decide whether to purchase the product. Besides, it is important to review the time, and by the time of the review, we can see whether the quality of the goods changes, such as what the goods were before, what the goods were now. Therefore, the invention calculates the effective index value of the comment by extracting the effective index related to the comment ordering.
Further, the extracted four effective indexes of the comment include:
1. reliability of the reviewer: the reliability score can be predicted from the reliability score of the reviewer, and whether the reviewer is trustworthy or not is reflected.
2. Similarity between the keywords in the comments and the basic information of the product: using Jaccard Similarity between each review and the merchandise information, the calculation is as follows:
Figure BDA0002530367320000091
where jac _ sim (c, p) represents the Jaccard similarity between the review c and each item p, wordc∪wordpRepresenting the same keyword, word, in comment c and item pc∩wordpRepresenting the total number of keywords in review c and item p.
3. The different information in the comment from other comments, which is the factor for obtaining other information besides the basic information provided by this comment, is calculated as follows:
dis(c,p)=1-jac_sim(c,p)
where dis (c, p) represents the difference between comment c and item p.
4. Date of review. Some users want to see recent comments rather than previous old comments. But some good comments may have been written before. Therefore, we calculate the time difference between the comment time and the current time, and represent the time difference between the comment time and the current time by diff _ date (c, t), as follows:
diff_date(c,t)=datet-datec
wherein datetDate, date representing commentcIndicating the current date.
In order to recommend the credible comments to the consumers, a comment ranking model of each commodity is constructed by using LambdaMART according to the calculated effective index value of the comments and the credibility score of the reviewer, and the final ranking of the comments is calculated and obtained according to the comment ranking model.
Further, in one embodiment, the implementation of the comment ranking model for each item includes: the method for constructing the comment ranking model by adopting LambdaMART specifically comprises the following steps: setting n comments of a certain commodity, and extracting any two comments c from all the commentsi、cjMake up a pair of comments cijCommon to the review rules
Figure BDA0002530367320000101
Possible combinations, for each comment pair cijCalculate the comment ciRanked on comment cjProbability of front pijWill be the probability pijAs a comment ranking model, the output of the comment ranking model is comment ciRank s ofiAnd comment cjRank s ofjThe expression of the review order model is as follows:
Figure BDA0002530367320000102
wherein, Pij=P(ci>cj) Express comment ciRanked on comment cjThe shape of the sigmoid function is determined by the previous probability and the parameter sigma, and the influence on the final result is small. When the loss function is finally constructed, a gradient descent method is used, and sequencing indexes (NDCG, ERR and the like) are added, so that the method for directly solving the sequencing problem avoids the defect that the traditional method for solving the sequencing problem through classification or regression.
By adopting the calculation mode, each comment of the commodity is calculated to obtain the comment ranking of each comment, the comments are ranked according to the comment ranking sequence, namely the comment with the highest ranking is placed at the first position, and by analogy, the comments are recommended according to the comment ranking sequence, and the comment ranking at the front is preferentially recommended
As an optional implementation, the comment ordering model of each commodity may also adopt the following implementation:
c_scorej=λ1cj12cj23cj34cj4
wherein, c _ scorejExpress the jth comment cjJ represents the jth comment, and j is 1,2 … … m, m represents the total number of comments for a certain product, cj1、cj2、cj3、cj4Presentation pair comment cjFour elements required for sorting, λ1、λ2、λ3、λ4The weights of the four elements are respectively represented.
By adopting the above calculation mode, each comment of the commodity is calculated to obtain the comment ranking score of each comment, the comments are arranged according to the sequence of the comment ranking scores from high to low, namely, the comment with the highest comment ranking score is placed at the first position, and so on, the comments are recommended according to the sequence of the comments, and the comments in the front are preferentially recommended.
In order to make the technical solution of the present invention clearer and more complete, the following will take the comment data of the united restaurants as an example of practical application, and further explain the concept, specific structure and technical effect of the present invention.
A American group reviewer writes 64 comments on 64 restaurants in total by 5 months in 2019, and uploads 24 pictures. The method provided by the technical scheme of the invention can adopt computer software and database statements to realize automatic operation process, and combines with the flow chart of the specific steps of the embodiment of the invention in figure 2, and the specific steps of the embodiment of the reviewer comprise:
extracting attribute characteristics related to the reliability of the reviewer, substituting into a calculation formula of each reviewer attribute characteristic to calculate an attribute characteristic value of the reviewer, and specifically calculating as follows:
Figure BDA0002530367320000121
num_same_com=19
num_same_star=12
Figure BDA0002530367320000122
Figure BDA0002530367320000123
Figure BDA0002530367320000124
Figure BDA0002530367320000125
num_com=64
num_star=65
img_num=24
and (3) performing normalization processing on the calculated related attribute characteristic values of the credibility of the reviewers, and inputting the normalized related attribute characteristic values into a trained regression model for predicting the credibility scores of the reviewers to calculate the ranking scores of the reviewers:
r_scorei=β01*0.458+β2*0.19+...+β9*0.24+i
because each parameter of the trained prediction reviewer credibility score model is known, the credibility score of the reviewer can be directly obtained. And calculating the credibility scores of other reviewers by analogy, wherein the credibility scores of the reviewers can reflect the reliability of the reviewers, namely the higher the score is, the stronger the reliability is, and the reviewers have priority weight in the recommendation process.
Extracting effective indexes related to the sorted comments, and calculating effective index values of the comments according to a calculation formula of each effective index of the comments, wherein the calculation is as follows:
Figure BDA0002530367320000131
dis(c,p)=1-jac_sim(c,p)=1-0.0645
diff_date(c,t)=datet-datec=240
and substituting the calculated effective index value of the comment into a pre-trained LambdaMART comment sorting model to calculate the ranking of the comment. And calculating each comment of the commodity to obtain the rank of each comment, recommending the reviewers and the comments according to the ranking sequence of the comments, and preferentially recommending the comments and the reviewers which are arranged in the front.
Optionally, in an embodiment, the comment ranking score may be calculated by substituting the calculated comment effective index value into the comment ranking model according to an actual effect:
c_scorej=λ1cj12cj23cj34cj4
=0.31*1.775+0.3*0.0645+0.2*(1-0.0645)+0.2*0.24
by adopting the above calculation mode, each comment of each commodity is calculated to obtain a comment ranking score of each comment, the comments are arranged in the order of the comment ranking score from high to low, namely, the comment with the highest comment ranking score is placed at the first position, and so on, the comments and the comments in the front are recommended in the order of the comment ranking, and the comments and the reviewers in the front are recommended in priority.
Although embodiments of the present invention have been shown and described, it will be appreciated by those skilled in the art that changes, modifications, substitutions and alterations can be made in these embodiments without departing from the principles and spirit of the invention, the scope of which is defined in the appended claims and their equivalents.

Claims (10)

1. A commodity comment recommendation method based on comment party credibility regression prediction is characterized by comprising the following steps:
extracting relevant attribute characteristics of the credibility of the reviewers, and calculating attribute characteristic values of the reviewers according to a calculation formula of the attribute characteristics of each reviewer;
constructing a credible scoring model of the predicted reviewer by using a regression algorithm according to the calculated attribute characteristic value of the reviewer, and substituting the calculated attribute characteristic value of the reviewer into the credible scoring model of the predicted reviewer to obtain the credible score of the reviewer;
extracting effective indexes related to comment sequencing, and calculating effective index values of comments according to a calculation formula of each effective index of the comments;
according to the calculated effective index value of the comments and the reliability score of the reviewer, constructing a comment sorting model by using LambdaMART, and calculating and obtaining the final ranking of the comments according to the comment sorting model;
and arranging the comments according to the ranking order according to the final ranking of the comments, and preferentially recommending the comments arranged at the top.
2. The commodity review recommendation method based on reviewer reliability regression prediction according to claim 1, wherein the relevant attribute features of reviewer reliability include: the method comprises the following steps of length difference dif _ len of every two comments of a reviewer, the number num _ same _ com of the same comments of the same reviewer, the number num _ same _ star of the same star grades of the comments, different information dif _ tag of a comment keyword and a keyword tag provided by a website of each reviewer, scores dif _ star _ score of different star grades, useful words usefuul _ word and different keywords dif _ keyword in each comment i of the reviewer, the number num _ com of comments of one reviewer to one commodity, the number num _ star of scores of the reviewer to the same commodity and the number num _ img of pictures uploaded by the reviewer.
3. The commodity comment recommendation method based on reviewer reliability regression prediction according to claim 2, wherein the calculation manner of the length difference dif _ len of every two reviews of a reviewer comprises the following steps:
Figure FDA0002530367310000021
wherein (i, j) represents a comment pair, len (i) represents the length of comment i, len (j) represents the length of comment j,
Figure FDA0002530367310000022
represents the combination of n reviews by the reviewer, with n representing the total number of reviews by the reviewer.
4. The commodity comment recommendation method based on reviewer reliability regression prediction according to claim 2, wherein the calculation manner of the different information dif _ tag of the comment keyword of each reviewer and the keyword tag provided by the website comprises:
Figure FDA0002530367310000023
wherein n represents the total number of comments, num _ key _ word, of the revieweri-num_tagpIndicating that the keyword of each review is compared to the keyword tag provided on the website, and p indicates the item.
5. The commodity comment recommendation method based on reviewer reliability regression prediction according to claim 2, wherein the scores dif _ star _ score of different star grades are calculated in a manner that:
Figure FDA0002530367310000024
where n is the total number of reviews by the reviewer, scoreiScore, representing a comment ipRepresents the score of the item p offered by the website.
6. The commodity comment recommendation method based on reviewer reliability regression prediction according to claim 2, wherein useful words in each comment i of a reviewer r are calculated in a manner that includes:
Figure FDA0002530367310000025
wherein num _ useful _ wordiThe number of useful words per comment i of the reviewer r is represented, and n represents the total number of comments of the reviewer.
7. The commodity comment recommendation method based on reviewer reliability regression prediction according to claim 2, wherein the calculation modes of different keywords comprise:
Figure FDA0002530367310000031
wherein num _ keyword (i, j) represents the same number of keywords commonly owned by the comment i and the comment j, max _ num _ keyword (i, j) is the value with the most number of the keywords owned by the comment i and the comment j,
Figure FDA0002530367310000032
represents the combination of n reviews by the reviewer, with n representing the total number of reviews by the reviewer.
8. The commodity comment recommendation method based on reviewer reliability regression prediction according to claim 1, wherein the effective indicators related to comment ranking comprise: credibility of the reviewer, similarity between keywords in the review and basic information of the commodity, different information in the review and other reviews, and review date.
9. The commodity comment recommendation method based on reviewer reliability regression prediction according to claim 1, wherein the building of the model for predicting reviewer reliability scores using a regression algorithm comprises:
r_scorei=β01fi12fi2+...+βpfip+ei
i=1,2,...,k
wherein, β0Is a constant term, β1、β2、βpTo account for the slope coefficient of the variable, fi1、fi2、fipRespectively representing the characteristic attributes f of the reviewers i1、f2、fpThe value of (a) is,iindicates deviation and k indicates number of reviewers.
10. The commodity comment recommendation method based on reviewer reliability regression prediction as claimed in claim 1, wherein a comment ranking model is constructed by using lambdamard, and if n comments of a commodity are total, any two comments c are extracted from all commentsi、cjForm a review pair, which is common to
Figure FDA0002530367310000033
Possible combinations, for each comment pair cijCalculate the comment ciRanked on comment cjProbability of front pijAs a comment ranking model, the output of the comment ranking model is comment ciRank s ofiAnd comment cjRank s ofjThe expression of the review order model is as follows:
Figure FDA0002530367310000034
wherein, Pij=P(ci>cj) Express comment ciRanked on comment cjThe previous probability, parameter σ, determines the shape of the sigmoid function.
CN202010516638.6A 2020-06-09 2020-06-09 Commodity comment recommendation method based on reviewer reliability regression prediction Active CN111666413B (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN202010516638.6A CN111666413B (en) 2020-06-09 2020-06-09 Commodity comment recommendation method based on reviewer reliability regression prediction

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN202010516638.6A CN111666413B (en) 2020-06-09 2020-06-09 Commodity comment recommendation method based on reviewer reliability regression prediction

Publications (2)

Publication Number Publication Date
CN111666413A true CN111666413A (en) 2020-09-15
CN111666413B CN111666413B (en) 2023-04-07

Family

ID=72386072

Family Applications (1)

Application Number Title Priority Date Filing Date
CN202010516638.6A Active CN111666413B (en) 2020-06-09 2020-06-09 Commodity comment recommendation method based on reviewer reliability regression prediction

Country Status (1)

Country Link
CN (1) CN111666413B (en)

Cited By (3)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN112801745A (en) * 2021-02-02 2021-05-14 李海涛 Big data platform based online comment validity recommendation method
CN114282106A (en) * 2021-12-22 2022-04-05 北京网聘咨询有限公司 Method for quickly delivering position information
CN117094856B (en) * 2023-08-24 2024-04-30 哈尔滨工业大学 Prediction method for user evaluation behavior after embedding OTA website based on panel logic model

Citations (12)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US20090150166A1 (en) * 2007-12-05 2009-06-11 International Business Machines Corporation Hiring process by using social networking techniques to verify job seeker information
CN104160414A (en) * 2011-12-28 2014-11-19 英特尔公司 System and method for identifying reviewers with incentives
CN104462333A (en) * 2014-12-03 2015-03-25 上海耀肖电子商务有限公司 Shopping search recommending and alarming method and system
CN106233316A (en) * 2014-03-05 2016-12-14 电子湾有限公司 Products & services are utilized to comment on
CN106484679A (en) * 2016-10-20 2017-03-08 北京邮电大学 A kind of false review information recognition methodss being applied on consumption platform and device
CN106537901A (en) * 2014-03-26 2017-03-22 马克·W·帕布利科弗 Computerized method and system for providing customized entertainment content
CN107577759A (en) * 2017-09-01 2018-01-12 安徽广播电视大学 User comment auto recommending method
US20180143975A1 (en) * 2016-11-18 2018-05-24 Lionbridge Technologies, Inc. Collection strategies that facilitate arranging portions of documents into content collections
CN108292995A (en) * 2015-08-13 2018-07-17 聚集股份有限公司 Method and system for characterizing user's prestige
CN108470046A (en) * 2018-03-07 2018-08-31 中国科学院自动化研究所 Media event sort method and system based on media event search statement
CN110489616A (en) * 2019-07-19 2019-11-22 南京邮电大学 A kind of search ordering method based on Ranknet and Lambdamart algorithm
CN110827118A (en) * 2019-10-18 2020-02-21 天津大学 Method for automatically analyzing user comments in application store and recommending user comments to developer

Patent Citations (12)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US20090150166A1 (en) * 2007-12-05 2009-06-11 International Business Machines Corporation Hiring process by using social networking techniques to verify job seeker information
CN104160414A (en) * 2011-12-28 2014-11-19 英特尔公司 System and method for identifying reviewers with incentives
CN106233316A (en) * 2014-03-05 2016-12-14 电子湾有限公司 Products & services are utilized to comment on
CN106537901A (en) * 2014-03-26 2017-03-22 马克·W·帕布利科弗 Computerized method and system for providing customized entertainment content
CN104462333A (en) * 2014-12-03 2015-03-25 上海耀肖电子商务有限公司 Shopping search recommending and alarming method and system
CN108292995A (en) * 2015-08-13 2018-07-17 聚集股份有限公司 Method and system for characterizing user's prestige
CN106484679A (en) * 2016-10-20 2017-03-08 北京邮电大学 A kind of false review information recognition methodss being applied on consumption platform and device
US20180143975A1 (en) * 2016-11-18 2018-05-24 Lionbridge Technologies, Inc. Collection strategies that facilitate arranging portions of documents into content collections
CN107577759A (en) * 2017-09-01 2018-01-12 安徽广播电视大学 User comment auto recommending method
CN108470046A (en) * 2018-03-07 2018-08-31 中国科学院自动化研究所 Media event sort method and system based on media event search statement
CN110489616A (en) * 2019-07-19 2019-11-22 南京邮电大学 A kind of search ordering method based on Ranknet and Lambdamart algorithm
CN110827118A (en) * 2019-10-18 2020-02-21 天津大学 Method for automatically analyzing user comments in application store and recommending user comments to developer

Non-Patent Citations (2)

* Cited by examiner, † Cited by third party
Title
JAMES HARTY 等: "Trust and Risk in Collaborative Environments" *
李泽华: "酒店先期在线评论对后续评论影响的调节变量研究" *

Cited By (4)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN112801745A (en) * 2021-02-02 2021-05-14 李海涛 Big data platform based online comment validity recommendation method
CN114282106A (en) * 2021-12-22 2022-04-05 北京网聘咨询有限公司 Method for quickly delivering position information
CN114282106B (en) * 2021-12-22 2023-07-25 北京网聘咨询有限公司 Quick delivering method for position information
CN117094856B (en) * 2023-08-24 2024-04-30 哈尔滨工业大学 Prediction method for user evaluation behavior after embedding OTA website based on panel logic model

Also Published As

Publication number Publication date
CN111666413B (en) 2023-04-07

Similar Documents

Publication Publication Date Title
Zhao et al. Exploring demographic information in social media for product recommendation
Singh et al. Predicting the “helpfulness” of online consumer reviews
Mostafa Mining and mapping halal food consumers: A geo-located Twitter opinion polarity analysis
Wang et al. Effect of online review sentiment on product sales: The moderating role of review credibility perception
US11734717B2 (en) Dynamic predictive similarity grouping based on vectorization of merchant data
US10419820B2 (en) Profiling media characters
Yang et al. Integrating rich and heterogeneous information to design a ranking system for multiple products
US10685181B2 (en) Linguistic expression of preferences in social media for prediction and recommendation
US11042591B2 (en) Analytical search engine
CN109189904A (en) Individuation search method and system
Kangale et al. Mining consumer reviews to generate ratings of different product attributes while producing feature-based review-summary
Ran et al. Marketing China to US travelers through electronic word-of-mouth and destination image: Taking Beijing as an example
Huang et al. Uncovering the effects of textual features on trustworthiness of online consumer reviews: A computational-experimental approach
Ku et al. Artificial intelligence and visual analytics: a deep-learning approach to analyze hotel reviews & responses
KR102227552B1 (en) System for providing context awareness algorithm based restaurant sorting personalized service using review category
US11392631B2 (en) System and method for programmatic generation of attribute descriptors
CN111666413B (en) Commodity comment recommendation method based on reviewer reliability regression prediction
He et al. Comparing consumer-produced product reviews across multiple websites with sentiment classification
Nan et al. DO ONLY REVIEW CHARACTERISTICS AFFECT CONSUMERS'ONLINE BEHAVIORS? A STUDY OF RELATIONSHIP BETWEEN REVIEWS.
Liu et al. The effects of customer online reviews on sales performance: The role of mobile phone’s quality characteristics
Powell et al. Developing artwork pricing models for online art sales using text analytics
Chiny et al. Towards a Machine Learning and Datamining approach to identify customer satisfaction factors on Airbnb
Guo et al. The impact of online reviews on hotel ratings through the lens of elaboration likelihood model: A text mining approach
Gayer et al. Similarity-based model for ordered categorical data
Yu-tao et al. Study on the method of identifying opinion leaders based on online customer reviews

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination
GR01 Patent grant
GR01 Patent grant