CN110751180B - Spurious comment group division method based on spectral clustering - Google Patents

Spurious comment group division method based on spectral clustering Download PDF

Info

Publication number
CN110751180B
CN110751180B CN201910887582.2A CN201910887582A CN110751180B CN 110751180 B CN110751180 B CN 110751180B CN 201910887582 A CN201910887582 A CN 201910887582A CN 110751180 B CN110751180 B CN 110751180B
Authority
CN
China
Prior art keywords
user
comment
similarity
scoring
spectral clustering
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Active
Application number
CN201910887582.2A
Other languages
Chinese (zh)
Other versions
CN110751180A (en
Inventor
王帮海
叶子成
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Guangdong University of Technology
Original Assignee
Guangdong University of Technology
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Guangdong University of Technology filed Critical Guangdong University of Technology
Priority to CN201910887582.2A priority Critical patent/CN110751180B/en
Publication of CN110751180A publication Critical patent/CN110751180A/en
Application granted granted Critical
Publication of CN110751180B publication Critical patent/CN110751180B/en
Active legal-status Critical Current
Anticipated expiration legal-status Critical

Links

Images

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F18/00Pattern recognition
    • G06F18/20Analysing
    • G06F18/22Matching criteria, e.g. proximity measures
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F18/00Pattern recognition
    • G06F18/20Analysing
    • G06F18/23Clustering techniques
    • G06F18/232Non-hierarchical techniques
    • G06F18/2321Non-hierarchical techniques using statistics or function optimisation, e.g. modelling of probability density functions
    • G06F18/23213Non-hierarchical techniques using statistics or function optimisation, e.g. modelling of probability density functions with fixed number of clusters, e.g. K-means clustering
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06QINFORMATION AND COMMUNICATION TECHNOLOGY [ICT] SPECIALLY ADAPTED FOR ADMINISTRATIVE, COMMERCIAL, FINANCIAL, MANAGERIAL OR SUPERVISORY PURPOSES; SYSTEMS OR METHODS SPECIALLY ADAPTED FOR ADMINISTRATIVE, COMMERCIAL, FINANCIAL, MANAGERIAL OR SUPERVISORY PURPOSES, NOT OTHERWISE PROVIDED FOR
    • G06Q30/00Commerce
    • G06Q30/018Certifying business or products
    • G06Q30/0185Product, service or business identity fraud
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06QINFORMATION AND COMMUNICATION TECHNOLOGY [ICT] SPECIALLY ADAPTED FOR ADMINISTRATIVE, COMMERCIAL, FINANCIAL, MANAGERIAL OR SUPERVISORY PURPOSES; SYSTEMS OR METHODS SPECIALLY ADAPTED FOR ADMINISTRATIVE, COMMERCIAL, FINANCIAL, MANAGERIAL OR SUPERVISORY PURPOSES, NOT OTHERWISE PROVIDED FOR
    • G06Q30/00Commerce
    • G06Q30/02Marketing; Price estimation or determination; Fundraising
    • G06Q30/0282Rating or review of business operators or products
    • YGENERAL TAGGING OF NEW TECHNOLOGICAL DEVELOPMENTS; GENERAL TAGGING OF CROSS-SECTIONAL TECHNOLOGIES SPANNING OVER SEVERAL SECTIONS OF THE IPC; TECHNICAL SUBJECTS COVERED BY FORMER USPC CROSS-REFERENCE ART COLLECTIONS [XRACs] AND DIGESTS
    • Y02TECHNOLOGIES OR APPLICATIONS FOR MITIGATION OR ADAPTATION AGAINST CLIMATE CHANGE
    • Y02DCLIMATE CHANGE MITIGATION TECHNOLOGIES IN INFORMATION AND COMMUNICATION TECHNOLOGIES [ICT], I.E. INFORMATION AND COMMUNICATION TECHNOLOGIES AIMING AT THE REDUCTION OF THEIR OWN ENERGY USE
    • Y02D10/00Energy efficient computing, e.g. low power processors, power management or thermal management

Landscapes

  • Engineering & Computer Science (AREA)
  • Theoretical Computer Science (AREA)
  • Business, Economics & Management (AREA)
  • Data Mining & Analysis (AREA)
  • Physics & Mathematics (AREA)
  • General Physics & Mathematics (AREA)
  • Accounting & Taxation (AREA)
  • Development Economics (AREA)
  • Finance (AREA)
  • Strategic Management (AREA)
  • Marketing (AREA)
  • Bioinformatics & Cheminformatics (AREA)
  • Economics (AREA)
  • General Engineering & Computer Science (AREA)
  • Entrepreneurship & Innovation (AREA)
  • Life Sciences & Earth Sciences (AREA)
  • Artificial Intelligence (AREA)
  • General Business, Economics & Management (AREA)
  • Bioinformatics & Computational Biology (AREA)
  • Computer Vision & Pattern Recognition (AREA)
  • Evolutionary Biology (AREA)
  • Evolutionary Computation (AREA)
  • Game Theory and Decision Science (AREA)
  • Probability & Statistics with Applications (AREA)
  • Information Retrieval, Db Structures And Fs Structures Therefor (AREA)

Abstract

The invention relates to a false comment group division method based on spectral clustering, which comprises the following steps: s1: collecting and cleaning comment data of an electronic commerce platform; s2: calculating 5 similarity indexes of common comment times, scoring similarity of the same commodity, user interaction times, user positive scoring proportion and user negative scoring proportion based on the metadata in the S1, wherein the similarity of the scoring proportions between two users is measured by Euclidean distance; s3: constructing a weighted reviewer graph; s4: dividing the adjacent matrix to obtain a plurality of groups; s5: the classification of the divided groups is further determined manually by selecting reasonable analysis indexes and proper thresholds. The invention reflects the behavior similarity between users more accurately, and improves the accuracy of the subsequent dividing algorithm; the dividing effect on the weighted reviewer graph is better and has universality.

Description

Spurious comment group division method based on spectral clustering
Technical Field
The invention relates to the technical field of data mining, in particular to a false comment group division method based on spectral clustering.
Background
With the rapid development of the Internet, the electronic commerce platform changes the consumption modes of people in shopping, traveling, dining and other aspects. In the transaction process of the electronic commerce platform, comments of services or products play a key role in purchasing decision behaviors of users, more and more true commodities are positively commented, so that the users can get the favor of the users, and otherwise, the users have lower purchase willingness for more commodities with negative comments. In recent years, with the diversification development of electronic commerce platforms, market competition is aggravated, and many unscrupulous merchants can adopt various means to obtain more false good scores or give false bad scores to competitors. Traditional methods such as sellers getting a higher-than-average score through the broker-provided review-brushing service or inducing consumers to give unrealistic scores by way of "score return". With the rise of network social platforms, a novel comment brushing mode is popular, the mode is advertised through key opinion leaders (Key Opinion Leader, KOL), and an operation team of the mode simultaneously publishes unreal comments on commodities. In the mode, because the information spreading efficiency of the key opinion leader is high, a large number of comments after normal consumption can be released by the vermicelli in a short time, and the normal comments and the false comments burst in a short time, so that the difficulty in detecting the false comment group is greatly increased.
Many existing algorithms for false comment group detection cannot meet new requirements, particularly in a new comment brushing mode, since many normal consumers purchase the same commodity in a similar time, comments of the commodity are greatly increased in a short time, so that some algorithms for detecting few abnormal behaviors or algorithms for detecting by utilizing comment explosiveness are poor in the problem. There is therefore a need to find a better way to meet new demands.
Disclosure of Invention
The invention provides a method for dividing false comment groups based on spectral clustering, which aims to overcome the defect of poor false comment group detection effect in the prior art.
The method comprises the following steps:
s1: collecting and cleaning comment data of an electronic commerce platform; the metadata used includes: user id, comment id, commodity id, score, number of interactions of comments (e.g., endorsed by other users, "considered useful" by other users, "considered interesting" by other users, etc.);
s2: calculating 5 similarity indexes of common comment times, scoring similarity of the same commodity, user interaction times, user positive scoring proportion and user negative scoring proportion based on the metadata in the S1, wherein the similarity of the scoring proportions between two users is measured by Euclidean distance;
s3: constructing a weighted reviewer graph: each user is a graph node, the users who comment on a product two by two are connected by a undirected edge, and the weight of the edge is obtained by calculation according to 5 indexes obtained by calculation in the step S2;
s4: dividing an adjacency matrix of the graph constructed in the step S3 by a spectral clustering algorithm to obtain a plurality of groups;
s5: the classification of the divided groups is further determined manually by selecting reasonable analysis indexes and proper thresholds.
Preferably, the calculation formula of the common comment Count (CRT) in S2 is:
CRT(n 1 ,n 2 )=|P 1 ∩P 2 |
wherein ,n1 ,n 2 P for two different reviewers 1 ,P 2 Respectively n 1 ,n 2 A collection of articles for which comments were posted.
Preferably, the calculation formula of the scoring similarity (Similarity of Rating on Same Product, SRSP) of the same commodity in S2 is:
Figure GDA0004143751140000021
wherein ,
Figure GDA0004143751140000022
respectively n 1 ,n 2 Scoring the item P to release the ith or jth rating, N 1 ,N 2 Respectively n 1 ,n 2 Number of reviews posted on commodity P.
Preferably, the calculation formula of the user interaction times (Interaction Times, IT) in S2 is:
Figure GDA0004143751140000023
wherein ,C1i ,C 2i Respectively represent n 1 ,n 2 Number of m-th interaction actions.
Preferably, the calculation formula of the user positive scoring ratio (Positive Rating Ratio, PR) in S2 is:
Figure GDA0004143751140000024
wherein ,Si And S represents the score of a comment of the user, ΣS 0 Comment times, Σs, scored {1,2.3.4.5} for the user i 0 The number of times the user was published with a score of {4,5 }.
Preferably, the calculation formula of the user negative score ratio (Negative Rating Ratio, NR) in S2 is:
Figure GDA0004143751140000031
preferably, the similarity of the positive score proportion and the negative score proportion of the two users is measured by the Euclidean distance in S2:
Figure GDA0004143751140000032
wherein :n1 and rn2 For reviewer n 1 2 or a user positive score proportion or a user negative score proportion.
Preferably, S3 comprises the steps of:
s3.1: importing all users as graph nodes;
s3.2: the number of times of common comments among every two nodes is calculated by taking every two nodes as a node combination;
s3.3: judging whether the number of the common comments calculated in the step S3.2 is larger than 0; if yes, S3.4 is carried out; if not, returning to S3.2 to perform the next node combination until all node combinations are traversed;
s3.4: calculating the similarity index of the node to the rest; calculating weight values and establishing edges of two nodes;
s3.5: and (5) completing the construction of the weighted reviewer graph.
Preferably, the weight calculation formula of the edge in S3 is as follows:
Figure GDA0004143751140000033
wherein ,ωij For the weights on node i and node j, k is the number of metrics, here 5, CRT ij SPSR is the number of times user i and user j comment together ij IT for the similarity of user i and user j scoring the same merchandise ij For both user i and user j,
Figure GDA0004143751140000034
and />
Figure GDA0004143751140000035
The approximation of the positive score proportion and the negative score proportion for user i and user j, respectively.
Preferably, S4 comprises the steps of:
s4.1: inputting a weighted reviewer graph G, and dividing the number n of clusters;
s4.2: calculating an adjacency matrix a, a degree matrix D and a laplace matrix l=d-a from the weighted reviewer graph G;
s4.3: a normalized laplacian matrix is obtained according to:
NL=D -1/ (-A) -1/-1/ LD -1/
s4.4: calculating k minimum characteristic values of NL and corresponding characteristic vectors f, wherein k is the number n of the partition clusters;
s4.5: forming the corresponding feature vectors f into feature matrixes f with v multiplied by k, and standardizing according to the rows, wherein v is the number of samples, namely the number of nodes of the graph G;
s4.6: clustering the normalized feature vectors by using a K-Means method to obtain n candidate groups C= (C) 1 ,2…c n )。
Compared with the prior art, the technical scheme of the invention has the beneficial effects that: based on the thought of the weighted reviewer graph, the invention provides 5 similarity indexes for measuring the similarity degree of behaviors among different users, more accurately reflects the similarity degree of behaviors among users than the existing algorithm based on the weighted reviewer graph, and improves the accuracy of the subsequent dividing algorithm.
The invention uses the spectral clustering algorithm to divide the weighted reviewer graph, and compared with some existing dividing algorithms, such as: the KMeans algorithm, the hierarchical clustering algorithm, the Louvain community discovery algorithm and the like have better dividing effects and universality.
Drawings
Fig. 1 is a flowchart of a method for classifying false comment groups based on spectral clustering according to embodiment 1.
Fig. 2 is a schematic flow chart of a spectral clustering algorithm.
Detailed Description
The drawings are for illustrative purposes only and are not to be construed as limiting the present patent;
for the purpose of better illustrating the embodiments, certain elements of the drawings may be omitted, enlarged or reduced and do not represent the actual product dimensions;
it will be appreciated by those skilled in the art that certain well-known structures in the drawings and descriptions thereof may be omitted.
The technical scheme of the invention is further described below with reference to the accompanying drawings and examples.
Example 1
The embodiment provides a false comment group division method based on spectral clustering, as shown in fig. 1, comprising the following steps:
s1: collecting and cleaning comment data of an electronic commerce platform; the metadata used includes: user id, comment id, commodity id, score, number of interactions of comments (e.g., endorsed by other users, "considered useful" by other users, "considered interesting" by other users, etc.);
wherein the metadata is data used to define the data. Comment data is data of many dimensions collected from an e-commerce platform, including field user id, comment time, score, and so forth. Each item, such as "user id" and "comment time" herein is metadata, and information of a comment can be restored through description of the metadata. In this example, a part of the data item (metadata) of a comment, such as "comment time", is selected and not used.
S2: calculating 5 similarity indexes of common comment times, scoring similarity of the same commodity, user interaction times, user positive scoring proportion and user negative scoring proportion based on the metadata in the S1, wherein the similarity of the scoring proportions between two users is measured by Euclidean distance;
s3: constructing a weighted reviewer graph: each user is a graph node, the users who comment on a product two by two are connected by a undirected edge, and the weight of the edge is obtained by calculation according to 5 indexes obtained by calculation in the step S2;
s4: dividing an adjacency matrix of the graph constructed in the step S3 by a spectral clustering algorithm to obtain a plurality of groups;
s5: the classification of the divided groups is further determined manually by selecting reasonable analysis indexes and proper thresholds. In this example, the following are selected: extreme score, duplicate score, and score bias were used as analytical indicators. The calculation formula of the common comment Count (CRT) in S2 is:
CRT(n 1 ,n 2 )=|P 1 ∩P 2 |
wherein ,n1 ,n 2 P for two different reviewers 1 ,P 2 Respectively n 1 ,n 2 A collection of articles for which comments were posted.
The calculation formula of the scoring similarity (Similarity of Rating on Same Product, SRSP) of the same commodity in S2 is:
Figure GDA0004143751140000051
wherein ,
Figure GDA0004143751140000052
respectively n 1 ,n 2 Scoring the item P to release the ith or jth rating, N 1 ,N 2 Respectively n 1 ,n 2 Number of reviews posted on commodity P.
The calculation formula of the user interaction times (Interaction Times, IT) in S2 is:
Figure GDA0004143751140000053
wherein ,C1i ,C 2i Respectively represent n 1 ,n 2 Number of m-th interaction actions.
The calculation formula of the positive scoring ratio (Positive Rating Ratio, PR) of the user in S2 is:
Figure GDA0004143751140000054
wherein ,Si And S represents the score of a comment of the user, ΣS 0 Comment times, Σs, scored {1,2.3.4.5} for the user i 0 The number of times the user was published with a score of {4,5 }.
The calculation formula of the user negative scoring ratio (Negative Rating Ratio, NR) in S2 is:
Figure GDA0004143751140000061
and S2, measuring the similarity of the positive scoring proportion and the negative scoring proportion of the two users by using the Euclidean distance:
Figure GDA0004143751140000062
wherein :rn1 and rn2 For reviewer n 1 ,n 2 A user positive score proportion or a user negative score proportion.
S3 comprises the following steps:
s3.1: importing all users as graph nodes;
s3.2: the number of times of common comments among every two nodes is calculated by taking every two nodes as a node combination;
s3.3: judging whether the number of the common comments calculated in the step S3.2 is larger than 0; if yes, S3.4 is carried out; if not, returning to S3.2 to perform the next node combination until all node combinations are traversed;
s3.4: calculating the similarity index of the node to the rest, including: "similarity of scores of the same commodity", "number of user interactions", "user positive score proportion" and "user negative score proportion"; calculating weight values and establishing edges of two nodes;
s3.5: and (5) completing the construction of the weighted reviewer graph.
The weight calculation formula of the edge in S3 is as follows:
Figure GDA0004143751140000063
wherein ,ωij For the weights on node i and node j, k is the number of metrics, here 5, CRT ij SPSR is the number of times user i and user j comment together ij IT for the similarity of user i and user j scoring the same merchandise ij For both user i and user j,
Figure GDA0004143751140000064
and />
Figure GDA0004143751140000065
The approximation of the positive score proportion and the negative score proportion for user i and user j, respectively.
S4 comprises the following steps:
s4.1: inputting a weighted reviewer graph G, and dividing the number n of clusters;
s4.2: calculating an adjacency matrix a, a degree matrix D and a laplace matrix l=d-a from the weighted reviewer graph G;
s4.3: a normalized laplacian matrix is obtained according to:
NL=D -1/2 (D-AD -1/2 =D -1/2 LD -1/2
s4.4: calculating k minimum characteristic values of NL and corresponding characteristic vectors f, wherein k is the number n of the partition clusters;
s4.5: forming the corresponding feature vectors F into feature matrixes F with the size of v multiplied by k, and standardizing according to the rows, wherein v is the number of samples, namely the number of nodes of the graph G;
s4.6: clustering the normalized feature vectors by using a K-Means method to obtain n candidate groups C= (C) 1 ,c 2 …c n )。
As a specific embodiment, the similarity index in this embodiment may omit, replace or add other indexes according to actual situations, and may combine and modify the calculation methods of the indexes.
The group division is to divide a large number of users into a plurality of groups, wherein the users in each group have the same or similar behaviors, and whether one group belongs to a false comment group needs to be further judged manually. Because the data types generated by the e-commerce platforms and the collected data have large differences, in practical application, the specific implementation of the group division algorithm should be adjusted correspondingly according to the different metadata types given by the data sets. Theoretically, the idea of the embodiment can also be applied to the fields of public opinion monitoring, marketing and the like.
An embodiment of the spurious comment group division method of the present embodiment can be roughly divided into five steps: data collection and cleaning, statistics of similarity indexes, construction of a weighted reviewer graph, spectral clustering grouping and manual judgment of group categories, wherein:
the data collection and cleaning comprises the steps of collecting, analyzing and cleaning the original data, wherein many errors and incomplete data exist in the original data, and the data items with partial data missing or data with abnormal values can be processed in a deleting mode, a mean filling mode and the like. The embodiment uses the data set used for research, so that only the data needs to be analyzed, the required data type is selected, and useless data is deleted.
And (5) calculating a similarity index: most of the indexes used in the embodiment are calculated for two users, so that the calculation of the similarity indexes is synchronously performed in the process of constructing the weighted reviewer graph. The step is to count and store some statistics indexes (such as positive/negative scoring proportion of users) needed to be used in advance, so that repeated statistics on data in the subsequent calculation process is avoided, and the time complexity of an algorithm is increased.
Constructing a weighted reviewer graph: the graph takes users as graph nodes, when two users comment on the same commodity together, an edge is established, and the weight of the edge reflects the similarity degree of the two nodes. The construction of the graph and the calculation process of the similarity index are synchronously carried out: firstly, adding nodes which are equal to the number of users in a data set into a graph, wherein each node is named by a user id; counting the common comment times of every two nodes, if the index is more than or equal to 1, further calculating other indexes, calculating weights on the edges of the two nodes by using all indexes, and constructing an edge connecting the two nodes; if the index is 0, no operation is performed, and the 'common comment times' of the next node combination is continuously calculated until all the node combinations in the graph are traversed.
Spectral clustering partitions groups: as shown in fig. 2, the number n of groups to be divided is selected first, then the adjacency matrix and the degree matrix of the group n can be calculated by the weighted reviewer graph, and further the laplace matrix and the standardized laplace matrix of the graph can be calculated by the adjacency matrix and the degree matrix. Next, the smallest n eigenvalues and corresponding n eigenvectors of the laplace matrix are calculated, the n eigenvectors are combined into a matrix f and are normalized by rows. And finally, clustering f according to the row by using a K-Means method, and dividing the f into n candidate groups.
Manually judging the group category: according to the embodiment of the invention, the judgment index and the corresponding threshold value are selected according to the existing research in the field, and the judgment basis for the category to which the group belongs is different in different fields.
In this embodiment, a study data set of the us comment website Yelp is divided, and other common false comment division group methods are selected for comparison experiments. According to the current research situation in the field, three representative false comment group indexes including an extreme comment ratio (Extreme Rating Ratio, ERR), a repeated comment ratio (Repeated Comment Ratio, RCR) and a Rating Deviation (RD) are selected as judgment bases of dividing effects in experiments, and on the three indexes, the performance of the algorithm is superior to that of K-Means clustering, hierarchical clustering and Louvain community discovery algorithms serving as control experiments.
The terms describing the positional relationship in the drawings are merely illustrative, and are not to be construed as limiting the present patent;
it is to be understood that the above examples of the present invention are provided by way of illustration only and not by way of limitation of the embodiments of the present invention. Other variations or modifications of the above teachings will be apparent to those of ordinary skill in the art. It is not necessary here nor is it exhaustive of all embodiments. Any modification, equivalent replacement, improvement, etc. which come within the spirit and principles of the invention are desired to be protected by the following claims.

Claims (10)

1. A method for classifying false comment groups based on spectral clustering, the method comprising the steps of:
s1: collecting and cleaning comment data of an electronic commerce platform; the metadata used includes: user id, comment id, commodity id, score and comment interaction times;
s2: calculating 5 similarity indexes of common comment times, scoring similarity of the same commodity, user interaction times, user positive scoring proportion and user negative scoring proportion based on the metadata in the S1, wherein the similarity of the scoring proportions between two users is measured by Euclidean distance;
s3: constructing a weighted reviewer graph: each user is a graph node, the users who comment on a product two by two are connected by a undirected edge, and the weight of the edge is obtained by calculation according to 5 indexes obtained by calculation in the step S2;
s4: dividing an adjacency matrix of the graph constructed in the step S3 by a spectral clustering algorithm to obtain a plurality of groups;
s5: the classification of the divided groups is further determined manually by selecting an analysis index and a threshold.
2. The method for classifying a false comment group based on spectral clustering according to claim 1, wherein the calculation formula of the number of common comments in S2 is:
CRT(n 1 ,n 2 )=|P 1 ∩P 2 |
wherein ,n1 ,n 2 P for two different reviewers 1 ,P 2 Respectively n 1 ,n 2 A collection of articles for which comments were posted.
3. The method for classifying a false comment group based on spectral clustering according to claim 2, wherein the calculation formula of the scoring similarity of the same commodity in S2 is:
Figure FDA0004135860790000011
wherein ,
Figure FDA0004135860790000012
respectively n 1 ,n 2 Scoring the item P to release the ith or jth rating, N 1 ,N 2 Respectively n 1 ,n 2 Number of reviews posted on commodity P.
4. The method for classifying a false comment group based on spectral clustering according to claim 3, wherein the calculation formula of the number of user interactions in S2 is:
Figure FDA0004135860790000021
wherein ,C1m ,C 2m Respectively represent n 1 ,n 2 Number of m-th interaction actions.
5. The method for classifying a false comment group based on spectral clustering according to claim 4, wherein the calculation formula of the positive scoring proportion of the user in S2 is:
Figure FDA0004135860790000022
wherein ,Si And S represents the score of a comment of the user, ΣS 0 Number of comments scored {1,2.3.4.5}, ΣS, posted to the user i 0 The number of times the user was published with a score of {4,5 }.
6. The method for classifying a false comment group based on spectral clustering according to claim 5, wherein the calculation formula of the user negative score ratio in S2 is:
Figure FDA0004135860790000023
7. the method for classifying a false comment group based on spectral clustering according to claim 6, wherein the similarity of the positive score ratio and the negative score ratio of two users is measured by the euclidean distance in S2:
Figure FDA0004135860790000024
wherein :rn1 and rn2 For reviewer n 1 ,n 2 A user positive score proportion or a user negative score proportion.
8. The method for classifying a false comment group based on spectral clustering according to claim 7, wherein S3 includes the steps of:
s3.1: importing all users as graph nodes;
s3.2: the number of times of common comments among every two nodes is calculated by taking every two nodes as a node combination;
s3.3: judging whether the number of the common comments calculated in the step S3.2 is larger than 0; if yes, S3.4 is carried out; if not, returning to S3.2 to perform the next node combination until all node combinations are traversed;
s3.4: calculating the similarity index of the node to the rest; calculating weight values and establishing edges of two nodes;
s3.5: and (5) completing the construction of the weighted reviewer graph.
9. The method for classifying a false comment group based on spectral clustering according to claim 8, wherein the weight calculation formula of the edge in S3 is as follows:
Figure FDA0004135860790000025
wherein ,ωij For the weights on node i and node j, k is the number of metrics, here 5, CRT ij SPSR is the number of times user i and user j comment together ij IT for the similarity of user i and user j scoring the same merchandise ij For both user i and user j,
Figure FDA0004135860790000031
and />
Figure FDA0004135860790000032
The approximation of the positive score proportion and the negative score proportion for user i and user j, respectively.
10. The method for classifying a false comment group based on spectral clustering according to claim 1 or 9, wherein S4 includes the steps of:
s4.1: inputting a weighted reviewer graph G, and dividing the number n of clusters;
s4.2: calculating an adjacency matrix a, a degree matrix D and a laplace matrix l=d-a from the weighted reviewer graph G;
s4.3: a normalized laplacian matrix is obtained according to:
NL=D -1/2 (D-A)D -1/2 =D -1/2 LD -1/2
s4.4: calculating k minimum characteristic values of NL and corresponding characteristic vectors f, wherein k is the number n of the partition clusters;
s4.5: forming the corresponding feature vectors F into feature matrixes F with the size of v multiplied by k, and standardizing according to the rows, wherein v is the number of samples, namely the number of nodes of the graph G;
s4.6: clustering the normalized feature vectors by using a K-Means method to obtain n candidate groups C= (C) 1 ,c 2 …c n )。
CN201910887582.2A 2019-09-19 2019-09-19 Spurious comment group division method based on spectral clustering Active CN110751180B (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN201910887582.2A CN110751180B (en) 2019-09-19 2019-09-19 Spurious comment group division method based on spectral clustering

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN201910887582.2A CN110751180B (en) 2019-09-19 2019-09-19 Spurious comment group division method based on spectral clustering

Publications (2)

Publication Number Publication Date
CN110751180A CN110751180A (en) 2020-02-04
CN110751180B true CN110751180B (en) 2023-06-20

Family

ID=69276657

Family Applications (1)

Application Number Title Priority Date Filing Date
CN201910887582.2A Active CN110751180B (en) 2019-09-19 2019-09-19 Spurious comment group division method based on spectral clustering

Country Status (1)

Country Link
CN (1) CN110751180B (en)

Families Citing this family (1)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN117421492B (en) * 2023-12-19 2024-04-05 四川久远银海软件股份有限公司 Screening system and method for data element commodities

Citations (2)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN108345587A (en) * 2018-02-14 2018-07-31 广州大学 A kind of the authenticity detection method and system of comment
CN109829733A (en) * 2019-01-31 2019-05-31 重庆大学 A kind of false comment detection system and method based on Shopping Behaviors sequence data

Patent Citations (2)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN108345587A (en) * 2018-02-14 2018-07-31 广州大学 A kind of the authenticity detection method and system of comment
CN109829733A (en) * 2019-01-31 2019-05-31 重庆大学 A kind of false comment detection system and method based on Shopping Behaviors sequence data

Non-Patent Citations (1)

* Cited by examiner, † Cited by third party
Title
马晓宁 ; 王婷 ; 董松月 ; .基于PSO-SVM的网络舆情垃圾观点识别.计算机与数字工程.2018,(02),第119-124页. *

Also Published As

Publication number Publication date
CN110751180A (en) 2020-02-04

Similar Documents

Publication Publication Date Title
CN105608600A (en) Method for evaluating and optimizing B2B seller performances
TW201501059A (en) Method and system for recommending information
CN109584006B (en) Cross-platform commodity matching method based on deep matching model
WO2017028735A1 (en) Method and device for selecting and recommending display object
CN113763095B (en) Information recommendation method and device and model training method and device
CN108921602B (en) User purchasing behavior prediction method based on integrated neural network
US20150142580A1 (en) Heuristic customer clustering
CN113379494B (en) Commodity recommendation method and device based on heterogeneous social relationship and electronic equipment
CN112231583B (en) E-commerce recommendation method based on dynamic interest group identification and generation of confrontation network
CN108053050A (en) Clicking rate predictor method, device, computing device and storage medium
CN111340566B (en) Commodity classification method and device, electronic equipment and storage medium
CN113239264A (en) Personalized recommendation method and system based on meta-path network representation learning
CN115860880B (en) Personalized commodity recommendation method and system based on multi-layer heterogeneous graph convolution model
CN113821827B (en) Combined modeling method and device for protecting multiparty data privacy
CN110751180B (en) Spurious comment group division method based on spectral clustering
CN104572623B (en) A kind of efficient data analysis and summary method of online LDA models
Yin et al. A network-enhanced prediction method for automobile purchase classification using deep learning
CN106886934A (en) Method, system and apparatus for determining merchant categories
CN111507804B (en) Emotion perception commodity recommendation method based on mixed information fusion
Liu et al. Features for link prediction in social networks: A comprehensive study
CN110020918B (en) Recommendation information generation method and system
CN110968670A (en) Method, device, equipment and storage medium for acquiring attributes of popular commodities
CN115713390A (en) Shoe popularity trend prediction recommendation method and system based on user transaction data
Ridzky et al. Public perception for the use of digital wallet in indonesia using social network analysis
CN115293815A (en) Cross-platform e-commerce user alignment method based on user commodity interest

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination
GR01 Patent grant
GR01 Patent grant