CN110751180B - Spurious comment group division method based on spectral clustering - Google Patents
Spurious comment group division method based on spectral clustering Download PDFInfo
- Publication number
- CN110751180B CN110751180B CN201910887582.2A CN201910887582A CN110751180B CN 110751180 B CN110751180 B CN 110751180B CN 201910887582 A CN201910887582 A CN 201910887582A CN 110751180 B CN110751180 B CN 110751180B
- Authority
- CN
- China
- Prior art keywords
- user
- comment
- similarity
- scoring
- spectral clustering
- Prior art date
- Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
- Active
Links
- 238000000034 method Methods 0.000 title claims abstract description 30
- 230000003595 spectral effect Effects 0.000 title claims abstract description 24
- 239000011159 matrix material Substances 0.000 claims abstract description 24
- 230000003993 interaction Effects 0.000 claims abstract description 16
- 238000004140 cleaning Methods 0.000 claims abstract description 7
- 238000004364 calculation method Methods 0.000 claims description 28
- 239000013598 vector Substances 0.000 claims description 9
- 238000010276 construction Methods 0.000 claims description 5
- 238000005192 partition Methods 0.000 claims description 4
- 101100494729 Syncephalastrum racemosum SPSR gene Proteins 0.000 claims description 3
- 238000012552 review Methods 0.000 claims description 3
- 230000000694 effects Effects 0.000 abstract description 4
- 230000006399 behavior Effects 0.000 description 4
- 230000008569 process Effects 0.000 description 4
- 238000002474 experimental method Methods 0.000 description 3
- 238000011160 research Methods 0.000 description 3
- 230000001680 brushing effect Effects 0.000 description 2
- 238000013480 data collection Methods 0.000 description 2
- 238000001514 detection method Methods 0.000 description 2
- 238000011161 development Methods 0.000 description 2
- 238000012986 modification Methods 0.000 description 2
- 230000004048 modification Effects 0.000 description 2
- 206010000117 Abnormal behaviour Diseases 0.000 description 1
- 230000002159 abnormal effect Effects 0.000 description 1
- 230000009286 beneficial effect Effects 0.000 description 1
- 238000007418 data mining Methods 0.000 description 1
- 230000007547 defect Effects 0.000 description 1
- 230000006872 improvement Effects 0.000 description 1
- 230000001939 inductive effect Effects 0.000 description 1
- 238000003064 k means clustering Methods 0.000 description 1
- 238000012544 monitoring process Methods 0.000 description 1
- 230000007480 spreading Effects 0.000 description 1
Images
Classifications
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F18/00—Pattern recognition
- G06F18/20—Analysing
- G06F18/22—Matching criteria, e.g. proximity measures
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F18/00—Pattern recognition
- G06F18/20—Analysing
- G06F18/23—Clustering techniques
- G06F18/232—Non-hierarchical techniques
- G06F18/2321—Non-hierarchical techniques using statistics or function optimisation, e.g. modelling of probability density functions
- G06F18/23213—Non-hierarchical techniques using statistics or function optimisation, e.g. modelling of probability density functions with fixed number of clusters, e.g. K-means clustering
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06Q—INFORMATION AND COMMUNICATION TECHNOLOGY [ICT] SPECIALLY ADAPTED FOR ADMINISTRATIVE, COMMERCIAL, FINANCIAL, MANAGERIAL OR SUPERVISORY PURPOSES; SYSTEMS OR METHODS SPECIALLY ADAPTED FOR ADMINISTRATIVE, COMMERCIAL, FINANCIAL, MANAGERIAL OR SUPERVISORY PURPOSES, NOT OTHERWISE PROVIDED FOR
- G06Q30/00—Commerce
- G06Q30/018—Certifying business or products
- G06Q30/0185—Product, service or business identity fraud
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06Q—INFORMATION AND COMMUNICATION TECHNOLOGY [ICT] SPECIALLY ADAPTED FOR ADMINISTRATIVE, COMMERCIAL, FINANCIAL, MANAGERIAL OR SUPERVISORY PURPOSES; SYSTEMS OR METHODS SPECIALLY ADAPTED FOR ADMINISTRATIVE, COMMERCIAL, FINANCIAL, MANAGERIAL OR SUPERVISORY PURPOSES, NOT OTHERWISE PROVIDED FOR
- G06Q30/00—Commerce
- G06Q30/02—Marketing; Price estimation or determination; Fundraising
- G06Q30/0282—Rating or review of business operators or products
-
- Y—GENERAL TAGGING OF NEW TECHNOLOGICAL DEVELOPMENTS; GENERAL TAGGING OF CROSS-SECTIONAL TECHNOLOGIES SPANNING OVER SEVERAL SECTIONS OF THE IPC; TECHNICAL SUBJECTS COVERED BY FORMER USPC CROSS-REFERENCE ART COLLECTIONS [XRACs] AND DIGESTS
- Y02—TECHNOLOGIES OR APPLICATIONS FOR MITIGATION OR ADAPTATION AGAINST CLIMATE CHANGE
- Y02D—CLIMATE CHANGE MITIGATION TECHNOLOGIES IN INFORMATION AND COMMUNICATION TECHNOLOGIES [ICT], I.E. INFORMATION AND COMMUNICATION TECHNOLOGIES AIMING AT THE REDUCTION OF THEIR OWN ENERGY USE
- Y02D10/00—Energy efficient computing, e.g. low power processors, power management or thermal management
Landscapes
- Engineering & Computer Science (AREA)
- Theoretical Computer Science (AREA)
- Business, Economics & Management (AREA)
- Data Mining & Analysis (AREA)
- Physics & Mathematics (AREA)
- General Physics & Mathematics (AREA)
- Accounting & Taxation (AREA)
- Development Economics (AREA)
- Finance (AREA)
- Strategic Management (AREA)
- Marketing (AREA)
- Bioinformatics & Cheminformatics (AREA)
- Economics (AREA)
- General Engineering & Computer Science (AREA)
- Entrepreneurship & Innovation (AREA)
- Life Sciences & Earth Sciences (AREA)
- Artificial Intelligence (AREA)
- General Business, Economics & Management (AREA)
- Bioinformatics & Computational Biology (AREA)
- Computer Vision & Pattern Recognition (AREA)
- Evolutionary Biology (AREA)
- Evolutionary Computation (AREA)
- Game Theory and Decision Science (AREA)
- Probability & Statistics with Applications (AREA)
- Information Retrieval, Db Structures And Fs Structures Therefor (AREA)
Abstract
The invention relates to a false comment group division method based on spectral clustering, which comprises the following steps: s1: collecting and cleaning comment data of an electronic commerce platform; s2: calculating 5 similarity indexes of common comment times, scoring similarity of the same commodity, user interaction times, user positive scoring proportion and user negative scoring proportion based on the metadata in the S1, wherein the similarity of the scoring proportions between two users is measured by Euclidean distance; s3: constructing a weighted reviewer graph; s4: dividing the adjacent matrix to obtain a plurality of groups; s5: the classification of the divided groups is further determined manually by selecting reasonable analysis indexes and proper thresholds. The invention reflects the behavior similarity between users more accurately, and improves the accuracy of the subsequent dividing algorithm; the dividing effect on the weighted reviewer graph is better and has universality.
Description
Technical Field
The invention relates to the technical field of data mining, in particular to a false comment group division method based on spectral clustering.
Background
With the rapid development of the Internet, the electronic commerce platform changes the consumption modes of people in shopping, traveling, dining and other aspects. In the transaction process of the electronic commerce platform, comments of services or products play a key role in purchasing decision behaviors of users, more and more true commodities are positively commented, so that the users can get the favor of the users, and otherwise, the users have lower purchase willingness for more commodities with negative comments. In recent years, with the diversification development of electronic commerce platforms, market competition is aggravated, and many unscrupulous merchants can adopt various means to obtain more false good scores or give false bad scores to competitors. Traditional methods such as sellers getting a higher-than-average score through the broker-provided review-brushing service or inducing consumers to give unrealistic scores by way of "score return". With the rise of network social platforms, a novel comment brushing mode is popular, the mode is advertised through key opinion leaders (Key Opinion Leader, KOL), and an operation team of the mode simultaneously publishes unreal comments on commodities. In the mode, because the information spreading efficiency of the key opinion leader is high, a large number of comments after normal consumption can be released by the vermicelli in a short time, and the normal comments and the false comments burst in a short time, so that the difficulty in detecting the false comment group is greatly increased.
Many existing algorithms for false comment group detection cannot meet new requirements, particularly in a new comment brushing mode, since many normal consumers purchase the same commodity in a similar time, comments of the commodity are greatly increased in a short time, so that some algorithms for detecting few abnormal behaviors or algorithms for detecting by utilizing comment explosiveness are poor in the problem. There is therefore a need to find a better way to meet new demands.
Disclosure of Invention
The invention provides a method for dividing false comment groups based on spectral clustering, which aims to overcome the defect of poor false comment group detection effect in the prior art.
The method comprises the following steps:
s1: collecting and cleaning comment data of an electronic commerce platform; the metadata used includes: user id, comment id, commodity id, score, number of interactions of comments (e.g., endorsed by other users, "considered useful" by other users, "considered interesting" by other users, etc.);
s2: calculating 5 similarity indexes of common comment times, scoring similarity of the same commodity, user interaction times, user positive scoring proportion and user negative scoring proportion based on the metadata in the S1, wherein the similarity of the scoring proportions between two users is measured by Euclidean distance;
s3: constructing a weighted reviewer graph: each user is a graph node, the users who comment on a product two by two are connected by a undirected edge, and the weight of the edge is obtained by calculation according to 5 indexes obtained by calculation in the step S2;
s4: dividing an adjacency matrix of the graph constructed in the step S3 by a spectral clustering algorithm to obtain a plurality of groups;
s5: the classification of the divided groups is further determined manually by selecting reasonable analysis indexes and proper thresholds.
Preferably, the calculation formula of the common comment Count (CRT) in S2 is:
CRT(n 1 ,n 2 )=|P 1 ∩P 2 |
wherein ,n1 ,n 2 P for two different reviewers 1 ,P 2 Respectively n 1 ,n 2 A collection of articles for which comments were posted.
Preferably, the calculation formula of the scoring similarity (Similarity of Rating on Same Product, SRSP) of the same commodity in S2 is:
wherein ,respectively n 1 ,n 2 Scoring the item P to release the ith or jth rating, N 1 ,N 2 Respectively n 1 ,n 2 Number of reviews posted on commodity P.
Preferably, the calculation formula of the user interaction times (Interaction Times, IT) in S2 is:
wherein ,C1i ,C 2i Respectively represent n 1 ,n 2 Number of m-th interaction actions.
Preferably, the calculation formula of the user positive scoring ratio (Positive Rating Ratio, PR) in S2 is:
wherein ,Si And S represents the score of a comment of the user, ΣS 0 Comment times, Σs, scored {1,2.3.4.5} for the user i 0 The number of times the user was published with a score of {4,5 }.
Preferably, the calculation formula of the user negative score ratio (Negative Rating Ratio, NR) in S2 is:
preferably, the similarity of the positive score proportion and the negative score proportion of the two users is measured by the Euclidean distance in S2:
wherein :n1 and rn2 For reviewer n 1 2 or a user positive score proportion or a user negative score proportion.
Preferably, S3 comprises the steps of:
s3.1: importing all users as graph nodes;
s3.2: the number of times of common comments among every two nodes is calculated by taking every two nodes as a node combination;
s3.3: judging whether the number of the common comments calculated in the step S3.2 is larger than 0; if yes, S3.4 is carried out; if not, returning to S3.2 to perform the next node combination until all node combinations are traversed;
s3.4: calculating the similarity index of the node to the rest; calculating weight values and establishing edges of two nodes;
s3.5: and (5) completing the construction of the weighted reviewer graph.
Preferably, the weight calculation formula of the edge in S3 is as follows:
wherein ,ωij For the weights on node i and node j, k is the number of metrics, here 5, CRT ij SPSR is the number of times user i and user j comment together ij IT for the similarity of user i and user j scoring the same merchandise ij For both user i and user j, and />The approximation of the positive score proportion and the negative score proportion for user i and user j, respectively.
Preferably, S4 comprises the steps of:
s4.1: inputting a weighted reviewer graph G, and dividing the number n of clusters;
s4.2: calculating an adjacency matrix a, a degree matrix D and a laplace matrix l=d-a from the weighted reviewer graph G;
s4.3: a normalized laplacian matrix is obtained according to:
NL=D -1/ (-A) -1/ = -1/ LD -1/
s4.4: calculating k minimum characteristic values of NL and corresponding characteristic vectors f, wherein k is the number n of the partition clusters;
s4.5: forming the corresponding feature vectors f into feature matrixes f with v multiplied by k, and standardizing according to the rows, wherein v is the number of samples, namely the number of nodes of the graph G;
s4.6: clustering the normalized feature vectors by using a K-Means method to obtain n candidate groups C= (C) 1 ,2…c n )。
Compared with the prior art, the technical scheme of the invention has the beneficial effects that: based on the thought of the weighted reviewer graph, the invention provides 5 similarity indexes for measuring the similarity degree of behaviors among different users, more accurately reflects the similarity degree of behaviors among users than the existing algorithm based on the weighted reviewer graph, and improves the accuracy of the subsequent dividing algorithm.
The invention uses the spectral clustering algorithm to divide the weighted reviewer graph, and compared with some existing dividing algorithms, such as: the KMeans algorithm, the hierarchical clustering algorithm, the Louvain community discovery algorithm and the like have better dividing effects and universality.
Drawings
Fig. 1 is a flowchart of a method for classifying false comment groups based on spectral clustering according to embodiment 1.
Fig. 2 is a schematic flow chart of a spectral clustering algorithm.
Detailed Description
The drawings are for illustrative purposes only and are not to be construed as limiting the present patent;
for the purpose of better illustrating the embodiments, certain elements of the drawings may be omitted, enlarged or reduced and do not represent the actual product dimensions;
it will be appreciated by those skilled in the art that certain well-known structures in the drawings and descriptions thereof may be omitted.
The technical scheme of the invention is further described below with reference to the accompanying drawings and examples.
Example 1
The embodiment provides a false comment group division method based on spectral clustering, as shown in fig. 1, comprising the following steps:
s1: collecting and cleaning comment data of an electronic commerce platform; the metadata used includes: user id, comment id, commodity id, score, number of interactions of comments (e.g., endorsed by other users, "considered useful" by other users, "considered interesting" by other users, etc.);
wherein the metadata is data used to define the data. Comment data is data of many dimensions collected from an e-commerce platform, including field user id, comment time, score, and so forth. Each item, such as "user id" and "comment time" herein is metadata, and information of a comment can be restored through description of the metadata. In this example, a part of the data item (metadata) of a comment, such as "comment time", is selected and not used.
S2: calculating 5 similarity indexes of common comment times, scoring similarity of the same commodity, user interaction times, user positive scoring proportion and user negative scoring proportion based on the metadata in the S1, wherein the similarity of the scoring proportions between two users is measured by Euclidean distance;
s3: constructing a weighted reviewer graph: each user is a graph node, the users who comment on a product two by two are connected by a undirected edge, and the weight of the edge is obtained by calculation according to 5 indexes obtained by calculation in the step S2;
s4: dividing an adjacency matrix of the graph constructed in the step S3 by a spectral clustering algorithm to obtain a plurality of groups;
s5: the classification of the divided groups is further determined manually by selecting reasonable analysis indexes and proper thresholds. In this example, the following are selected: extreme score, duplicate score, and score bias were used as analytical indicators. The calculation formula of the common comment Count (CRT) in S2 is:
CRT(n 1 ,n 2 )=|P 1 ∩P 2 |
wherein ,n1 ,n 2 P for two different reviewers 1 ,P 2 Respectively n 1 ,n 2 A collection of articles for which comments were posted.
The calculation formula of the scoring similarity (Similarity of Rating on Same Product, SRSP) of the same commodity in S2 is:
wherein ,respectively n 1 ,n 2 Scoring the item P to release the ith or jth rating, N 1 ,N 2 Respectively n 1 ,n 2 Number of reviews posted on commodity P.
The calculation formula of the user interaction times (Interaction Times, IT) in S2 is:
wherein ,C1i ,C 2i Respectively represent n 1 ,n 2 Number of m-th interaction actions.
The calculation formula of the positive scoring ratio (Positive Rating Ratio, PR) of the user in S2 is:
wherein ,Si And S represents the score of a comment of the user, ΣS 0 Comment times, Σs, scored {1,2.3.4.5} for the user i 0 The number of times the user was published with a score of {4,5 }.
The calculation formula of the user negative scoring ratio (Negative Rating Ratio, NR) in S2 is:
and S2, measuring the similarity of the positive scoring proportion and the negative scoring proportion of the two users by using the Euclidean distance:
wherein :rn1 and rn2 For reviewer n 1 ,n 2 A user positive score proportion or a user negative score proportion.
S3 comprises the following steps:
s3.1: importing all users as graph nodes;
s3.2: the number of times of common comments among every two nodes is calculated by taking every two nodes as a node combination;
s3.3: judging whether the number of the common comments calculated in the step S3.2 is larger than 0; if yes, S3.4 is carried out; if not, returning to S3.2 to perform the next node combination until all node combinations are traversed;
s3.4: calculating the similarity index of the node to the rest, including: "similarity of scores of the same commodity", "number of user interactions", "user positive score proportion" and "user negative score proportion"; calculating weight values and establishing edges of two nodes;
s3.5: and (5) completing the construction of the weighted reviewer graph.
The weight calculation formula of the edge in S3 is as follows:
wherein ,ωij For the weights on node i and node j, k is the number of metrics, here 5, CRT ij SPSR is the number of times user i and user j comment together ij IT for the similarity of user i and user j scoring the same merchandise ij For both user i and user j, and />The approximation of the positive score proportion and the negative score proportion for user i and user j, respectively.
S4 comprises the following steps:
s4.1: inputting a weighted reviewer graph G, and dividing the number n of clusters;
s4.2: calculating an adjacency matrix a, a degree matrix D and a laplace matrix l=d-a from the weighted reviewer graph G;
s4.3: a normalized laplacian matrix is obtained according to:
NL=D -1/2 (D-AD -1/2 =D -1/2 LD -1/2
s4.4: calculating k minimum characteristic values of NL and corresponding characteristic vectors f, wherein k is the number n of the partition clusters;
s4.5: forming the corresponding feature vectors F into feature matrixes F with the size of v multiplied by k, and standardizing according to the rows, wherein v is the number of samples, namely the number of nodes of the graph G;
s4.6: clustering the normalized feature vectors by using a K-Means method to obtain n candidate groups C= (C) 1 ,c 2 …c n )。
As a specific embodiment, the similarity index in this embodiment may omit, replace or add other indexes according to actual situations, and may combine and modify the calculation methods of the indexes.
The group division is to divide a large number of users into a plurality of groups, wherein the users in each group have the same or similar behaviors, and whether one group belongs to a false comment group needs to be further judged manually. Because the data types generated by the e-commerce platforms and the collected data have large differences, in practical application, the specific implementation of the group division algorithm should be adjusted correspondingly according to the different metadata types given by the data sets. Theoretically, the idea of the embodiment can also be applied to the fields of public opinion monitoring, marketing and the like.
An embodiment of the spurious comment group division method of the present embodiment can be roughly divided into five steps: data collection and cleaning, statistics of similarity indexes, construction of a weighted reviewer graph, spectral clustering grouping and manual judgment of group categories, wherein:
the data collection and cleaning comprises the steps of collecting, analyzing and cleaning the original data, wherein many errors and incomplete data exist in the original data, and the data items with partial data missing or data with abnormal values can be processed in a deleting mode, a mean filling mode and the like. The embodiment uses the data set used for research, so that only the data needs to be analyzed, the required data type is selected, and useless data is deleted.
And (5) calculating a similarity index: most of the indexes used in the embodiment are calculated for two users, so that the calculation of the similarity indexes is synchronously performed in the process of constructing the weighted reviewer graph. The step is to count and store some statistics indexes (such as positive/negative scoring proportion of users) needed to be used in advance, so that repeated statistics on data in the subsequent calculation process is avoided, and the time complexity of an algorithm is increased.
Constructing a weighted reviewer graph: the graph takes users as graph nodes, when two users comment on the same commodity together, an edge is established, and the weight of the edge reflects the similarity degree of the two nodes. The construction of the graph and the calculation process of the similarity index are synchronously carried out: firstly, adding nodes which are equal to the number of users in a data set into a graph, wherein each node is named by a user id; counting the common comment times of every two nodes, if the index is more than or equal to 1, further calculating other indexes, calculating weights on the edges of the two nodes by using all indexes, and constructing an edge connecting the two nodes; if the index is 0, no operation is performed, and the 'common comment times' of the next node combination is continuously calculated until all the node combinations in the graph are traversed.
Spectral clustering partitions groups: as shown in fig. 2, the number n of groups to be divided is selected first, then the adjacency matrix and the degree matrix of the group n can be calculated by the weighted reviewer graph, and further the laplace matrix and the standardized laplace matrix of the graph can be calculated by the adjacency matrix and the degree matrix. Next, the smallest n eigenvalues and corresponding n eigenvectors of the laplace matrix are calculated, the n eigenvectors are combined into a matrix f and are normalized by rows. And finally, clustering f according to the row by using a K-Means method, and dividing the f into n candidate groups.
Manually judging the group category: according to the embodiment of the invention, the judgment index and the corresponding threshold value are selected according to the existing research in the field, and the judgment basis for the category to which the group belongs is different in different fields.
In this embodiment, a study data set of the us comment website Yelp is divided, and other common false comment division group methods are selected for comparison experiments. According to the current research situation in the field, three representative false comment group indexes including an extreme comment ratio (Extreme Rating Ratio, ERR), a repeated comment ratio (Repeated Comment Ratio, RCR) and a Rating Deviation (RD) are selected as judgment bases of dividing effects in experiments, and on the three indexes, the performance of the algorithm is superior to that of K-Means clustering, hierarchical clustering and Louvain community discovery algorithms serving as control experiments.
The terms describing the positional relationship in the drawings are merely illustrative, and are not to be construed as limiting the present patent;
it is to be understood that the above examples of the present invention are provided by way of illustration only and not by way of limitation of the embodiments of the present invention. Other variations or modifications of the above teachings will be apparent to those of ordinary skill in the art. It is not necessary here nor is it exhaustive of all embodiments. Any modification, equivalent replacement, improvement, etc. which come within the spirit and principles of the invention are desired to be protected by the following claims.
Claims (10)
1. A method for classifying false comment groups based on spectral clustering, the method comprising the steps of:
s1: collecting and cleaning comment data of an electronic commerce platform; the metadata used includes: user id, comment id, commodity id, score and comment interaction times;
s2: calculating 5 similarity indexes of common comment times, scoring similarity of the same commodity, user interaction times, user positive scoring proportion and user negative scoring proportion based on the metadata in the S1, wherein the similarity of the scoring proportions between two users is measured by Euclidean distance;
s3: constructing a weighted reviewer graph: each user is a graph node, the users who comment on a product two by two are connected by a undirected edge, and the weight of the edge is obtained by calculation according to 5 indexes obtained by calculation in the step S2;
s4: dividing an adjacency matrix of the graph constructed in the step S3 by a spectral clustering algorithm to obtain a plurality of groups;
s5: the classification of the divided groups is further determined manually by selecting an analysis index and a threshold.
2. The method for classifying a false comment group based on spectral clustering according to claim 1, wherein the calculation formula of the number of common comments in S2 is:
CRT(n 1 ,n 2 )=|P 1 ∩P 2 |
wherein ,n1 ,n 2 P for two different reviewers 1 ,P 2 Respectively n 1 ,n 2 A collection of articles for which comments were posted.
3. The method for classifying a false comment group based on spectral clustering according to claim 2, wherein the calculation formula of the scoring similarity of the same commodity in S2 is:
5. The method for classifying a false comment group based on spectral clustering according to claim 4, wherein the calculation formula of the positive scoring proportion of the user in S2 is:
wherein ,Si And S represents the score of a comment of the user, ΣS 0 Number of comments scored {1,2.3.4.5}, ΣS, posted to the user i 0 The number of times the user was published with a score of {4,5 }.
7. the method for classifying a false comment group based on spectral clustering according to claim 6, wherein the similarity of the positive score ratio and the negative score ratio of two users is measured by the euclidean distance in S2:
wherein :rn1 and rn2 For reviewer n 1 ,n 2 A user positive score proportion or a user negative score proportion.
8. The method for classifying a false comment group based on spectral clustering according to claim 7, wherein S3 includes the steps of:
s3.1: importing all users as graph nodes;
s3.2: the number of times of common comments among every two nodes is calculated by taking every two nodes as a node combination;
s3.3: judging whether the number of the common comments calculated in the step S3.2 is larger than 0; if yes, S3.4 is carried out; if not, returning to S3.2 to perform the next node combination until all node combinations are traversed;
s3.4: calculating the similarity index of the node to the rest; calculating weight values and establishing edges of two nodes;
s3.5: and (5) completing the construction of the weighted reviewer graph.
9. The method for classifying a false comment group based on spectral clustering according to claim 8, wherein the weight calculation formula of the edge in S3 is as follows:
wherein ,ωij For the weights on node i and node j, k is the number of metrics, here 5, CRT ij SPSR is the number of times user i and user j comment together ij IT for the similarity of user i and user j scoring the same merchandise ij For both user i and user j, and />The approximation of the positive score proportion and the negative score proportion for user i and user j, respectively.
10. The method for classifying a false comment group based on spectral clustering according to claim 1 or 9, wherein S4 includes the steps of:
s4.1: inputting a weighted reviewer graph G, and dividing the number n of clusters;
s4.2: calculating an adjacency matrix a, a degree matrix D and a laplace matrix l=d-a from the weighted reviewer graph G;
s4.3: a normalized laplacian matrix is obtained according to:
NL=D -1/2 (D-A)D -1/2 =D -1/2 LD -1/2
s4.4: calculating k minimum characteristic values of NL and corresponding characteristic vectors f, wherein k is the number n of the partition clusters;
s4.5: forming the corresponding feature vectors F into feature matrixes F with the size of v multiplied by k, and standardizing according to the rows, wherein v is the number of samples, namely the number of nodes of the graph G;
s4.6: clustering the normalized feature vectors by using a K-Means method to obtain n candidate groups C= (C) 1 ,c 2 …c n )。
Priority Applications (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN201910887582.2A CN110751180B (en) | 2019-09-19 | 2019-09-19 | Spurious comment group division method based on spectral clustering |
Applications Claiming Priority (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN201910887582.2A CN110751180B (en) | 2019-09-19 | 2019-09-19 | Spurious comment group division method based on spectral clustering |
Publications (2)
Publication Number | Publication Date |
---|---|
CN110751180A CN110751180A (en) | 2020-02-04 |
CN110751180B true CN110751180B (en) | 2023-06-20 |
Family
ID=69276657
Family Applications (1)
Application Number | Title | Priority Date | Filing Date |
---|---|---|---|
CN201910887582.2A Active CN110751180B (en) | 2019-09-19 | 2019-09-19 | Spurious comment group division method based on spectral clustering |
Country Status (1)
Country | Link |
---|---|
CN (1) | CN110751180B (en) |
Families Citing this family (1)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN117421492B (en) * | 2023-12-19 | 2024-04-05 | 四川久远银海软件股份有限公司 | Screening system and method for data element commodities |
Citations (2)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN108345587A (en) * | 2018-02-14 | 2018-07-31 | 广州大学 | A kind of the authenticity detection method and system of comment |
CN109829733A (en) * | 2019-01-31 | 2019-05-31 | 重庆大学 | A kind of false comment detection system and method based on Shopping Behaviors sequence data |
-
2019
- 2019-09-19 CN CN201910887582.2A patent/CN110751180B/en active Active
Patent Citations (2)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN108345587A (en) * | 2018-02-14 | 2018-07-31 | 广州大学 | A kind of the authenticity detection method and system of comment |
CN109829733A (en) * | 2019-01-31 | 2019-05-31 | 重庆大学 | A kind of false comment detection system and method based on Shopping Behaviors sequence data |
Non-Patent Citations (1)
Title |
---|
马晓宁 ; 王婷 ; 董松月 ; .基于PSO-SVM的网络舆情垃圾观点识别.计算机与数字工程.2018,(02),第119-124页. * |
Also Published As
Publication number | Publication date |
---|---|
CN110751180A (en) | 2020-02-04 |
Similar Documents
Publication | Publication Date | Title |
---|---|---|
CN105608600A (en) | Method for evaluating and optimizing B2B seller performances | |
TW201501059A (en) | Method and system for recommending information | |
CN109584006B (en) | Cross-platform commodity matching method based on deep matching model | |
WO2017028735A1 (en) | Method and device for selecting and recommending display object | |
CN113763095B (en) | Information recommendation method and device and model training method and device | |
CN108921602B (en) | User purchasing behavior prediction method based on integrated neural network | |
US20150142580A1 (en) | Heuristic customer clustering | |
CN113379494B (en) | Commodity recommendation method and device based on heterogeneous social relationship and electronic equipment | |
CN112231583B (en) | E-commerce recommendation method based on dynamic interest group identification and generation of confrontation network | |
CN108053050A (en) | Clicking rate predictor method, device, computing device and storage medium | |
CN111340566B (en) | Commodity classification method and device, electronic equipment and storage medium | |
CN113239264A (en) | Personalized recommendation method and system based on meta-path network representation learning | |
CN115860880B (en) | Personalized commodity recommendation method and system based on multi-layer heterogeneous graph convolution model | |
CN113821827B (en) | Combined modeling method and device for protecting multiparty data privacy | |
CN110751180B (en) | Spurious comment group division method based on spectral clustering | |
CN104572623B (en) | A kind of efficient data analysis and summary method of online LDA models | |
Yin et al. | A network-enhanced prediction method for automobile purchase classification using deep learning | |
CN106886934A (en) | Method, system and apparatus for determining merchant categories | |
CN111507804B (en) | Emotion perception commodity recommendation method based on mixed information fusion | |
Liu et al. | Features for link prediction in social networks: A comprehensive study | |
CN110020918B (en) | Recommendation information generation method and system | |
CN110968670A (en) | Method, device, equipment and storage medium for acquiring attributes of popular commodities | |
CN115713390A (en) | Shoe popularity trend prediction recommendation method and system based on user transaction data | |
Ridzky et al. | Public perception for the use of digital wallet in indonesia using social network analysis | |
CN115293815A (en) | Cross-platform e-commerce user alignment method based on user commodity interest |
Legal Events
Date | Code | Title | Description |
---|---|---|---|
PB01 | Publication | ||
PB01 | Publication | ||
SE01 | Entry into force of request for substantive examination | ||
SE01 | Entry into force of request for substantive examination | ||
GR01 | Patent grant | ||
GR01 | Patent grant |