CN113392334A - False comment detection method in cold start environment - Google Patents
False comment detection method in cold start environment Download PDFInfo
- Publication number
- CN113392334A CN113392334A CN202110733235.1A CN202110733235A CN113392334A CN 113392334 A CN113392334 A CN 113392334A CN 202110733235 A CN202110733235 A CN 202110733235A CN 113392334 A CN113392334 A CN 113392334A
- Authority
- CN
- China
- Prior art keywords
- comment
- user
- behavior
- text
- product
- Prior art date
- Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
- Granted
Links
- 238000001514 detection method Methods 0.000 title claims abstract description 22
- 238000000034 method Methods 0.000 claims abstract description 30
- 238000000605 extraction Methods 0.000 claims abstract description 20
- 230000004927 fusion Effects 0.000 claims abstract description 9
- 230000006399 behavior Effects 0.000 claims description 68
- 239000013598 vector Substances 0.000 claims description 25
- 238000013527 convolutional neural network Methods 0.000 description 15
- 230000003542 behavioural effect Effects 0.000 description 12
- 230000006870 function Effects 0.000 description 11
- 230000004913 activation Effects 0.000 description 8
- 230000002159 abnormal effect Effects 0.000 description 7
- 230000000694 effects Effects 0.000 description 7
- 239000011159 matrix material Substances 0.000 description 6
- 238000012549 training Methods 0.000 description 6
- 238000010586 diagram Methods 0.000 description 5
- 238000012552 review Methods 0.000 description 5
- 238000013528 artificial neural network Methods 0.000 description 4
- 230000002776 aggregation Effects 0.000 description 3
- 238000004220 aggregation Methods 0.000 description 3
- 238000002474 experimental method Methods 0.000 description 3
- 238000004458 analytical method Methods 0.000 description 2
- 238000012733 comparative method Methods 0.000 description 2
- 238000011156 evaluation Methods 0.000 description 2
- 230000000052 comparative effect Effects 0.000 description 1
- 230000002996 emotional effect Effects 0.000 description 1
- 239000000284 extract Substances 0.000 description 1
- 230000010365 information processing Effects 0.000 description 1
- 239000000203 mixture Substances 0.000 description 1
- 238000011176 pooling Methods 0.000 description 1
- 230000001502 supplementing effect Effects 0.000 description 1
- 238000012360 testing method Methods 0.000 description 1
Images
Classifications
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F16/00—Information retrieval; Database structures therefor; File system structures therefor
- G06F16/90—Details of database functions independent of the retrieved data types
- G06F16/95—Retrieval from the web
- G06F16/953—Querying, e.g. by the use of web search engines
- G06F16/9536—Search customisation based on social or collaborative filtering
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F16/00—Information retrieval; Database structures therefor; File system structures therefor
- G06F16/30—Information retrieval; Database structures therefor; File system structures therefor of unstructured textual data
- G06F16/35—Clustering; Classification
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F18/00—Pattern recognition
- G06F18/20—Analysing
- G06F18/24—Classification techniques
- G06F18/241—Classification techniques relating to the classification model, e.g. parametric or non-parametric approaches
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F18/00—Pattern recognition
- G06F18/20—Analysing
- G06F18/24—Classification techniques
- G06F18/241—Classification techniques relating to the classification model, e.g. parametric or non-parametric approaches
- G06F18/2415—Classification techniques relating to the classification model, e.g. parametric or non-parametric approaches based on parametric or probabilistic models, e.g. based on likelihood ratio or false acceptance rate versus a false rejection rate
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06N—COMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
- G06N3/00—Computing arrangements based on biological models
- G06N3/02—Neural networks
- G06N3/04—Architecture, e.g. interconnection topology
- G06N3/045—Combinations of networks
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06N—COMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
- G06N3/00—Computing arrangements based on biological models
- G06N3/02—Neural networks
- G06N3/04—Architecture, e.g. interconnection topology
- G06N3/047—Probabilistic or stochastic networks
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06N—COMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
- G06N3/00—Computing arrangements based on biological models
- G06N3/02—Neural networks
- G06N3/04—Architecture, e.g. interconnection topology
- G06N3/048—Activation functions
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06N—COMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
- G06N3/00—Computing arrangements based on biological models
- G06N3/02—Neural networks
- G06N3/08—Learning methods
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06Q—INFORMATION AND COMMUNICATION TECHNOLOGY [ICT] SPECIALLY ADAPTED FOR ADMINISTRATIVE, COMMERCIAL, FINANCIAL, MANAGERIAL OR SUPERVISORY PURPOSES; SYSTEMS OR METHODS SPECIALLY ADAPTED FOR ADMINISTRATIVE, COMMERCIAL, FINANCIAL, MANAGERIAL OR SUPERVISORY PURPOSES, NOT OTHERWISE PROVIDED FOR
- G06Q50/00—Information and communication technology [ICT] specially adapted for implementation of business processes of specific business sectors, e.g. utilities or tourism
- G06Q50/01—Social networking
-
- Y—GENERAL TAGGING OF NEW TECHNOLOGICAL DEVELOPMENTS; GENERAL TAGGING OF CROSS-SECTIONAL TECHNOLOGIES SPANNING OVER SEVERAL SECTIONS OF THE IPC; TECHNICAL SUBJECTS COVERED BY FORMER USPC CROSS-REFERENCE ART COLLECTIONS [XRACs] AND DIGESTS
- Y02—TECHNOLOGIES OR APPLICATIONS FOR MITIGATION OR ADAPTATION AGAINST CLIMATE CHANGE
- Y02D—CLIMATE CHANGE MITIGATION TECHNOLOGIES IN INFORMATION AND COMMUNICATION TECHNOLOGIES [ICT], I.E. INFORMATION AND COMMUNICATION TECHNOLOGIES AIMING AT THE REDUCTION OF THEIR OWN ENERGY USE
- Y02D10/00—Energy efficient computing, e.g. low power processors, power management or thermal management
Landscapes
- Engineering & Computer Science (AREA)
- Theoretical Computer Science (AREA)
- Physics & Mathematics (AREA)
- Data Mining & Analysis (AREA)
- General Physics & Mathematics (AREA)
- General Engineering & Computer Science (AREA)
- Life Sciences & Earth Sciences (AREA)
- Artificial Intelligence (AREA)
- Evolutionary Computation (AREA)
- Computing Systems (AREA)
- Health & Medical Sciences (AREA)
- General Health & Medical Sciences (AREA)
- Software Systems (AREA)
- Databases & Information Systems (AREA)
- Computational Linguistics (AREA)
- Mathematical Physics (AREA)
- Biophysics (AREA)
- Biomedical Technology (AREA)
- Molecular Biology (AREA)
- Bioinformatics & Cheminformatics (AREA)
- Computer Vision & Pattern Recognition (AREA)
- Evolutionary Biology (AREA)
- Probability & Statistics with Applications (AREA)
- Bioinformatics & Computational Biology (AREA)
- Business, Economics & Management (AREA)
- Human Resources & Organizations (AREA)
- Economics (AREA)
- Marketing (AREA)
- Primary Health Care (AREA)
- Strategic Management (AREA)
- Tourism & Hospitality (AREA)
- General Business, Economics & Management (AREA)
- Information Retrieval, Db Structures And Fs Structures Therefor (AREA)
- Machine Translation (AREA)
Abstract
A false comment detection method in a cold start environment comprises the following steps: step (1) feature extraction; constructing a heterogeneous graph; step (3) shared feature learning based on graph convolution; and (4) feature fusion and classification. The method and the device can accurately judge the false comments in the cold start environment.
Description
Technical Field
The invention relates to the field of computer information processing, in particular to a false comment detection method in a cold start environment.
Background
The more abundant the behavior information left by the user on the social network site, the more effective the traditional behavior characteristic analysis method, and in the cold start environment, the new user only issues one comment, and is difficult to extract effective behavior characteristics from the comment, and the text characteristics have been proved to be not good in performance when detecting the false comment of the commercial network site, so that the main difficulty of the false comment detection in the cold start environment is the lack of the activity track of the new user, which results in the lack of effective detection means in the prior art.
Therefore, the invention provides a false comment detection method in a cold start environment.
Disclosure of Invention
In order to realize the purpose of the invention, the following technical scheme is adopted for realizing the purpose:
a false comment detection method in a cold start environment comprises the following steps:
step (1) feature extraction;
constructing a heterogeneous graph;
step (3) shared feature learning based on graph convolution;
and (4) feature fusion and classification.
The false comment detection method under the cold start environment comprises the following steps:
the characteristic extraction in the step (1) comprises the following steps: extracting the behavior characteristics of a user entity, a product entity and a comment entity, extracting the text characteristics of a comment based on CNN, and expressing the user, the product and the comment by using a characteristic vector;
the step (2) of constructing the heterogeneous graph comprises the following steps: constructing an abnormal composition by taking a user entity and a product entity as nodes and taking the issued comment and the received comment as edges;
the step (3) of shared feature learning based on graph convolution comprises the following steps: for each review published by the cold-start user, learning user-based shared behavior features and product-based shared behavior features using a graph convolution neural network;
the feature fusion and classification in the step (4) comprises the following steps: the original behavior features and text features of the comments are fused with the learned shared behavior features to generate new feature vectors of the cold-start comments, and the new feature vectors are used for constructing a classifier to judge the false comments.
The false comment detection method under the cold start environment comprises the following steps of (1) feature extraction:
and for all the user comments, extracting the behavior characteristics of all the users and the behavior characteristics of all the products, and taking the behavior characteristics and the behavior characteristics as characteristic values of the user nodes and the product nodes respectively.
For all user entities u and product entities p, the behavior feature values are:
BFu={uMNR,uPR,uNR,uERD,uavgRD,uBST} (1)
BFp={pMNR,pPR,pNR,pavgRD,pERD} (2)
wherein, BFuFor behavioral characteristics of user entities, BFpIs the behavior characteristics of the product entity;
in addition, for each comment, behavior features based on comment entities are extracted
BFr={Rank,RD,EXT,DEV,ISR} (3)
Combining the behavior characteristics of the user corresponding to the comment and the product-based behavior characteristics of the corresponding product to form a complete behavior characteristic vector q (r) of the comment r,
q(r)=[o1,o2,…,oj,…,o16] (4)
the pre-training text feature extraction model obtains the text features of each comment, and the classification adopts a softmax activation function:
classTe=softmax(WTe·Te(r)+bTe) (5)
where Te (r) is a text feature vector obtained by convolution of the comment text r, WTeIs a learnable weight matrix, bTeIndicates deviation, classTeIs used to indicate a classification as trueReal or false comments;
after the text feature extraction model is trained, Te (r) obtained by the text feature extraction model based on the CNN is used as a text feature vector of each comment r.
The false comment detection method under the cold start environment comprises the following steps of (2) constructing a heterogeneous graph: the relationship of the heterogeneous graph is represented by a triplet as follows: a source node type, an edge type and a target node type; the heterogeneous graph constructed in step 2 includes two sets of relationships: users, reviews, products and products, reviewed, users; wherein, the node of the user type uses the behavior characteristic BF corresponding to the useruRepresenting the behavioral characteristics BF of the product type corresponding to the productpThe relationship is expressed by s and is divided into two types, namely comment and commented.
The false comment detection method under the cold start environment, wherein the shared feature learning based on graph convolution in the step (3) comprises the following steps:
after the abnormal graph is constructed, for each edge in the graph, extracting the shared behavior characteristics of the old user to the new user by adopting a two-layer graph convolution neural network, wherein the convolution process is shown as a formula 6,
wherein f issIs the convolution module for each relation s, AGG is the aggregation function,the characteristics of the source node in the relationship s are represented,representing the characteristics of the target node in the relationship s. During initialization, according to the user characteristic BF that if the node type is a user, the initial characteristic value h is corresponding to the nodeu(ii) a If the node type is a product, the characteristic value h is the product characteristic BF corresponding to the nodep,l+1 represents the current iteration times, l represents the previous iteration times, and the initial value of l is 0;
the convolution module fs is represented by:
where N (i) is the neighbor set of node i, j is an element in the set N (i), cjiIs the product of the square root of the degree of nodes, i.e. Represents the characteristic value, W, of node j after l iterationslRepresenting learnable weights, blRepresents the deviation, σ is the activation function;
obtaining the hidden characteristic value h of the source node of each edge through convolution operation on the heterogeneous graphsrcAnd hidden eigenvalues h of the target nodes of each edgedst。
The false comment detection method under the cold start environment, wherein the feature fusion and classification in the step (4) comprises the following steps: in the feature fusion and classification stage, the original text features, behavior features, source node sharing features and target node sharing features of each edge in the heterogeneous map are spliced, and then the spliced feature vectors are processed by using a full connection layer with a softmax activation function to obtain a final classification result.
Finally, using the full-link layer with the softmax activation function to process F (r), obtaining a final classification result y:
y=softmax(WF·F(r)+bF) (9)
wherein, WFIs a learnable parameter matrix, bFRepresenting the deviation, y has a dimension of 2,respectively representing the probability that the current edge is a false comment and a true comment.
Drawings
FIG. 1 is a block diagram of a false comment detection method in a cold start environment;
FIG. 2 is a schematic diagram of a graph convolution network-based shared feature learning process;
fig. 3 is a schematic diagram of a text feature extraction model.
Detailed Description
The following detailed description of the embodiments of the present invention is provided in conjunction with the accompanying drawings of fig. 1-3.
As shown in fig. 1, the method for detecting false comments in a cold start environment includes the following steps:
and (1) extracting features. The behavior characteristics of the user entity, the product entity and the comment entity are extracted for the user comment, and the text characteristics of the user comment are extracted based on a CNN (convolutional neural network), so that the user, the product and the comment are expressed by a feature vector.
And (3) constructing a heterogeneous graph. And constructing the heteromorphic graph by taking the user entity and the product entity as nodes and taking the issued comment and the received comment as edges. Because the feature vectors obtained by feature extraction are independent among the entities, the association information among the entities can be stored in a mode of constructing an abnormal graph.
And (3) learning the shared features based on graph convolution. For the comments issued by each cold-start user, the graph convolution neural network is used for learning the sharing behavior characteristics based on the users and the sharing behavior characteristics based on the products, and the behavior information missing from the cold-start users is supplemented, so that the detection effect of the false comments under cold start is improved.
And (4) feature fusion and classification. And fusing the original behavior features and text features of the comments issued by the cold-start user with the learned two types of shared features to generate a new feature vector of the cold-start comments. The new feature vectors are used to build classifiers to enable discrimination of false comments.
Specifically, the method comprises the following steps: step 1, feature extraction:
the behavior characteristics of the user entity, the product entity and the comment entity in the step 1 are respectively explained as follows:
TABLE 1 behavioral characteristics in user entities
TABLE 2 behavioral characteristics in product entities
TABLE 3 assessment of behavioral characteristics in an entity
And for all the user comments, extracting the behavior characteristics of all the users and the behavior characteristics of all the products, and taking the behavior characteristics and the behavior characteristics as characteristic values of the user nodes and the product nodes respectively.
For all user entities u and product entities p, the behavior feature values are:
BFu={uMNR,uPR,uNR,uERD,uavgRD,uBST} (10)
BFp={pMNR,pPR,pNR,pavgRD,pERD} (11)
wherein, BFuFor behavioral characteristics of user entities, BFpThe meaning of each characteristic value is shown in table 1 and table 2 as the behavior characteristic of the product entity.
Furthermore, for each review, 5 review entity-based behavioral features were extracted according to Table 3
BFr={Rank,RD,EXT,DEV,ISR} (12)
And combining the 6 user-based behavior characteristics of the users corresponding to the comments and the 5 product-based behavior characteristics of the corresponding products to form a complete behavior characteristic vector q (r) of the comment r.
q(r)=[o1,o2,…,oj,…,o16] (13)
Then, a text feature extraction model based on CNN (convolutional neural network) is pre-trained by using the false comment text and the common comment text, and is used for obtaining the text feature of each comment. The model structure is shown in fig. 3.
Wherein, the characteristic diagrams 1, 2 and 3 are hidden layers obtained by convolution kernels with convolution window heights of 3,4 and 5 respectively; the classification uses the softmax activation function, which is described as:
classTe=softmax(WTe·Te(r)+bTe) (14)
where Te (r) is a text feature vector obtained by convolution of the comment text r, WTeIs a learnable weight matrix, bTeIndicates deviation, classTeThe value of (d) is used to indicate whether a classification is true or false.
The convolution operation represents the comment text as a feature vector Te, and the text feature extraction model enables Te to represent whether the comment text is real or not to the maximum extent through training, so that the feature vector is extracted to be used as a text feature vector corresponding to the comment.
In the pre-trained CNN-based text feature extraction model, in the aspect of parameter setting, the number of convolution kernels is set to be 60, the text feature length is set to be 10, the maximum pooling is used, the learning rate is set to be 0.00001, and the epoch (iteration number) is set to be 100. The model adopts cross entropy loss, the weight ratio of the normal comments to the false comments is set to be 1:10, the problem of unbalanced proportion of the normal comments to the false comments is solved, and the model with the highest F1 value in the training process is stored as a final feature extraction model.
After a text feature extraction model based on the CNN is trained, Te (r) obtained by the text feature extraction model based on the CNN is used for each comment r as a text feature vector of the comment, and the length of the Te (r) is the text feature length given when parameters are set.
Step 2, constructing an isomeric diagram:
in order to extract shared characteristics from old users associated with new users to solve the problem of missing of new user behavior information, after the behavior characteristics of each user and each product are extracted, the users and the products are used as nodes to construct a heterogeneous graph.
The heteromorphic graph relationship can be represented by a triplet: (source node type, edge type, target node type), the heterogeneous graph constructed in step 2 includes two sets of relationships: (user, review, product), (product, reviewed, user). Wherein, the node of the user type uses the behavior characteristic BF corresponding to the useruRepresenting the behavioral characteristics BF of the product type corresponding to the productpAnd representing, wherein the edge is represented by the comment or the behavior characteristic of the comment. The above-mentioned relation is represented by s and is divided into two types, i.e., comment and commented.
Step 3, shared characteristic learning based on GCN (graph convolution network)
After the abnormal graph is constructed, for each edge in the graph, a two-layer graph convolution neural network is adopted to extract the shared behavior characteristics of the old user to the new user, the convolution process is shown as a formula 6, and the characteristic matrix is a matrix formed by characteristic values of each node in the abnormal graph. The mathematical definition of graph convolution in an anomaly graph is:
wherein f issIs the convolution module for each relation s, AGG is the aggregation function,the characteristics of the source node in the relationship s are represented,representing the characteristics of the target node in the relationship s. During initialization, according to the user characteristic BF that if the node type is a user, the initial characteristic value h is corresponding to the nodeu(ii) a If the node type is a product, the characteristic value h is the product characteristic BF corresponding to the nodepL +1 represents the current iteration number, l represents the previous iteration number, and the initial value of l is 0.
The aggregation function AGG used in the present invention is sum.
The convolution module fs is represented by:
where N (i) is the neighbor set of node i, j is an element in the set N (i), cjiIs the product of the square root of the degree of nodes, i.e. Represents the characteristic value, W, of node j after l iterationslRepresenting learnable weights, blRepresenting the deviation, σ is the activation function, and Relu is used in the present invention.
When constructing the graph, the feature vector described by formula (1) or formula (2) is used as the initial feature value h of each node i according to different node typesi 0And (7) assigning values. The node i performs graph convolution on all neighbor nodes of the node through the process described by the formula (7), and then converges the feature vectors of all the neighbor nodes of the node i by using the formula (6). And iterating the process to enable each node to learn the hidden characteristic value h of the node.
Through convolution operation on the heterogeneous graph, the hidden characteristic value of the source node of each edge is hsrcThe hidden characteristic value of the target node of each edge is hdstThen the two hidden features are treated as shared features of the source node and the target node, and the two sets of feature vectors are used for enriching the behavior information of each edge missing. According to the relationship represented by the edge, hsrcAnd hdstRespectively representing the user sharing behavior characteristics or the product sharing behavior characteristics: when an edge represents a (user, comment, product) relationship, hsrcSharing of behavioral characteristics for users, hdstSharing behavioral characteristics for the product; when an edge represents a (product, commented on, user) relationship, hsrcFor product sharing behavioral characteristics, hdstBehavioral characteristics are shared for users.
Step 4. feature fusion and classification
In the feature fusion and classification stage, original text features, behavior features, source node sharing features and target node sharing features of each edge (namely each comment) in the abnormal graph are spliced, and then the spliced feature vectors are processed by using a full connection layer with a softmax activation function to obtain a final classification result.
Finally, using the full-link layer with the softmax activation function to process F (r), obtaining a final classification result y:
y=softmax(WF·F(r)+bF) (18)
wherein, WFIs a learnable parameter matrix, bFThe deviation is represented, and the dimension of y is 2, which respectively represents the probability that the current (i.e. the comment to be detected) is a false comment and a true comment.
Results and analysis of the experiments
To demonstrate the effectiveness of the proposed method of the present invention, the proposed model was compared to other 7-class baseline methods, a brief description of which is as follows:
(1) LF: traditional bigram features are used as comment text features.
(2) Supervised-CNN: and training the convolutional neural network by using the marked comments only, thereby extracting semantic information of the comments as text features of the comments and identifying false comments only according to the semantic information.
(3) LF + BF: and evaluating the text characteristics and the behavior characteristics of the comment entity to represent comments, and performing false comment detection by using the characteristics obtained by splicing, wherein the text characteristics are binary grammatical characteristics, and the behavior characteristics comprise comment text length, score, absolute deviation rate of the score, and maximum cosine similarity between the comment and other comments in the corresponding product.
(4) And BF _ EditSim + LF, namely associating the new user with the old user by using a representation learning-based method, then using the most similar behavior characteristics of the old user as the behavior characteristics of the new user, and finally splicing the behavior characteristics and the binary grammar characteristics as the characteristic representation of the cold start comment so as to detect whether the comment is real.
(5) BF _ W2Vsim + W2V: firstly, a word vector of each word in the comment is obtained through a word vector model word2vec, then the text features of the comment are obtained by taking the mean value, then the comment which is most similar to the cold-start comment is obtained by using the cosine similarity between the cold-start comment and the text features of the existing comment, finally the feature representation of cold start is formed by using the behavior features of the most similar comment and the text features of the comment, and the comment is detected according to the combined feature vector.
(6) RE: and (3) constructing the behavior characteristics of the user by using a TransE model, wherein the text characteristics adopt CNN, and the emotional tendency of the text is stored by adopting constraint.
(7) RE + RRE + PRE: the model is expanded on an RE model, and the comment representation, the comment score and the product comment score obtained by the RE model are spliced to serve as final comment representation.
In order to verify the effectiveness of the method, hotel comment data in a Yelp data set is selected for experiment. The Yelp dataset is a publicly available commercial website dataset that provides a good balance between commercial authenticity and ground truth and is therefore widely used in many predecessor writings. And taking the first comment published by the new user with the label after 1/2012 as a test set, and taking the first comment published by the user before 1/2012 as a training set for learning the GCN-based shared feature extraction model. In addition, in order to train the global text feature representation model, all labeled comment data before 1 month and 1 day of 2012 are separately extracted for separately training the CNN-based text feature extraction model.
TABLE 1 comparative experimental results for different methods in cold start environment
The results of the experiment are shown in table 4. The method provided by the invention is superior to a comparison method in all evaluation indexes. Particularly, compared with other methods, the recall rate of the method provided by the invention is improved by about 10%, which shows that the method provided by the invention can more accurately identify the false comments. Furthermore, by analyzing table 1, the following conclusions are made:
1) in the cold start environment, the text features still perform poorly. The LF recognition accuracy of the method based on the binary grammatical feature is the lowest in all comparison methods, while the Supervised-CNN method based on the text feature of the CNN has the lowest value compared with the other methods F1. This indicates that relying on the comment text alone does not effectively identify false comments.
2) The detection effect under the cold start environment is improved to a certain extent by combining the behavior characteristics. As can be seen from the results of the LF + BF model, combining the behavior features and the text features can improve the detection accuracy of false comments under cold start, but from the fact that model 3 recall rate and F1 are rather reduced, it can be concluded that: relying only on the behavioral characteristics of the comment itself at cold start will result in more spurious comments being identified as normal comments.
3) The method for directly replacing the behavior characteristics of the comment to be detected with the similar comment behavior characteristics under cold start has poor effect. The model 4 and the model 5 are subjected to false comment detection in a mode of replacing features from the perspective of similarity between users and texts, and experimental results show that the accuracy of the model is not obviously improved from the perspective of similarity between users or the perspective of similarity between texts, and partial indexes (such as F1 value of the model 4 and recall rate) are even lower than that of a method only using text features.
4) By extracting the association from the existing comments, the behavior characteristics of the cold start comment are constructed and combined with the original behavior characteristics of the cold start comment, and a better effect can be achieved. The model 8 extracts the behavior characteristics of the associated user through the abnormal picture and combines the behavior characteristics with the original behavior characteristics of the model, so that the obtained experimental effect is best, and compared with other methods, all parameters are greatly improved.
5) The shared characteristic based on graph convolution learning effectively solves the problem of behavior characteristic information loss of cold-start users, and improves the accuracy of false comment detection in a cold-start environment. Compared to other comparative methods the model presented here outperforms other comparative methods in all evaluation indices.
The method can express the association among the user, the product and the comment in a graph mode, and learn the shared behavior characteristics through graph convolution for supplementing the missing behavior characteristics of the cold start user; fusing text features and behavior features of the comments and shared behavior features of entities with which the comments are associated to detect false comments; the problem of poor detection effect of false comments caused by lack of user behavior information in a cold start environment is effectively solved.
Claims (2)
1. A false comment detection method in a cold start environment is characterized by comprising the following steps:
step (1) feature extraction;
constructing a heterogeneous graph;
step (3) shared feature learning based on graph convolution;
and (4) feature fusion and classification.
2. The method of claim 1, wherein:
the characteristic extraction in the step (1) comprises the following steps: and extracting the behavior characteristics of the user entity, the product entity and the comment entity, extracting the text characteristics of the comment based on the CNN, and expressing the user, the product and the comment by using a characteristic vector.
Priority Applications (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN202110733235.1A CN113392334B (en) | 2021-06-29 | 2021-06-29 | False comment detection method in cold start environment |
Applications Claiming Priority (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN202110733235.1A CN113392334B (en) | 2021-06-29 | 2021-06-29 | False comment detection method in cold start environment |
Publications (2)
Publication Number | Publication Date |
---|---|
CN113392334A true CN113392334A (en) | 2021-09-14 |
CN113392334B CN113392334B (en) | 2024-03-08 |
Family
ID=77624525
Family Applications (1)
Application Number | Title | Priority Date | Filing Date |
---|---|---|---|
CN202110733235.1A Active CN113392334B (en) | 2021-06-29 | 2021-06-29 | False comment detection method in cold start environment |
Country Status (1)
Country | Link |
---|---|
CN (1) | CN113392334B (en) |
Cited By (1)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN114692007A (en) * | 2022-06-01 | 2022-07-01 | 腾讯科技(深圳)有限公司 | Method, device, equipment and storage medium for determining representation information |
Citations (12)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
US20140172989A1 (en) * | 2012-12-14 | 2014-06-19 | Yigal Dan Rubinstein | Spam detection and prevention in a social networking system |
WO2019183191A1 (en) * | 2018-03-22 | 2019-09-26 | Michael Bronstein | Method of news evaluation in social media networks |
CN110321436A (en) * | 2019-07-04 | 2019-10-11 | 中国人民解放军国防科技大学 | Cold-start fraud comment detection method based on social attention mechanism representation learning |
CN110580341A (en) * | 2019-09-19 | 2019-12-17 | 山东科技大学 | False comment detection method and system based on semi-supervised learning model |
CN111259140A (en) * | 2020-01-13 | 2020-06-09 | 长沙理工大学 | False comment detection method based on LSTM multi-entity feature fusion |
CN111639252A (en) * | 2020-05-18 | 2020-09-08 | 华中科技大学 | False news identification method based on news-comment relevance analysis |
CN111753884A (en) * | 2020-06-08 | 2020-10-09 | 浙江工业大学 | Depth map convolution model defense method and device based on network feature reinforcement |
CN112417099A (en) * | 2020-11-20 | 2021-02-26 | 南京邮电大学 | Method for constructing fraud user detection model based on graph attention network |
CN112732921A (en) * | 2021-01-19 | 2021-04-30 | 福州大学 | False user comment detection method and system |
CN112765313A (en) * | 2020-12-31 | 2021-05-07 | 太原理工大学 | False information detection method based on original text and comment information analysis algorithm |
CN112990972A (en) * | 2021-03-19 | 2021-06-18 | 华南理工大学 | Recommendation method based on heterogeneous graph neural network |
CN113032525A (en) * | 2021-03-23 | 2021-06-25 | 深圳大学 | False news detection method and device, electronic equipment and storage medium |
-
2021
- 2021-06-29 CN CN202110733235.1A patent/CN113392334B/en active Active
Patent Citations (12)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
US20140172989A1 (en) * | 2012-12-14 | 2014-06-19 | Yigal Dan Rubinstein | Spam detection and prevention in a social networking system |
WO2019183191A1 (en) * | 2018-03-22 | 2019-09-26 | Michael Bronstein | Method of news evaluation in social media networks |
CN110321436A (en) * | 2019-07-04 | 2019-10-11 | 中国人民解放军国防科技大学 | Cold-start fraud comment detection method based on social attention mechanism representation learning |
CN110580341A (en) * | 2019-09-19 | 2019-12-17 | 山东科技大学 | False comment detection method and system based on semi-supervised learning model |
CN111259140A (en) * | 2020-01-13 | 2020-06-09 | 长沙理工大学 | False comment detection method based on LSTM multi-entity feature fusion |
CN111639252A (en) * | 2020-05-18 | 2020-09-08 | 华中科技大学 | False news identification method based on news-comment relevance analysis |
CN111753884A (en) * | 2020-06-08 | 2020-10-09 | 浙江工业大学 | Depth map convolution model defense method and device based on network feature reinforcement |
CN112417099A (en) * | 2020-11-20 | 2021-02-26 | 南京邮电大学 | Method for constructing fraud user detection model based on graph attention network |
CN112765313A (en) * | 2020-12-31 | 2021-05-07 | 太原理工大学 | False information detection method based on original text and comment information analysis algorithm |
CN112732921A (en) * | 2021-01-19 | 2021-04-30 | 福州大学 | False user comment detection method and system |
CN112990972A (en) * | 2021-03-19 | 2021-06-18 | 华南理工大学 | Recommendation method based on heterogeneous graph neural network |
CN113032525A (en) * | 2021-03-23 | 2021-06-25 | 深圳大学 | False news detection method and device, electronic equipment and storage medium |
Non-Patent Citations (4)
Title |
---|
AO LI 等: "Spam Review Detection with Graph Convolutional Networks", 《CIKM ’19》, pages 2703 - 2711 * |
XIAOQING SUN 等: "Deepdom: Malicious domain detection with scalable and heterogeneous graph convolutional networks", 《COMPUTERS & SECURITY》, vol. 99, 31 December 2020 (2020-12-31), pages 1 - 16 * |
焦易于: "基于融合特征的虚假评论检测算法", 《中国优秀硕士学位论文全文数据库 信息科技辑》, no. 06, 15 June 2017 (2017-06-15), pages 138 - 1559 * |
郭国庆: "基于特征融合的虚假评论检测研究", 《中国优秀硕士学位论文全文数据库 信息科技辑》, no. 01, 15 January 2023 (2023-01-15), pages 138 - 3407 * |
Cited By (2)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN114692007A (en) * | 2022-06-01 | 2022-07-01 | 腾讯科技(深圳)有限公司 | Method, device, equipment and storage medium for determining representation information |
WO2023231542A1 (en) * | 2022-06-01 | 2023-12-07 | 腾讯科技(深圳)有限公司 | Representation information determination method and apparatus, and device and storage medium |
Also Published As
Publication number | Publication date |
---|---|
CN113392334B (en) | 2024-03-08 |
Similar Documents
Publication | Publication Date | Title |
---|---|---|
US11687728B2 (en) | Text sentiment analysis method based on multi-level graph pooling | |
CN107608956B (en) | Reader emotion distribution prediction algorithm based on CNN-GRNN | |
EP3180742B1 (en) | Generating and using a knowledge-enhanced model | |
CN109376222B (en) | Question-answer matching degree calculation method, question-answer automatic matching method and device | |
CN112329474B (en) | Attention-fused aspect-level user comment text emotion analysis method and system | |
CN109766557B (en) | Emotion analysis method and device, storage medium and terminal equipment | |
CN113095415B (en) | Cross-modal hashing method and system based on multi-modal attention mechanism | |
CN110929034A (en) | Commodity comment fine-grained emotion classification method based on improved LSTM | |
CN107818084B (en) | Emotion analysis method fused with comment matching diagram | |
CN104346440A (en) | Neural-network-based cross-media Hash indexing method | |
CN112749274B (en) | Chinese text classification method based on attention mechanism and interference word deletion | |
CN111931505A (en) | Cross-language entity alignment method based on subgraph embedding | |
CN111339260A (en) | BERT and QA thought-based fine-grained emotion analysis method | |
CN114942991B (en) | Emotion classification model construction method based on metaphor recognition | |
CN112215629B (en) | Multi-target advertisement generating system and method based on construction countermeasure sample | |
CN110969005B (en) | Method and device for determining similarity between entity corpora | |
CN113076425B (en) | Event related viewpoint sentence classification method for microblog comments | |
CN112836007B (en) | Relational element learning method based on contextualized attention network | |
CN113392334A (en) | False comment detection method in cold start environment | |
CN113435192A (en) | Chinese text emotion analysis method based on changing neural network channel cardinality | |
KR102448044B1 (en) | Aspect based sentiment analysis method using aspect map and electronic device | |
CN111859925A (en) | Emotion analysis system and method based on probability emotion dictionary | |
CN111666410B (en) | Emotion classification method and system for commodity user comment text | |
CN115659990A (en) | Tobacco emotion analysis method, device and medium | |
CN112364258B (en) | Recommendation method and system based on map, storage medium and electronic equipment |
Legal Events
Date | Code | Title | Description |
---|---|---|---|
PB01 | Publication | ||
PB01 | Publication | ||
SE01 | Entry into force of request for substantive examination | ||
SE01 | Entry into force of request for substantive examination | ||
GR01 | Patent grant | ||
GR01 | Patent grant |