CN113392334A - False comment detection method in cold start environment - Google Patents

False comment detection method in cold start environment Download PDF

Info

Publication number
CN113392334A
CN113392334A CN202110733235.1A CN202110733235A CN113392334A CN 113392334 A CN113392334 A CN 113392334A CN 202110733235 A CN202110733235 A CN 202110733235A CN 113392334 A CN113392334 A CN 113392334A
Authority
CN
China
Prior art keywords
comment
user
behavior
text
product
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Granted
Application number
CN202110733235.1A
Other languages
Chinese (zh)
Other versions
CN113392334B (en
Inventor
向凌云
郭国庆
游卉擎
刘宇航
夏卓群
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Changsha University of Science and Technology
Original Assignee
Changsha University of Science and Technology
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Changsha University of Science and Technology filed Critical Changsha University of Science and Technology
Priority to CN202110733235.1A priority Critical patent/CN113392334B/en
Publication of CN113392334A publication Critical patent/CN113392334A/en
Application granted granted Critical
Publication of CN113392334B publication Critical patent/CN113392334B/en
Active legal-status Critical Current
Anticipated expiration legal-status Critical

Links

Images

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F16/00Information retrieval; Database structures therefor; File system structures therefor
    • G06F16/90Details of database functions independent of the retrieved data types
    • G06F16/95Retrieval from the web
    • G06F16/953Querying, e.g. by the use of web search engines
    • G06F16/9536Search customisation based on social or collaborative filtering
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F16/00Information retrieval; Database structures therefor; File system structures therefor
    • G06F16/30Information retrieval; Database structures therefor; File system structures therefor of unstructured textual data
    • G06F16/35Clustering; Classification
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F18/00Pattern recognition
    • G06F18/20Analysing
    • G06F18/24Classification techniques
    • G06F18/241Classification techniques relating to the classification model, e.g. parametric or non-parametric approaches
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F18/00Pattern recognition
    • G06F18/20Analysing
    • G06F18/24Classification techniques
    • G06F18/241Classification techniques relating to the classification model, e.g. parametric or non-parametric approaches
    • G06F18/2415Classification techniques relating to the classification model, e.g. parametric or non-parametric approaches based on parametric or probabilistic models, e.g. based on likelihood ratio or false acceptance rate versus a false rejection rate
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06NCOMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
    • G06N3/00Computing arrangements based on biological models
    • G06N3/02Neural networks
    • G06N3/04Architecture, e.g. interconnection topology
    • G06N3/045Combinations of networks
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06NCOMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
    • G06N3/00Computing arrangements based on biological models
    • G06N3/02Neural networks
    • G06N3/04Architecture, e.g. interconnection topology
    • G06N3/047Probabilistic or stochastic networks
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06NCOMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
    • G06N3/00Computing arrangements based on biological models
    • G06N3/02Neural networks
    • G06N3/04Architecture, e.g. interconnection topology
    • G06N3/048Activation functions
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06NCOMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
    • G06N3/00Computing arrangements based on biological models
    • G06N3/02Neural networks
    • G06N3/08Learning methods
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06QINFORMATION AND COMMUNICATION TECHNOLOGY [ICT] SPECIALLY ADAPTED FOR ADMINISTRATIVE, COMMERCIAL, FINANCIAL, MANAGERIAL OR SUPERVISORY PURPOSES; SYSTEMS OR METHODS SPECIALLY ADAPTED FOR ADMINISTRATIVE, COMMERCIAL, FINANCIAL, MANAGERIAL OR SUPERVISORY PURPOSES, NOT OTHERWISE PROVIDED FOR
    • G06Q50/00Information and communication technology [ICT] specially adapted for implementation of business processes of specific business sectors, e.g. utilities or tourism
    • G06Q50/01Social networking
    • YGENERAL TAGGING OF NEW TECHNOLOGICAL DEVELOPMENTS; GENERAL TAGGING OF CROSS-SECTIONAL TECHNOLOGIES SPANNING OVER SEVERAL SECTIONS OF THE IPC; TECHNICAL SUBJECTS COVERED BY FORMER USPC CROSS-REFERENCE ART COLLECTIONS [XRACs] AND DIGESTS
    • Y02TECHNOLOGIES OR APPLICATIONS FOR MITIGATION OR ADAPTATION AGAINST CLIMATE CHANGE
    • Y02DCLIMATE CHANGE MITIGATION TECHNOLOGIES IN INFORMATION AND COMMUNICATION TECHNOLOGIES [ICT], I.E. INFORMATION AND COMMUNICATION TECHNOLOGIES AIMING AT THE REDUCTION OF THEIR OWN ENERGY USE
    • Y02D10/00Energy efficient computing, e.g. low power processors, power management or thermal management

Landscapes

  • Engineering & Computer Science (AREA)
  • Theoretical Computer Science (AREA)
  • Physics & Mathematics (AREA)
  • Data Mining & Analysis (AREA)
  • General Physics & Mathematics (AREA)
  • General Engineering & Computer Science (AREA)
  • Life Sciences & Earth Sciences (AREA)
  • Artificial Intelligence (AREA)
  • Evolutionary Computation (AREA)
  • Computing Systems (AREA)
  • Health & Medical Sciences (AREA)
  • General Health & Medical Sciences (AREA)
  • Software Systems (AREA)
  • Databases & Information Systems (AREA)
  • Computational Linguistics (AREA)
  • Mathematical Physics (AREA)
  • Biophysics (AREA)
  • Biomedical Technology (AREA)
  • Molecular Biology (AREA)
  • Bioinformatics & Cheminformatics (AREA)
  • Computer Vision & Pattern Recognition (AREA)
  • Evolutionary Biology (AREA)
  • Probability & Statistics with Applications (AREA)
  • Bioinformatics & Computational Biology (AREA)
  • Business, Economics & Management (AREA)
  • Human Resources & Organizations (AREA)
  • Economics (AREA)
  • Marketing (AREA)
  • Primary Health Care (AREA)
  • Strategic Management (AREA)
  • Tourism & Hospitality (AREA)
  • General Business, Economics & Management (AREA)
  • Information Retrieval, Db Structures And Fs Structures Therefor (AREA)
  • Machine Translation (AREA)

Abstract

A false comment detection method in a cold start environment comprises the following steps: step (1) feature extraction; constructing a heterogeneous graph; step (3) shared feature learning based on graph convolution; and (4) feature fusion and classification. The method and the device can accurately judge the false comments in the cold start environment.

Description

False comment detection method in cold start environment
Technical Field
The invention relates to the field of computer information processing, in particular to a false comment detection method in a cold start environment.
Background
The more abundant the behavior information left by the user on the social network site, the more effective the traditional behavior characteristic analysis method, and in the cold start environment, the new user only issues one comment, and is difficult to extract effective behavior characteristics from the comment, and the text characteristics have been proved to be not good in performance when detecting the false comment of the commercial network site, so that the main difficulty of the false comment detection in the cold start environment is the lack of the activity track of the new user, which results in the lack of effective detection means in the prior art.
Therefore, the invention provides a false comment detection method in a cold start environment.
Disclosure of Invention
In order to realize the purpose of the invention, the following technical scheme is adopted for realizing the purpose:
a false comment detection method in a cold start environment comprises the following steps:
step (1) feature extraction;
constructing a heterogeneous graph;
step (3) shared feature learning based on graph convolution;
and (4) feature fusion and classification.
The false comment detection method under the cold start environment comprises the following steps:
the characteristic extraction in the step (1) comprises the following steps: extracting the behavior characteristics of a user entity, a product entity and a comment entity, extracting the text characteristics of a comment based on CNN, and expressing the user, the product and the comment by using a characteristic vector;
the step (2) of constructing the heterogeneous graph comprises the following steps: constructing an abnormal composition by taking a user entity and a product entity as nodes and taking the issued comment and the received comment as edges;
the step (3) of shared feature learning based on graph convolution comprises the following steps: for each review published by the cold-start user, learning user-based shared behavior features and product-based shared behavior features using a graph convolution neural network;
the feature fusion and classification in the step (4) comprises the following steps: the original behavior features and text features of the comments are fused with the learned shared behavior features to generate new feature vectors of the cold-start comments, and the new feature vectors are used for constructing a classifier to judge the false comments.
The false comment detection method under the cold start environment comprises the following steps of (1) feature extraction:
and for all the user comments, extracting the behavior characteristics of all the users and the behavior characteristics of all the products, and taking the behavior characteristics and the behavior characteristics as characteristic values of the user nodes and the product nodes respectively.
For all user entities u and product entities p, the behavior feature values are:
BFu={uMNR,uPR,uNR,uERD,uavgRD,uBST} (1)
BFp={pMNR,pPR,pNR,pavgRD,pERD} (2)
wherein, BFuFor behavioral characteristics of user entities, BFpIs the behavior characteristics of the product entity;
in addition, for each comment, behavior features based on comment entities are extracted
BFr={Rank,RD,EXT,DEV,ISR} (3)
Combining the behavior characteristics of the user corresponding to the comment and the product-based behavior characteristics of the corresponding product to form a complete behavior characteristic vector q (r) of the comment r,
q(r)=[o1,o2,…,oj,…,o16] (4)
the pre-training text feature extraction model obtains the text features of each comment, and the classification adopts a softmax activation function:
classTe=softmax(WTe·Te(r)+bTe) (5)
where Te (r) is a text feature vector obtained by convolution of the comment text r, WTeIs a learnable weight matrix, bTeIndicates deviation, classTeIs used to indicate a classification as trueReal or false comments;
after the text feature extraction model is trained, Te (r) obtained by the text feature extraction model based on the CNN is used as a text feature vector of each comment r.
The false comment detection method under the cold start environment comprises the following steps of (2) constructing a heterogeneous graph: the relationship of the heterogeneous graph is represented by a triplet as follows: a source node type, an edge type and a target node type; the heterogeneous graph constructed in step 2 includes two sets of relationships: users, reviews, products and products, reviewed, users; wherein, the node of the user type uses the behavior characteristic BF corresponding to the useruRepresenting the behavioral characteristics BF of the product type corresponding to the productpThe relationship is expressed by s and is divided into two types, namely comment and commented.
The false comment detection method under the cold start environment, wherein the shared feature learning based on graph convolution in the step (3) comprises the following steps:
after the abnormal graph is constructed, for each edge in the graph, extracting the shared behavior characteristics of the old user to the new user by adopting a two-layer graph convolution neural network, wherein the convolution process is shown as a formula 6,
Figure BDA0003138079050000041
wherein f issIs the convolution module for each relation s, AGG is the aggregation function,
Figure BDA0003138079050000042
the characteristics of the source node in the relationship s are represented,
Figure BDA0003138079050000043
representing the characteristics of the target node in the relationship s. During initialization, according to the user characteristic BF that if the node type is a user, the initial characteristic value h is corresponding to the nodeu(ii) a If the node type is a product, the characteristic value h is the product characteristic BF corresponding to the nodep,l+1 represents the current iteration times, l represents the previous iteration times, and the initial value of l is 0;
the convolution module fs is represented by:
Figure BDA0003138079050000044
where N (i) is the neighbor set of node i, j is an element in the set N (i), cjiIs the product of the square root of the degree of nodes, i.e.
Figure BDA0003138079050000045
Figure BDA0003138079050000046
Represents the characteristic value, W, of node j after l iterationslRepresenting learnable weights, blRepresents the deviation, σ is the activation function;
obtaining the hidden characteristic value h of the source node of each edge through convolution operation on the heterogeneous graphsrcAnd hidden eigenvalues h of the target nodes of each edgedst
The false comment detection method under the cold start environment, wherein the feature fusion and classification in the step (4) comprises the following steps: in the feature fusion and classification stage, the original text features, behavior features, source node sharing features and target node sharing features of each edge in the heterogeneous map are spliced, and then the spliced feature vectors are processed by using a full connection layer with a softmax activation function to obtain a final classification result.
Figure BDA0003138079050000051
Finally, using the full-link layer with the softmax activation function to process F (r), obtaining a final classification result y:
y=softmax(WF·F(r)+bF) (9)
wherein, WFIs a learnable parameter matrix, bFRepresenting the deviation, y has a dimension of 2,respectively representing the probability that the current edge is a false comment and a true comment.
Drawings
FIG. 1 is a block diagram of a false comment detection method in a cold start environment;
FIG. 2 is a schematic diagram of a graph convolution network-based shared feature learning process;
fig. 3 is a schematic diagram of a text feature extraction model.
Detailed Description
The following detailed description of the embodiments of the present invention is provided in conjunction with the accompanying drawings of fig. 1-3.
As shown in fig. 1, the method for detecting false comments in a cold start environment includes the following steps:
and (1) extracting features. The behavior characteristics of the user entity, the product entity and the comment entity are extracted for the user comment, and the text characteristics of the user comment are extracted based on a CNN (convolutional neural network), so that the user, the product and the comment are expressed by a feature vector.
And (3) constructing a heterogeneous graph. And constructing the heteromorphic graph by taking the user entity and the product entity as nodes and taking the issued comment and the received comment as edges. Because the feature vectors obtained by feature extraction are independent among the entities, the association information among the entities can be stored in a mode of constructing an abnormal graph.
And (3) learning the shared features based on graph convolution. For the comments issued by each cold-start user, the graph convolution neural network is used for learning the sharing behavior characteristics based on the users and the sharing behavior characteristics based on the products, and the behavior information missing from the cold-start users is supplemented, so that the detection effect of the false comments under cold start is improved.
And (4) feature fusion and classification. And fusing the original behavior features and text features of the comments issued by the cold-start user with the learned two types of shared features to generate a new feature vector of the cold-start comments. The new feature vectors are used to build classifiers to enable discrimination of false comments.
Specifically, the method comprises the following steps: step 1, feature extraction:
the behavior characteristics of the user entity, the product entity and the comment entity in the step 1 are respectively explained as follows:
TABLE 1 behavioral characteristics in user entities
Figure BDA0003138079050000061
TABLE 2 behavioral characteristics in product entities
Figure BDA0003138079050000071
TABLE 3 assessment of behavioral characteristics in an entity
Figure BDA0003138079050000072
And for all the user comments, extracting the behavior characteristics of all the users and the behavior characteristics of all the products, and taking the behavior characteristics and the behavior characteristics as characteristic values of the user nodes and the product nodes respectively.
For all user entities u and product entities p, the behavior feature values are:
BFu={uMNR,uPR,uNR,uERD,uavgRD,uBST} (10)
BFp={pMNR,pPR,pNR,pavgRD,pERD} (11)
wherein, BFuFor behavioral characteristics of user entities, BFpThe meaning of each characteristic value is shown in table 1 and table 2 as the behavior characteristic of the product entity.
Furthermore, for each review, 5 review entity-based behavioral features were extracted according to Table 3
BFr={Rank,RD,EXT,DEV,ISR} (12)
And combining the 6 user-based behavior characteristics of the users corresponding to the comments and the 5 product-based behavior characteristics of the corresponding products to form a complete behavior characteristic vector q (r) of the comment r.
q(r)=[o1,o2,…,oj,…,o16] (13)
Then, a text feature extraction model based on CNN (convolutional neural network) is pre-trained by using the false comment text and the common comment text, and is used for obtaining the text feature of each comment. The model structure is shown in fig. 3.
Wherein, the characteristic diagrams 1, 2 and 3 are hidden layers obtained by convolution kernels with convolution window heights of 3,4 and 5 respectively; the classification uses the softmax activation function, which is described as:
classTe=softmax(WTe·Te(r)+bTe) (14)
where Te (r) is a text feature vector obtained by convolution of the comment text r, WTeIs a learnable weight matrix, bTeIndicates deviation, classTeThe value of (d) is used to indicate whether a classification is true or false.
The convolution operation represents the comment text as a feature vector Te, and the text feature extraction model enables Te to represent whether the comment text is real or not to the maximum extent through training, so that the feature vector is extracted to be used as a text feature vector corresponding to the comment.
In the pre-trained CNN-based text feature extraction model, in the aspect of parameter setting, the number of convolution kernels is set to be 60, the text feature length is set to be 10, the maximum pooling is used, the learning rate is set to be 0.00001, and the epoch (iteration number) is set to be 100. The model adopts cross entropy loss, the weight ratio of the normal comments to the false comments is set to be 1:10, the problem of unbalanced proportion of the normal comments to the false comments is solved, and the model with the highest F1 value in the training process is stored as a final feature extraction model.
After a text feature extraction model based on the CNN is trained, Te (r) obtained by the text feature extraction model based on the CNN is used for each comment r as a text feature vector of the comment, and the length of the Te (r) is the text feature length given when parameters are set.
Step 2, constructing an isomeric diagram:
in order to extract shared characteristics from old users associated with new users to solve the problem of missing of new user behavior information, after the behavior characteristics of each user and each product are extracted, the users and the products are used as nodes to construct a heterogeneous graph.
The heteromorphic graph relationship can be represented by a triplet: (source node type, edge type, target node type), the heterogeneous graph constructed in step 2 includes two sets of relationships: (user, review, product), (product, reviewed, user). Wherein, the node of the user type uses the behavior characteristic BF corresponding to the useruRepresenting the behavioral characteristics BF of the product type corresponding to the productpAnd representing, wherein the edge is represented by the comment or the behavior characteristic of the comment. The above-mentioned relation is represented by s and is divided into two types, i.e., comment and commented.
Step 3, shared characteristic learning based on GCN (graph convolution network)
After the abnormal graph is constructed, for each edge in the graph, a two-layer graph convolution neural network is adopted to extract the shared behavior characteristics of the old user to the new user, the convolution process is shown as a formula 6, and the characteristic matrix is a matrix formed by characteristic values of each node in the abnormal graph. The mathematical definition of graph convolution in an anomaly graph is:
Figure BDA0003138079050000091
wherein f issIs the convolution module for each relation s, AGG is the aggregation function,
Figure BDA0003138079050000092
the characteristics of the source node in the relationship s are represented,
Figure BDA0003138079050000101
representing the characteristics of the target node in the relationship s. During initialization, according to the user characteristic BF that if the node type is a user, the initial characteristic value h is corresponding to the nodeu(ii) a If the node type is a product, the characteristic value h is the product characteristic BF corresponding to the nodepL +1 represents the current iteration number, l represents the previous iteration number, and the initial value of l is 0.
The aggregation function AGG used in the present invention is sum.
The convolution module fs is represented by:
Figure BDA0003138079050000102
where N (i) is the neighbor set of node i, j is an element in the set N (i), cjiIs the product of the square root of the degree of nodes, i.e.
Figure BDA0003138079050000103
Figure BDA0003138079050000104
Represents the characteristic value, W, of node j after l iterationslRepresenting learnable weights, blRepresenting the deviation, σ is the activation function, and Relu is used in the present invention.
When constructing the graph, the feature vector described by formula (1) or formula (2) is used as the initial feature value h of each node i according to different node typesi 0And (7) assigning values. The node i performs graph convolution on all neighbor nodes of the node through the process described by the formula (7), and then converges the feature vectors of all the neighbor nodes of the node i by using the formula (6). And iterating the process to enable each node to learn the hidden characteristic value h of the node.
Through convolution operation on the heterogeneous graph, the hidden characteristic value of the source node of each edge is hsrcThe hidden characteristic value of the target node of each edge is hdstThen the two hidden features are treated as shared features of the source node and the target node, and the two sets of feature vectors are used for enriching the behavior information of each edge missing. According to the relationship represented by the edge, hsrcAnd hdstRespectively representing the user sharing behavior characteristics or the product sharing behavior characteristics: when an edge represents a (user, comment, product) relationship, hsrcSharing of behavioral characteristics for users, hdstSharing behavioral characteristics for the product; when an edge represents a (product, commented on, user) relationship, hsrcFor product sharing behavioral characteristics, hdstBehavioral characteristics are shared for users.
Step 4. feature fusion and classification
In the feature fusion and classification stage, original text features, behavior features, source node sharing features and target node sharing features of each edge (namely each comment) in the abnormal graph are spliced, and then the spliced feature vectors are processed by using a full connection layer with a softmax activation function to obtain a final classification result.
Figure BDA0003138079050000111
Finally, using the full-link layer with the softmax activation function to process F (r), obtaining a final classification result y:
y=softmax(WF·F(r)+bF) (18)
wherein, WFIs a learnable parameter matrix, bFThe deviation is represented, and the dimension of y is 2, which respectively represents the probability that the current (i.e. the comment to be detected) is a false comment and a true comment.
Results and analysis of the experiments
To demonstrate the effectiveness of the proposed method of the present invention, the proposed model was compared to other 7-class baseline methods, a brief description of which is as follows:
(1) LF: traditional bigram features are used as comment text features.
(2) Supervised-CNN: and training the convolutional neural network by using the marked comments only, thereby extracting semantic information of the comments as text features of the comments and identifying false comments only according to the semantic information.
(3) LF + BF: and evaluating the text characteristics and the behavior characteristics of the comment entity to represent comments, and performing false comment detection by using the characteristics obtained by splicing, wherein the text characteristics are binary grammatical characteristics, and the behavior characteristics comprise comment text length, score, absolute deviation rate of the score, and maximum cosine similarity between the comment and other comments in the corresponding product.
(4) And BF _ EditSim + LF, namely associating the new user with the old user by using a representation learning-based method, then using the most similar behavior characteristics of the old user as the behavior characteristics of the new user, and finally splicing the behavior characteristics and the binary grammar characteristics as the characteristic representation of the cold start comment so as to detect whether the comment is real.
(5) BF _ W2Vsim + W2V: firstly, a word vector of each word in the comment is obtained through a word vector model word2vec, then the text features of the comment are obtained by taking the mean value, then the comment which is most similar to the cold-start comment is obtained by using the cosine similarity between the cold-start comment and the text features of the existing comment, finally the feature representation of cold start is formed by using the behavior features of the most similar comment and the text features of the comment, and the comment is detected according to the combined feature vector.
(6) RE: and (3) constructing the behavior characteristics of the user by using a TransE model, wherein the text characteristics adopt CNN, and the emotional tendency of the text is stored by adopting constraint.
(7) RE + RRE + PRE: the model is expanded on an RE model, and the comment representation, the comment score and the product comment score obtained by the RE model are spliced to serve as final comment representation.
In order to verify the effectiveness of the method, hotel comment data in a Yelp data set is selected for experiment. The Yelp dataset is a publicly available commercial website dataset that provides a good balance between commercial authenticity and ground truth and is therefore widely used in many predecessor writings. And taking the first comment published by the new user with the label after 1/2012 as a test set, and taking the first comment published by the user before 1/2012 as a training set for learning the GCN-based shared feature extraction model. In addition, in order to train the global text feature representation model, all labeled comment data before 1 month and 1 day of 2012 are separately extracted for separately training the CNN-based text feature extraction model.
TABLE 1 comparative experimental results for different methods in cold start environment
Figure BDA0003138079050000131
The results of the experiment are shown in table 4. The method provided by the invention is superior to a comparison method in all evaluation indexes. Particularly, compared with other methods, the recall rate of the method provided by the invention is improved by about 10%, which shows that the method provided by the invention can more accurately identify the false comments. Furthermore, by analyzing table 1, the following conclusions are made:
1) in the cold start environment, the text features still perform poorly. The LF recognition accuracy of the method based on the binary grammatical feature is the lowest in all comparison methods, while the Supervised-CNN method based on the text feature of the CNN has the lowest value compared with the other methods F1. This indicates that relying on the comment text alone does not effectively identify false comments.
2) The detection effect under the cold start environment is improved to a certain extent by combining the behavior characteristics. As can be seen from the results of the LF + BF model, combining the behavior features and the text features can improve the detection accuracy of false comments under cold start, but from the fact that model 3 recall rate and F1 are rather reduced, it can be concluded that: relying only on the behavioral characteristics of the comment itself at cold start will result in more spurious comments being identified as normal comments.
3) The method for directly replacing the behavior characteristics of the comment to be detected with the similar comment behavior characteristics under cold start has poor effect. The model 4 and the model 5 are subjected to false comment detection in a mode of replacing features from the perspective of similarity between users and texts, and experimental results show that the accuracy of the model is not obviously improved from the perspective of similarity between users or the perspective of similarity between texts, and partial indexes (such as F1 value of the model 4 and recall rate) are even lower than that of a method only using text features.
4) By extracting the association from the existing comments, the behavior characteristics of the cold start comment are constructed and combined with the original behavior characteristics of the cold start comment, and a better effect can be achieved. The model 8 extracts the behavior characteristics of the associated user through the abnormal picture and combines the behavior characteristics with the original behavior characteristics of the model, so that the obtained experimental effect is best, and compared with other methods, all parameters are greatly improved.
5) The shared characteristic based on graph convolution learning effectively solves the problem of behavior characteristic information loss of cold-start users, and improves the accuracy of false comment detection in a cold-start environment. Compared to other comparative methods the model presented here outperforms other comparative methods in all evaluation indices.
The method can express the association among the user, the product and the comment in a graph mode, and learn the shared behavior characteristics through graph convolution for supplementing the missing behavior characteristics of the cold start user; fusing text features and behavior features of the comments and shared behavior features of entities with which the comments are associated to detect false comments; the problem of poor detection effect of false comments caused by lack of user behavior information in a cold start environment is effectively solved.

Claims (2)

1. A false comment detection method in a cold start environment is characterized by comprising the following steps:
step (1) feature extraction;
constructing a heterogeneous graph;
step (3) shared feature learning based on graph convolution;
and (4) feature fusion and classification.
2. The method of claim 1, wherein:
the characteristic extraction in the step (1) comprises the following steps: and extracting the behavior characteristics of the user entity, the product entity and the comment entity, extracting the text characteristics of the comment based on the CNN, and expressing the user, the product and the comment by using a characteristic vector.
CN202110733235.1A 2021-06-29 2021-06-29 False comment detection method in cold start environment Active CN113392334B (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN202110733235.1A CN113392334B (en) 2021-06-29 2021-06-29 False comment detection method in cold start environment

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN202110733235.1A CN113392334B (en) 2021-06-29 2021-06-29 False comment detection method in cold start environment

Publications (2)

Publication Number Publication Date
CN113392334A true CN113392334A (en) 2021-09-14
CN113392334B CN113392334B (en) 2024-03-08

Family

ID=77624525

Family Applications (1)

Application Number Title Priority Date Filing Date
CN202110733235.1A Active CN113392334B (en) 2021-06-29 2021-06-29 False comment detection method in cold start environment

Country Status (1)

Country Link
CN (1) CN113392334B (en)

Cited By (1)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN114692007A (en) * 2022-06-01 2022-07-01 腾讯科技(深圳)有限公司 Method, device, equipment and storage medium for determining representation information

Citations (12)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US20140172989A1 (en) * 2012-12-14 2014-06-19 Yigal Dan Rubinstein Spam detection and prevention in a social networking system
WO2019183191A1 (en) * 2018-03-22 2019-09-26 Michael Bronstein Method of news evaluation in social media networks
CN110321436A (en) * 2019-07-04 2019-10-11 中国人民解放军国防科技大学 Cold-start fraud comment detection method based on social attention mechanism representation learning
CN110580341A (en) * 2019-09-19 2019-12-17 山东科技大学 False comment detection method and system based on semi-supervised learning model
CN111259140A (en) * 2020-01-13 2020-06-09 长沙理工大学 False comment detection method based on LSTM multi-entity feature fusion
CN111639252A (en) * 2020-05-18 2020-09-08 华中科技大学 False news identification method based on news-comment relevance analysis
CN111753884A (en) * 2020-06-08 2020-10-09 浙江工业大学 Depth map convolution model defense method and device based on network feature reinforcement
CN112417099A (en) * 2020-11-20 2021-02-26 南京邮电大学 Method for constructing fraud user detection model based on graph attention network
CN112732921A (en) * 2021-01-19 2021-04-30 福州大学 False user comment detection method and system
CN112765313A (en) * 2020-12-31 2021-05-07 太原理工大学 False information detection method based on original text and comment information analysis algorithm
CN112990972A (en) * 2021-03-19 2021-06-18 华南理工大学 Recommendation method based on heterogeneous graph neural network
CN113032525A (en) * 2021-03-23 2021-06-25 深圳大学 False news detection method and device, electronic equipment and storage medium

Patent Citations (12)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US20140172989A1 (en) * 2012-12-14 2014-06-19 Yigal Dan Rubinstein Spam detection and prevention in a social networking system
WO2019183191A1 (en) * 2018-03-22 2019-09-26 Michael Bronstein Method of news evaluation in social media networks
CN110321436A (en) * 2019-07-04 2019-10-11 中国人民解放军国防科技大学 Cold-start fraud comment detection method based on social attention mechanism representation learning
CN110580341A (en) * 2019-09-19 2019-12-17 山东科技大学 False comment detection method and system based on semi-supervised learning model
CN111259140A (en) * 2020-01-13 2020-06-09 长沙理工大学 False comment detection method based on LSTM multi-entity feature fusion
CN111639252A (en) * 2020-05-18 2020-09-08 华中科技大学 False news identification method based on news-comment relevance analysis
CN111753884A (en) * 2020-06-08 2020-10-09 浙江工业大学 Depth map convolution model defense method and device based on network feature reinforcement
CN112417099A (en) * 2020-11-20 2021-02-26 南京邮电大学 Method for constructing fraud user detection model based on graph attention network
CN112765313A (en) * 2020-12-31 2021-05-07 太原理工大学 False information detection method based on original text and comment information analysis algorithm
CN112732921A (en) * 2021-01-19 2021-04-30 福州大学 False user comment detection method and system
CN112990972A (en) * 2021-03-19 2021-06-18 华南理工大学 Recommendation method based on heterogeneous graph neural network
CN113032525A (en) * 2021-03-23 2021-06-25 深圳大学 False news detection method and device, electronic equipment and storage medium

Non-Patent Citations (4)

* Cited by examiner, † Cited by third party
Title
AO LI 等: "Spam Review Detection with Graph Convolutional Networks", 《CIKM ’19》, pages 2703 - 2711 *
XIAOQING SUN 等: "Deepdom: Malicious domain detection with scalable and heterogeneous graph convolutional networks", 《COMPUTERS & SECURITY》, vol. 99, 31 December 2020 (2020-12-31), pages 1 - 16 *
焦易于: "基于融合特征的虚假评论检测算法", 《中国优秀硕士学位论文全文数据库 信息科技辑》, no. 06, 15 June 2017 (2017-06-15), pages 138 - 1559 *
郭国庆: "基于特征融合的虚假评论检测研究", 《中国优秀硕士学位论文全文数据库 信息科技辑》, no. 01, 15 January 2023 (2023-01-15), pages 138 - 3407 *

Cited By (2)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN114692007A (en) * 2022-06-01 2022-07-01 腾讯科技(深圳)有限公司 Method, device, equipment and storage medium for determining representation information
WO2023231542A1 (en) * 2022-06-01 2023-12-07 腾讯科技(深圳)有限公司 Representation information determination method and apparatus, and device and storage medium

Also Published As

Publication number Publication date
CN113392334B (en) 2024-03-08

Similar Documents

Publication Publication Date Title
US11687728B2 (en) Text sentiment analysis method based on multi-level graph pooling
CN107608956B (en) Reader emotion distribution prediction algorithm based on CNN-GRNN
EP3180742B1 (en) Generating and using a knowledge-enhanced model
CN109376222B (en) Question-answer matching degree calculation method, question-answer automatic matching method and device
CN112329474B (en) Attention-fused aspect-level user comment text emotion analysis method and system
CN109766557B (en) Emotion analysis method and device, storage medium and terminal equipment
CN113095415B (en) Cross-modal hashing method and system based on multi-modal attention mechanism
CN110929034A (en) Commodity comment fine-grained emotion classification method based on improved LSTM
CN107818084B (en) Emotion analysis method fused with comment matching diagram
CN104346440A (en) Neural-network-based cross-media Hash indexing method
CN112749274B (en) Chinese text classification method based on attention mechanism and interference word deletion
CN111931505A (en) Cross-language entity alignment method based on subgraph embedding
CN111339260A (en) BERT and QA thought-based fine-grained emotion analysis method
CN114942991B (en) Emotion classification model construction method based on metaphor recognition
CN112215629B (en) Multi-target advertisement generating system and method based on construction countermeasure sample
CN110969005B (en) Method and device for determining similarity between entity corpora
CN113076425B (en) Event related viewpoint sentence classification method for microblog comments
CN112836007B (en) Relational element learning method based on contextualized attention network
CN113392334A (en) False comment detection method in cold start environment
CN113435192A (en) Chinese text emotion analysis method based on changing neural network channel cardinality
KR102448044B1 (en) Aspect based sentiment analysis method using aspect map and electronic device
CN111859925A (en) Emotion analysis system and method based on probability emotion dictionary
CN111666410B (en) Emotion classification method and system for commodity user comment text
CN115659990A (en) Tobacco emotion analysis method, device and medium
CN112364258B (en) Recommendation method and system based on map, storage medium and electronic equipment

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination
GR01 Patent grant
GR01 Patent grant