CN114742564A - A false reviewer group detection method incorporating complex relationships - Google Patents
A false reviewer group detection method incorporating complex relationships Download PDFInfo
- Publication number
- CN114742564A CN114742564A CN202210449853.8A CN202210449853A CN114742564A CN 114742564 A CN114742564 A CN 114742564A CN 202210449853 A CN202210449853 A CN 202210449853A CN 114742564 A CN114742564 A CN 114742564A
- Authority
- CN
- China
- Prior art keywords
- node
- model
- graph
- false
- training
- Prior art date
- Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
- Granted
Links
- 238000001514 detection method Methods 0.000 title claims abstract description 77
- 238000000034 method Methods 0.000 claims abstract description 76
- 238000012549 training Methods 0.000 claims abstract description 66
- 230000008569 process Effects 0.000 claims abstract description 36
- 230000006870 function Effects 0.000 claims description 57
- 239000011159 matrix material Substances 0.000 claims description 43
- 238000012552 review Methods 0.000 claims description 31
- 230000007246 mechanism Effects 0.000 claims description 22
- 238000013528 artificial neural network Methods 0.000 claims description 14
- 238000003062 neural network model Methods 0.000 claims description 12
- 238000004364 calculation method Methods 0.000 claims description 9
- 239000010410 layer Substances 0.000 claims description 9
- 238000013461 design Methods 0.000 claims description 7
- 230000002776 aggregation Effects 0.000 claims description 6
- 238000004220 aggregation Methods 0.000 claims description 6
- 230000004913 activation Effects 0.000 claims description 5
- 230000007423 decrease Effects 0.000 claims description 3
- 230000004927 fusion Effects 0.000 claims description 3
- 239000002356 single layer Substances 0.000 claims description 3
- 230000009466 transformation Effects 0.000 claims description 3
- 230000000694 effects Effects 0.000 abstract description 8
- 238000013473 artificial intelligence Methods 0.000 abstract description 3
- 238000002474 experimental method Methods 0.000 description 7
- 230000008859 change Effects 0.000 description 5
- 230000002159 abnormal effect Effects 0.000 description 3
- 238000004458 analytical method Methods 0.000 description 3
- 238000010586 diagram Methods 0.000 description 3
- 238000012800 visualization Methods 0.000 description 3
- 238000005516 engineering process Methods 0.000 description 2
- 239000003550 marker Substances 0.000 description 2
- 238000012360 testing method Methods 0.000 description 2
- 206010042635 Suspiciousness Diseases 0.000 description 1
- 230000003542 behavioural effect Effects 0.000 description 1
- 230000009286 beneficial effect Effects 0.000 description 1
- 238000012512 characterization method Methods 0.000 description 1
- 238000012733 comparative method Methods 0.000 description 1
- 238000013136 deep learning model Methods 0.000 description 1
- 239000006185 dispersion Substances 0.000 description 1
- 238000011156 evaluation Methods 0.000 description 1
- 239000000284 extract Substances 0.000 description 1
- 230000006872 improvement Effects 0.000 description 1
- 238000012804 iterative process Methods 0.000 description 1
- 238000012986 modification Methods 0.000 description 1
- 230000004048 modification Effects 0.000 description 1
- 238000005457 optimization Methods 0.000 description 1
- 238000005192 partition Methods 0.000 description 1
- 238000012545 processing Methods 0.000 description 1
- 230000009467 reduction Effects 0.000 description 1
Images
Classifications
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06Q—INFORMATION AND COMMUNICATION TECHNOLOGY [ICT] SPECIALLY ADAPTED FOR ADMINISTRATIVE, COMMERCIAL, FINANCIAL, MANAGERIAL OR SUPERVISORY PURPOSES; SYSTEMS OR METHODS SPECIALLY ADAPTED FOR ADMINISTRATIVE, COMMERCIAL, FINANCIAL, MANAGERIAL OR SUPERVISORY PURPOSES, NOT OTHERWISE PROVIDED FOR
- G06Q30/00—Commerce
- G06Q30/018—Certifying business or products
- G06Q30/0185—Product, service or business identity fraud
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F18/00—Pattern recognition
- G06F18/20—Analysing
- G06F18/22—Matching criteria, e.g. proximity measures
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F18/00—Pattern recognition
- G06F18/20—Analysing
- G06F18/23—Clustering techniques
- G06F18/232—Non-hierarchical techniques
- G06F18/2321—Non-hierarchical techniques using statistics or function optimisation, e.g. modelling of probability density functions
- G06F18/23213—Non-hierarchical techniques using statistics or function optimisation, e.g. modelling of probability density functions with fixed number of clusters, e.g. K-means clustering
-
- Y—GENERAL TAGGING OF NEW TECHNOLOGICAL DEVELOPMENTS; GENERAL TAGGING OF CROSS-SECTIONAL TECHNOLOGIES SPANNING OVER SEVERAL SECTIONS OF THE IPC; TECHNICAL SUBJECTS COVERED BY FORMER USPC CROSS-REFERENCE ART COLLECTIONS [XRACs] AND DIGESTS
- Y04—INFORMATION OR COMMUNICATION TECHNOLOGIES HAVING AN IMPACT ON OTHER TECHNOLOGY AREAS
- Y04S—SYSTEMS INTEGRATING TECHNOLOGIES RELATED TO POWER NETWORK OPERATION, COMMUNICATION OR INFORMATION TECHNOLOGIES FOR IMPROVING THE ELECTRICAL POWER GENERATION, TRANSMISSION, DISTRIBUTION, MANAGEMENT OR USAGE, i.e. SMART GRIDS
- Y04S10/00—Systems supporting electrical power generation, transmission or distribution
- Y04S10/50—Systems or methods supporting the power network operation or management, involving a certain degree of interaction with the load-side end user applications
Landscapes
- Engineering & Computer Science (AREA)
- Data Mining & Analysis (AREA)
- Theoretical Computer Science (AREA)
- Physics & Mathematics (AREA)
- Business, Economics & Management (AREA)
- General Physics & Mathematics (AREA)
- Bioinformatics & Cheminformatics (AREA)
- Bioinformatics & Computational Biology (AREA)
- Computer Vision & Pattern Recognition (AREA)
- Evolutionary Biology (AREA)
- Evolutionary Computation (AREA)
- General Engineering & Computer Science (AREA)
- Artificial Intelligence (AREA)
- Life Sciences & Earth Sciences (AREA)
- Entrepreneurship & Innovation (AREA)
- Probability & Statistics with Applications (AREA)
- Accounting & Taxation (AREA)
- Development Economics (AREA)
- Economics (AREA)
- Finance (AREA)
- Marketing (AREA)
- Strategic Management (AREA)
- General Business, Economics & Management (AREA)
- Information Retrieval, Db Structures And Fs Structures Therefor (AREA)
Abstract
Description
技术领域technical field
本发明涉及人工智能领域,特别涉及一种融合复杂关系的虚假评论者群体检测方法。The invention relates to the field of artificial intelligence, in particular to a false commenter group detection method integrating complex relationships.
背景技术Background technique
在线评论系统的迅速流行使得评论成为人们购买商品的重要依据,越来越多的人在购买商品前会在平台上查看评论,也会在购买商品后做出对商品的评价。这些评论可以给顾客提供有用的信息和第一手商品体验,因此在线评论的质量尤为重要,虚假的、不符合商品事实的评论会影响商品的声誉,同时也会混淆买家视线。The rapid popularity of online review systems has made reviews an important basis for people to buy products. More and more people will check reviews on the platform before purchasing products, and will also make comments on products after purchasing them. These reviews can provide customers with useful information and first-hand product experience, so the quality of online reviews is particularly important. Fake reviews that do not conform to the facts of the product will affect the reputation of the product and confuse buyers.
现有虚假评论检测技术大多数通过大数据与人工智能的方法来实现,传统的检测技术利用手工生成的特征来对评论者进行分类,基于行为特征、利用评论中的语言特征以及构建图来捕捉用户之间的关系特征。以往研究人员主要将注意力集中在检测单独的虚假评论者,然而虚假评论群体往往会对在线评论系统造成更大危害,发现群体虚假评论者存在难度:群体中的虚假评论可能是正常的单独评论,以往的单独虚假评论检测技术很难发挥作用。另外,虚假评论者之间难以建立关系,而这种复杂关系能够使得模型抓取群体内评论者之间的联系,从而辅助虚假评论者群体检测。Most of the existing fake review detection technologies are realized by big data and artificial intelligence methods. relationship between users. In the past, researchers mainly focused on detecting individual fake reviewers. However, groups of fake reviews tend to cause more harm to online review systems, and it is difficult to find fake reviewers in groups: fake reviews in groups may be normal individual reviews. , the previous detection technology of fake reviews alone is difficult to play a role. In addition, it is difficult to establish relationships between fake reviewers, and this complex relationship enables the model to capture the connections between reviewers within the group, thereby assisting the detection of fake reviewer groups.
当前虚假评论群体检测方法可归为如下几类:The current fake comment group detection methods can be classified into the following categories:
基于聚类算法的检测方法。基于聚类算法的检测算法通常使用图神经网络等算法学习节点嵌入表示,然后通过聚类算法对节点进行聚类,最终通过检测方法来检测虚假评论群体。常见的聚类算法如基于分区的聚类算法KMeans、基于密度的聚类算法DBSCAN。Detection method based on clustering algorithm. Detection algorithms based on clustering algorithms usually use algorithms such as graph neural networks to learn node embedding representations, then cluster nodes through clustering algorithms, and finally detect fake comment groups through detection methods. Common clustering algorithms such as partition-based clustering algorithm KMeans, density-based clustering algorithm DBSCAN.
(1)KMeans聚类算法主要是将样本空间中的所有点划分为K组,相似度通常使用欧几里得距离来进行衡量,其算法的主要流程如下:随机放置K个质心,每个簇中存在一个质心。计算每个点到质心的距离,每个数据点都分配到其最近的质心,形成一个集群。在迭代过程中,重新计算质心K的位置。(1) The KMeans clustering algorithm mainly divides all points in the sample space into K groups, and the similarity is usually measured by Euclidean distance. The main process of the algorithm is as follows: randomly place K centroids, and each cluster There is a centroid in . The distance from each point to the centroid is calculated, and each data point is assigned to its nearest centroid, forming a cluster. In the iterative process, the position of the centroid K is recalculated.
(2)DBSCAN聚类算法首先确定每个点的类型,数据集中的每个数据点都有可能是核心点或者边界点。如果一个数据点的邻域内至少有M个点在指定的半径R内,则该数据点是核心点,如果一个数据点邻域内少于M个数据点,或者其可以从某个核心点到达,即其在距离核心点的R距离内,则说明其是边界点。作为邻居的核心点将被连接并放在同一个集群中,边界点将被分配给每一个集群。(2) The DBSCAN clustering algorithm first determines the type of each point, and each data point in the data set may be a core point or a boundary point. A data point is a core point if there are at least M points in its neighborhood within the specified radius R, if there are less than M data points in the neighborhood of a data point, or it can be reached from a core point, That is, it is within the R distance from the core point, indicating that it is a boundary point. Core points that are neighbors will be connected and placed in the same cluster, and border points will be assigned to each cluster.
基于图的检测方法。从子图入手,使用节点或者子图的属性来对群体的可疑性进行判断,从而实现整个检测过程。一些方法从图拓扑结构,时间和评分的差别中聚合关系,使用联合概率来检测虚假评论者群体。此种方法忽略了节点自身的结构特征,没有考虑节点之间的复杂关系。还有一些方法提出群体的几个主要特征,如同步性、适度性和分散性,并通过计算某些指标来检测群体异常。此种方法在实际应用中缺少普适性,对于不同的网络或是数据集,需要提出特定的指标才能较好地完成虚假评论群体检测任务,若将其推广,检测精度会大幅下降。此外,此种方法仅仅考虑了群体内部的特征,仍然缺少对于评论者之间复杂关系的考虑。Graph-based detection methods. Starting from the subgraph, use the attributes of the node or subgraph to judge the suspiciousness of the group, so as to realize the whole detection process. Some methods aggregate relationships from differences in graph topology, timing, and ratings, using joint probabilities to detect groups of fake reviewers. This method ignores the structural characteristics of the nodes themselves, and does not consider the complex relationships between nodes. There are also methods that propose several main characteristics of a population, such as synchronicity, moderation, and dispersion, and detect population anomalies by computing certain metrics. This method lacks universality in practical applications. For different networks or data sets, specific indicators need to be proposed to better complete the task of detecting fake comment groups. If it is promoted, the detection accuracy will drop significantly. In addition, this method only considers the characteristics within the group, and still lacks the consideration of the complex relationship between reviewers.
发明内容SUMMARY OF THE INVENTION
现有的虚假评论群体检测方法中,嵌入过程和后续的聚类与检测过程是分开的,训练过程缺乏目标指引,如果表示学习的结果并不适合检测,所得到的虚假评论群体检测结果将会很差。此外,评论网络中的复杂关系被忽略,群体内评论者间有价值的关系信息无法得到利用。In the existing false comment group detection methods, the embedding process and the subsequent clustering and detection process are separated, and the training process lacks target guidance. If the result of representation learning is not suitable for detection, the obtained false comment group detection results will be. very poor. In addition, the complex relationships in the review network are ignored, and the valuable relationship information among reviewers within the group cannot be utilized.
针对现有技术存在的问题,本发明提出一种融合复杂关系的虚假评论者群体检测方法,以目标为导向,基于节点的复杂关系特征,利用该特征学习到节点的复杂关系表示,并使用自动编码器来重构图的拓扑信息,该方法用于在线出售平台上的虚假评论群体检测;为了将嵌入过程和聚类与检测过程整合,本方法采用自监督的训练模型,利用聚类以及检测结果指导模型的优化。Aiming at the problems existing in the prior art, the present invention proposes a method for detecting groups of false commenters that integrates complex relationships, which is target-oriented, based on the complex relationship features of nodes, uses the features to learn the complex relationship representation of nodes, and uses automatic encoder to reconstruct the topological information of the graph, this method is used for the detection of fake comment groups on online sales platforms; in order to integrate the embedding process and clustering with the detection process, this method adopts a self-supervised training model, using clustering and detection. The results guide the optimization of the model.
为了达到上述目的,本发明采用的技术方案为:一种融合复杂关系的虚假评论者群体检测方法,使用基于注意力机制的图神经网络对评论网络中的评论节点进行表征更新;设计图重构损失和自监督分布损失进行模型训练,获得最优模型后应用于虚假评论者群体检测识别出评论网络中的虚假评论者群体;包括具体步骤如下:In order to achieve the above purpose, the technical solutions adopted in the present invention are as follows: a method for detecting groups of false commenters integrating complex relationships, using a graph neural network based on an attention mechanism to characterize and update the comment nodes in the comment network; design graph reconstruction Loss and self-supervised distribution loss are used for model training, and the optimal model is obtained and applied to false reviewer group detection to identify false reviewer groups in the review network; the specific steps are as follows:
第一步,更新节点表征,获得重构图;模型提取评论网络的邻接矩阵和属性矩阵,并根据邻接矩阵得到复杂关系矩阵。得到复杂关系矩阵后,图注意力编码器将复杂关系与消息传递过程融合,有效地对网络的高阶结构信息与节点属性信息编码,继而更新节点表征。基于注意力机制的图神经网络作为编码器;以节点的初始特征作为节点初始嵌入,在基于注意力机制的图神经网络上融入节点复杂关系,使得节点表征同时表达高阶结构特征与属性特征;The first step is to update the node representation to obtain the reconstruction graph; the model extracts the adjacency matrix and attribute matrix of the review network, and obtains the complex relationship matrix according to the adjacency matrix. After obtaining the complex relation matrix, the graph attention encoder integrates the complex relation with the message passing process, effectively encodes the high-order structural information and node attribute information of the network, and then updates the node representation. The graph neural network based on the attention mechanism is used as the encoder; the initial feature of the node is used as the initial embedding of the node, and the complex relationship of the node is integrated into the graph neural network based on the attention mechanism, so that the node representation can express both high-order structural features and attribute features;
1.1)计算节点相似性;为了简化计算、减少模型参数,将节点限制于中心节点的一阶邻居节点内,计算公式如下:1.1) Calculate the similarity of nodes; in order to simplify the calculation and reduce the model parameters, the nodes are limited to the first-order neighbor nodes of the central node, and the calculation formula is as follows:
cij=a(Whi,Whj)#(1)c ij =a(Wh i , Wh j )#(1)
式中,cij代表了节点j对于节点i的重要性,W代表权重矩阵;hi与hj分别代表节点i与节点j的特征向量;a代表计算节点相似性的函数;In the formula, c ij represents the importance of node j to node i, W represents the weight matrix; hi and h j represent the eigenvectors of node i and node j respectively; a represents the function of calculating the similarity of nodes;
1.2)计算复杂关系矩阵;评论网络有复杂的结构关系,其节点之间的复杂关系包含着有价值的信息。通过考虑节点的高阶邻居节点,获得节点的复杂关系矩阵:1.2) Calculate the complex relationship matrix; the review network has complex structural relationships, and the complex relationships between its nodes contain valuable information. The complex relationship matrix of a node is obtained by considering the node's higher-order neighbor nodes:
M=(B+B2+…+Bt)/t# (2)M=(B+B 2 +...+B t )/t# (2)
式中,B为转换矩阵,当节点i和节点j之间存在边,Bij=1/di,其中di为节点的度;当节点i和节点j之间不存在边,Bij=0;矩阵M代表复杂关系矩阵,Mij为节点i和节点j在t阶下的复杂关系;In the formula, B is the transformation matrix. When there is an edge between node i and node j, B ij = 1/d i , where d i is the degree of the node; when there is no edge between node i and node j, B ij = 0; the matrix M represents a complex relationship matrix, and M ij is the complex relationship between node i and node j at order t;
1.3)融合复杂关系;以单层前馈神经网络为计算方式,将复杂关系矩阵M与基于注意力机制的图神经网络融合,具体为将复杂关系矩阵与节点相似性相乘,表示在计算节点间相似度时,不仅要考虑节点表征之间的相似度,也要考虑节点间复杂关系对相似度的影响;选择LeakyReLU作为激活函数,来增加模型非线性因素,使得模型的特征表达能力增强。融合复杂关系后,节点j对节点i的重要性表达式被重写为:1.3) Fusion of complex relationships; with a single-layer feedforward neural network as the calculation method, the complex relationship matrix M is fused with the graph neural network based on the attention mechanism. When calculating the similarity between nodes, not only the similarity between node representations, but also the impact of complex relationships between nodes on the similarity must be considered; LeakyReLU is selected as the activation function to increase the nonlinear factors of the model and enhance the feature expression ability of the model. After fusing complex relations, the importance expression of node j to node i is rewritten as:
1.4)更新节点表征;softmax函数对邻居节点的重要性做归一化处理,使一阶邻居节点对中心节点的重要性分布在[0,1]之间,聚合邻居节点的特征以更新节点表征;1.4) Update node representation; the softmax function normalizes the importance of neighbor nodes, so that the importance of first-order neighbor nodes to the central node is distributed between [0, 1], and aggregates the features of neighbor nodes to update node representation ;
式(4)中,αij代表归一化后的注意力系数;Ni代表节点i的一阶邻居集合;In formula (4), α ij represents the normalized attention coefficient; N i represents the first-order neighbor set of node i;
式(5)中,为节点i的相邻节点j在第l层上的表示,代表节点i在第l+1上的表示;节点最终表征经多层聚合获得;In formula (5), is the representation of the adjacent node j of node i on the lth layer, represents the representation of node i on the l+1th; the final representation of the node is obtained by multi-layer aggregation;
第二步,模型训练;模型首先使用自编码器重构图的拓扑信息,通过计算原图与重构图的邻接矩阵之间的差异来计算损失,此损失是第一部分损失。第二部分损失由自监督的训练方式得到,模型利用DBSCAN聚类算法确定评论网络中的核心点,并计算所有节点与核心点之间的距离,使用KL散度作为这部分的损失。最终的损失函数由上述两种损失函数共同组成,用于联合训练模型。在计算损失后,使用梯度下降算法更新模型参数,训练完成。The second step is model training; the model first uses the autoencoder to reconstruct the topological information of the graph, and calculates the loss by calculating the difference between the adjacency matrix of the original graph and the reconstructed graph. This loss is the first part of the loss. The second part of the loss is obtained by the self-supervised training method. The model uses the DBSCAN clustering algorithm to determine the core points in the review network, and calculates the distance between all nodes and the core points, and uses the KL divergence as the loss of this part. The final loss function is composed of the above two loss functions together and is used to jointly train the model. After the loss is calculated, the model parameters are updated using the gradient descent algorithm, and the training is complete.
设计图重构损失函数与自监督分布损失函数,更新基于注意力机制的图神经网络模型参数完成训练,具体步骤为:Design the graph reconstruction loss function and the self-supervised distribution loss function, update the parameters of the graph neural network model based on the attention mechanism to complete the training. The specific steps are:
2.1)计算图重构损失函数;根据编码器重构图的拓扑信息,计算邻接矩阵间差异获取重构图与原图的重构损失;公式为:2.1) Calculate the reconstruction loss function of the graph; according to the topology information of the reconstructed graph of the encoder, calculate the difference between the adjacency matrices to obtain the reconstruction loss of the reconstructed graph and the original graph; the formula is:
式中,为邻接矩阵;H为更新后的节点表征矩阵;σ为激活函数;In the formula, is the adjacency matrix; H is the updated node representation matrix; σ is the activation function;
训练过程中,采用交叉熵作为损失函数:During training, cross entropy is used as the loss function:
式中,y代表邻接矩阵中某个元素的值,代表重构的邻接矩阵中的相应元素。此部分训练需要最小化重构损失,重构损失定义如下:where y represents the value of an element in the adjacency matrix, represents the corresponding element in the reconstructed adjacency matrix. This part of the training needs to minimize the reconstruction loss, which is defined as follows:
2.2)计算自监督分布损失函数;虚假评论检测方法的挑战之一是没有标签指导模型的训练;本模型采用自监督的训练方式,采用伪标记优化节点嵌入表示;采用聚类算法对节点进行聚类,本模型采用K-Means算法进行聚类:2.2) Calculate the self-supervised distribution loss function; one of the challenges of false comment detection methods is that there is no label to guide the training of the model; this model adopts a self-supervised training method, and uses pseudo-labels to optimize node embedding representation; clustering algorithms are used to cluster nodes. This model uses the K-Means algorithm for clustering:
式中,μi是Si中所有节点的均值,k为要聚类的集合数。In the formula, μ i is the mean of all nodes in Si, and k is the number of sets to be clustered.
在得到了所有的虚假评论群体后,采用DBSCAN聚类算法确定评论网络中的核心点,计算各节点与核心点之间的距离分布;After obtaining all the fake comment groups, the DBSCAN clustering algorithm is used to determine the core points in the comment network, and the distance distribution between each node and the core points is calculated;
训练过程中,需要不断学习数据的分布以区分正常节点与异常节点,piu表示模型计算出的伪标记,qiu表示所有节点的特征与DBSCAN检测出的核心点之间的距离分布。qiu的定义如下:During the training process, it is necessary to continuously learn the distribution of data to distinguish normal nodes from abnormal nodes. p iu represents the pseudo-marker calculated by the model, and qiu represents the distance distribution between the features of all nodes and the core points detected by DBSCAN. The definition of q iu is as follows:
式中,uu表示DBSCAN检测出的核心点的表征;Zi表示当前处理节点的表征;uk表示第k个类别的核心点的表征。该公式计算节点的表征与核心点的表征的距离,如果节点与核心点的距离足够近,可以认为此节点属于该群体,则被认为是正常节点。假设某节点与核心点的距离较远,可以认为此节点是离群点,即被认为是相应的虚假评论群体。节点标签可以通过如下公式获得:In the formula, u u represents the representation of the core point detected by DBSCAN; Z i represents the representation of the current processing node; uk represents the representation of the core point of the kth category. The formula calculates the distance between the representation of the node and the representation of the core point. If the distance between the node and the core point is close enough, the node can be considered to belong to the group, and it is considered to be a normal node. Assuming that a certain node is far away from the core point, it can be considered that this node is an outlier, that is, it is considered to be a corresponding false comment group. Node labels can be obtained by the following formula:
Si=argmax·qiu#(11)S i =argmax·q iu #(11)
使用KL散度作为损失函数,以衡量节点与核心点之间的距离分布与其伪标记的差异;Use KL divergence as a loss function to measure the difference between the distance distribution between nodes and core points and their pseudo-labels;
KL散度主要衡量概率分布Q与参考概率分布P的不同之处。与式(11)得到的标签不同,目标分布piu被认为是真正的标签,是通过训练过程中的Q计算出来的,piu依赖P分布并按照阶段来进行更新,被当作是该阶段内自监督的标签。目标分布的主要作用是监督模型的学习,指导分布Q的更新。P的计算公式如下:KL divergence mainly measures the difference between the probability distribution Q and the reference probability distribution P. Different from the label obtained by equation (11), the target distribution p iu is considered as the real label, which is calculated by Q in the training process, and p iu depends on the P distribution and is updated according to the stage, which is regarded as the stage. Internal self-supervised labels. The main role of the target distribution is to supervise the learning of the model and guide the update of the distribution Q. The formula for calculating P is as follows:
式中,qik表示所有节点的特征与第k个类别的核心点之间的距离分布。自监督优化嵌入的损失函数如下:In the formula, q ik represents the distance distribution between the features of all nodes and the core point of the k-th category. The loss function for self-supervised optimized embedding is as follows:
2.3)计算联合损失函数;联合损失函数表达式为:2.3) Calculate the joint loss function; the expression of the joint loss function is:
L=·Lr+βLc#(14)L=·L r +βL c #(14)
式中,Lr为图重构损失函数,Lc为自监督分布损失函数,β两损失函数之间的权重;In the formula, L r is the graph reconstruction loss function, L c is the self-supervised distribution loss function, and β is the weight between the two loss functions;
2.4)模型训练,设定基于注意力机制的图神经网络模型的初始参数,基于联合损失函数,迭代训练过程,获得基于注意力机制的图神经网络模型的最佳参数;2.4) Model training, setting the initial parameters of the graph neural network model based on the attention mechanism, based on the joint loss function, iterative training process, to obtain the best parameters of the graph neural network model based on the attention mechanism;
第三步,虚假评论群体检测;采用第二步获取的基于注意力机制的图神经网络模型,对真实评论网络进行检测,并保存检测结果。The third step is the detection of false comment groups; the graph neural network model based on the attention mechanism obtained in the second step is used to detect the real comment network and save the detection results.
所述的图重构损失函数采用交叉熵损失函数;所述对节点进行聚类的聚类算法采用KMeans聚类算法。The graph reconstruction loss function adopts the cross entropy loss function; the clustering algorithm for clustering the nodes adopts the KMeans clustering algorithm.
所述2.4)中模型训练的具体做法如下:The specific method of model training in 2.4) is as follows:
设定基于注意力机制的图神经网络模型的初始参数,包括基于注意力机制的图神经网络模型的聚合层数、节点嵌入维度、KMeans聚类算法的聚类个数和训练迭代次数等;Set the initial parameters of the graph neural network model based on the attention mechanism, including the number of aggregation layers of the graph neural network model based on the attention mechanism, the node embedding dimension, the number of clusters and the number of training iterations of the KMeans clustering algorithm;
模型训练过程中,不断调整参数,根据训练过程中联合损失函数的下降情况或模型的最终检测结果确定最优参数;During the model training process, the parameters are continuously adjusted, and the optimal parameters are determined according to the decline of the joint loss function during the training process or the final detection result of the model;
具体为:将评论网络以及网络的邻接矩阵输入至模型,运行并训练模型,记录本次训练后模型的检测表现,在同一组超参数下多次重复训练,取检测精度的平均值作为最终的结果检测精度;完成一组参数下的模型训练后,遵循控制变量法对模型中的参数进行调整,按照使得平均精度增大的方向调整模型某一参数,保持其它参数不变;重复调整参数,保留使模型平均判别精度达到最高的一组参数设置,模型训练完毕。Specifically: input the review network and the adjacency matrix of the network into the model, run and train the model, record the detection performance of the model after this training, repeat the training multiple times under the same set of hyperparameters, and take the average value of the detection accuracy as the final Result detection accuracy; after completing the model training under a set of parameters, adjust the parameters in the model according to the control variable method, adjust a certain parameter of the model in the direction of increasing the average accuracy, and keep other parameters unchanged; repeat the adjustment of parameters, Retain a set of parameter settings that maximizes the average discriminant accuracy of the model, and the model is trained.
本发明的有益效果:本方法既能识别出虚假评论者,也可以很好地将虚假评论者群体与正常评论者区分。本方法基于节点的复杂关系特征,充分利用评论者间有价值的关系信息,将嵌入过程与聚类检测过程整合,获得一种以目标为导向的虚假评论者群体检测模型,同时可以克服现有群体检测方法存在的普适性差、检测效果低下等问题。Beneficial effects of the present invention: the method can not only identify false reviewers, but also distinguish the group of false reviewers from normal reviewers. Based on the complex relationship characteristics of nodes, this method makes full use of the valuable relationship information between reviewers, integrates the embedding process with the clustering detection process, and obtains a target-oriented false reviewer group detection model. The group detection method has problems such as poor universality and low detection effect.
附图说明Description of drawings
图1是本发明基本框架图;Fig. 1 is the basic frame diagram of the present invention;
图2是本发明工作流程图;Fig. 2 is the working flow chart of the present invention;
图3是本发明一实施例训练过程中召回率变化图;Fig. 3 is the recall rate change chart in the training process of an embodiment of the present invention;
图4是本发明一实施例训练过程中损失函数变化图;FIG. 4 is a change diagram of a loss function in a training process according to an embodiment of the present invention;
图5是本发明一实施例群体检测结果可视化图。FIG. 5 is a visualization diagram of a population detection result according to an embodiment of the present invention.
具体实施方式Detailed ways
以下结合附图和技术方案,进一步说明本发明的具体实施方式。The specific embodiments of the present invention will be further described below with reference to the accompanying drawings and technical solutions.
一种一种融合复杂关系的虚假评论者群体检测方法,包含三个阶段:节点表征更新;模型训练;虚假评论群体检测。A false reviewer group detection method integrating complex relationships, including three stages: node representation update; model training; false review group detection.
第一步,节点表征更新。本阶段使用基于注意力机制的图神经网络作为编码器,以节点初始特征表示节点初始嵌入,在基于注意力机制的图神经网络上融入节点复杂关系,使得节点表征具有表达高阶结构特征与属性特征的能力。In the first step, the node representation is updated. In this stage, the graph neural network based on attention mechanism is used as the encoder, the initial node features are used to represent the initial embedding of nodes, and the complex relationship of nodes is integrated into the graph neural network based on attention mechanism, so that the node representation has the ability to express high-order structural features and attributes. characteristic ability.
1.1)计算节点相似性。为了简化计算、减少模型参数,将节点限制在中心节点的一跳邻居内,计算公式如下:1.1) Calculate node similarity. In order to simplify the calculation and reduce the model parameters, the nodes are limited to the one-hop neighbors of the central node. The calculation formula is as follows:
cij=a(Whi,Whj)#(1)c ij =a(Wh i , Wh j )#(1)
式中,cij代表了节点j对于节点i的重要性,W代表权重矩阵;;hi与hj分别代表节点i与节点j的特征向量;a代表计算节点相似性的函数;In the formula, c ij represents the importance of node j to node i, W represents the weight matrix; hi and h j represent the eigenvectors of node i and node j respectively; a represents the function of calculating the similarity of nodes;
1.2)计算复杂关系矩阵。评论网络有复杂的结构关系,其节点之间的复杂关系包含着有价值的信息。通过考虑节点的高阶邻居,可以获得节点的复杂关系矩阵:1.2) Calculate the complex relationship matrix. The review network has complex structural relationships, and the complex relationships between its nodes contain valuable information. The complex relationship matrix of a node can be obtained by considering its higher-order neighbors:
M=(B+B2+…+Bt)/t#(2)M=(B+B 2 +...+B t )/t#(2)
式中,B为转换矩阵,如果节点i和节点j之间存在边,Bij=1/di,其中di为节点的度,当节点i和节点j之间不存在边,Bij=0。矩阵M代表复杂关系矩阵,Mij为节点i和节点j在t阶下的复杂关系。In the formula, B is the transformation matrix. If there is an edge between node i and node j, B ij = 1/d i , where d i is the degree of the node. When there is no edge between node i and node j, B ij = 0. The matrix M represents a complex relationship matrix, and M ij is the complex relationship between node i and node j at order t.
1.3)融合复杂关系。选择单层前馈神经网络为计算方式,使复杂关系矩阵M与图注意力网络融合,具体做法为将复杂关系矩阵与节点间相似度相乘,表示在计算节点间相似度时,不仅要考虑节点表征之间的相似度,也要考虑节点间复杂关系对相似度的影响。最后,选择LeakyReLU作为激活函数来增加模型非线性因素,使得模型的特征表达能力增强。融合复杂关系后,节点j对节点i的重要性表达式被重写为:1.3) Integrate complex relationships. The single-layer feedforward neural network is selected as the calculation method, so that the complex relationship matrix M is integrated with the graph attention network. The similarity between node representations should also consider the impact of complex relationships between nodes on the similarity. Finally, LeakyReLU is selected as the activation function to increase the nonlinear factor of the model, so that the feature expression ability of the model is enhanced. After fusing complex relations, the importance expression of node j to node i is rewritten as:
1.4)更新节点表征。为使得邻居节点对中心节点的重要性分布在[0,1]之间,利用softmax函数对邻居节点的重要性做归一化处理,并聚合邻居节点的特征以更新节点表征。1.4) Update node representation. In order to make the importance of neighbor nodes to the central node distributed between [0, 1], the softmax function is used to normalize the importance of neighbor nodes, and the features of neighbor nodes are aggregated to update the node representation.
式(4)中,αij代表归一化后的注意力系数,Ni代表节点i的一阶邻居集合;式(5)中,为节点i的相邻节点j在第l层上的表示,代表节点i在第l+1上的表示。节点最终的表征经多层聚合获得。In equation (4), α ij represents the normalized attention coefficient, and Ni represents the first-order neighbor set of node i ; in equation (5), is the representation of the adjacent node j of node i on the lth layer, represents the representation of node i on the l+1th. The final representation of the node is obtained through multi-layer aggregation.
第二步,模型训练。首先对损失函数进行设计,在使用设计好的损失函数计算损失后,更新模型参数从而完成训练。模型首先使用解码器重构原始网络,用以计算原始网络与重构网络的邻接矩阵差异损失。由于虚假评论者群体检测任务中的节点没有标签,故采用自监督的训练方式优化嵌入,先利用DBSCAN聚类算法生成评论网络中的核心点,使用KL散度来衡量核心点与其它节点的距离,继而计算伪标记与学习到的嵌入分布之间的不同之处。在损失计算完成后,梯度下降算法被用于更新模型参数,完成训练。The second step is model training. First, the loss function is designed. After using the designed loss function to calculate the loss, the model parameters are updated to complete the training. The model first uses the decoder to reconstruct the original network to calculate the adjacency matrix difference loss between the original network and the reconstructed network. Since the nodes in the false commenter group detection task have no labels, the self-supervised training method is used to optimize the embedding. First, the DBSCAN clustering algorithm is used to generate the core points in the comment network, and the KL divergence is used to measure the distance between the core points and other nodes. , and then compute the difference between the pseudo-labels and the learned embedding distribution. After the loss calculation is completed, the gradient descent algorithm is used to update the model parameters, completing the training.
2.1)计算图重构损失。采用内积的方式来重构原始图,重构公式为:2.1) Computational graph reconstruction loss. The inner product is used to reconstruct the original graph, and the reconstruction formula is:
式中,H为学习到的节点的嵌入向量,为重构图的邻接矩阵,为使得重构出来的邻接矩阵与输入的邻接矩阵尽可能的相似。训练过程中,采用交叉熵作为损失函数:In the formula, H is the embedding vector of the learned node, is the adjacency matrix of the reconstructed graph, so that the reconstructed adjacency matrix Be as similar as possible to the input adjacency matrix. During training, cross entropy is used as the loss function:
式中,y代表邻接矩阵中某个元素的值,代表重构的邻接矩阵中的相应元素。此部分训练需要最小化重构损失,重构损失定义如下:where y represents the value of an element in the adjacency matrix, represents the corresponding element in the reconstructed adjacency matrix. This part of the training needs to minimize the reconstruction loss, which is defined as follows:
2.2)计算分布损失。虚假评论检测方法的挑战之一是没有标签指导模型的训练。本模型采用自监督的方式,使用伪标记来优化节点嵌入表示。由于图中的节点是独立的,在训练的过程中,首先对所有节点进行聚类,本模型采用K-Means算法进行聚类:2.2) Calculate the distribution loss. One of the challenges of fake review detection methods is that there is no label to guide the training of the model. This model adopts a self-supervised way, using pseudo-labels to optimize node embedding representation. Since the nodes in the graph are independent, in the training process, all nodes are clustered first. This model uses the K-Means algorithm for clustering:
式中,μi是Si中所有节点的均值,k为要聚类的集合数。在得到了所有的虚假评论群体后,采用DBSCAN算法来检测异常的群体。DBSCAN算法首先区分图中的核心点与边界点,将检测到的核心点作为训练模型中的核心点,计算其它节点的表征与核心点表征的距离。训练过程中,需要不断学习数据的分布以区分正常节点与异常节,piu表示模型计算出的伪标记,qiu表示所有节点的特征与DBSCAN检测出的核心点之间的距离分布。qiu的定义如下:In the formula, μ i is the mean of all nodes in Si, and k is the number of sets to be clustered. After getting all the fake comment groups, the DBSCAN algorithm is used to detect abnormal groups. The DBSCAN algorithm first distinguishes the core points and boundary points in the graph, uses the detected core points as the core points in the training model, and calculates the distance between the representation of other nodes and the representation of the core point. During the training process, it is necessary to continuously learn the distribution of data to distinguish normal nodes from abnormal nodes. p iu represents the pseudo-marker calculated by the model, and qiu represents the distance distribution between the features of all nodes and the core points detected by DBSCAN. The definition of q iu is as follows:
式中,uu表示DBSCAN检测出的核心点的表征。该公式计算节点的表征与核心点的表征的距离,如果节点与核心点的距离足够近,可以认为此节点属于该群体,则被认为是正常节点。假设某节点与核心点的距离较远,可以认为此节点是离群点,即被认为是相应的虚假评论群体。节点标签可以通过如下公式获得:In the formula, u u represents the characterization of the core point detected by DBSCAN. The formula calculates the distance between the representation of the node and the representation of the core point. If the distance between the node and the core point is close enough, the node can be considered to belong to the group, and it is considered to be a normal node. Assuming that a certain node is far away from the core point, it can be considered that this node is an outlier, that is, it is considered to be a corresponding false comment group. Node labels can be obtained by the following formula:
Si=argmax·qiu#(11)S i =argmax·q iu #(11)
本模型采用KL散度来衡量伪标记与学习到的分布之间的差异,KL散度主要衡量概率分布Q与参考概率分布P的不同之处。与式(11)得到的标签不同,目标分布piu被认为是真正的标签,是通过训练过程中的Q计算出来的,piu依赖P分布并按照阶段来进行更新,被当作是该阶段内自监督的标签。目标分布的主要作用是监督模型的学习,指导分布Q的更新。P的计算公式如下:This model uses KL divergence to measure the difference between the pseudo-label and the learned distribution. KL divergence mainly measures the difference between the probability distribution Q and the reference probability distribution P. Different from the label obtained by equation (11), the target distribution p iu is considered as the real label, which is calculated by Q in the training process, and p iu depends on the P distribution and is updated according to the stage, which is regarded as the stage. Internal self-supervised labels. The main role of the target distribution is to supervise the learning of the model and guide the update of the distribution Q. The formula for calculating P is as follows:
自监督优化嵌入的损失函数如下:The loss function for self-supervised optimized embedding is as follows:
2.3)计算联合损失函数。模型总体的损失函数由图重构损失函数与自监督分布损失函数两部分组成,最终的损失函数表达式为:2.3) Calculate the joint loss function. The overall loss function of the model consists of a graph reconstruction loss function and a self-supervised distribution loss function. The final loss function expression is:
L=·Lr+βLc#(14)L=·L r +βL c #(14)
式中,Lr为重构损失,Lc为分布损失,β用于控制两损失之间的权重。In the formula, L r is the reconstruction loss, L c is the distribution loss, and β is used to control the weight between the two losses.
2.4)模型训练。模型的训练按照以下步骤进行:设定初始的超参数,包括图注意力网络的聚合层数、节点嵌入维度、KMeans聚类算法的聚类个数和训练迭代次数等。2.4) Model training. The training of the model is carried out according to the following steps: setting the initial hyperparameters, including the number of aggregation layers of the graph attention network, the node embedding dimension, the number of clusters of the KMeans clustering algorithm, and the number of training iterations.
在模型训练过程中,超参数需要进行人工调整,以使得模型的检测效果达到最佳。一般来讲,超参数需要根据训练过程中损失函数的下降情况或模型的最终检测结果确定。在超参数设置完毕后,将评论网络以及网络的邻接矩阵等信息输入模型,运行模型,等待模型训练完毕,记录本次训练后模型的检测表现,在同一组超参数下多次重复上述过程,取检测精度的平均值作为最终的结果检测精度。完成一组超参数下的模型训练后,遵循控制变量法对模型中的超参数进行调整,按照使得平均精度增大的方向调整模型某一超参数,保持其它参数不变。重复超参数的调整过程,保留使模型平均判别精度达到最高的一组超参数设置,模型训练完毕。During the model training process, the hyperparameters need to be manually adjusted to achieve the best detection effect of the model. Generally speaking, hyperparameters need to be determined according to the decline of the loss function during the training process or the final detection result of the model. After the hyperparameters are set, input the information such as the comment network and the adjacency matrix of the network into the model, run the model, wait for the model training to complete, record the detection performance of the model after this training, and repeat the above process multiple times under the same set of hyperparameters. Take the average of detection accuracy as the final result detection accuracy. After the model training under a set of hyperparameters is completed, the hyperparameters in the model are adjusted according to the control variable method, and a certain hyperparameter of the model is adjusted in the direction of increasing the average accuracy, and other parameters are kept unchanged. Repeat the hyperparameter adjustment process, keep a set of hyperparameter settings that maximizes the average discriminant accuracy of the model, and complete the model training.
第三步,虚假评论群体检测。利用上一步训练好的模型以及超参数对真实评论网络进行检测并保存模型对评论网络的检测结果。The third step is to detect fake comment groups. Use the model trained in the previous step and the hyperparameters to detect the real review network and save the detection results of the model on the review network.
表1本算法运行过程Table 1 The running process of this algorithm
结合本发明的方案,进行实验分析如下:In conjunction with the scheme of the present invention, carry out experimental analysis as follows:
本发明在研究人员处理过的Amazon数据集上验证虚假评论群体检测效果,数据集的基本情况如表2所示。表中关系类型U-P-U代表两个用户至少评论了一个相同的产品。U-S-U代表两个评论者在一周内评论了相同的评分。U-V-U代表了两个评论者有着类似的评论。实验在四个数据集上进行,分别对应以上三种关系,以及一个由三种关系构成的数据集,四个数据集分别为Amazon_p,Amazon_s,Amazon_v和Amazon数据集。The present invention verifies the false comment group detection effect on the Amazon data set processed by the researchers. The basic situation of the data set is shown in Table 2. The relationship type U-P-U in the table means that two users have reviewed at least one of the same product. U-S-U means that two reviewers reviewed the same rating within a week. U-V-U represents two reviewers with similar reviews. The experiments are carried out on four datasets, corresponding to the above three relations, and a dataset consisting of three relations. The four datasets are Amazon_p, Amazon_s, Amazon_v and Amazon datasets.
表2实验使用的虚假评论数据集基本情况Table 2 The basic situation of the fake comment dataset used in the experiment
对一种融合复杂关系的虚假评论者群体检测方法的实验分析过程可分为两部分:将该方法与现有的虚假评论群体检测方法作对比,以召回率作为评价指标验证本发明的优越性;将训练过程与检测结果做可视化实验,从而更直观地分析模型设计的合理性与检测效果的有效性。The experimental analysis process of a false reviewer group detection method integrating complex relationships can be divided into two parts: the method is compared with the existing false review group detection methods, and the recall rate is used as an evaluation index to verify the superiority of the present invention ; Visualize the training process and detection results, so as to more intuitively analyze the rationality of the model design and the validity of the detection effect.
(1)检测结果对比实验(1) Comparison experiment of test results
对比实验选择研究人员已经提出的几种虚假评论群体检测方法与本方法作对比,其中Graph-Strainer使用基于图的方式来发现目标项目,并在此基础上检测虚假评论者小组,通过2跳子图解决群体检测问题。ColluEage使用马尔可夫随机场来检测共谋的虚假评论者与虚假评论活动。DeFrauder利用产品评论图,结合行为信号来检测候选欺诈组,将候选欺诈组映射到一个嵌入空间中,并为每个组分配分数,最后按照分数的高低来确定虚假评论者群体。除上述对比方法外,为了验证本方法内模块的有效性,实验额外加入两种解耦的检测方法:GCN+KMeans+DBSCAN与GAT+KMeans+DBSCAN。第一种方法使用GCN对初始数据集进行嵌入,第二种方法使用GAT对初始数据集进行嵌入,在得到嵌入后,两种方法均对嵌入结果采用KMeans聚类方法和DBSCAN方法进行检测。The comparison experiment selects several false comment group detection methods that have been proposed by researchers to compare with this method, in which Graph-Strainer uses a graph-based method to discover target items, and on this basis detects false commenter groups, through 2 jumpers The graph solves the crowd detection problem. ColluEage uses Markov random fields to detect colluding fake reviewers and fake review activity. DeFrauder utilizes product review graphs, combined with behavioral signals to detect candidate fraud groups, maps candidate fraud groups into an embedding space, assigns scores to each group, and finally identifies groups of fake reviewers based on their scores. In addition to the above comparison methods, in order to verify the effectiveness of the modules in this method, two additional decoupling detection methods are added to the experiment: GCN+KMeans+DBSCAN and GAT+KMeans+DBSCAN. The first method uses GCN to embed the initial data set, and the second method uses GAT to embed the initial data set. After obtaining the embedding, both methods use KMeans clustering method and DBSCAN method to detect the embedding results.
本方法与对比方法的实验结果如表3所示。通过实验结果的纵向比较,可以看出本方法的性能明显优于其它方法,检测效果有较大幅度的提升。GAT+KMeans+DBSCAN的结果优于GCN+KMeans+DBSCAN,证明了使用GAT作为图编码器的有效性。相比于GCN,GAT能够根据中心节点与邻居节点的相似度来聚合邻居特征,从而使得正常节点中的表征结果中不会聚合大量虚假节点的信息。通过实验结果的横向比较,可以看出在考虑三个不同的关系和考虑全部关系的情况下,本方法均取得了最优的结果,这表明在深度学习模型中融合KMeans聚类算法,并且在训练过程中不断迭代更新核心点,可以取得更准确的检测结果。The experimental results of this method and the comparative method are shown in Table 3. Through the longitudinal comparison of the experimental results, it can be seen that the performance of this method is obviously better than that of other methods, and the detection effect is greatly improved. The results of GAT+KMeans+DBSCAN are better than GCN+KMeans+DBSCAN, proving the effectiveness of using GAT as a graph encoder. Compared with GCN, GAT can aggregate neighbor features according to the similarity between the central node and neighbor nodes, so that the representation results of normal nodes will not aggregate the information of a large number of false nodes. Through the horizontal comparison of the experimental results, it can be seen that the method has achieved the best results when considering three different relationships and considering all the relationships, which shows that the KMeans clustering algorithm is integrated in the deep learning model, and in the During the training process, the core points are continuously updated iteratively to obtain more accurate detection results.
表2检测结果Table 2 Test results
(2)训练过程与检测结果可视化实验(2) Visualization experiment of training process and detection results
可视化实验的目的在于通过分析训练过程中的损失、结果召回率的变化来表现本方法设计的合理性,通过检测结果可视化这一手段来直观地表明检测结果的有效性。The purpose of the visualization experiment is to show the rationality of the design of this method by analyzing the loss in the training process and the change of the recall rate of the result, and to intuitively show the validity of the detection result by visualizing the detection result.
图3展示了训练过程中召回率的变化,图的整体态势表明,随着训练的进行,检测结果的召回率在不断提高,模型设计的合理性得到证实。Figure 3 shows the change of recall rate during the training process. The overall situation of the graph shows that the recall rate of the detection results is continuously improving as the training progresses, and the rationality of the model design has been confirmed.
训练过程中损失函数的变化如图4所示。结合图3和图4进行分析,随着损失函数的不断减小,召回率也在不断提升,这表明本方法在不断更新表示学习结果的同时,得到的表示学习结果也可以适用于虚假评论者群体检测。反过来说,我们设计的损失函数,可以将损失很好地反馈给模型并监督模型的学习,解决了表示学习结果可能不适合检测方法的问题。The change of the loss function during training is shown in Figure 4. Combined with Figure 3 and Figure 4 for analysis, with the continuous reduction of the loss function, the recall rate is also continuously improved, which shows that the method is constantly updating the representation learning results, and the obtained representation learning results can also be applied to fake reviewers. Crowd detection. Conversely, the loss function we designed can feed the loss well to the model and supervise the learning of the model, solving the problem that the representation learning result may not be suitable for the detection method.
图5展示了模型在Amazon数据集上的聚类结果,可以看出本发明对于虚假评论者群体检测问题具有很好的效果。其中黑色实体代表虚假评论群体,主要集中在左下方,灰色实体代表正常评论节点,主要集中在右上方。Figure 5 shows the clustering results of the model on the Amazon data set. It can be seen that the present invention has a good effect on the problem of false reviewer group detection. The black entities represent fake comment groups, which are mainly concentrated in the lower left, and the gray entities represent normal comment nodes, which are mainly concentrated in the upper right.
以上所述实施方式仅表达本发明的实施方式,但并不能因此而理解为对本发明专利的范围的限制,应当指出,对于本领域的技术人员来说,在不脱离本发明构思的前提下,还可以做出若干变形和改进,这些均属于本发明的保护范围。The above-mentioned embodiments only represent the embodiments of the present invention, but should not be construed as a limitation on the scope of the patent of the present invention. Several modifications and improvements can also be made, which all belong to the protection scope of the present invention.
Claims (3)
Priority Applications (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN202210449853.8A CN114742564B (en) | 2022-04-27 | 2022-04-27 | False reviewer group detection method integrating complex relations |
Applications Claiming Priority (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN202210449853.8A CN114742564B (en) | 2022-04-27 | 2022-04-27 | False reviewer group detection method integrating complex relations |
Publications (2)
Publication Number | Publication Date |
---|---|
CN114742564A true CN114742564A (en) | 2022-07-12 |
CN114742564B CN114742564B (en) | 2024-09-17 |
Family
ID=82282704
Family Applications (1)
Application Number | Title | Priority Date | Filing Date |
---|---|---|---|
CN202210449853.8A Active CN114742564B (en) | 2022-04-27 | 2022-04-27 | False reviewer group detection method integrating complex relations |
Country Status (1)
Country | Link |
---|---|
CN (1) | CN114742564B (en) |
Cited By (2)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN116737934A (en) * | 2023-06-20 | 2023-09-12 | 合肥工业大学 | Naval false comment detection algorithm based on semi-supervised graph neural network |
CN116993433A (en) * | 2023-07-14 | 2023-11-03 | 重庆邮电大学 | Internet E-commerce abnormal user detection method based on big data |
Citations (4)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN110580341A (en) * | 2019-09-19 | 2019-12-17 | 山东科技大学 | A method and system for detecting false comments based on a semi-supervised learning model |
US20210089579A1 (en) * | 2019-09-23 | 2021-03-25 | Arizona Board Of Regents On Behalf Of Arizona State University | Method and apparatus for collecting, detecting and visualizing fake news |
CN112597302A (en) * | 2020-12-18 | 2021-04-02 | 东北林业大学 | False comment detection method based on multi-dimensional comment representation |
CN112732921A (en) * | 2021-01-19 | 2021-04-30 | 福州大学 | False user comment detection method and system |
-
2022
- 2022-04-27 CN CN202210449853.8A patent/CN114742564B/en active Active
Patent Citations (4)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN110580341A (en) * | 2019-09-19 | 2019-12-17 | 山东科技大学 | A method and system for detecting false comments based on a semi-supervised learning model |
US20210089579A1 (en) * | 2019-09-23 | 2021-03-25 | Arizona Board Of Regents On Behalf Of Arizona State University | Method and apparatus for collecting, detecting and visualizing fake news |
CN112597302A (en) * | 2020-12-18 | 2021-04-02 | 东北林业大学 | False comment detection method based on multi-dimensional comment representation |
CN112732921A (en) * | 2021-01-19 | 2021-04-30 | 福州大学 | False user comment detection method and system |
Non-Patent Citations (2)
Title |
---|
尹春勇;朱宇航;: "基于垂直集成Tri-training的虚假评论检测模型", 计算机应用, no. 08, 10 August 2020 (2020-08-10) * |
曾致远;卢晓勇;徐盛剑;陈木生;: "基于多层注意力机制深度学习模型的虚假评论检测", 计算机应用与软件, no. 05, 12 May 2020 (2020-05-12) * |
Cited By (3)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN116737934A (en) * | 2023-06-20 | 2023-09-12 | 合肥工业大学 | Naval false comment detection algorithm based on semi-supervised graph neural network |
CN116737934B (en) * | 2023-06-20 | 2024-03-22 | 合肥工业大学 | Naval false comment detection algorithm based on semi-supervised graph neural network |
CN116993433A (en) * | 2023-07-14 | 2023-11-03 | 重庆邮电大学 | Internet E-commerce abnormal user detection method based on big data |
Also Published As
Publication number | Publication date |
---|---|
CN114742564B (en) | 2024-09-17 |
Similar Documents
Publication | Publication Date | Title |
---|---|---|
Li et al. | LGM-GNN: A local and global aware memory-based graph neural network for fraud detection | |
CN110555455A (en) | Online transaction fraud detection method based on entity relationship | |
CN113269647B (en) | Graph-based transaction abnormity associated user detection method | |
CN111476261A (en) | Community-enhanced graph convolution neural network method | |
CN113807422B (en) | Weighted graph convolutional neural network scoring prediction model integrating multi-feature information | |
CN105956798A (en) | Sparse random forest-based method for assessing running state of distribution network device | |
CN109767312A (en) | A credit evaluation model training and evaluation method and device | |
CN114742564A (en) | A false reviewer group detection method incorporating complex relationships | |
CN108734223A (en) | The social networks friend recommendation method divided based on community | |
CN117150416B (en) | A detection method, system, media and equipment for abnormal nodes in the industrial Internet | |
CN115952424A (en) | A Graph Convolutional Neural Network Clustering Method Based on Multi-view Structure | |
CN110532429A (en) | It is a kind of based on cluster and correlation rule line on user group's classification method and device | |
CN113343123A (en) | Training method and detection method for generating confrontation multiple relation graph network | |
CN114580934A (en) | Early warning method for food detection data risk based on unsupervised anomaly detection | |
CN114330291A (en) | Text recommendation system based on dual attention mechanism | |
CN115840853A (en) | Course recommendation system based on knowledge graph and attention network | |
CN108062566A (en) | A kind of intelligent integrated flexible measurement method based on the potential feature extraction of multinuclear | |
CN116527346A (en) | Threat Node Awareness Method Based on Deep Learning Graph Neural Network Theory | |
CN103093239B (en) | A kind of merged point to neighborhood information build drawing method | |
CN115619099A (en) | Substation safety protection evaluation method, device, computer equipment and storage medium | |
CN115189942A (en) | A Pseudo-Label-Guided Multi-View Consensus Graph Semi-Supervised Network Intrusion Detection System | |
CN114880538A (en) | Attribute graph community detection method based on self-supervision | |
CN117639057A (en) | New energy power distribution area topology association analysis method and storage medium | |
Munikoti et al. | Bayesian graph neural network for fast identification of critical nodes in uncertain complex networks | |
CN116862667A (en) | A fraud detection and credit assessment method based on contrastive learning and decoupled graph neural |
Legal Events
Date | Code | Title | Description |
---|---|---|---|
PB01 | Publication | ||
PB01 | Publication | ||
SE01 | Entry into force of request for substantive examination | ||
SE01 | Entry into force of request for substantive examination | ||
GR01 | Patent grant | ||
GR01 | Patent grant |