CN117576504A - Training method for social media fake news detection model - Google Patents

Training method for social media fake news detection model Download PDF

Info

Publication number
CN117576504A
CN117576504A CN202311475479.XA CN202311475479A CN117576504A CN 117576504 A CN117576504 A CN 117576504A CN 202311475479 A CN202311475479 A CN 202311475479A CN 117576504 A CN117576504 A CN 117576504A
Authority
CN
China
Prior art keywords
news
social media
field
propagation
graph
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Pending
Application number
CN202311475479.XA
Other languages
Chinese (zh)
Inventor
张怀文
杨青
刘鑫鑫
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Inner Mongolia University
Original Assignee
Inner Mongolia University
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Inner Mongolia University filed Critical Inner Mongolia University
Priority to CN202311475479.XA priority Critical patent/CN117576504A/en
Publication of CN117576504A publication Critical patent/CN117576504A/en
Pending legal-status Critical Current

Links

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06VIMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
    • G06V10/00Arrangements for image or video recognition or understanding
    • G06V10/70Arrangements for image or video recognition or understanding using pattern recognition or machine learning
    • G06V10/77Processing image or video features in feature spaces; using data integration or data reduction, e.g. principal component analysis [PCA] or independent component analysis [ICA] or self-organising maps [SOM]; Blind source separation
    • G06V10/774Generating sets of training patterns; Bootstrap methods, e.g. bagging or boosting
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06NCOMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
    • G06N3/00Computing arrangements based on biological models
    • G06N3/02Neural networks
    • G06N3/04Architecture, e.g. interconnection topology
    • G06N3/042Knowledge-based neural networks; Logical representations of neural networks
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06NCOMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
    • G06N3/00Computing arrangements based on biological models
    • G06N3/02Neural networks
    • G06N3/04Architecture, e.g. interconnection topology
    • G06N3/0464Convolutional networks [CNN, ConvNet]
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06NCOMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
    • G06N3/00Computing arrangements based on biological models
    • G06N3/02Neural networks
    • G06N3/08Learning methods
    • G06N3/0895Weakly supervised learning, e.g. semi-supervised or self-supervised learning
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06VIMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
    • G06V10/00Arrangements for image or video recognition or understanding
    • G06V10/40Extraction of image or video features
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06VIMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
    • G06V10/00Arrangements for image or video recognition or understanding
    • G06V10/70Arrangements for image or video recognition or understanding using pattern recognition or machine learning
    • G06V10/764Arrangements for image or video recognition or understanding using pattern recognition or machine learning using classification, e.g. of video objects
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06VIMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
    • G06V10/00Arrangements for image or video recognition or understanding
    • G06V10/70Arrangements for image or video recognition or understanding using pattern recognition or machine learning
    • G06V10/77Processing image or video features in feature spaces; using data integration or data reduction, e.g. principal component analysis [PCA] or independent component analysis [ICA] or self-organising maps [SOM]; Blind source separation
    • G06V10/778Active pattern-learning, e.g. online learning of image or video features
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06VIMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
    • G06V10/00Arrangements for image or video recognition or understanding
    • G06V10/70Arrangements for image or video recognition or understanding using pattern recognition or machine learning
    • G06V10/82Arrangements for image or video recognition or understanding using pattern recognition or machine learning using neural networks
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06VIMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
    • G06V20/00Scenes; Scene-specific elements
    • G06V20/95Pattern authentication; Markers therefor; Forgery detection

Landscapes

  • Engineering & Computer Science (AREA)
  • Theoretical Computer Science (AREA)
  • Physics & Mathematics (AREA)
  • General Physics & Mathematics (AREA)
  • Evolutionary Computation (AREA)
  • Artificial Intelligence (AREA)
  • Computing Systems (AREA)
  • Health & Medical Sciences (AREA)
  • General Health & Medical Sciences (AREA)
  • Software Systems (AREA)
  • Multimedia (AREA)
  • Databases & Information Systems (AREA)
  • Medical Informatics (AREA)
  • Computer Vision & Pattern Recognition (AREA)
  • Life Sciences & Earth Sciences (AREA)
  • Biomedical Technology (AREA)
  • Biophysics (AREA)
  • Computational Linguistics (AREA)
  • Data Mining & Analysis (AREA)
  • Molecular Biology (AREA)
  • General Engineering & Computer Science (AREA)
  • Mathematical Physics (AREA)
  • Management, Administration, Business Operations System, And Electronic Commerce (AREA)

Abstract

The invention provides a training method of a social media false news detection model, which comprises the following steps: acquiring a plurality of news propagation graphs of a first field in a training set; the news propagation graph is used for representing a news propagation path; acquiring a plurality of news propagation graphs of a second field in the training set; the ratio of the number of news maps of the second domain to the number of news maps of the first domain is less than a threshold; training the social media false news detection model according to the plurality of news propagation graphs in the first field, the plurality of news propagation graphs in the second field and the first target loss function to obtain a social media false news detection model trained based on the training set; the social media false news detection model is used for detecting the authenticity of news; the first objective loss function is determined by the classification loss, the global contrast loss, and the local contrast loss. The method of the invention realizes the accuracy of false news detection in the second field with less data quantity and shorter propagation time.

Description

社交媒体假新闻检测模型的训练方法Training methods for social media fake news detection models

技术领域Technical Field

本发明涉及数据处理技术领域,尤其涉及一种社交媒体假新闻检测模型的训练。The present invention relates to the field of data processing technology, and in particular to the training of a social media fake news detection model.

背景技术Background Art

随着互联网的快速发展,社交媒体成为人们日常获取信息、发表观点、日常交流的重要平台。然而,伴随社交平台用户数量增长,如果一个假新闻成为热门话题,经过大量用户的讨论与传播,将影响社会安定,带来潜在经济损失风险。为了确定新闻的真实性,人们提出社交媒体假新闻检测任务,旨在确定社交媒体中新闻的真实性。With the rapid development of the Internet, social media has become an important platform for people to obtain information, express opinions, and communicate in daily life. However, with the growth of the number of users on social platforms, if a piece of fake news becomes a hot topic, it will affect social stability and bring potential economic loss risks after being discussed and spread by a large number of users. In order to determine the authenticity of news, people have proposed the task of social media fake news detection, which aims to determine the authenticity of news in social media.

相关技术中,在有大量数据的高资源域中训练的自动假新闻检测模型可以准确地进行高资源域的检测;但对于突发事件产生的新兴领域,由于数据不足,自动假新闻检测效果准确性较低。因此如何解决高资源数据充足,低资源数据较少的情况下低资源域中新闻真实性的准确检测,是本领域技术人员亟需解决的问题。In the related art, the automatic fake news detection model trained in the high-resource domain with a large amount of data can accurately detect the high-resource domain; however, for emerging fields where emergencies occur, due to insufficient data, the accuracy of automatic fake news detection is low. Therefore, how to accurately detect the authenticity of news in the low-resource domain when there is sufficient high-resource data and less low-resource data is a problem that technicians in this field need to solve urgently.

发明内容Summary of the invention

针对现有技术中的问题,本发明实施例提供一种社交媒体假新闻检测模型的训练方法。In view of the problems in the prior art, an embodiment of the present invention provides a training method for a social media fake news detection model.

具体地,本发明实施例提供了以下技术方案:Specifically, the embodiment of the present invention provides the following technical solutions:

第一方面,本发明实施例提供了一种社交媒体假新闻检测模型的训练方法,包括:In a first aspect, an embodiment of the present invention provides a method for training a social media fake news detection model, comprising:

获取训练集中第一领域的多个新闻传播图;新闻传播图用于表示新闻的传播路径;Obtain multiple news diffusion graphs in the first field of the training set; the news diffusion graph is used to represent the diffusion path of news;

获取训练集中第二领域的多个新闻传播图;第二领域的新闻传播图的数量与第一领域的新闻传播图的数量的比值小于阈值;Acquire multiple news diffusion graphs in the second field in the training set; the ratio of the number of news diffusion graphs in the second field to the number of news diffusion graphs in the first field is less than a threshold;

根据第一领域的多个新闻传播图、第二领域的多个新闻传播图和第一目标损失函数,对社交媒体假新闻检测模型进行训练,得到基于训练集训练后的社交媒体假新闻检测模型;社交媒体假新闻检测模型用于检测新闻的真实性;第一目标损失函数是由分类损失、全局对比损失和局部对比损失确定的;全局对比损失表示新闻传播图中的节点特征、新闻传播图的第一类型增广图中的节点特征与新闻传播图特征之间的关联程度;局部对比损失表示新闻传播图的第二类型增广图中的节点特征与新闻传播图的第三类型增广图中的节点特征之间的关联程度;分类损失表示分类结果的准确程度。According to multiple news propagation graphs in the first field, multiple news propagation graphs in the second field and the first target loss function, a social media fake news detection model is trained to obtain a social media fake news detection model trained based on the training set; the social media fake news detection model is used to detect the authenticity of news; the first target loss function is determined by classification loss, global contrast loss and local contrast loss; the global contrast loss represents the degree of correlation between node features in the news propagation graph, node features in the first type of augmented graph of the news propagation graph and news propagation graph features; the local contrast loss represents the degree of correlation between node features in the second type of augmented graph of the news propagation graph and node features in the third type of augmented graph of the news propagation graph; the classification loss represents the accuracy of the classification result.

进一步地,社交媒体假新闻检测模型,包括以下至少一项:Furthermore, the social media fake news detection model includes at least one of the following:

特征提取模块;特征提取模块用于提取新闻传播图特征;Feature extraction module: The feature extraction module is used to extract the features of the news dissemination graph;

分类模块;分类模块用于根据新闻传播图特征,预测新闻传播图所对应的社交媒体新闻的真实性;Classification module: The classification module is used to predict the authenticity of social media news corresponding to the news diffusion graph based on the characteristics of the news diffusion graph;

自监督学习模块;自监督学习模块用于根据新闻传播图中的节点特征、新闻传播图的第一类型增广图中的节点特征与新闻传播图特征,确定全局对比损失;根据新闻传播图的第二类型增广图中的节点特征与新闻传播图的第三类型增广图中的节点特征,确定局部对比损失。Self-supervised learning module; the self-supervised learning module is used to determine the global contrast loss based on the node features in the news diffusion graph, the node features in the first type of augmented graph of the news diffusion graph, and the news diffusion graph features; and determine the local contrast loss based on the node features in the second type of augmented graph of the news diffusion graph and the node features in the third type of augmented graph of the news diffusion graph.

进一步地,社交媒体假新闻检测模型基于如下方式进行训练:Furthermore, the social media fake news detection model is trained based on the following method:

将训练集中第一领域的多个新闻传播图和第二领域的多个新闻传播图输入至社交媒体假新闻检测模型,输出新闻传播图所对应的社交媒体新闻的真实性检测结果;根据社交媒体新闻的的真实性检测结果和新闻的标签信息,得到社交媒体假新闻检测模型的分类损失;标签信息用于标注新闻的真实性;Input multiple news propagation graphs in the first domain and multiple news propagation graphs in the second domain in the training set into the social media fake news detection model, and output the authenticity detection results of the social media news corresponding to the news propagation graphs; obtain the classification loss of the social media fake news detection model based on the authenticity detection results of the social media news and the label information of the news; the label information is used to mark the authenticity of the news;

将训练集中第一领域的多个新闻传播图、第一领域的多个新闻传播图所对应的第一类型增广图、第一领域的多个新闻传播图所对应的第二类型增广图和第一领域的多个新闻传播图所对应的第三类型增广图,输入至社交媒体假新闻检测模型,得到第一领域的多个新闻的全局对比损失和局部对比损失;Input multiple news propagation graphs in the first field in the training set, the first type of augmented graphs corresponding to the multiple news propagation graphs in the first field, the second type of augmented graphs corresponding to the multiple news propagation graphs in the first field, and the third type of augmented graphs corresponding to the multiple news propagation graphs in the first field into the social media fake news detection model to obtain the global contrast loss and local contrast loss of the multiple news in the first field;

将训练集中第二领域的多个新闻传播图、第二领域的多个新闻传播图所对应的第一类型增广图、第二领域的多个新闻传播图所对应的第二类型增广图和第二领域的多个新闻传播图所对应的第三类型增广图,输入至社交媒体假新闻检测模型,得到第二领域的多个新闻的全局对比损失和局部对比损失;Input multiple news propagation graphs in the second field in the training set, the first type of augmented graphs corresponding to the multiple news propagation graphs in the second field, the second type of augmented graphs corresponding to the multiple news propagation graphs in the second field, and the third type of augmented graphs corresponding to the multiple news propagation graphs in the second field into the social media fake news detection model to obtain the global contrast loss and local contrast loss of the multiple news in the second field;

将社交媒体假新闻检测模型的分类损失、第一领域的多个新闻的全局对比损失和局部对比损失、第二领域的多个新闻的全局对比损失和局部对比损失的三者加权之和作为第一目标损失函数的值;The weighted sum of the classification loss of the social media fake news detection model, the global contrast loss and local contrast loss of multiple news in the first field, and the global contrast loss and local contrast loss of multiple news in the second field is taken as the value of the first objective loss function;

基于第一目标损失函数的值,对社交媒体假新闻检测模型进行训练,得到基于训练集训练后的社交媒体假新闻检测模型。Based on the value of the first objective loss function, the social media fake news detection model is trained to obtain a social media fake news detection model trained based on the training set.

进一步地,社交媒体假新闻检测模型,还包括:Furthermore, the social media fake news detection model also includes:

数据自适应约束模块;数据自适应约束模块用于确定训练集中的新闻传播图特征和测试集中的新闻传播图特征之间的差异。Data adaptive constraint module: The data adaptive constraint module is used to determine the difference between the news diffusion graph features in the training set and the news diffusion graph features in the test set.

进一步地,基于第一目标损失函数的值,对社交媒体假新闻检测模型进行训练,得到基于训练集训练后的社交媒体假新闻检测模型之后,还包括:Furthermore, after training the social media fake news detection model based on the value of the first objective loss function to obtain the social media fake news detection model trained based on the training set, the method further includes:

将测试集中第二领域的多个新闻传播图、测试集中第二领域的多个新闻传播图所对应的第一类型增广图、测试集中第二领域的多个新闻传播图所对应的第二类型增广图和测试集中第二领域的多个新闻传播图所对应的第三类型增广图,输入至基于训练集训练后的社交媒体假新闻检测模型,得到测试集中第二领域的多个新闻的全局对比损失和局部对比损失;Input multiple news propagation graphs in the second field in the test set, the first type of augmented graphs corresponding to the multiple news propagation graphs in the second field in the test set, the second type of augmented graphs corresponding to the multiple news propagation graphs in the second field in the test set, and the third type of augmented graphs corresponding to the multiple news propagation graphs in the second field in the test set into the social media fake news detection model trained based on the training set, and obtain the global contrast loss and local contrast loss of the multiple news in the second field in the test set;

根据训练集中第二领域的新闻传播图特征和测试集中第二领域的新闻传播图特征之间的差异、测试集中第二领域的多个新闻的全局对比损失和局部对比损失加权之和作为第二目标损失函数的值;According to the difference between the news propagation graph features of the second field in the training set and the news propagation graph features of the second field in the test set, the weighted sum of the global contrast loss and the local contrast loss of multiple news in the second field in the test set is used as the value of the second objective loss function;

基于第二目标损失函数的值,对社交媒体假新闻检测模型进行训练,得到基于测试集潜在特征的社交媒体假新闻检测模型。Based on the value of the second objective loss function, the social media fake news detection model is trained to obtain a social media fake news detection model based on the potential features of the test set.

第二方面,本发明实施例还提供了一种社交媒体假新闻检测方法,包括:In a second aspect, an embodiment of the present invention further provides a method for detecting fake news on social media, comprising:

获取待检测的第二领域的新闻传播图;Obtain a news dissemination graph of a second field to be detected;

将待检测的第二领域的新闻传播图输入社交媒体假新闻检测模型中,得到第二领域的新闻传播图所对应的社交媒体新闻的真实性检测结果;社交媒体假新闻检测模型为基于如第一方面的社交媒体假新闻检测模型的训练方法训练得到的。The news propagation graph of the second field to be detected is input into the social media fake news detection model to obtain the authenticity detection result of the social media news corresponding to the news propagation graph of the second field; the social media fake news detection model is trained based on the training method of the social media fake news detection model of the first aspect.

第三方面,本发明实施例还提供了一种社交媒体假新闻检测装置,包括:In a third aspect, an embodiment of the present invention further provides a social media fake news detection device, comprising:

获取待检测的第二领域的新闻传播图;Obtain a news dissemination graph of a second field to be detected;

将待检测的第二领域的新闻传播图输入社交媒体假新闻检测模型中,得到第二领域的新闻传播图所对应的社交媒体新闻的真实性检测结果;社交媒体假新闻检测模型为基于如第一方面所述社交媒体假新闻检测模型的训练方法训练得到的。The news propagation graph of the second field to be detected is input into the social media fake news detection model to obtain the authenticity detection result of the social media news corresponding to the news propagation graph of the second field; the social media fake news detection model is trained based on the training method of the social media fake news detection model as described in the first aspect.

第四方面,本发明实施例还提供了一种电子设备,包括存储器、处理器及存储在所述存储器上并可在所述处理器上运行的计算机程序,所述处理器执行所述程序时实现如第一方面所述社交媒体假新闻检测模型的训练方法或第二方面所述的社交媒体假新闻检测方法。In a fourth aspect, an embodiment of the present invention further provides an electronic device, comprising a memory, a processor, and a computer program stored in the memory and executable on the processor, wherein when the processor executes the program, the training method for the social media fake news detection model as described in the first aspect or the social media fake news detection method as described in the second aspect is implemented.

第五方面,本发明实施例还提供了一种非暂态计算机可读存储介质,其上存储有计算机程序,所述计算机程序被处理器执行时实现如第一方面所述社交媒体假新闻检测模型的训练方法或第二方面所述的社交媒体假新闻检测方法。In a fifth aspect, an embodiment of the present invention further provides a non-transitory computer-readable storage medium having a computer program stored thereon, which, when executed by a processor, implements the training method for a social media fake news detection model as described in the first aspect or the social media fake news detection method as described in the second aspect.

第六方面,本发明实施例还提供了一种计算机程序产品,包括计算机程序,所述计算机程序被处理器执行时实现如第一方面所述社交媒体假新闻检测模型的训练方法或第二方面所述的社交媒体假新闻检测方法。In a sixth aspect, an embodiment of the present invention further provides a computer program product, including a computer program, which, when executed by a processor, implements the training method for the social media fake news detection model as described in the first aspect or the social media fake news detection method as described in the second aspect.

本发明实施例提供的社交媒体假新闻检测模型的训练方法,根据第一领域的多个新闻传播图、第二领域的多个新闻传播图和第一目标损失函数,对社交媒体假新闻检测模型进行训练,使得训练后的社交媒体假新闻检测模型不仅可以准确的提取第一领域新闻的特征信息以及新闻传播图中的节点与新闻传播图之间的全局对比损失、新闻传播图中的增广图节点之间的局部对比损失,还可以准确的提取第二领域新闻的特征信息以及新闻传播图中的节点与新闻传播图之间的全局对比损失、新闻传播图中的增广图节点之间的局部对比损失,也就使得训练后的社交媒体假新闻检测模型可以准确、全面地提取到数据量较少、传播时间较短的第二领域的假新闻的特征信息,有效地促进和提升了假新闻的检测准确性,从而也就使得训练后的社交媒体假新闻检测模型可以对数据量较少、传播时间较短的第二领域的新闻真实性进行准确的检测,提升了对数据量较少、传播时间较短的第二领域假新闻检测的准确性。The training method of the social media fake news detection model provided by the embodiment of the present invention trains the social media fake news detection model according to multiple news propagation graphs in the first field, multiple news propagation graphs in the second field and the first target loss function, so that the trained social media fake news detection model can not only accurately extract the feature information of the news in the first field and the global contrast loss between the nodes in the news propagation graph and the news propagation graph, and the local contrast loss between the augmented graph nodes in the news propagation graph, but also accurately extract the feature information of the news in the second field and the global contrast loss between the nodes in the news propagation graph and the news propagation graph, and the local contrast loss between the augmented graph nodes in the news propagation graph, so that the trained social media fake news detection model can accurately and comprehensively extract the feature information of the fake news in the second field with less data volume and shorter propagation time, effectively promote and improve the detection accuracy of fake news, so that the trained social media fake news detection model can accurately detect the authenticity of the news in the second field with less data volume and shorter propagation time, and improve the accuracy of fake news detection in the second field with less data volume and shorter propagation time.

附图说明BRIEF DESCRIPTION OF THE DRAWINGS

为了更清楚地说明本发明或现有技术中的技术方案,下面将对实施例或现有技术描述中所需要使用的附图作一简单地介绍,显而易见地,下面描述中的附图是本发明的一些实施例,对于本领域普通技术人员来讲,在不付出创造性劳动的前提下,还可以根据这些附图获得其他的附图。In order to more clearly illustrate the technical solutions in the present invention or the prior art, the following briefly introduces the drawings required for use in the embodiments or the description of the prior art. Obviously, the drawings described below are some embodiments of the present invention. For ordinary technicians in this field, other drawings can be obtained based on these drawings without paying creative work.

图1是本发明实施例提供的社交媒体假新闻检测模型的训练方法的流程示意图之一;FIG1 is a flow chart of a method for training a social media fake news detection model according to an embodiment of the present invention;

图2是本发明实施例提供的社交媒体假新闻检测模型的训练方法的流程示意图之二;FIG2 is a second flow chart of a method for training a social media fake news detection model provided by an embodiment of the present invention;

图3是本发明实施例提供的社交媒体假新闻检测模型的训练方法的流程示意图之三;FIG3 is a third flow chart of a method for training a social media fake news detection model provided by an embodiment of the present invention;

图4是本发明实施例提供的社交媒体假新闻检测模型的训练方法的流程示意图之四;FIG4 is a fourth flow chart of a method for training a social media fake news detection model provided by an embodiment of the present invention;

图5是本发明实施例提供的社交媒体假新闻检测模型的训练装置的结构示意图;FIG5 is a schematic diagram of the structure of a training device for a social media fake news detection model provided by an embodiment of the present invention;

图6是本发明实施例提供的电子设备的结构示意图。FIG. 6 is a schematic diagram of the structure of an electronic device provided by an embodiment of the present invention.

具体实施方式DETAILED DESCRIPTION

为使本发明的目的、技术方案和优点更加清楚,下面将结合本发明中的附图,对本发明中的技术方案进行清楚、完整地描述,显然,所描述的实施例是本发明一部分实施例,而不是全部的实施例。基于本发明中的实施例,本领域普通技术人员在没有作出创造性劳动前提下所获得的所有其他实施例,都属于本发明保护的范围。In order to make the purpose, technical solution and advantages of the present invention clearer, the technical solution of the present invention will be clearly and completely described below in conjunction with the drawings of the present invention. Obviously, the described embodiments are part of the embodiments of the present invention, not all of the embodiments. Based on the embodiments of the present invention, all other embodiments obtained by ordinary technicians in this field without creative work are within the scope of protection of the present invention.

本发明实施例的方法可以应用于假新闻检测场景中,提升了对数据量较少、传播时间较短的第二领域假新闻检测的准确性。The method of the embodiment of the present invention can be applied to fake news detection scenarios, thereby improving the accuracy of fake news detection in the second field with a small amount of data and a short propagation time.

相关技术中,在有大量数据的高资源域中训练的自动假新闻检测模型可以准确地进行高资源域的检测;但对于突发事件产生的新兴领域,由于数据不足,自动假新闻检测效果准确性较低。因此如何解决高资源数据充足,低资源数据较少的情况下低资源域中新闻真实性的准确检测,是本领域技术人员亟需解决的问题。In the related art, the automatic fake news detection model trained in the high-resource domain with a large amount of data can accurately detect the high-resource domain; however, for emerging fields where emergencies occur, due to insufficient data, the accuracy of automatic fake news detection is low. Therefore, how to accurately detect the authenticity of news in the low-resource domain when there is sufficient high-resource data and less low-resource data is a problem that technicians in this field need to solve urgently.

本发明实施例的社交媒体假新闻检测模型的训练方法,根据第一领域的多个新闻传播图、第二领域的多个新闻传播图和第一目标损失函数,对社交媒体假新闻检测模型进行训练,使得训练后的社交媒体假新闻检测模型不仅可以准确的提取第一领域新闻的特征信息以及新闻传播图中的节点与新闻传播图之间的全局对比损失、新闻传播图中的增广图节点之间的局部对比损失,还可以准确的提取第二领域新闻的特征信息以及新闻传播图中的节点与新闻传播图之间的全局对比损失、新闻传播图中的增广图节点之间的局部对比损失,也就使得训练后的社交媒体假新闻检测模型可以准确、全面地提取到数据量较少、传播时间较短的第二领域的假新闻的特征信息,有效地促进和提升了假新闻的检测准确性,从而也就使得训练后的社交媒体假新闻检测模型可以对数据量较少、传播时间较短的第二领域的新闻真实性进行准确的检测,提升了对数据量较少、传播时间较短的第二领域假新闻检测的准确性。The training method of the social media fake news detection model of the embodiment of the present invention trains the social media fake news detection model according to multiple news propagation graphs in the first field, multiple news propagation graphs in the second field and the first target loss function, so that the trained social media fake news detection model can not only accurately extract the feature information of the news in the first field and the global contrast loss between the nodes in the news propagation graph and the news propagation graph, and the local contrast loss between the augmented graph nodes in the news propagation graph, but also accurately extract the feature information of the news in the second field and the global contrast loss between the nodes in the news propagation graph and the news propagation graph, and the local contrast loss between the augmented graph nodes in the news propagation graph, so that the trained social media fake news detection model can accurately and comprehensively extract the feature information of the fake news in the second field with less data volume and shorter propagation time, effectively promoting and improving the detection accuracy of fake news, so that the trained social media fake news detection model can accurately detect the authenticity of the news in the second field with less data volume and shorter propagation time, thereby improving the accuracy of fake news detection in the second field with less data volume and shorter propagation time.

下面结合图1-图6以具体的实施例对本发明的技术方案进行详细说明。下面这几个具体的实施例可以相互结合,对于相同或相似的概念或过程可能在某些实施例不再赘述。The technical solution of the present invention is described in detail with reference to specific embodiments in conjunction with Figures 1 to 6. The following specific embodiments may be combined with each other, and the same or similar concepts or processes may not be described in detail in some embodiments.

图1是本发明实施例提供的社交媒体假新闻检测模型的训练方法一实施例的流程示意图。如图1所示,本实施例提供的方法,包括:FIG1 is a flow chart of an embodiment of a method for training a social media fake news detection model provided by an embodiment of the present invention. As shown in FIG1 , the method provided by this embodiment includes:

步骤101、获取训练集中第一领域的多个新闻传播图;新闻传播图用于表示新闻的传播路径;Step 101, obtaining multiple news diffusion graphs in the first field of the training set; the news diffusion graph is used to represent the diffusion path of news;

具体地,现有技术中基于人的主观意识去判断和检测新闻的真实性,效率较低。示例性的,如图2所示,本申请实施例中数据量比较多的域为源域,数据量比较少的域为目标域,通过源域和目标域训练后的模型可以对第一领域的假新闻进行准确的识别(只有一个错误),而对第二领域的假新闻的检测结果就会存在大量的错误(存在6个错误)。Specifically, the prior art uses human subjective consciousness to judge and detect the authenticity of news, which is inefficient. For example, as shown in FIG2 , in the embodiment of the present application, the domain with a relatively large amount of data is the source domain, and the domain with a relatively small amount of data is the target domain. The model trained by the source domain and the target domain can accurately identify fake news in the first domain (with only one error), while the detection results of fake news in the second domain will have a large number of errors (with 6 errors).

为了解决上述问题,本申请实施例中首先获取训练集中第一领域的多个新闻传播图;其中,新闻传播图用于表示新闻的传播路径;可选地,社交媒体中新闻的传播主要依赖的是社交媒体中用户的行为,用户对社交媒体中新闻的转载、评论或点赞,构成了社交媒体新闻传播图;也就是将用户对新闻的转发、评论或点赞作为新闻传播图中的节点,根据新闻传播图中的节点以及时间信息,就可以构成新闻传播图,从而准确地表示出新闻的传播路径。In order to solve the above problems, in an embodiment of the present application, multiple news diffusion graphs of the first field in the training set are first obtained; wherein the news diffusion graph is used to represent the diffusion path of news; optionally, the diffusion of news in social media mainly depends on the behavior of users in social media, and users' reposting, commenting or liking of news in social media constitutes a social media news diffusion graph; that is, users' reposting, commenting or liking of news are used as nodes in the news diffusion graph, and according to the nodes and time information in the news diffusion graph, a news diffusion graph can be constructed, thereby accurately representing the diffusion path of news.

步骤102、获取训练集中第二领域的多个新闻传播图;第二领域的新闻传播图的数量与第一领域的新闻传播图的数量的比值小于阈值;Step 102: obtaining a plurality of news diffusion graphs in the second field in the training set; the ratio of the number of news diffusion graphs in the second field to the number of news diffusion graphs in the first field is less than a threshold;

具体地,当某个突发事件出现,也就是某个新的域出现时,在数据少的情况下,如何准确检测这个新出现域新闻的真实性,从而及时采取措施避免假新闻的进一步传播,是本领域技术人员亟需解决的问题。本申请实施例中在获取第一领域的多个新闻传播图之后,也获取训练集中第二领域的多个新闻传播图,其中,第二领域的新闻传播图的数量远小于第一领域的新闻传播图的数量;例如,第二领域的新闻传播图的数量与第一领域的新闻传播图的数量的比值小于阈值;例如,第二领域的新闻传播图的数量是第一领域的新闻传播图数量的10%;也就是本申请在进行社交媒体假新闻检测模型的训练过程中,不仅获取数据量较多、传播时间较长的第一领域的新闻作为训练样本,还获取数据量较少、传播时间较短的第二领域的新闻作为训练样本,从而使得训练后的社交媒体假新闻检测模型不仅可以对数据量较多、传播时间较长的新闻的真实性进行准确的检测,也可以对数据量较少、传播时间较短的、才发生的社交媒体新闻的真实性进行准确的检测。Specifically, when an emergency occurs, that is, when a new domain appears, how to accurately detect the authenticity of the news in the newly emerged domain in the case of little data, so as to take timely measures to avoid the further spread of fake news, is a problem that technicians in this field need to solve urgently. In the embodiment of the present application, after obtaining multiple news propagation graphs in the first field, multiple news propagation graphs in the second field in the training set are also obtained, wherein the number of news propagation graphs in the second field is much smaller than the number of news propagation graphs in the first field; for example, the ratio of the number of news propagation graphs in the second field to the number of news propagation graphs in the first field is less than the threshold; for example, the number of news propagation graphs in the second field is 10% of the number of news propagation graphs in the first field; that is, in the process of training the social media fake news detection model, the present application not only obtains news in the first field with a large amount of data and a long propagation time as training samples, but also obtains news in the second field with a small amount of data and a short propagation time as training samples, so that the trained social media fake news detection model can not only accurately detect the authenticity of news with a large amount of data and a long propagation time, but also accurately detect the authenticity of social media news with a small amount of data and a short propagation time that has just occurred.

步骤103、根据第一领域的多个新闻传播图、第二领域的多个新闻传播图和第一目标损失函数,对社交媒体假新闻检测模型进行训练,得到基于训练集训练后的社交媒体假新闻检测模型;社交媒体假新闻检测模型用于检测新闻的真实性;第一目标损失函数是由分类损失、全局对比损失和局部对比损失确定的;全局对比损失表示新闻传播图中的节点特征、新闻传播图的第一类型增广图中的节点特征与新闻传播图特征之间的关联程度;局部对比损失表示新闻传播图的第二类型增广图中的节点特征与新闻传播图的第三类型增广图中的节点特征之间的关联程度;分类损失表示分类结果的准确程度。Step 103: According to multiple news propagation graphs in the first field, multiple news propagation graphs in the second field and the first target loss function, a social media fake news detection model is trained to obtain a social media fake news detection model trained based on a training set; the social media fake news detection model is used to detect the authenticity of news; the first target loss function is determined by classification loss, global contrast loss and local contrast loss; the global contrast loss represents the degree of correlation between node features in the news propagation graph, node features in the first type of augmented graph of the news propagation graph and news propagation graph features; the local contrast loss represents the degree of correlation between node features in the second type of augmented graph of the news propagation graph and node features in the third type of augmented graph of the news propagation graph; the classification loss represents the accuracy of the classification result.

具体地,本申请实施例中在获取数据量较多、传播时间较长的第一领域的多个新闻传播图,以及数据量较少、传播时间较短的第二领域的新闻传播图之后,根据第一领域的多个新闻传播图、第二领域的多个新闻传播图和第一目标损失函数,对社交媒体假新闻检测模型进行训练,得到基于训练集训练后的社交媒体假新闻检测模型;其中,社交媒体假新闻检测模型用于检测新闻的真实性。可选地,第一目标损失函数是由分类损失、全局对比损失和局部对比损失确定的;其中,全局对比损失用于表示新闻传播图中的节点特征、新闻传播图的第一类型增广图中的节点特征与新闻传播图特征之间的关联程度;局部对比损失用于表示新闻传播图的第二类型增广图中的节点特征与新闻传播图的第三类型增广图中的节点特征之间的关联程度;也就是全局对比损失用于表示节点和新闻传播图的关联程度,局部对比损失用于表示节点和节点之间的关联程度。可选地,全局对比损失可以根据新闻传播图原图和经过对新闻传播图原图的节点随机打乱得到的增广图,进行对比学习,得到新闻传播图原图中每个节点与整个图之间的关系特征;局部对比损失可以根据对新闻传播图原图的边进行有选择的丢弃,和对原图的节点特征进行有选择遮盖得到的两类增广图,进行对比学习得到的图中每个节点之间的关系特征。也就是本申请实施例中通过增加全局对比损失和局部对比损失,实现了对社交媒体假新闻检测模型中的特征提取器的优化,使得优化后的特征提取器和社交媒体假新闻检测模型可以更准确、全面地提取新闻传播图中的特征,对假新闻的特征提取的更准确和完整,也就可以提取到对于新闻分类来说更多更重要的信息,进而也就可以实现对新闻的真实性进行更好更准确的分类。另一方面,本发明实施例的社交媒体假新闻检测模型的训练方法,将第二领域假新闻数据及其增广数据进行局部对比学习和全局对比学习,学习到了第二领域假新闻数据的局部信息和全局信息,提高了模型在第二领域假新闻数据上的泛化能力。Specifically, in the embodiment of the present application, after obtaining multiple news diffusion graphs in the first field with a large amount of data and a long diffusion time, and the news diffusion graphs in the second field with a small amount of data and a short diffusion time, the social media fake news detection model is trained according to the multiple news diffusion graphs in the first field, the multiple news diffusion graphs in the second field and the first target loss function, to obtain a social media fake news detection model trained based on the training set; wherein the social media fake news detection model is used to detect the authenticity of the news. Optionally, the first target loss function is determined by classification loss, global contrast loss and local contrast loss; wherein the global contrast loss is used to represent the degree of association between node features in the news diffusion graph, node features in the first type of augmented graph of the news diffusion graph and news diffusion graph features; the local contrast loss is used to represent the degree of association between node features in the second type of augmented graph of the news diffusion graph and node features in the third type of augmented graph of the news diffusion graph; that is, the global contrast loss is used to represent the degree of association between the node and the news diffusion graph, and the local contrast loss is used to represent the degree of association between nodes and nodes. Optionally, the global contrast loss can be compared and learned based on the original news propagation graph and the augmented graph obtained by randomly shuffling the nodes of the original news propagation graph, to obtain the relationship characteristics between each node in the original news propagation graph and the entire graph; the local contrast loss can be compared and learned based on the two types of augmented graphs obtained by selectively discarding the edges of the original news propagation graph and selectively covering the node features of the original graph to obtain the relationship characteristics between each node in the graph. That is, in the embodiment of the present application, by adding global contrast loss and local contrast loss, the optimization of the feature extractor in the social media fake news detection model is achieved, so that the optimized feature extractor and the social media fake news detection model can more accurately and comprehensively extract the features in the news propagation graph, and extract the features of fake news more accurately and completely, so that more and more important information can be extracted for news classification, and thus the authenticity of the news can be better and more accurately classified. On the other hand, the training method of the social media fake news detection model in the embodiment of the present invention performs local contrast learning and global contrast learning on the fake news data and its augmented data in the second field, learns the local information and global information of the fake news data in the second field, and improves the generalization ability of the model on the fake news data in the second field.

也就是本申请实施例在基于第一目标损失函数对社交媒体假新闻检测模型进行训练后,使得训练后的社交媒体假新闻检测模型不仅可以准确的提取第一领域新闻的特征信息以及新闻传播图中的节点与新闻传播图之间的全局对比损失、新闻传播图中的增广图节点之间的局部对比损失,还可以准确的提取第二领域新闻的特征信息以及新闻传播图中的节点与新闻传播图之间的全局对比损失、新闻传播图中的增广图节点之间的局部对比损失,也就使得社交媒体假新闻检测模型可以准确地识别和提取假新闻的特征信息,有效地促进和提升了假新闻的检测准确率,从而也就使得训练后的社交媒体假新闻检测模型可以对数据量较少、传播时间较短的第二领域的新闻真实性进行准确的检测,提升了对数据量较少、传播时间较短的第二领域假新闻检测的准确性。That is, after the embodiment of the present application trains the social media fake news detection model based on the first objective loss function, the trained social media fake news detection model can not only accurately extract the feature information of news in the first field and the global contrast loss between the nodes in the news propagation graph and the news propagation graph, and the local contrast loss between the augmented graph nodes in the news propagation graph, but also accurately extract the feature information of news in the second field and the global contrast loss between the nodes in the news propagation graph and the news propagation graph, and the local contrast loss between the augmented graph nodes in the news propagation graph. This means that the social media fake news detection model can accurately identify and extract the feature information of fake news, effectively promote and improve the detection accuracy of fake news, and thus the trained social media fake news detection model can accurately detect the authenticity of news in the second field with less data volume and shorter propagation time, thereby improving the accuracy of fake news detection in the second field with less data volume and shorter propagation time.

上述实施例的方法,根据第一领域的多个新闻传播图、第二领域的多个新闻传播图和第一目标损失函数,对社交媒体假新闻检测模型进行训练,使得训练后的社交媒体假新闻检测模型不仅可以准确的提取第一领域新闻的特征信息以及新闻传播图中的节点与新闻传播图之间的全局对比损失、新闻传播图中的增广图节点之间的局部对比损失,还可以准确的提取第二领域新闻的特征信息以及新闻传播图中的节点与新闻传播图之间的全局对比损失、新闻传播图中的增广图节点之间的局部对比损失,也就使得训练后的社交媒体假新闻检测模型可以准确、全面地提取到数据量较少、传播时间较短的第二领域的假新闻的特征信息,有效地促进和提升了假新闻的检测准确性,从而也就使得训练后的社交媒体假新闻检测模型可以对数据量较少、传播时间较短的第二领域的新闻真实性进行准确的检测,提升了对数据量较少、传播时间较短的第二领域假新闻检测的准确性。The method of the above embodiment trains a social media fake news detection model according to multiple news propagation graphs in the first field, multiple news propagation graphs in the second field and a first target loss function, so that the trained social media fake news detection model can not only accurately extract the feature information of news in the first field and the global contrast loss between the nodes in the news propagation graph and the news propagation graph, and the local contrast loss between the augmented graph nodes in the news propagation graph, but also accurately extract the feature information of news in the second field and the global contrast loss between the nodes in the news propagation graph and the news propagation graph, and the local contrast loss between the augmented graph nodes in the news propagation graph. This means that the trained social media fake news detection model can accurately and comprehensively extract the feature information of fake news in the second field with less data volume and shorter propagation time, effectively promoting and improving the detection accuracy of fake news, so that the trained social media fake news detection model can accurately detect the authenticity of news in the second field with less data volume and shorter propagation time, thereby improving the accuracy of fake news detection in the second field with less data volume and shorter propagation time.

在一实施例中,社交媒体假新闻检测模型,包括以下至少一项:In one embodiment, a social media fake news detection model includes at least one of the following:

特征提取模块;特征提取模块用于提取新闻传播图特征;Feature extraction module: The feature extraction module is used to extract the features of the news dissemination graph;

分类模块;分类模块用于根据新闻传播图特征,预测新闻传播图所对应的社交媒体新闻的真实性;Classification module: The classification module is used to predict the authenticity of social media news corresponding to the news diffusion graph based on the characteristics of the news diffusion graph;

自监督学习模块;自监督学习模块用于根据新闻传播图中的节点特征、新闻传播图的第一类型增广图中的节点特征与新闻传播图特征,确定全局对比损失;根据新闻传播图的第二类型增广图中的节点特征与新闻传播图的第三类型增广图中的节点特征,确定局部对比损失。Self-supervised learning module; the self-supervised learning module is used to determine the global contrast loss based on the node features in the news diffusion graph, the node features in the first type of augmented graph of the news diffusion graph, and the news diffusion graph features; and determine the local contrast loss based on the node features in the second type of augmented graph of the news diffusion graph and the node features in the third type of augmented graph of the news diffusion graph.

具体地,本申请实施例中,社交媒体假新闻检测模型包括特征提取模块、分类模块和自监督学习模块;其中,特征提取模块用于提取新闻传播图特征;可选地,特征提取模块是基于图卷积网络GCN建立的;分类模块用于根据新闻传播图特征,预测新闻传播图所对应的社交媒体新闻的真实性;自监督学习模块用于根据新闻传播图中的节点特征、新闻传播图的第一类型增广图中的节点特征与新闻传播图特征,确定全局对比损失;根据新闻传播图的第二类型增广图中的节点特征与新闻传播图的第三类型增广图中的节点特征,确定局部对比损失,从而也就使得社交媒体假新闻检测模型不仅可以准确的提取第一领域新闻的特征信息以及新闻传播图中的节点与新闻传播图之间的全局对比损失、新闻传播图中的增广图节点之间的局部对比损失,还可以准确的提取第二领域新闻的特征信息以及新闻传播图中的节点与新闻传播图之间的全局对比损失、新闻传播图中的增广图节点之间的局部对比损失;可选地,假新闻对应的全局对比损失、局部对比损失和真新闻对应的全局对比损失、局部对比损失是不同的;第一领域假新闻对应的全局对比损失、局部对比损失和第二领域假新闻对应的全局对比损失、局部对比损失也是存在区别的,从而也就使得训练后的社交媒体假新闻检测模型可以准确地识别和提取假新闻的特征信息,实现对数据量较少、传播时间较短的第二领域新闻的真实性的检测,提升了对数据量较少、传播时间较短的第二领域假新闻检测的准确性。Specifically, in an embodiment of the present application, a social media fake news detection model includes a feature extraction module, a classification module and a self-supervised learning module; wherein the feature extraction module is used to extract news propagation graph features; optionally, the feature extraction module is established based on a graph convolutional network (GCN); the classification module is used to predict the authenticity of social media news corresponding to the news propagation graph based on the news propagation graph features; the self-supervised learning module is used to determine the global contrast loss based on the node features in the news propagation graph, the node features in the first type of augmented graph of the news propagation graph, and the news propagation graph features; the local contrast loss is determined based on the node features in the second type of augmented graph of the news propagation graph and the node features in the third type of augmented graph of the news propagation graph, thereby enabling the social media fake news detection model to not only accurately extract the feature information of the news in the first field and the global contrast between the nodes in the news propagation graph and the news propagation graph The contrast loss, the local contrast loss between the augmented graph nodes in the news propagation graph, and the feature information of the news in the second field can also be accurately extracted, as well as the global contrast loss between the nodes in the news propagation graph and the news propagation graph, and the local contrast loss between the augmented graph nodes in the news propagation graph; optionally, the global contrast loss and local contrast loss corresponding to fake news are different from the global contrast loss and local contrast loss corresponding to true news; the global contrast loss and local contrast loss corresponding to fake news in the first field and the global contrast loss and local contrast loss corresponding to fake news in the second field are also different, so that the trained social media fake news detection model can accurately identify and extract the feature information of fake news, realize the detection of the authenticity of the second field news with less data volume and shorter propagation time, and improve the accuracy of detecting fake news in the second field with less data volume and shorter propagation time.

在一实施例中,社交媒体假新闻检测模型基于如下方式进行训练:In one embodiment, the social media fake news detection model is trained based on the following method:

将训练集中第一领域的多个新闻传播图和第二领域的多个新闻传播图输入至社交媒体假新闻检测模型,输出新闻传播图所对应的社交媒体新闻的真实性检测结果;根据社交媒体新闻的的真实性检测结果和新闻的标签信息,得到社交媒体假新闻检测模型的分类损失;标签信息用于标注新闻的真实性;Input multiple news propagation graphs in the first domain and multiple news propagation graphs in the second domain in the training set into the social media fake news detection model, and output the authenticity detection results of the social media news corresponding to the news propagation graphs; obtain the classification loss of the social media fake news detection model based on the authenticity detection results of the social media news and the label information of the news; the label information is used to mark the authenticity of the news;

将训练集中第一领域的多个新闻传播图、第一领域的多个新闻传播图所对应的第一类型增广图、第一领域的多个新闻传播图所对应的第二类型增广图和第一领域的多个新闻传播图所对应的第三类型增广图,输入至社交媒体假新闻检测模型,得到第一领域的多个新闻的全局对比损失和局部对比损失;Input multiple news propagation graphs in the first field in the training set, the first type of augmented graphs corresponding to the multiple news propagation graphs in the first field, the second type of augmented graphs corresponding to the multiple news propagation graphs in the first field, and the third type of augmented graphs corresponding to the multiple news propagation graphs in the first field into the social media fake news detection model to obtain the global contrast loss and local contrast loss of the multiple news in the first field;

将训练集中第二领域的多个新闻传播图、第二领域的多个新闻传播图所对应的第一类型增广图、第二领域的多个新闻传播图所对应的第二类型增广图和第二领域的多个新闻传播图所对应的第三类型增广图,输入至社交媒体假新闻检测模型,得到第二领域的多个新闻的全局对比损失和局部对比损失;Input multiple news propagation graphs in the second field in the training set, the first type of augmented graphs corresponding to the multiple news propagation graphs in the second field, the second type of augmented graphs corresponding to the multiple news propagation graphs in the second field, and the third type of augmented graphs corresponding to the multiple news propagation graphs in the second field into the social media fake news detection model to obtain the global contrast loss and local contrast loss of the multiple news in the second field;

将社交媒体假新闻检测模型的分类损失、第一领域的多个新闻的全局对比损失和局部对比损失、第二领域的多个新闻的全局对比损失和局部对比损失的三者加权之和作为第一目标损失函数的值;The weighted sum of the classification loss of the social media fake news detection model, the global contrast loss and local contrast loss of multiple news in the first field, and the global contrast loss and local contrast loss of multiple news in the second field is taken as the value of the first objective loss function;

基于第一目标损失函数的值,对社交媒体假新闻检测模型进行训练,得到基于训练集训练后的社交媒体假新闻检测模型。Based on the value of the first objective loss function, the social media fake news detection model is trained to obtain a social media fake news detection model trained based on the training set.

具体地,本申请在进行社交媒体假新闻检测模型的训练过程中,首先将训练集中第一领域的多个新闻传播图和第二领域的多个新闻传播图输入至社交媒体假新闻检测模型,社交媒体假新闻检测模型的分类模块输出新闻传播图所对应的社交媒体新闻的真实性检测结果,进而根据社交媒体新闻的的真实性检测结果和新闻的标签信息,也就可以得到社交媒体假新闻检测模型的分类损失(检测正确率)。Specifically, in the process of training the social media fake news detection model, the present application first inputs multiple news propagation graphs in the first field and multiple news propagation graphs in the second field in the training set into the social media fake news detection model, and the classification module of the social media fake news detection model outputs the authenticity detection results of the social media news corresponding to the news propagation graphs. Then, based on the authenticity detection results of the social media news and the label information of the news, the classification loss (detection accuracy) of the social media fake news detection model can be obtained.

进一步地,本申请实施例中将训练集中第一领域的多个新闻传播图、第一领域的多个新闻传播图所对应的第一类型增广图、第一领域的多个新闻传播图所对应的第二类型增广图和第一领域的多个新闻传播图所对应的第三类型增广图,输入至社交媒体假新闻检测模型,得到第一领域的多个新闻的全局对比损失和局部对比损失;也就是社交媒体假新闻检测模型中的自监督学习模块根据新闻传播图原图和其三类增广图的特征,获得图中节点的局部对比损失和全局对比损失;其中,全局对比损失是根据新闻传播图原图和经过对新闻传播图原图的节点随机打乱得到的增广图,进行对比学习,得到新闻传播图原图中每个节点与整个图之间的关系特征;局部对比损失是根据对新闻传播图原图的边进行有选择的丢弃,和对原图的节点特征进行有选择遮盖得到的两类增广图,进行对比学习得到的图中每个节点之间的关系特征。Furthermore, in an embodiment of the present application, multiple news propagation graphs in the first field in the training set, the first type of augmented graphs corresponding to the multiple news propagation graphs in the first field, the second type of augmented graphs corresponding to the multiple news propagation graphs in the first field, and the third type of augmented graphs corresponding to the multiple news propagation graphs in the first field are input into the social media fake news detection model to obtain the global contrast loss and local contrast loss of the multiple news in the first field; that is, the self-supervised learning module in the social media fake news detection model obtains the local contrast loss and the global contrast loss of the nodes in the graph according to the features of the original news propagation graph and its three types of augmented graphs; wherein the global contrast loss is based on the original news propagation graph and the augmented graph obtained by randomly shuffling the nodes of the original news propagation graph, and comparative learning is performed to obtain the relationship features between each node in the original news propagation graph and the entire graph; the local contrast loss is based on the relationship features between each node in the graph obtained by comparative learning of two types of augmented graphs obtained by selectively discarding the edges of the original news propagation graph and selectively covering the node features of the original graph.

例如,利用如下公式建模新闻传播图原图中每个节点与整个图之间的全局对比损失:For example, the global contrast loss between each node in the original news propagation graph and the entire graph is modeled using the following formula:

D(Zsi,s)=Sigmoid(Zsi*s)D(Z si ,s)=Sigmoid(Z si *s)

其中,Zsi表示新闻传播图的节点特征表示;s表示新闻传播图的特征表示;*表示内积;D表示判别器,分别计算正负样本的相关性分数;Sigmoid为激活函数。Among them, Z si represents the node feature representation of the news diffusion graph; s represents the feature representation of the news diffusion graph; * represents the inner product; D represents the discriminator, which calculates the correlation scores of positive and negative samples respectively; Sigmoid is the activation function.

对新闻传播图原图的节点随机打乱得到的增广图中每个节点与整个图之间的全局对比损失基于如下公式确定:The global contrast loss between each node in the augmented graph obtained by randomly shuffling the nodes of the original news diffusion graph and the entire graph is determined based on the following formula:

其中,N表示输入图中的节点数量;Z0i表示新闻传播图原图中的节点;Z1i表示对新闻传播图原图的节点随机打乱得到的增广图中的节点。Where N represents the number of nodes in the input graph; Z 0i represents the nodes in the original news diffusion graph; and Z 1i represents the nodes in the augmented graph obtained by randomly shuffling the nodes of the original news diffusion graph.

可选地,利用如下公式建模局部对比损失:Optionally, the local contrast loss is modeled using the following formula:

其中,Z2,Z3表示新闻数据的两类增广图;可选地,可以为对新闻传播图原图的边进行有选择的丢弃,和对原图的节点特征进行有选择遮盖得到的两类增广图;Wherein, Z 2 and Z 3 represent two types of augmented graphs of news data; optionally, they may be two types of augmented graphs obtained by selectively discarding the edges of the original news diffusion graph and selectively covering the node features of the original graph;

(Z2i,Z3j)(i,j∈{1,...,N},i≠j),N表示图中节点的个数,其中cos()表示cosine相似度函数,τ是一个超参数,g()用来进一步增强模型的表示能力。将经过g()计算的节点表示为Z2’和Z3’;(Z 2i ,Z 3j )(i,j∈{1,...,N},i≠j), N represents the number of nodes in the graph, where cos() represents the cosine similarity function, τ is a hyperparameter, and g() is used to further enhance the representation ability of the model. The nodes calculated by g() are represented as Z2' and Z3';

其中I表示单位矩阵,节点的局部对比损失最终的定义为如下:Where I represents the identity matrix, and the local contrast loss of the node is finally defined as follows:

进一步地,将训练集中第二领域的多个新闻传播图、第二领域的多个新闻传播图所对应的第一类型增广图、第二领域的多个新闻传播图所对应的第二类型增广图和第二领域的多个新闻传播图所对应的第三类型增广图,输入至社交媒体假新闻检测模型,得到第二领域的多个新闻的全局对比损失和局部对比损失;可选地,第二领域的多个新闻的全局对比损失和局部对比损失与第一领域全局对比损失和局部对比损失的确定方法一致,本申请实施例不再赘述。Furthermore, multiple news propagation graphs in the second field in the training set, the first type of augmented graphs corresponding to the multiple news propagation graphs in the second field, the second type of augmented graphs corresponding to the multiple news propagation graphs in the second field, and the third type of augmented graphs corresponding to the multiple news propagation graphs in the second field are input into the social media fake news detection model to obtain the global contrast loss and local contrast loss of the multiple news in the second field; optionally, the global contrast loss and local contrast loss of the multiple news in the second field are consistent with the method for determining the global contrast loss and the local contrast loss in the first field, and the embodiments of the present application will not be repeated.

可选地,在确定社交媒体假新闻检测模型的分类损失、第一领域的多个新闻的全局对比损失和局部对比损失、第二领域的多个新闻的全局对比损失和局部对比损失之后,本申请实施例中,将社交媒体假新闻检测模型的分类损失、第一领域的多个新闻的全局对比损失和局部对比损失、第二领域的多个新闻的全局对比损失和局部对比损失的三者加权之和作为第一目标损失函数的值;也就是本申请实施例在进行社交媒体假新闻检测模型训练的过程中,不仅只是单纯的考虑社交媒体假新闻检测模型的分类损失,而且考虑社交媒体假新闻检测模型是否可以准确、全面地提取到数据量较少、传播时间较短的第二领域的假新闻的特征信息,从而在准确地提取到数据量较少、传播时间较短的第二领域的假新闻的特征信息之后,也就可以有效地促进假新闻的识别准确性,从根本上解决了无法对数据量较少、传播时间较短的第二领域的假新闻的特征信息进行准确、全面提取的问题,有效地提升了数据量较少、传播时间较短的第二领域的假新闻检测的准确性。Optionally, after determining the classification loss of the social media fake news detection model, the global contrast loss and local contrast loss of multiple news in the first field, and the global contrast loss and local contrast loss of multiple news in the second field, in an embodiment of the present application, the weighted sum of the classification loss of the social media fake news detection model, the global contrast loss and local contrast loss of multiple news in the first field, and the global contrast loss and local contrast loss of multiple news in the second field is used as the value of the first target loss function; that is, in the process of training the social media fake news detection model, the embodiment of the present application not only simply considers the classification loss of the social media fake news detection model, but also considers whether the social media fake news detection model can accurately and comprehensively extract the feature information of the fake news in the second field with a small amount of data and a short propagation time, so that after accurately extracting the feature information of the fake news in the second field with a small amount of data and a short propagation time, it can also effectively promote the recognition accuracy of fake news, fundamentally solve the problem of not being able to accurately and comprehensively extract the feature information of the fake news in the second field with a small amount of data and a short propagation time, and effectively improve the accuracy of fake news detection in the second field with a small amount of data and a short propagation time.

上述实施例的方法,在进行社交媒体假新闻检测模型训练的过程中,不仅只是单纯的考虑社交媒体假新闻检测模型的分类损失,而且考虑社交媒体假新闻检测模型是否可以准确、全面地提取到数据量较少、传播时间较短的第二领域的假新闻的特征信息,从而在准确地提取到数据量较少、传播时间较短的第二领域的假新闻的特征信息之后,也就可以有效地促进假新闻的识别准确性,从根本上解决了无法对数据量较少、传播时间较短的第二领域的假新闻的特征信息进行准确、全面提取的问题,有效地提升了数据量较少、传播时间较短的第二领域的假新闻检测的准确性。The method of the above embodiment, during the process of training the social media fake news detection model, not only simply considers the classification loss of the social media fake news detection model, but also considers whether the social media fake news detection model can accurately and comprehensively extract the feature information of fake news in the second field with less data volume and shorter propagation time. Therefore, after accurately extracting the feature information of fake news in the second field with less data volume and shorter propagation time, the accuracy of fake news recognition can be effectively promoted, fundamentally solving the problem of not being able to accurately and comprehensively extract the feature information of fake news in the second field with less data volume and shorter propagation time, and effectively improving the accuracy of fake news detection in the second field with less data volume and shorter propagation time.

在一实施例中,社交媒体假新闻检测模型,还包括:In one embodiment, the social media fake news detection model further includes:

数据自适应约束模块;数据自适应约束模块用于确定训练集中的新闻传播图特征和测试集中的新闻传播图特征之间的差异。Data adaptive constraint module: The data adaptive constraint module is used to determine the difference between the news diffusion graph features in the training set and the news diffusion graph features in the test set.

具体地,本申请实施例中,社交媒体假新闻检测模型在包括特征提取模块、分类模块和自监督学习模块的基础上,还增加了数据自适应约束模块。其中,数据自适应约束模块用于确定训练集中的新闻传播图特征和测试集中的新闻传播图特征之间的差异。也就是在基于训练集完成社交媒体假新闻检测模型的训练之后,本申请实施例中基于数据自适应约束模块确定训练集中的新闻传播图特征和测试集中的新闻传播图特征之间的差异,从而实现对社交媒体假新闻检测模型的微调,使得测试集上特征提取器和训练集上特征提取器上的表现相近,实现对特征提取器的优化,防止过拟合问题的发生,从而也就通过增加数据自适应约束模块对训练集中的新闻传播图特征和测试集中的新闻传播图特征之间的差异进行约束之后的社交媒体假新闻检测模型可以更加准确的识别出待检测的第二领域中的假新闻。Specifically, in the embodiment of the present application, the social media fake news detection model, on the basis of including a feature extraction module, a classification module and a self-supervised learning module, further adds a data adaptive constraint module. Among them, the data adaptive constraint module is used to determine the difference between the news propagation graph features in the training set and the news propagation graph features in the test set. That is, after completing the training of the social media fake news detection model based on the training set, the embodiment of the present application determines the difference between the news propagation graph features in the training set and the news propagation graph features in the test set based on the data adaptive constraint module, thereby achieving fine-tuning of the social media fake news detection model, so that the performance of the feature extractor on the test set is similar to that of the feature extractor on the training set, and the feature extractor is optimized to prevent the occurrence of overfitting problems, so that the social media fake news detection model after constraining the difference between the news propagation graph features in the training set and the news propagation graph features in the test set by adding a data adaptive constraint module can more accurately identify the fake news in the second field to be detected.

例如,可以基于如下方式确定训练集中第二领域的新闻传播图特征和测试集中第二领域的新闻传播图特征之间的差异: For example, the difference between the news diffusion graph features of the second field in the training set and the news diffusion graph features of the second field in the test set can be determined based on the following method:

其中,h是训练集数据的特征矩阵,μ表示特征平均值,Σ表示协方差矩阵,用同样的方法获得测试集的特征平均值μt,和协方差矩阵Σt,训练集中第二领域的新闻传播图特征和测试集中第二领域的新闻传播图特征之间的差异如下:Among them, h is the feature matrix of the training set data, μ represents the feature mean, Σ represents the covariance matrix, and the feature mean μ t and covariance matrix Σ t of the test set are obtained in the same way. The difference between the news diffusion graph features of the second field in the training set and the news diffusion graph features of the second field in the test set is as follows:

也就是本申请实施例中数据自适应约束模块,通过计算训练集中假新闻数据与测试集中假新闻数据的两个统计值差异,确保了社交媒体假新闻检测模型不会对假新闻数据过拟合,提高了模型的泛化能力。That is, the data adaptive constraint module in the embodiment of the present application ensures that the social media fake news detection model does not overfit the fake news data by calculating the difference between the two statistical values of the fake news data in the training set and the fake news data in the test set, thereby improving the generalization ability of the model.

上述实施例的方法,社交媒体假新闻检测模型通过增加数据自适应约束模块,对训练集中的新闻传播图特征和测试集中的新闻传播图特征之间的差异进行约束,从而实现了对社交媒体假新闻检测模型的微调,使得测试集上特征提取器和训练集上特征提取器上的表现相近,实现对特征提取器的优化,防止过拟合问题的发生,从而也就通过增加数据自适应约束模块对训练集中的新闻传播图特征和测试集中的新闻传播图特征之间的差异进行约束之后的社交媒体假新闻检测模型可以更加准确的识别出待检测的第二领域中的假新闻。In the method of the above embodiment, the social media fake news detection model constrains the difference between the news propagation graph features in the training set and the news propagation graph features in the test set by adding a data adaptive constraint module, thereby achieving fine-tuning of the social media fake news detection model, so that the performance of the feature extractor on the test set is similar to that of the feature extractor on the training set, thereby achieving optimization of the feature extractor and preventing the occurrence of overfitting problems. Therefore, the social media fake news detection model after constraining the difference between the news propagation graph features in the training set and the news propagation graph features in the test set by adding a data adaptive constraint module can more accurately identify fake news in the second field to be detected.

在一实施例中,基于第一目标损失函数的值,对社交媒体假新闻检测模型进行训练,得到基于训练集训练后的社交媒体假新闻检测模型之后,还包括:In one embodiment, after training the social media fake news detection model based on the value of the first objective loss function to obtain the social media fake news detection model trained based on the training set, the method further includes:

将测试集中第二领域的多个新闻传播图、测试集中第二领域的多个新闻传播图所对应的第一类型增广图、测试集中第二领域的多个新闻传播图所对应的第二类型增广图和测试集中第二领域的多个新闻传播图所对应的第三类型增广图,输入至基于训练集训练后的社交媒体假新闻检测模型,得到测试集中第二领域的多个新闻的全局对比损失和局部对比损失;Input multiple news propagation graphs in the second field in the test set, the first type of augmented graphs corresponding to the multiple news propagation graphs in the second field in the test set, the second type of augmented graphs corresponding to the multiple news propagation graphs in the second field in the test set, and the third type of augmented graphs corresponding to the multiple news propagation graphs in the second field in the test set into the social media fake news detection model trained based on the training set, and obtain the global contrast loss and local contrast loss of the multiple news in the second field in the test set;

根据训练集中第二领域的新闻传播图特征和测试集中第二领域的新闻传播图特征之间的差异、测试集中第二领域的多个新闻的全局对比损失和局部对比损失加权之和作为第二目标损失函数的值;According to the difference between the news propagation graph features of the second field in the training set and the news propagation graph features of the second field in the test set, the weighted sum of the global contrast loss and the local contrast loss of multiple news in the second field in the test set is used as the value of the second objective loss function;

基于第二目标损失函数的值,对社交媒体假新闻检测模型进行训练,得到基于测试集潜在特征的社交媒体假新闻检测模型。Based on the value of the second objective loss function, the social media fake news detection model is trained to obtain a social media fake news detection model based on the potential features of the test set.

具体地,本申请实施例将测试集中第二领域的多个新闻传播图、测试集中第二领域的多个新闻传播图所对应的第一类型增广图、测试集中第二领域的多个新闻传播图所对应的第二类型增广图和测试集中第二领域的多个新闻传播图所对应的第三类型增广图,输入至基于训练集训练后的社交媒体假新闻检测模型,得到测试集中第二领域的多个新闻的全局对比损失和局部对比损失;可选地,确定测试集中第二领域的多个新闻的全局对比损失和局部对比损失的方法步骤与确定训练集中多个新闻的全局对比损失和局部对比损失的方法步骤相类似,本申请实施例不再赘述。Specifically, the embodiment of the present application inputs multiple news propagation graphs in the second field in the test set, the first type of augmented graphs corresponding to the multiple news propagation graphs in the second field in the test set, the second type of augmented graphs corresponding to the multiple news propagation graphs in the second field in the test set, and the third type of augmented graphs corresponding to the multiple news propagation graphs in the second field in the test set into a social media fake news detection model trained based on the training set to obtain the global contrast loss and local contrast loss of the multiple news in the second field in the test set; optionally, the method steps for determining the global contrast loss and local contrast loss of the multiple news in the second field in the test set are similar to the method steps for determining the global contrast loss and local contrast loss of the multiple news in the training set, and the embodiments of the present application will not be repeated herein.

进一步地,在得到测试集中第二领域的多个新闻的全局对比损失和局部对比损失之后,就可以基于社交媒体假新闻检测模型中的数据自适应约束模块确定训练集中第二领域的新闻传播图特征和测试集中第二领域的新闻传播图特征之间的差异;进而也就可以将训练集中第二领域的新闻传播图特征和测试集中第二领域的新闻传播图特征之间的差异、测试集中第二领域的多个新闻的全局对比损失和局部对比损失加权之和作为第二目标损失函数的值,并基于第二目标损失函数的值,对社交媒体假新闻检测模型进行训练,得到基于测试集潜在特征的社交媒体假新闻检测模型。也就是在训练集的损失函数中增加训练集中第二领域的新闻传播图特征和测试集中第二领域的新闻传播图特征之间的差异的约束,从而实现了对社交媒体假新闻检测模型的微调,使得测试集上特征提取器和训练集上特征提取器上的表现相近,实现对特征提取器的优化,防止过拟合问题的发生,从而也就通过增加对训练集中的新闻传播图特征和测试集中的新闻传播图特征之间的差异进行约束之后的社交媒体假新闻检测模型可以更加准确的识别出待检测的第二领域中的假新闻。可选地,如果在损失函数中不增加训练集中第二领域的新闻传播图特征和测试集中第二领域的新闻传播图特征之间的差异的约束,会导致过拟合问题的发生,从而使得基于训练集训练后的社交媒体假新闻检测模型无法对待检测的第二领域新闻的真实性作出准确的检测,也就无法准确有效地识别出第二领域的假新闻。Furthermore, after obtaining the global contrast loss and local contrast loss of multiple news in the second field in the test set, the difference between the news propagation graph features of the second field in the training set and the news propagation graph features of the second field in the test set can be determined based on the data adaptive constraint module in the social media fake news detection model; and then the difference between the news propagation graph features of the second field in the training set and the news propagation graph features of the second field in the test set, and the weighted sum of the global contrast loss and the local contrast loss of multiple news in the second field in the test set can be used as the value of the second objective loss function, and based on the value of the second objective loss function, the social media fake news detection model is trained to obtain a social media fake news detection model based on the potential features of the test set. That is, the constraint of the difference between the news propagation graph features in the second field in the training set and the news propagation graph features in the second field in the test set is added to the loss function of the training set, thereby achieving fine-tuning of the social media fake news detection model, making the performance of the feature extractor on the test set and the feature extractor on the training set similar, achieving optimization of the feature extractor, and preventing the occurrence of overfitting problems. Therefore, by adding the constraint on the difference between the news propagation graph features in the training set and the news propagation graph features in the test set, the social media fake news detection model can more accurately identify the fake news in the second field to be detected. Optionally, if the constraint of the difference between the news propagation graph features in the second field in the training set and the news propagation graph features in the second field in the test set is not added to the loss function, it will lead to the occurrence of overfitting problems, so that the social media fake news detection model trained based on the training set cannot accurately detect the authenticity of the news in the second field to be detected, and cannot accurately and effectively identify the fake news in the second field.

上述实施例的方法,在损失函数中增加训练集中第二领域的新闻传播图特征和测试集中第二领域的新闻传播图特征之间的差异的约束,从而实现了对社交媒体假新闻检测模型的微调和特征提取器的优化,使得测试集上特征提取器和训练集上特征提取器上的表现相近,有效地防止过拟合问题的发生,从而也就使得增加训练集中的新闻传播图特征和测试集中的新闻传播图特征之间的差异进行约束之后的社交媒体假新闻检测模型可以更加准确的识别出待检测的第二领域中的假新闻。The method of the above embodiment adds a constraint on the difference between the news propagation graph features of the second field in the training set and the news propagation graph features of the second field in the test set to the loss function, thereby achieving fine-tuning of the social media fake news detection model and optimization of the feature extractor, so that the performance of the feature extractor on the test set is similar to that of the feature extractor on the training set, effectively preventing the occurrence of overfitting problems, and thus making the social media fake news detection model after adding the constraint of the difference between the news propagation graph features in the training set and the news propagation graph features in the test set can more accurately identify fake news in the second field to be detected.

本申请实施例提供一种社交媒体假新闻检测方法,包括:The present application embodiment provides a method for detecting fake news on social media, including:

获取待检测的第二领域的新闻传播图;Obtain a news dissemination graph of a second field to be detected;

将待检测的第二领域的新闻传播图输入社交媒体假新闻检测模型中,得到第二领域的新闻传播图所对应的社交媒体新闻的真实性检测结果;社交媒体假新闻检测模型为基于如上述任一项的社交媒体假新闻检测模型的训练方法训练得到的。The news propagation graph of the second field to be detected is input into the social media fake news detection model to obtain the authenticity detection result of the social media news corresponding to the news propagation graph of the second field; the social media fake news detection model is trained based on the training method of the social media fake news detection model such as any of the above items.

具体地,在基于训练集和测试集进行社交媒体假新闻检测模型的训练之后,就可以基于训练后的社交媒体假新闻检测模型准确地进行第二领域假新闻的检测。可选地,可以首先获取待检测的第二领域的新闻传播图,然后将待检测的第二领域的新闻传播图输入训练后的社交媒体假新闻检测模型中,就可以得到第二领域的新闻传播图所对应的社交媒体新闻的真实性检测结果,从而实现了对数据量较少、传播时间较短的第二领域新闻的真实性的检测,提升了对数据量较少、传播时间较短的第二领域假新闻检测的准确性。Specifically, after training the social media fake news detection model based on the training set and the test set, the fake news in the second field can be accurately detected based on the trained social media fake news detection model. Optionally, the news propagation graph of the second field to be detected can be first obtained, and then the news propagation graph of the second field to be detected can be input into the trained social media fake news detection model, and the authenticity detection result of the social media news corresponding to the news propagation graph of the second field can be obtained, thereby realizing the authenticity detection of the second field news with less data volume and shorter propagation time, and improving the accuracy of the second field fake news detection with less data volume and shorter propagation time.

示例性的,本申请实施例中的社交媒体假新闻检测模型的训练方法的具体流程如图3和图4所示,具体如下:Exemplarily, the specific process of the training method of the social media fake news detection model in the embodiment of the present application is shown in Figures 3 and 4, and is as follows:

1、基于训练集中第一领域(高资源)新闻进行社交媒体假新闻检测模型的训练。1. Train the social media fake news detection model based on the first domain (high resource) news in the training set.

通过对第一领域新闻数据作三种不同类型的增广操作:丢边、特征打乱和特征掩盖,生成三种不同类型的增广图。使用特征提取器获得第一领域新闻数据与三种类型增广图的高维特征矩阵。主任务与辅助任务共用一个特征提取器。将第一领域假新闻数据输入到主任务中,同时第一领域新闻数据与三种类型的增广图输入到辅助任务中,并将做了丢边操作与特征随机掩盖操作的增广图作局部对比学习,将第一领域新闻数据与做了特征打乱操作的增广图作全局对比学习。By performing three different types of augmentation operations on the first domain news data: edge loss, feature shuffling, and feature masking, three different types of augmented graphs are generated. A feature extractor is used to obtain the high-dimensional feature matrix of the first domain news data and the three types of augmented graphs. The main task and the auxiliary task share a feature extractor. The first domain fake news data is input into the main task, and the first domain news data and the three types of augmented graphs are input into the auxiliary task. The augmented graphs with edge loss and feature random masking are locally compared and learned, and the first domain news data and the augmented graphs with feature shuffling are globally compared and learned.

1.1基于图卷积神经网络的全局对比学习1.1 Global Contrastive Learning Based on Graph Convolutional Neural Networks

全局对比学习的目的是帮助图数据中的节点表示获得整个图的全局信息。给定一个输入事件通过各种类型的数据增广可以产生不同的图。对于全局对比学习,采用了两种视图:一种是原始视图View0;另一种是增强视图View1,图中的所有节点的节点属性都被随机分配。有了这两个图,就可以通过一个共享的特征提取器得到两个相应的节点表示Z0和Z1。之后,可以通过多层感知机从原始视图View0的节点表示矩阵Z0中总结出一个全局图表示s。The purpose of global contrastive learning is to help the node representation in the graph data obtain the global information of the entire graph. Given an input event Different graphs can be generated through various types of data augmentation. For global contrastive learning, two views are used: one is the original view View0; the other is the enhanced view View1, where the node attributes of all nodes in the graph are randomly assigned. With these two graphs, two corresponding node representations Z0 and Z1 can be obtained through a shared feature extractor. Afterwards, a global graph representation s can be summarized from the node representation matrix Z0 of the original view View0 through a multi-layer perceptron.

全局对比学习中的正样本由节点-图表示对组成,其中节点表示和图形表示都来自原始视图View0。负面样本同样也是由节点-图表示对组成,其中节点表示来自View1,图表示来自View0。判别器D分别计算正负样本的分数,对于正样本,分数应该更高,对于负样本,分数应该更低一些。其中D的表示定义如下:The positive samples in global contrastive learning are composed of node-graph representation pairs, where both the node representation and the graph representation come from the original view View0. Negative samples are also composed of node-graph representation pairs, where the node representation comes from View1 and the graph representation comes from View0. The discriminator D calculates the scores of positive and negative samples respectively. For positive samples, the score should be higher, and for negative samples, the score should be lower. The representation of D is defined as follows:

D(Zsi,s)=Sigmoid(Zsi*s)D(Z si ,s)=Sigmoid(Z si *s)

其中Zsi表示图Views的节点表示,*表示内积。全局对比学习的损失函数如下:Where Z si represents the node representation of the graph Views, and * represents the inner product. The loss function of global contrastive learning is as follows:

其中,N表示输入图中的节点数量。Where N represents the number of nodes in the input graph.

1.2基于图卷积神经网络的局部对比学习1.2 Local Contrastive Learning Based on Graph Convolutional Neural Networks

通过局部对比学习,模型可以学到更丰富的节点特征表示。通过比较节点与其邻居节点的特征差异,模型可以学习到更具区分性的节点表示,可以减少节点位置,噪声和缺失值等对图数据的影响,提高图数据的鲁棒性和可解释性。通过局部对比学习,模型可以更好的捕获节点与其邻居节点之间的相似性和差异性,从而更好的理解节点的上下文信息。Through local contrastive learning, the model can learn richer node feature representations. By comparing the feature differences between a node and its neighboring nodes, the model can learn more discriminative node representations, which can reduce the impact of node position, noise, and missing values on graph data, and improve the robustness and interpretability of graph data. Through local contrastive learning, the model can better capture the similarities and differences between a node and its neighboring nodes, thereby better understanding the contextual information of the node.

给定一个输入事件通过图增广策略(边删除和节点属性遮盖),获得两个不同增广图,作为特征提取器的输入。特征提取器的输出是两个节点表示矩阵,Z2,Z3。局部对比学习的基本目标是区分来自增强视图的两个节点是否是同一个节点,因此,(Z2i,Z3i)(i∈{1,...,N})表示正对,其中N是节点的数量,(Z2i,Z3j)和(Z2i,Z2j)(i,j∈{1,...,N},i≠j)表示类内负对和类间负对。正对的目标函数定义如下:Given an input event Through the graph augmentation strategy (edge deletion and node attribute masking), two different augmented graphs are obtained as the input of the feature extractor. The output of the feature extractor is two node representation matrices, Z 2 and Z 3 . The basic goal of local contrastive learning is to distinguish whether two nodes from the enhanced view are the same node. Therefore, (Z 2i ,Z 3i )(i∈{1,...,N}) represents a positive pair, where N is the number of nodes, (Z 2i ,Z 3j ) and (Z 2i ,Z 2j )(i,j∈{1,...,N},i≠j) represent intra-class negative pairs and inter-class negative pairs. The objective function of the positive pair is defined as follows:

其中cos()表示cosine相似度函数,τ是一个超参数,g()是一个两层的MLP。将经过MLP的节点表示为Z2’和Z3’,正则化修饰器被用于重新定义的节点表征上。如下:Where cos() represents the cosine similarity function, τ is a hyperparameter, and g() is a two-layer MLP. The nodes that pass through the MLP are represented as Z2' and Z3', and the regularizer is used on the redefined node representation. As follows:

局部对比最终的损失函数定义如下:The final loss function of local comparison is defined as follows:

基于测试时训练的高资源假新闻数据的训练方法,辅助任务的最终损失函数如下:Based on the training method of high-resource fake news data trained at test time, the final loss function of the auxiliary task is as follows:

Ls=Lg+αLl LsLg + αLl

基于测试时训练的高资源假新闻数据的训练方法,损失函数如下:Based on the training method of high-resource fake news data trained at test time, the loss function is as follows:

L=Lm+γLs L= Lm + γLs

其中Lm是主任务的损失。Where Lm is the loss of the main task.

2、基于训练集中第二领域(低资源)新闻进行社交媒体假新闻检测模型的训练。2. Train the social media fake news detection model based on the second domain (low-resource) news in the training set.

对第二领域新闻数据作三种不同类型的增广操作:丢边、特征打乱和特征掩盖,生成三种不同类型的增广图。使用特征提取器获得第二领域新闻数据与三种类型增广图的的高维特征矩阵。此处的特征提取器与第一部分所提到的特征提取器相同。将第二领域新闻数据与三种类型的增广图输入到辅助任务中,将做了丢边操作与特征随机掩盖操作的增广图作局部对比学习,将第二领域假新闻数据与做了特征打乱操作的增广图作全局对比学习。基于训练集中第二领域新闻进行社交媒体假新闻检测模型的训练,同样采用包含了全局对比学习和局部对比学的自监督学习框架,与基于训练集中第一领域新闻进行社交媒体假新闻检测模型的训练方法中辅助任务的模型相同,这里就不再赘述。Perform three different types of augmentation operations on the second domain news data: edge loss, feature shuffling, and feature masking to generate three different types of augmented graphs. Use a feature extractor to obtain the high-dimensional feature matrix of the second domain news data and the three types of augmented graphs. The feature extractor here is the same as the feature extractor mentioned in the first part. Input the second domain news data and the three types of augmented graphs into the auxiliary task, perform local contrast learning on the augmented graphs that have undergone edge loss and feature random masking operations, and perform global contrast learning on the second domain fake news data and the augmented graphs that have undergone feature shuffling operations. The training of the social media fake news detection model based on the second domain news in the training set also adopts a self-supervised learning framework that includes global contrast learning and local contrast learning. The model of the auxiliary task is the same as the training method of the social media fake news detection model based on the first domain news in the training set, so it will not be repeated here.

3、基于测试时训练的数据自适应约束方法。3. Data adaptive constraint method based on training during test time.

使用READOUT函数获得新闻数据的特征矩阵,计算新闻数据特征矩阵的两个统计数据,特征平均值和协方差矩阵。逐个计算第二领域假新闻数据的两个统计数据,特征值和协方差矩阵,构建损失函数。Use the READOUT function to obtain the feature matrix of the news data, and calculate two statistics of the news data feature matrix, the feature mean and the covariance matrix. Calculate two statistics of the fake news data in the second field one by one, the eigenvalue and the covariance matrix, and construct the loss function.

定义使用READOUT函数(READOUT函数为特征提取器)definition Use the READOUT function (READOUT function is a feature extractor)

获得新闻数据的特征矩阵,计算新闻数据特征矩阵的两个统计数据,特征平均值和协方差矩阵,特征平均值和协方差矩阵的定义如下:Get the feature matrix of news data, calculate two statistical data of the feature matrix of news data, the feature mean and covariance matrix, the definitions of the feature mean and covariance matrix are as follows:

通过READOUT函数计算第二领域低资源假新闻数据的特征矩阵,同样计算其两个统计数据,特征平均值和协方差矩阵,构建损失函数如下:The feature matrix of the low-resource fake news data in the second domain is calculated through the READOUT function. Its two statistical data, the feature mean and the covariance matrix, are also calculated, and the loss function is constructed as follows:

其中μt,∑t分别表示低资源假新闻数据的特征平均值和协方差矩阵。where μ t ,∑ t represent the feature mean and covariance matrix of low-resource fake news data, respectively.

上述实施例的方法,通过获取待检测的新闻数据,并将新闻数据输入到假新闻检测模型中,来判断新闻数据是否为假新闻。本发明的模型训练方法是将两个不同领域的假新闻数据特征融合到模型表示中,从而实现了对目标数据域更好的建模,并在测试阶段再次对目标数据进行训练,增强了模型对目标数据的表示能力,可以更加准确高效的对假新闻进行检测,提高了跨域检测假新闻的准确性。The method of the above embodiment determines whether the news data is fake news by obtaining the news data to be detected and inputting the news data into the fake news detection model. The model training method of the present invention integrates the fake news data features of two different fields into the model representation, thereby achieving better modeling of the target data domain, and training the target data again in the test phase, thereby enhancing the model's ability to represent the target data, and can detect fake news more accurately and efficiently, thereby improving the accuracy of cross-domain detection of fake news.

下面对本发明提供的社交媒体假新闻检测模型的训练装置进行描述,下文描述的社交媒体假新闻检测模型的训练装置与上文描述的社交媒体假新闻检测模型的训练方法可相互对应参照。The training device for the social media fake news detection model provided by the present invention is described below. The training device for the social media fake news detection model described below and the training method for the social media fake news detection model described above can refer to each other.

图5是本发明提供的社交媒体假新闻检测模型的训练装置的结构示意图。本实施例提供的社交媒体假新闻检测模型的训练装置,包括:FIG5 is a schematic diagram of the structure of a training device for a social media fake news detection model provided by the present invention. The training device for a social media fake news detection model provided by this embodiment includes:

第一获取模块710,用于获取训练集中第一领域的多个新闻传播图;新闻传播图用于表示新闻的传播路径;The first acquisition module 710 is used to acquire a plurality of news diffusion graphs in the first field in the training set; the news diffusion graph is used to represent the diffusion path of news;

第二获取模块720,用于获取训练集中第二领域的多个新闻传播图;第二领域的新闻传播图的数量与第一领域的新闻传播图的数量的比值小于阈值;The second acquisition module 720 is used to acquire a plurality of news diffusion graphs in a second field in the training set; the ratio of the number of news diffusion graphs in the second field to the number of news diffusion graphs in the first field is less than a threshold;

训练模块730,用于根据第一领域的多个新闻传播图、第二领域的多个新闻传播图和第一目标损失函数,对社交媒体假新闻检测模型进行训练,得到基于训练集训练后的社交媒体假新闻检测模型;社交媒体假新闻检测模型用于检测新闻的真实性;第一目标损失函数是由分类损失、全局对比损失和局部对比损失确定的;全局对比损失表示新闻传播图中的节点特征、新闻传播图的第一类型增广图中的节点特征与新闻传播图特征之间的关联程度;局部对比损失表示新闻传播图的第二类型增广图中的节点特征与新闻传播图的第三类型增广图中的节点特征之间的关联程度;分类损失表示分类结果的准确程度。The training module 730 is used to train the social media fake news detection model according to multiple news propagation graphs in the first field, multiple news propagation graphs in the second field and the first target loss function, so as to obtain the social media fake news detection model trained based on the training set; the social media fake news detection model is used to detect the authenticity of news; the first target loss function is determined by classification loss, global contrast loss and local contrast loss; the global contrast loss represents the degree of correlation between the node features in the news propagation graph, the node features in the first type of augmented graph of the news propagation graph and the features of the news propagation graph; the local contrast loss represents the degree of correlation between the node features in the second type of augmented graph of the news propagation graph and the node features in the third type of augmented graph of the news propagation graph; the classification loss represents the accuracy of the classification result.

可选地,社交媒体假新闻检测模型,包括以下至少一项:Optionally, the social media fake news detection model includes at least one of the following:

特征提取模块;特征提取模块用于提取新闻传播图特征;Feature extraction module: The feature extraction module is used to extract the features of the news dissemination graph;

分类模块;分类模块用于根据新闻传播图特征,预测新闻传播图所对应的社交媒体新闻的真实性;Classification module: The classification module is used to predict the authenticity of social media news corresponding to the news diffusion graph based on the characteristics of the news diffusion graph;

自监督学习模块;自监督学习模块用于根据新闻传播图中的节点特征、新闻传播图的第一类型增广图中的节点特征与新闻传播图特征,确定全局对比损失;根据新闻传播图的第二类型增广图中的节点特征与新闻传播图的第三类型增广图中的节点特征,确定局部对比损失。Self-supervised learning module; the self-supervised learning module is used to determine the global contrast loss based on the node features in the news diffusion graph, the node features in the first type of augmented graph of the news diffusion graph, and the news diffusion graph features; and determine the local contrast loss based on the node features in the second type of augmented graph of the news diffusion graph and the node features in the third type of augmented graph of the news diffusion graph.

可选地,社交媒体假新闻检测模型基于如下方式进行训练:Optionally, the social media fake news detection model is trained based on:

将训练集中第一领域的多个新闻传播图和第二领域的多个新闻传播图输入至社交媒体假新闻检测模型,输出新闻传播图所对应的社交媒体新闻的真实性检测结果;根据社交媒体新闻的的真实性检测结果和新闻的标签信息,得到社交媒体假新闻检测模型的分类损失;标签信息用于标注新闻的真实性;Input multiple news propagation graphs in the first domain and multiple news propagation graphs in the second domain in the training set into the social media fake news detection model, and output the authenticity detection results of the social media news corresponding to the news propagation graphs; obtain the classification loss of the social media fake news detection model based on the authenticity detection results of the social media news and the label information of the news; the label information is used to mark the authenticity of the news;

将训练集中第一领域的多个新闻传播图、第一领域的多个新闻传播图所对应的第一类型增广图、第一领域的多个新闻传播图所对应的第二类型增广图和第一领域的多个新闻传播图所对应的第三类型增广图,输入至社交媒体假新闻检测模型,得到第一领域的多个新闻的全局对比损失和局部对比损失;Input multiple news propagation graphs in the first field in the training set, the first type of augmented graphs corresponding to the multiple news propagation graphs in the first field, the second type of augmented graphs corresponding to the multiple news propagation graphs in the first field, and the third type of augmented graphs corresponding to the multiple news propagation graphs in the first field into the social media fake news detection model to obtain the global contrast loss and local contrast loss of the multiple news in the first field;

将训练集中第二领域的多个新闻传播图、第二领域的多个新闻传播图所对应的第一类型增广图、第二领域的多个新闻传播图所对应的第二类型增广图和第二领域的多个新闻传播图所对应的第三类型增广图,输入至社交媒体假新闻检测模型,得到第二领域的多个新闻的全局对比损失和局部对比损失;Input multiple news propagation graphs in the second field in the training set, the first type of augmented graphs corresponding to the multiple news propagation graphs in the second field, the second type of augmented graphs corresponding to the multiple news propagation graphs in the second field, and the third type of augmented graphs corresponding to the multiple news propagation graphs in the second field into the social media fake news detection model to obtain the global contrast loss and local contrast loss of the multiple news in the second field;

将社交媒体假新闻检测模型的分类损失、第一领域的多个新闻的全局对比损失和局部对比损失、第二领域的多个新闻的全局对比损失和局部对比损失的三者加权之和作为第一目标损失函数的值;The weighted sum of the classification loss of the social media fake news detection model, the global contrast loss and local contrast loss of multiple news in the first field, and the global contrast loss and local contrast loss of multiple news in the second field is taken as the value of the first objective loss function;

基于第一目标损失函数的值,对社交媒体假新闻检测模型进行训练,得到基于训练集训练后的社交媒体假新闻检测模型。Based on the value of the first objective loss function, the social media fake news detection model is trained to obtain a social media fake news detection model trained based on the training set.

可选地,社交媒体假新闻检测模型,还包括:Optionally, the social media fake news detection model also includes:

数据自适应约束模块;数据自适应约束模块用于确定训练集中的新闻传播图特征和测试集中的新闻传播图特征之间的差异。Data adaptive constraint module: The data adaptive constraint module is used to determine the difference between the news diffusion graph features in the training set and the news diffusion graph features in the test set.

可选地,所述训练模块730,还用于将测试集中第二领域的多个新闻传播图、测试集中第二领域的多个新闻传播图所对应的第一类型增广图、测试集中第二领域的多个新闻传播图所对应的第二类型增广图和测试集中第二领域的多个新闻传播图所对应的第三类型增广图,输入至基于训练集训练后的社交媒体假新闻检测模型,得到测试集中第二领域的多个新闻的全局对比损失和局部对比损失;Optionally, the training module 730 is further used to input multiple news propagation graphs in the second field in the test set, the first type of augmented graphs corresponding to the multiple news propagation graphs in the second field in the test set, the second type of augmented graphs corresponding to the multiple news propagation graphs in the second field in the test set, and the third type of augmented graphs corresponding to the multiple news propagation graphs in the second field in the test set into the social media fake news detection model trained based on the training set, so as to obtain the global contrast loss and local contrast loss of the multiple news in the second field in the test set;

根据训练集中第二领域的新闻传播图特征和测试集中第二领域的新闻传播图特征之间的差异、测试集中第二领域的多个新闻的全局对比损失和局部对比损失加权之和作为第二目标损失函数的值;According to the difference between the news propagation graph features of the second field in the training set and the news propagation graph features of the second field in the test set, the weighted sum of the global contrast loss and the local contrast loss of multiple news in the second field in the test set is used as the value of the second objective loss function;

基于第二目标损失函数的值,对社交媒体假新闻检测模型进行训练,得到基于测试集潜在特征的社交媒体假新闻检测模型。Based on the value of the second objective loss function, the social media fake news detection model is trained to obtain a social media fake news detection model based on the potential features of the test set.

本发明实施例的装置,其用于执行前述任一方法实施例中的方法,其实现原理和技术效果类似,此次不再赘述。The device of the embodiment of the present invention is used to execute the method in any of the aforementioned method embodiments. Its implementation principle and technical effects are similar and will not be repeated here.

图6示例了一种电子设备的实体结构示意图,该电子设备可以包括:处理器(processor)810、通信接口(Communications Interface)820、存储器(memory)830和通信总线840,其中,处理器810,通信接口820,存储器830通过通信总线840完成相互间的通信。处理器810可以调用存储器830中的逻辑指令,以执行社交媒体假新闻检测模型的训练方法,该方法包括:获取训练集中第一领域的多个新闻传播图;新闻传播图用于表示新闻的传播路径;获取训练集中第二领域的多个新闻传播图;第二领域的新闻传播图的数量与第一领域的新闻传播图的数量的比值小于阈值;根据第一领域的多个新闻传播图、第二领域的多个新闻传播图和第一目标损失函数,对社交媒体假新闻检测模型进行训练,得到基于训练集训练后的社交媒体假新闻检测模型;社交媒体假新闻检测模型用于检测新闻的真实性;第一目标损失函数是由分类损失、全局对比损失和局部对比损失确定的;全局对比损失表示新闻传播图中的节点特征、新闻传播图的第一类型增广图中的节点特征与新闻传播图特征之间的关联程度;局部对比损失表示新闻传播图的第二类型增广图中的节点特征与新闻传播图的第三类型增广图中的节点特征之间的关联程度;分类损失表示分类结果的准确程度。FIG6 illustrates a schematic diagram of the physical structure of an electronic device, which may include: a processor 810, a communications interface 820, a memory 830 and a communications bus 840, wherein the processor 810, the communications interface 820, and the memory 830 communicate with each other via the communications bus 840. The processor 810 can call the logic instructions in the memory 830 to execute the training method of the social media fake news detection model, which includes: obtaining multiple news propagation graphs in the first field in the training set; the news propagation graph is used to represent the propagation path of news; obtaining multiple news propagation graphs in the second field in the training set; the ratio of the number of news propagation graphs in the second field to the number of news propagation graphs in the first field is less than a threshold; according to the multiple news propagation graphs in the first field, the multiple news propagation graphs in the second field and the first target loss function, the social media fake news detection model is trained to obtain the social media fake news detection model based on the training set; the social media fake news detection model is used to detect the authenticity of news; the first target loss function is determined by classification loss, global contrast loss and local contrast loss; the global contrast loss represents the degree of correlation between the node features in the news propagation graph, the node features in the first type of augmented graph of the news propagation graph and the news propagation graph features; the local contrast loss represents the degree of correlation between the node features in the second type of augmented graph of the news propagation graph and the node features in the third type of augmented graph of the news propagation graph; the classification loss represents the accuracy of the classification result.

此外,上述的存储器830中的逻辑指令可以通过软件功能单元的形式实现并作为独立的产品销售或使用时,可以存储在一个计算机可读取存储介质中。基于这样的理解,本发明的技术方案本质上或者说对现有技术做出贡献的部分或者该技术方案的部分可以以软件产品的形式体现出来,该计算机软件产品存储在一个存储介质中,包括若干指令用以使得一台计算机设备(可以是个人计算机,服务器,或者网络设备等)执行本发明各个实施例所述方法的全部或部分步骤。而前述的存储介质包括:U盘、移动硬盘、只读存储器(ROM,Read-Only Memory)、随机存取存储器(RAM,Random Access Memory)、磁碟或者光盘等各种可以存储程序代码的介质。In addition, the logic instructions in the above-mentioned memory 830 can be implemented in the form of a software functional unit and can be stored in a computer-readable storage medium when it is sold or used as an independent product. Based on such an understanding, the technical solution of the present invention, in essence, or the part that contributes to the prior art or the part of the technical solution, can be embodied in the form of a software product, and the computer software product is stored in a storage medium, including a number of instructions for a computer device (which can be a personal computer, a server, or a network device, etc.) to perform all or part of the steps of the method described in each embodiment of the present invention. The aforementioned storage medium includes: U disk, mobile hard disk, read-only memory (ROM, Read-Only Memory), random access memory (RAM, Random Access Memory), disk or optical disk and other media that can store program codes.

另一方面,本发明还提供一种计算机程序产品,所述计算机程序产品包括存储在非暂态计算机可读存储介质上的计算机程序,所述计算机程序包括程序指令,当所述程序指令被计算机执行时,计算机能够执行上述各方法所提供的社交媒体假新闻检测模型的训练方法,该方法包括:获取训练集中第一领域的多个新闻传播图;新闻传播图用于表示新闻的传播路径;获取训练集中第二领域的多个新闻传播图;第二领域的新闻传播图的数量与第一领域的新闻传播图的数量的比值小于阈值;根据第一领域的多个新闻传播图、第二领域的多个新闻传播图和第一目标损失函数,对社交媒体假新闻检测模型进行训练,得到基于训练集训练后的社交媒体假新闻检测模型;社交媒体假新闻检测模型用于检测新闻的真实性;第一目标损失函数是由分类损失、全局对比损失和局部对比损失确定的;全局对比损失表示新闻传播图中的节点特征、新闻传播图的第一类型增广图中的节点特征与新闻传播图特征之间的关联程度;局部对比损失表示新闻传播图的第二类型增广图中的节点特征与新闻传播图的第三类型增广图中的节点特征之间的关联程度;分类损失表示分类结果的正确程度。On the other hand, the present invention also provides a computer program product, which includes a computer program stored on a non-transitory computer-readable storage medium, and the computer program includes program instructions. When the program instructions are executed by a computer, the computer can execute the training method of the social media fake news detection model provided by the above-mentioned methods, the method comprising: obtaining multiple news propagation graphs in a first field in a training set; the news propagation graphs are used to represent the propagation path of news; obtaining multiple news propagation graphs in a second field in the training set; the ratio of the number of news propagation graphs in the second field to the number of news propagation graphs in the first field is less than a threshold value; based on the multiple news propagation graphs in the first field and the multiple news propagation graphs in the second field, The news propagation graph and the first objective loss function are used to train the social media fake news detection model to obtain the social media fake news detection model trained based on the training set; the social media fake news detection model is used to detect the authenticity of news; the first objective loss function is determined by classification loss, global contrast loss and local contrast loss; the global contrast loss represents the degree of association between the node features in the news propagation graph, the node features in the first type of augmented graph of the news propagation graph and the features of the news propagation graph; the local contrast loss represents the degree of association between the node features in the second type of augmented graph of the news propagation graph and the node features in the third type of augmented graph of the news propagation graph; the classification loss represents the correctness of the classification result.

又一方面,本发明还提供一种非暂态计算机可读存储介质,其上存储有计算机程序,该计算机程序被处理器执行时实现以执行上述各提供的社交媒体假新闻检测模型的训练方法。On the other hand, the present invention also provides a non-transitory computer-readable storage medium having a computer program stored thereon, which, when executed by a processor, is implemented to execute the training methods for the social media fake news detection models provided above.

以上所描述的装置实施例仅仅是示意性的,其中所述作为分离部件说明的单元可以是或者也可以不是物理上分开的,作为单元显示的部件可以是或者也可以不是物理单元,即可以位于一个地方,或者也可以分布到多个网络单元上。可以根据实际的需要选择其中的部分或者全部模块来实现本实施例方案的目的。本领域普通技术人员在不付出创造性的劳动的情况下,即可以理解并实施。The device embodiments described above are merely illustrative, wherein the units described as separate components may or may not be physically separated, and the components displayed as units may or may not be physical units, that is, they may be located in one place, or may be distributed on multiple network units. Some or all of the modules may be selected according to actual needs to achieve the purpose of the scheme of this embodiment. Those of ordinary skill in the art may understand and implement it without creative effort.

通过以上的实施方式的描述,本领域的技术人员可以清楚地了解到各实施方式可借助软件加必需的通用硬件平台的方式来实现,当然也可以通过硬件。基于这样的理解,上述技术方案本质上或者说对现有技术做出贡献的部分可以以软件产品的形式体现出来,该计算机软件产品可以存储在计算机可读存储介质中,如ROM/RAM、磁碟、光盘等,包括若干指令用以使得一台计算机设备(可以是个人计算机,服务器,或者网络设备等)执行各个实施例或者实施例的某些部分所述的方法。Through the description of the above implementation methods, those skilled in the art can clearly understand that each implementation method can be implemented by means of software plus a necessary general hardware platform, and of course, it can also be implemented by hardware. Based on this understanding, the above technical solution is essentially or the part that contributes to the prior art can be embodied in the form of a software product, and the computer software product can be stored in a computer-readable storage medium, such as ROM/RAM, a disk, an optical disk, etc., including a number of instructions for a computer device (which can be a personal computer, a server, or a network device, etc.) to execute the methods described in each embodiment or some parts of the embodiments.

最后应说明的是:以上实施例仅用以说明本发明的技术方案,而非对其限制;尽管参照前述实施例对本发明进行了详细的说明,本领域的普通技术人员应当理解:其依然可以对前述各实施例所记载的技术方案进行修改,或者对其中部分技术特征进行等同替换;而这些修改或者替换,并不使相应技术方案的本质脱离本发明各实施例技术方案的精神和范围。Finally, it should be noted that the above embodiments are only used to illustrate the technical solutions of the present invention, rather than to limit it. Although the present invention has been described in detail with reference to the aforementioned embodiments, those skilled in the art should understand that they can still modify the technical solutions described in the aforementioned embodiments, or make equivalent replacements for some of the technical features therein. However, these modifications or replacements do not deviate the essence of the corresponding technical solutions from the spirit and scope of the technical solutions of the embodiments of the present invention.

Claims (10)

1. The training method of the social media false news detection model is characterized by comprising the following steps of:
acquiring a plurality of news propagation graphs of a first field in a training set; the news propagation graph is used for representing a news propagation path;
acquiring a plurality of news propagation graphs of a second field in the training set; the ratio of the number of the news broadcasting pictures in the second field to the number of the news broadcasting pictures in the first field is smaller than a threshold value;
training the social media false news detection model according to the plurality of news propagation graphs in the first field, the plurality of news propagation graphs in the second field and the first target loss function to obtain a social media false news detection model trained based on a training set; the social media false news detection model is used for detecting the authenticity of news; the first objective loss function is determined from classification loss, global contrast loss, and local contrast loss; the global contrast loss represents the node characteristics in the news propagation graph, and the degree of association between the node characteristics in the first type of the news propagation graph and the news propagation graph characteristics; the local contrast loss represents a degree of association between node features in a second type of augmentation graph of the news propagation graph and node features in a third type of augmentation graph of the news propagation graph; the classification loss represents the accuracy of the classification result.
2. The method of training a social media false news detection model according to claim 1, wherein the social media false news detection model comprises at least one of:
a feature extraction module; the feature extraction module is used for extracting news propagation graph features;
a classification module; the classification module is used for predicting the authenticity of social media news corresponding to the news propagation graph according to the news propagation graph characteristics;
a self-supervision learning module; the self-supervision learning module is used for determining the global contrast loss according to the node characteristics in the news propagation graph, the node characteristics in the first type of the news propagation graph and the news propagation graph characteristics; and determining the local contrast loss according to the node characteristics in the second type augmentation chart of the news propagation chart and the node characteristics in the third type augmentation chart of the news propagation chart.
3. The training method of a social media false news detection model according to claim 2, wherein the social media false news detection model is trained based on the following manner:
inputting a plurality of news propagation graphs of a first field and a plurality of news propagation graphs of a second field in a training set into a social media false news detection model, and outputting an authenticity detection result of social media news corresponding to the news propagation graphs; obtaining the classification loss of the social media false news detection model according to the authenticity detection result of the social media news and the label information of the news; the tag information is used for marking the authenticity of the news;
Inputting a plurality of news propagation graphs of the first field, a first type of augmentation graph corresponding to the plurality of news propagation graphs of the first field, a second type of augmentation graph corresponding to the plurality of news propagation graphs of the first field and a third type of augmentation graph corresponding to the plurality of news propagation graphs of the first field in a training set to a social media false news detection model to obtain global contrast loss and local contrast loss of a plurality of news of the first field;
inputting a plurality of news propagation graphs of the second field, a first type of augmentation graph corresponding to the plurality of news propagation graphs of the second field, a second type of augmentation graph corresponding to the plurality of news propagation graphs of the second field and a third type of augmentation graph corresponding to the plurality of news propagation graphs of the second field in a training set to a social media false news detection model to obtain global contrast loss and local contrast loss of a plurality of news of the second field;
taking the weighted sum of the classification loss of the social media false news detection model, the global contrast loss and the local contrast loss of a plurality of news in the first field, and the global contrast loss and the local contrast loss of a plurality of news in the second field as the value of a first target loss function;
And training the social media false news detection model based on the value of the first target loss function to obtain a social media false news detection model trained based on a training set.
4. The method for training a social media false news detection model according to claim 3, further comprising:
a data self-adaptive constraint module; the data adaptive constraint module is used for determining differences between the news spread map features in the training set and the news spread map features in the test set.
5. The method for training a social media false news detection model according to claim 4, wherein training the social media false news detection model based on the value of the first objective loss function, after obtaining a social media false news detection model trained based on a training set, further comprises:
inputting a plurality of news propagation graphs in a second field in a test set, a first type augmentation graph corresponding to the plurality of news propagation graphs in the second field in the test set, a second type augmentation graph corresponding to the plurality of news propagation graphs in the second field in the test set and a third type augmentation graph corresponding to the plurality of news propagation graphs in the second field in the test set into the social media false news detection model trained based on the training set to obtain global contrast loss and local contrast loss of a plurality of news in the second field in the test set;
Taking the sum of the global contrast loss and the local contrast loss weights of a plurality of news in the second field in the test set as a value of a second target loss function according to the difference between the news propagation map features in the second field in the training set and the news propagation map features in the second field in the test set;
and training the social media false news detection model based on the value of the second target loss function to obtain the social media false news detection model based on the potential characteristics of the test set.
6. A method for detecting false news of social media, comprising:
acquiring a news propagation diagram of a second field to be detected;
inputting the news propagation diagram of the second field to be detected into the social media false news detection model to obtain an authenticity detection result of social media news corresponding to the news propagation diagram of the second field; the social media false news detection model is trained based on the method of any one of claims 1-5.
7. A training device for a social media false news detection model, comprising:
the first acquisition module is used for acquiring a plurality of news propagation graphs of a first field in the training set; the news propagation graph is used for representing a news propagation path;
The second acquisition module is used for acquiring a plurality of news propagation graphs in a second field in the training set; the ratio of the number of the news broadcasting pictures in the second field to the number of the news broadcasting pictures in the first field is smaller than a threshold value;
the training module is used for training the social media false news detection model according to the plurality of news propagation graphs in the first field, the plurality of news propagation graphs in the second field and the first target loss function to obtain a social media false news detection model trained based on a training set; the social media false news detection model is used for detecting the authenticity of news; the first objective loss function is determined from classification loss, global contrast loss, and local contrast loss; the global contrast loss represents the node characteristics in the news propagation graph, and the degree of association between the node characteristics in the first type of the news propagation graph and the news propagation graph characteristics; the local contrast loss represents a degree of association between node features in a second type of augmentation graph of the news propagation graph and node features in a third type of augmentation graph of the news propagation graph; the classification loss represents the accuracy of the classification result.
8. An electronic device comprising a memory, a processor and a computer program stored on the memory and executable on the processor, wherein the processor implements the training method of the social media false news detection model of any one of claims 1 to 5 or the social media false news detection method of claim 6 when the program is executed by the processor.
9. A non-transitory computer readable storage medium having stored thereon a computer program, which when executed by a processor implements the training method of the social media false news detection model according to any one of claims 1 to 5 or the social media false news detection method according to claim 6.
10. A computer program product having stored thereon executable instructions that, when executed by a processor, cause the processor to implement the training method of a social media false news detection model according to any one of claims 1 to 5 or the social media false news detection method according to claim 6.
CN202311475479.XA 2023-11-07 2023-11-07 Training method for social media fake news detection model Pending CN117576504A (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN202311475479.XA CN117576504A (en) 2023-11-07 2023-11-07 Training method for social media fake news detection model

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN202311475479.XA CN117576504A (en) 2023-11-07 2023-11-07 Training method for social media fake news detection model

Publications (1)

Publication Number Publication Date
CN117576504A true CN117576504A (en) 2024-02-20

Family

ID=89861585

Family Applications (1)

Application Number Title Priority Date Filing Date
CN202311475479.XA Pending CN117576504A (en) 2023-11-07 2023-11-07 Training method for social media fake news detection model

Country Status (1)

Country Link
CN (1) CN117576504A (en)

Similar Documents

Publication Publication Date Title
Thieltges et al. The Devil's Triangle: Ethical Considerations on Developing Bot Detection Methods.
CN113422761B (en) Malicious social user detection method based on counterstudy
CN112150450A (en) A method and device for image tampering detection based on dual-channel U-Net model
CN111062019A (en) User attack detection method and device and electronic equipment
CN116958846A (en) Video detection method, device, equipment, medium and product
CN111178146A (en) Method and device for identifying anchor based on face features
CN109377347B (en) Network credit early warning method, system and electronic equipment based on feature selection
CN111353554B (en) Method and device for predicting missing user service attributes
CN113887214B (en) Willingness presumption method based on artificial intelligence and related equipment thereof
CN117312934A (en) Classification method, classification device, classification apparatus, classification storage medium, and classification product
CN113239225B (en) Image retrieval method, device, equipment and storage medium
CN118761888A (en) Smart city service platform, method and equipment based on cloud computing and big data
İş et al. A Profile Analysis of User Interaction in Social Media Using Deep Learning.
CN117576504A (en) Training method for social media fake news detection model
CN116863366A (en) Cross-sample fake news video detection method and system
CN118628949A (en) Data quality detection method, device, equipment, medium and program product
CN112989869B (en) Optimization method, device, equipment and storage medium of face quality detection model
CN115114851A (en) Scorecard modeling method and device based on five-fold cross-validation
Xu et al. Exposing deepfakes in online communication: detection based on ensemble strategy
CN109308565B (en) Crowd performance grade identification method and device, storage medium and computer equipment
CN113886547A (en) Client real-time conversation switching method and device based on artificial intelligence and electronic equipment
Narasamma et al. Detecting Malicious Activities on Twitter Data for Sentiment Analysis Using a Novel Optimized Machine Learning Approach
CN114662614B (en) Training method of image classification model, image classification method and device
Wang et al. Saliency detection by multilevel deep pyramid model
CN118627623B (en) Multi-mode fact checking method based on causal inference

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination