CN114357160A - Early rumor detection method and device based on generation propagation structure characteristics - Google Patents
Early rumor detection method and device based on generation propagation structure characteristics Download PDFInfo
- Publication number
- CN114357160A CN114357160A CN202111574716.9A CN202111574716A CN114357160A CN 114357160 A CN114357160 A CN 114357160A CN 202111574716 A CN202111574716 A CN 202111574716A CN 114357160 A CN114357160 A CN 114357160A
- Authority
- CN
- China
- Prior art keywords
- propagation structure
- event
- source text
- feature
- structure feature
- Prior art date
- Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
- Granted
Links
- 238000001514 detection method Methods 0.000 title claims abstract description 132
- 238000013145 classification model Methods 0.000 claims abstract description 76
- 238000010586 diagram Methods 0.000 claims abstract description 46
- 230000004044 response Effects 0.000 claims abstract description 43
- 238000013527 convolutional neural network Methods 0.000 claims abstract description 35
- 238000000605 extraction Methods 0.000 claims abstract description 30
- 238000012549 training Methods 0.000 claims description 41
- 230000006870 function Effects 0.000 claims description 27
- 238000004590 computer program Methods 0.000 claims description 16
- 230000015654 memory Effects 0.000 claims description 11
- 238000007781 pre-processing Methods 0.000 claims description 10
- 125000004122 cyclic group Chemical group 0.000 claims description 8
- 238000003062 neural network model Methods 0.000 claims description 6
- 238000000034 method Methods 0.000 abstract description 27
- 239000013598 vector Substances 0.000 description 30
- 238000004891 communication Methods 0.000 description 12
- 239000011159 matrix material Substances 0.000 description 10
- 238000013528 artificial neural network Methods 0.000 description 8
- 238000004364 calculation method Methods 0.000 description 7
- 230000008569 process Effects 0.000 description 7
- 230000000306 recurrent effect Effects 0.000 description 4
- 238000005070 sampling Methods 0.000 description 3
- 230000004913 activation Effects 0.000 description 2
- 238000005516 engineering process Methods 0.000 description 2
- 230000004927 fusion Effects 0.000 description 2
- 230000003287 optical effect Effects 0.000 description 2
- 238000012545 processing Methods 0.000 description 2
- 238000011160 research Methods 0.000 description 2
- 230000006403 short-term memory Effects 0.000 description 2
- 230000002411 adverse Effects 0.000 description 1
- 230000002776 aggregation Effects 0.000 description 1
- 238000004220 aggregation Methods 0.000 description 1
- 238000013473 artificial intelligence Methods 0.000 description 1
- 230000008901 benefit Effects 0.000 description 1
- 230000005540 biological transmission Effects 0.000 description 1
- 230000007547 defect Effects 0.000 description 1
- 238000011161 development Methods 0.000 description 1
- 230000000694 effects Effects 0.000 description 1
- 230000003993 interaction Effects 0.000 description 1
- 230000007774 longterm Effects 0.000 description 1
- 238000012986 modification Methods 0.000 description 1
- 230000004048 modification Effects 0.000 description 1
- 230000006855 networking Effects 0.000 description 1
- 230000001902 propagating effect Effects 0.000 description 1
- 230000000717 retained effect Effects 0.000 description 1
- 238000004088 simulation Methods 0.000 description 1
- 230000002194 synthesizing effect Effects 0.000 description 1
- 230000002123 temporal effect Effects 0.000 description 1
- 230000007704 transition Effects 0.000 description 1
Images
Landscapes
- Information Retrieval, Db Structures And Fs Structures Therefor (AREA)
Abstract
本发明提供一种基于生成传播结构特征的早期谣言检测方法及装置,该方法包括:获取待检测事件的第一源文本;将第一源文本输入条件变分自编码器中,得到条件变分自编码器生成的待检测事件的第一传播结构特征;条件变分自编码器根据样本事件的第二源文本和第二传播结构特征进行训练获取;第二传播结构特征基于图卷积神经网络模型对样本事件的传播结构图进行特征提取获得;传播结构图根据样本事件的第二源文本和第二源文本对应的响应推文构建获取;将第一源文本和第一传播结构特征输入分类模型中,得到分类模型输出的待检测事件的谣言检测结果。本发明实现在获取到待检测事件的源文本后,即可快速准确地获取待检测事件的谣言检测结果。
The present invention provides an early rumor detection method and device based on generating propagation structure features. The method includes: acquiring a first source text of an event to be detected; inputting the first source text into a conditional variational autoencoder to obtain a conditional variation The first propagation structure feature of the event to be detected generated by the autoencoder; the conditional variational autoencoder is trained and obtained according to the second source text and the second propagation structure feature of the sample event; the second propagation structure feature is based on the graph convolutional neural network The model performs feature extraction on the propagation structure diagram of the sample event; the propagation structure diagram is constructed and obtained according to the second source text of the sample event and the response tweet corresponding to the second source text; the first source text and the first propagation structure feature are input into the classification In the model, the rumor detection result of the event to be detected output by the classification model is obtained. The invention realizes that after acquiring the source text of the event to be detected, the rumor detection result of the event to be detected can be quickly and accurately acquired.
Description
技术领域technical field
本发明涉及数据处理技术领域,尤其涉及一种基于生成传播结构特征的早期谣言检测方法及装置。The present invention relates to the technical field of data processing, and in particular, to a method and device for detecting early rumors based on generating propagation structure features.
背景技术Background technique
随着互联网时代的兴起与发达,信息流动和传播速度也越来越快,人们在社交媒体上可以自由发布信息,这也导致谣言传播迅速。谣言是指违背事情真相或凭空捏造的事情,具有一定的欺骗性,是扰乱社会秩序甚至给社会带来严重恐慌和经济损失的言论。因此如何从社交媒体上海量的信息中辨别谣言是一项严峻的挑战。With the rise and development of the Internet era, the speed of information flow and dissemination is getting faster and faster. People can freely publish information on social media, which also leads to the rapid spread of rumors. Rumors refer to things that go against the truth or fabricate out of thin air, are deceptive to a certain extent, and are speeches that disrupt social order and even bring serious panic and economic losses to the society. Therefore, how to distinguish rumors from the huge amount of information on social media is a serious challenge.
谣言检测技术是通过相关的计算机知识和技能对谣言鉴定进行建模并识别的过程。现有的谣言检测技术存在如下问题:(1)现有的谣言检测研究局限于针对经过一定时间传播后的谣言进行检测,即谣言只有经过大量曝光后才可能被识别,对早期谣言检测欠缺,缺乏一定的时效性;(2)现有的研究在早期谣言检测中大多根据文本信息进行谣言检测,但由于信息结构的多样化,仅根据文本信息难以获取准确的谣言检测结果。Rumor detection technology is the process of modeling and identifying rumors through relevant computer knowledge and skills. The existing rumor detection technology has the following problems: (1) The existing rumor detection research is limited to the detection of rumors after a certain period of time, that is, rumors may only be identified after a large number of exposures, and the early rumor detection is lacking It lacks certain timeliness; (2) Most of the existing researches use text information to detect rumors in early rumor detection, but due to the diversification of information structures, it is difficult to obtain accurate rumor detection results only based on text information.
发明内容SUMMARY OF THE INVENTION
本发明提供一种基于生成传播结构特征的早期谣言检测方法及装置,用以解决现有技术中谣言检测方法时效性差和准确度低的缺陷,实现在获取到谣言的源文本时,即可及时准确地检测出谣言。The present invention provides an early rumor detection method and device based on generating a propagation structure feature, which is used to solve the defects of poor timeliness and low accuracy of the rumor detection method in the prior art, and realizes that when the source text of the rumor is obtained, it can be timely Accurately detect rumors.
本发明提供一种基于生成传播结构特征的早期谣言检测方法,包括:The present invention provides an early rumor detection method based on generating a propagation structure feature, comprising:
获取待检测事件的第一源文本;Obtain the first source text of the event to be detected;
将所述第一源文本输入条件变分自编码器中,得到所述条件变分自编码器生成的所述待检测事件的第一传播结构特征;其中,所述条件变分自编码器根据样本事件的第二源文本和第二传播结构特征进行训练获取;所述第二传播结构特征基于图卷积神经网络模型对样本事件的传播结构图进行特征提取获得;所述传播结构图根据所述样本事件的第二源文本和所述第二源文本对应的响应推文构建获取;Inputting the first source text into a conditional variational self-encoder, and obtaining the first propagation structure feature of the to-be-detected event generated by the conditional variational self-encoder; wherein, the conditional variational self-encoder is based on The second source text of the sample event and the second propagation structure feature are obtained by training; the second propagation structure feature is obtained by feature extraction on the propagation structure graph of the sample event based on the graph convolutional neural network model; the propagation structure graph is obtained according to the The second source text of the sample event and the response tweet corresponding to the second source text are constructed and acquired;
将所述第一源文本和第一传播结构特征输入分类模型中,得到所述分类模型输出的所述待检测事件的谣言检测结果;所述分类模型根据所述第二源文本信息和第二传播结构特征,以及所述样本事件的谣言检测标签进行训练获取。The first source text and the first propagation structure feature are input into the classification model, and the rumor detection result of the to-be-detected event output by the classification model is obtained; the classification model is based on the second source text information and the second Propagation structural features, and rumor detection labels of the sample events are obtained by training.
根据本发明提供的一种基于生成传播结构特征的早期谣言检测方法,所述将所述第一源文本和第一传播结构特征输入分类模型中,得到所述分类模型输出的所述待检测事件的谣言检测结果,包括:According to an early rumor detection method based on generating a propagation structure feature provided by the present invention, the first source text and the first propagation structure feature are input into a classification model, and the event to be detected outputted by the classification model is obtained. rumour detection results, including:
将所述第一源文本输入所述分类模型的特征提取层,得到所述第一源文本的文本特征;Inputting the first source text into the feature extraction layer of the classification model to obtain text features of the first source text;
将所述文本特征与所述第一传播结构特征进行拼接后,输入所述分类模型的分类层,得到所述待检测事件的谣言检测结果。After splicing the text feature and the first propagation structure feature, input the classification layer of the classification model to obtain the rumor detection result of the to-be-detected event.
根据本发明提供的一种基于生成传播结构特征的早期谣言检测方法,所述分类模型的特征提取层基于循环神经网络模型构建生成;According to an early rumor detection method based on generating a propagation structure feature provided by the present invention, the feature extraction layer of the classification model is constructed and generated based on a cyclic neural network model;
所述分类模型的分类层基于softmax函数构建生成。The classification layer of the classification model is constructed and generated based on the softmax function.
根据本发明提供的一种基于生成传播结构特征的早期谣言检测方法,所述传播结构图基于如下步骤构建:According to an early rumor detection method based on generating a propagation structure feature provided by the present invention, the propagation structure diagram is constructed based on the following steps:
以所述样本事件中的第二源文本和各响应推文为节点,根据任意两个节点之间的响应关系构建所述样本事件的传播结构图。Taking the second source text and each response tweet in the sample event as nodes, the propagation structure diagram of the sample event is constructed according to the response relationship between any two nodes.
根据本发明提供的一种基于生成传播结构特征的早期谣言检测方法,所述响应关系包括转发关系、时间维度关系和评论关系。According to an early rumor detection method based on generating a propagation structure feature provided by the present invention, the response relationship includes a forwarding relationship, a time dimension relationship and a commenting relationship.
根据本发明提供的一种基于生成传播结构特征的早期谣言检测方法,所述第二传播结构特征基于如下步骤获取:According to an early rumor detection method based on generating a propagation structure feature provided by the present invention, the second propagation structure feature is obtained based on the following steps:
将所述样本事件的传播结构图输入所述图卷积神经网络模型中,得到所述图卷积神经网络模型的特征提取层输出的所述样本事件的第二传播结构特征;Input the propagation structure diagram of the sample event into the graph convolutional neural network model, and obtain the second propagation structure feature of the sample event output by the feature extraction layer of the graph convolutional neural network model;
其中,所述图卷积神经网络模型根据所述样本事件的传播结构图和所述样本事件的谣言检测标签进行训练获取。Wherein, the graph convolutional neural network model is obtained by training according to the propagation structure diagram of the sample event and the rumor detection label of the sample event.
根据本发明提供的一种基于生成传播结构特征的早期谣言检测方法,所述将所述第一源文本输入条件变分自编码器中,得到所述条件变分自编码器生成的所述待检测事件的第一传播结构特征,包括:According to an early rumor detection method based on generating a propagation structure feature provided by the present invention, the first source text is input into a conditional variational auto-encoder, and the to-be-varied auto-encoder generated by the conditional variational auto-encoder is obtained. Detect first propagation structural features of events, including:
对所述第一源文本进行预处理;preprocessing the first source text;
其中,所述预处理包括去除乱码数据和文本向量化处理;Wherein, the preprocessing includes removing garbled data and text vectorization;
将预处理后的第一源文本输入所述条件变分自编码器中,得到所述待检测事件的第一传播结构特征。Inputting the preprocessed first source text into the conditional variational autoencoder to obtain the first propagation structure feature of the event to be detected.
本发明还提供一种基于生成传播结构特征的早期谣言检测装置,包括:The present invention also provides an early rumor detection device based on the generated propagation structure feature, comprising:
文本获取模块,用于获取待检测事件的第一源文本;A text acquisition module for acquiring the first source text of the event to be detected;
传播结构特征生成模块,用于将所述第一源文本输入条件变分自编码器中,得到所述条件变分自编码器生成的所述待检测事件的第一传播结构特征;其中,所述条件变分自编码器根据样本事件的第二源文本和第二传播结构特征进行训练获取;所述第二传播结构特征基于图卷积神经网络模型对样本事件的传播结构图进行特征提取获得;所述传播结构图根据所述样本事件的第二源文本和所述第二源文本对应的响应推文构建获取;A propagation structure feature generation module, configured to input the first source text into a conditional variational autoencoder, and obtain the first propagation structure feature of the to-be-detected event generated by the conditional variational autoencoder; wherein, the The conditional variational autoencoder is obtained by training according to the second source text of the sample event and the second propagation structure feature; the second propagation structure feature is obtained by feature extraction on the propagation structure graph of the sample event based on the graph convolutional neural network model. ; The propagation structure diagram is constructed and obtained according to the second source text of the sample event and the response tweet corresponding to the second source text;
谣言检测模块,用于将所述第一源文本和第一传播结构特征输入分类模型中,得到所述分类模型输出的所述待检测事件的谣言检测结果;所述分类模型根据所述第二源文本信息和第二传播结构特征,以及所述样本事件的谣言检测标签进行训练获取。A rumor detection module, configured to input the first source text and the first propagation structure feature into a classification model, and obtain the rumor detection result of the to-be-detected event output by the classification model; the classification model is based on the second The source text information, the second propagation structure feature, and the rumor detection label of the sample event are acquired through training.
本发明还提供一种电子设备,包括存储器、处理器及存储在存储器上并可在处理器上运行的计算机程序,所述处理器执行所述程序时实现如上述任一种所述基于生成传播结构特征的早期谣言检测方法的步骤。The present invention also provides an electronic device, comprising a memory, a processor, and a computer program stored in the memory and executable on the processor, when the processor executes the program, the generation-based propagation as described in any of the above is realized Steps of an early rumor detection method for structural features.
本发明还提供一种非暂态计算机可读存储介质,其上存储有计算机程序,该计算机程序被处理器执行时实现如上述任一种所述基于生成传播结构特征的早期谣言检测方法的步骤。The present invention also provides a non-transitory computer-readable storage medium on which a computer program is stored, and when the computer program is executed by a processor, implements the steps of any of the above-mentioned methods for detecting early rumors based on generating propagation structure features .
本发明还提供一种计算机程序产品,包括计算机程序,所述计算机程序被处理器执行时实现如上述任一种所述基于生成传播结构特征的早期谣言检测方法的步骤。The present invention also provides a computer program product, including a computer program, which, when executed by a processor, implements the steps of any of the above-mentioned methods for detecting early rumors based on generating propagation structure features.
本发明提供的基于生成传播结构特征的早期谣言检测方法及装置,通过根据样本事件的第二源文本和响应推文构建传播结构图,并基于图卷积神经网络模型对传播结构图进行特征提取获得样本事件的第二传播结构特征,根据样本事件的第二源文本、第二传播结构特征进行迭代,即可得到可根据事件的源文本,模拟生成传播结构特征的条件变分自编码器;在获取到待检测事件的第一源文本时,即可基于条件变分自编码器自动模拟生成待检测事件的第一传播结构特征;然后,联合待检测事件的第一传播结构特征和第一源文本,即可对待检测事件进行谣言检测;一方面,整个谣言检测过程中,只需要获取到待检测事件的第一源文本,即可快速生成相应的传播结构特征,减少收集大量响应推文后才能获取传播结构特征带来的延时,且在谣言的发布的源头上即可检测出谣言,有效提高谣言检测的时效性;另一方面,将事件的源文本信息和传播结构特征进行结合,丰富了与谣言检测相关的特征,使得谣言检测结果更加准确。The method and device for early rumor detection based on generating propagation structure features provided by the present invention construct a propagation structure graph according to the second source text and response tweets of sample events, and perform feature extraction on the propagation structure graph based on a graph convolutional neural network model. Obtaining the second propagation structure feature of the sample event, and iterating according to the second source text and the second propagation structure feature of the sample event, to obtain a conditional variational autoencoder that can simulate and generate the propagation structure feature according to the source text of the event; When the first source text of the event to be detected is obtained, the first propagation structure feature of the event to be detected can be automatically simulated and generated based on the conditional variational autoencoder; then, the first propagation structure feature of the event to be detected is combined with the first The source text can be used to detect rumors of the event to be detected; on the one hand, in the entire rumor detection process, only the first source text of the event to be detected needs to be obtained, and the corresponding communication structure features can be quickly generated, reducing the collection of a large number of response tweets. Only then can the delay caused by the characteristics of the communication structure be obtained, and the rumors can be detected at the source of the release of the rumors, which effectively improves the timeliness of rumor detection; on the other hand, the source text information of the event and the communication structure characteristics are combined. , which enriches the features related to rumor detection and makes the rumor detection results more accurate.
附图说明Description of drawings
为了更清楚地说明本发明或现有技术中的技术方案,下面将对实施例或现有技术描述中所需要使用的附图作一简单地介绍,显而易见地,下面描述中的附图是本发明的一些实施例,对于本领域普通技术人员来讲,在不付出创造性劳动的前提下,还可以根据这些附图获得其他的附图。In order to explain the present invention or the technical solutions in the prior art more clearly, the following will briefly introduce the accompanying drawings that need to be used in the description of the embodiments or the prior art. Obviously, the accompanying drawings in the following description are the For some embodiments of the invention, for those of ordinary skill in the art, other drawings can also be obtained according to these drawings without any creative effort.
图1是本发明提供的基于生成传播结构特征的早期谣言检测方法的流程示意图之一;Fig. 1 is one of the schematic flow charts of the early rumor detection method based on the generated propagation structure feature provided by the present invention;
图2是本发明提供的基于生成传播结构特征的早期谣言检测方法中条件变分自编码器、图卷积神经网络模型和分类模型的结构示意图;2 is a schematic structural diagram of a conditional variational autoencoder, a graph convolutional neural network model and a classification model in the early rumor detection method based on the generated propagation structure feature provided by the present invention;
图3是本发明提供的基于生成传播结构特征的早期谣言检测方法的流程示意图之二;3 is the second schematic flow chart of the early rumor detection method based on the generated propagation structure feature provided by the present invention;
图4是本发明提供的基于生成传播结构特征的早期谣言检测方法的流程示意图之三;Fig. 4 is the third schematic flow chart of the early rumor detection method based on the generated propagation structure feature provided by the present invention;
图5是本发明提供的基于生成传播结构特征的早期谣言检测装置的结构示意图;5 is a schematic structural diagram of an early rumor detection device based on generating a propagation structure feature provided by the present invention;
图6是本发明提供的电子设备的结构示意图。FIG. 6 is a schematic structural diagram of an electronic device provided by the present invention.
具体实施方式Detailed ways
为使本发明的目的、技术方案和优点更加清楚,下面将结合本发明中的附图,对本发明中的技术方案进行清楚、完整地描述,显然,所描述的实施例是本发明一部分实施例,而不是全部的实施例。基于本发明中的实施例,本领域普通技术人员在没有作出创造性劳动前提下所获得的所有其他实施例,都属于本发明保护的范围。In order to make the objectives, technical solutions and advantages of the present invention clearer, the technical solutions in the present invention will be clearly and completely described below with reference to the accompanying drawings. Obviously, the described embodiments are part of the embodiments of the present invention. , not all examples. Based on the embodiments of the present invention, all other embodiments obtained by those of ordinary skill in the art without creative efforts shall fall within the protection scope of the present invention.
在本实施例的描述中,需要理解的是,术语“第一”、“第二”仅用于区分不同的描述,而不能理解为指示或暗示相对重要性或层级顺序。In the description of this embodiment, it should be understood that the terms "first" and "second" are only used to distinguish different descriptions, and cannot be understood as indicating or implying relative importance or hierarchical order.
现有的谣言检测方法,无法在谣言发布的源头上,及时准确地检测出谣言事件,实时性差;另外,早期检测的数据量较少,只能根据事件的文本信息对谣言进行检测,仅考虑文本信息这一单一粒度,无法准确获取谣言检测结果。The existing rumor detection methods cannot detect rumor events in a timely and accurate manner at the source of the rumor release, and the real-time performance is poor; in addition, the amount of data for early detection is small, and rumors can only be detected based on the text information of the event, only considering The single granularity of text information cannot accurately obtain rumor detection results.
针对上述文本,本实施例提出了一种基于生成传播结构特征的早期谣言检测方法。根据历史事件的大量谣言传播数据构建一个条件变分自编码器用于学习谣言传播的结构特征,从而在早期只有待检测事件的源文本时,可以利用训练好的条件变分自编码器预测待检测时间的传播结构特征,并将其与源文本结合进行谣言检测,以在获取到待检测时间的源文本时,即可快速准确地获取谣言检测结果,以确定该待检测事件是否为谣言事件。In view of the above text, this embodiment proposes an early rumor detection method based on the generated propagation structure feature. According to a large amount of rumor propagation data of historical events, a conditional variational autoencoder is constructed to learn the structural features of rumor propagation, so that when there is only the source text of the event to be detected in the early stage, the trained conditional variational autoencoder can be used to predict the to-be-detected Temporal propagation structure characteristics, and combine it with the source text for rumor detection, so that when the source text of the time to be detected is obtained, the rumor detection result can be quickly and accurately obtained to determine whether the to-be-detected event is a rumor event.
需要说明的是,上述方法的执行主体可以是电子设备、电子设备中的部件、集成电路、或芯片。该电子设备可以是移动电子设备,也可以为非移动电子设备。示例性的,移动电子设备可以为手机、平板电脑、笔记本电脑、掌上电脑、车载电子设备、可穿戴设备、超级移动个人计算机(ultra-mobile personal computer,UMPC)、上网本或者个人数字助理(personal digital assistant,PDA)等,非移动电子设备可以为服务器、网络附属存储器(Network Attached Storage,NAS)、个人计算机(personal computer,PC)、智能电视机(television,TV)、柜员机或者自助机等,本发明不作具体限定。It should be noted that the execution body of the above method may be an electronic device, a component in the electronic device, an integrated circuit, or a chip. The electronic device may be a mobile electronic device or a non-mobile electronic device. Exemplarily, the mobile electronic device may be a mobile phone, a tablet computer, a notebook computer, a palmtop computer, an in-vehicle electronic device, a wearable device, an ultra-mobile personal computer (UMPC), a netbook, or a personal digital assistant (personal digital assistant). assistant, PDA), etc., non-mobile electronic devices can be servers, network attached storage (Network Attached Storage, NAS), personal computer (personal computer, PC), smart TV (television, TV), teller machine or self-service machine, etc. The invention is not specifically limited.
下面结合图1描述本发明的基于生成传播结构特征的早期谣言检测方法,该方法包括:The early rumor detection method based on the generated propagation structure feature of the present invention is described below in conjunction with FIG. 1 , and the method includes:
步骤101,获取待检测事件的第一源文本;
可选地,待检测事件即为需要进行谣言检测的事件。待检测事件可以采用爬虫工具从互联网上实时爬取的,也可以是通过互联网传输或者下载得到,本实施例对此不作具体地限定。Optionally, the event to be detected is an event that needs to be detected by rumors. The to-be-detected event may be crawled in real time from the Internet by using a crawler tool, or may be obtained through Internet transmission or download, which is not specifically limited in this embodiment.
样本事件和待检测事件可以是各种社交网站上发布的事件,如微博上发送的帖子或微信上发布的动态等,本实施例对此不作具体地限定。The sample event and the to-be-detected event may be events published on various social networking sites, such as posts sent on Weibo or dynamics published on WeChat, which are not specifically limited in this embodiment.
以下样本事件和待检测事件为微博上发送的帖子为例对本实施例中的谣言检测方法展开描述。The following sample events and events to be detected are posts sent on Weibo as an example to describe the rumor detection method in this embodiment.
待检测事件的第一源文本为针对待检测事件发布的源帖中包含的文本信息。The first source text of the to-be-detected event is text information contained in a source post published for the to-be-detected event.
步骤102,将所述第一源文本输入条件变分自编码器中,得到所述条件变分自编码器生成的所述待检测事件的第一传播结构特征;其中,所述条件变分自编码器根据样本事件的第二源文本和第二传播结构特征进行训练获取;所述第二传播结构特征基于图卷积神经网络模型对样本事件的传播结构图进行特征提取获得;所述传播结构图根据所述样本事件的第二源文本和所述第二源文本对应的响应推文构建获取;
可选地,条件变分自编码器是根据样本事件的第二源文本和第二传播结构特征进行训练获取;训练后的条件变分自编码器根据源文本,即可生成相应的传播结构特征,在后续的谣言检测中联合生成的传播结构特征和源文本,即可快速准确地进行早期谣言检测。Optionally, the conditional variational autoencoder is obtained by training according to the second source text of the sample event and the second propagation structure feature; the trained conditional variational autoencoder can generate the corresponding propagation structure feature according to the source text. , the propagation structure features and source text are jointly generated in the subsequent rumor detection, and early rumor detection can be performed quickly and accurately.
其中,响应推文为对源文本进行评论和转发的文本,即响应推文为对源帖进行评论的评论帖的文本信息,以及对源帖进行转发的转发帖的文本信息。本实施例将评论帖和转发帖统称为转帖。The response tweet is the text of commenting and forwarding the source text, that is, the response tweet is the text information of the comment post commenting on the source post, and the text information of the forwarding post that forwards the source post. In this embodiment, comment posts and reposts are collectively referred to as reposts.
条件变分自编码器(Conditional AutoEncoder,CVAE)是一种深度隐空间生成模型,由一个编码器和一个解码器组成;其中,编码器和解码器由神经网络模型构成,模型的类型和结构可以根据实际需求进行设置,如卷积神经网络和循环神经网络等,本实施例对此不作具体地限定。Conditional Variational AutoEncoder (CVAE) is a deep latent space generation model, which consists of an encoder and a decoder; the encoder and decoder are composed of neural network models, and the type and structure of the model can be Set according to actual requirements, such as a convolutional neural network and a cyclic neural network, which are not specifically limited in this embodiment.
可选地,在执行步骤102之前,需要根据样本事件的第二源文本和第二传播结构特征,训练完成后得到一个能够生成传播结构特征的条件变分自编码器,具体训练步骤可以包括:Optionally, before performing
首先,收集样本,整理数据集;First, collect samples and organize the data set;
可选地,采集大量样本帖,以事件为单位,对同一事件的源帖和转帖进行聚合,形成样本事件数据集C={c1,c2,…ci,…,cn};其中,ci代表第i个样本事件,n代表样本事件数据集中包含的样本事件的数量,具体数量可以根据实际需求进行设置,本实施例对此不作具体地限定。Optionally, collect a large number of sample posts, and in units of events, aggregate the source posts and reposts of the same event to form a sample event data set C={c 1 ,c 2 ,... ci ,...,c n } ; Among them, ci represents the ith sample event, and n represents the number of sample events included in the sample event data set, and the specific number can be set according to actual requirements, which is not specifically limited in this embodiment.
样本事件 代表第i个样本事件的源文本,代表第i个样本事件的第j个相关评论帖或转发帖,即响应推文,mi代表第i个事件中包含的响应推文的总数;到为第i个样本事件的所有响应推文。sample event represents the source text of the ith sample event, represents the jth related comment post or retweet post of the ith sample event, i.e. the response tweet, m i represents the total number of response tweets contained in the ith event; arrive All response tweets for the ith sample event.
然后,可以对样本事件的各文本进行预处理,得到样本事件中各文本的文本向量其中,ci代表第i个样本事件的源文本向量,到为第i个样本事件的所有响应推文的文本向量。Then, each text of the sample event can be preprocessed to obtain the text vector of each text in the sample event where c i represents the source text vector of the ith sample event, arrive Text vector of all response tweets for the ith sample event.
然后,根据样本事件的第二源文本和第二源文本对应的响应推文构建获取传播结构图;并基于图卷积神经网络模型对样本事件的传播结构图进行特征提取获得样本事件的第二传播结构特征;Then, construct and obtain a propagation structure diagram according to the second source text of the sample event and the response tweet corresponding to the second source text; and perform feature extraction on the propagation structure diagram of the sample event based on the graph convolutional neural network model to obtain the second Propagation structure characteristics;
然后,将样本事件数据集中的源文本向量c={c1,c2…,ci,…,cn},作为条件变分自编码器中的条件信息。其中代表第i个样本事件的源文本向量,ni代表源文本向量ci中包含的单词数。将所有样本事件的传播结构特征H={H0,H1,…,Hi,…,Hn},作为条件变分自编码器中的真实样本数据;其中,Hi为第i个样本事件的传播结构特征。Then, the source text vector c= { c 1 , c 2 . . . , c i , . in represents the source text vector of the ith sample event, and ni represents the number of words contained in the source text vector ci. The propagation structure feature H={H 0 , H 1 ,…,H i ,…,H n } of all sample events is taken as the real sample data in the conditional variational autoencoder; where H i is the ith sample Propagation structure characteristics of events.
可选地,将每一样本事件的源文本向量和传播结构特征作为一组样本,根据多组样本,计算获取条件变分自编码器的损失函数,基于损失函数对条件变分自编码器进行迭代训练。Optionally, take the source text vector and the propagation structure features of each sample event as a set of samples, calculate and obtain the loss function of the conditional variational autoencoder according to the multiple sets of samples, and perform the conditional variational autoencoder based on the loss function. Iterative training.
其中,条件变分自编码器的学习原理为,将原始数据编码到符合特定分布的隐变量z中,此处设定隐变量z符合标准正态分布即p(z)=N(1,0),然后将隐变量z解码生成新的数据。Among them, the learning principle of the conditional variational autoencoder is to encode the original data into a latent variable z that conforms to a specific distribution. Here, the latent variable z is set to conform to the standard normal distribution, that is, p(z)=N(1,0 ), and then decode the latent variable z to generate new data.
对于编码器E(·),将ci作为条件信息与真实样本数据Hi一起输入到编码器E(·)中,编码器E(·)计算出p(z|Hi,ci)的均值μi和方差δi 2。For the encoder E(·), input c i as condition information together with the real sample data Hi into the encoder E(·), and the encoder E(·) calculates the value of p(z|H i , c i ) Mean μ i and variance δ i 2 .
对于解码器D(·),利用重参数技巧采样得到隐变量z,解码器通过隐变量z对样本数据Hi进行重构。For the decoder D(·), the hidden variable z is obtained by sampling using the heavy parameter technique, and the decoder reconstructs the sample data Hi through the hidden variable z.
其中,利用重参数技巧采样得到隐变量z的计算公式为:Among them, the calculation formula of the hidden variable z obtained by sampling using the heavy parameter technique is:
z=μi+ε⊙δi;z=μ i +ε⊙δ i ;
其中,ε服从标准正态分布N(0,1),⊙是点乘运算符,μi和δi 2为正态分布的均值和方差;Among them, ε obeys the standard normal distribution N(0, 1), ⊙ is the dot product operator, μ i and δ i 2 are the mean and variance of the normal distribution;
CVAE模型的损失函数的计算公式为:The calculation formula of the loss function of the CVAE model is:
其中,为重构损失函数,E(·)为数学期望;为编码器的分布与先验分布pθ(z)之间的KL散度;为关于z的后验概率,pθ(Hi|z,ci)为关于Hi的条件概率;与θ分别为编码器与解码器的网络参数。in, is the reconstruction loss function, E( ) is the mathematical expectation; distribution for the encoder KL divergence from the prior distribution p θ (z); is the posterior probability about z, p θ (H i |z, c i ) is the conditional probability about Hi ; and θ are the network parameters of the encoder and decoder, respectively.
基于上式CVAE模型的损失函数,对CVAE模型的网络参数进行迭代更新,最终使得CVAE模型的解码器能够重构生成近似真实样本数据的模拟传播结构特征。Based on the loss function of the CVAE model above, the network parameters of the CVAE model are iteratively updated, and finally the decoder of the CVAE model can reconstruct the simulated propagation structure features that approximate the real sample data.
在CVAE模型训练完成后,将待检测事件的第一源文本输入训练后的CVAE模型中,即可生成第一源文本对应的第一传播结构特征。After the training of the CVAE model is completed, the first source text of the event to be detected is input into the trained CVAE model, and the first propagation structure feature corresponding to the first source text can be generated.
通过从大量历史事件的转发信息、评论信息和源文本中学习传播结构特征,从而在早期只有源文本时,也能够模拟生成源文本对应的传播结构特征,不需要等待源文本被大量扩散,有效提高检测的时效性;并且,综合源文本和源文本对应的传播结构特征,可在早期发布源文本时,即可及时准确地检测出谣言,避免谣言事件扩散带来的不良影响。By learning the propagation structure features from the forwarding information, comment information and source text of a large number of historical events, it is possible to simulate and generate the propagation structure features corresponding to the source text when there is only the source text in the early stage, without waiting for the source text to be widely diffused, effectively Improve the timeliness of detection; in addition, by synthesizing the source text and the corresponding propagation structure characteristics of the source text, rumors can be detected in a timely and accurate manner when the source text is released in the early stage, avoiding the adverse effects caused by the spread of rumor events.
步骤103,将所述第一源文本和第一传播结构特征输入分类模型中,得到所述分类模型输出的所述待检测事件的谣言检测结果;所述分类模型根据所述第二源文本信息和第二传播结构特征,以及所述样本事件的谣言检测标签进行训练获取。
其中,分类模型的网络结构可以根据实际需求进行设置,如网络的层数和各层的节点数等。分类模型可以由各种具有分类功能的人工智能模型构建生成,包括但不限于卷积神经网络、循环神经网络、计算机视觉组网络和残差网络系列等,本实施例对此不作具体地限定。Among them, the network structure of the classification model can be set according to actual needs, such as the number of layers of the network and the number of nodes in each layer. The classification model may be constructed and generated by various artificial intelligence models with classification functions, including but not limited to convolutional neural networks, recurrent neural networks, computer vision group networks, and residual network series, which are not specifically limited in this embodiment.
谣言检测标签为样本事件的真实谣言检测类别,包括谣言和非谣言。The rumor detection label is the true rumor detection category of the sample event, including rumor and non-rumor.
可选地,在执行步骤103之前,可以预先训练得到训练后的分类模型,具体训练方式为迭代训练,具体训练步骤可以包括:首先,获取样本事件的第二源文本信息和第二传播结构特征;将样本事件的第二源文本信息和第二传播结构特征作为输入信息,将样本事件的谣言检测标签,作为真实标签,对分类模型进行迭代训练,以获取准确区分谣言事件和非谣言事件的分类模型。Optionally, before
在分类模型训练完成后,将待检测事件的第一源文本和第一传播结构特征进行拼接后,输入训练后的分类模型中,即可确定待检测事件是否为谣言事件或非谣言事件。After the training of the classification model is completed, the first source text of the event to be detected and the first propagation structure feature are spliced and input into the trained classification model to determine whether the event to be detected is a rumor event or a non-rumor event.
本实例通过根据样本事件的第二源文本和响应推文构建传播结构图,并基于图卷积神经网络模型对传播结构图进行特征提取获得样本事件的第二传播结构特征,根据样本事件的第二源文本、第二传播结构特征进行迭代,即可得到可根据事件的源文本,模拟生成传播结构特征的条件变分自编码器;在获取到待检测事件的第一源文本时,即可基于条件变分自编码器自动模拟生成待检测事件的第一传播结构特征;然后,联合待检测事件的第一传播结构特征和第一源文本,即可对待检测事件进行谣言检测;一方面,整个谣言检测过程中,只需要获取到待检测事件的第一源文本,即可快速生成相应的传播结构特征,减少收集大量响应推文后才能获取传播结构特征带来的延时,且在谣言的发布的源头上即可检测出谣言,有效提高谣言检测的时效性;另一方面,将事件的源文本信息和传播结构特征进行结合,丰富了与谣言检测相关的特征,使得谣言检测结果更加准确。In this example, a propagation structure diagram is constructed according to the second source text of the sample event and the response tweet, and the feature extraction is performed on the propagation structure diagram based on the graph convolutional neural network model to obtain the second propagation structure feature of the sample event. The second source text and the second propagation structure feature are iterated to obtain a conditional variational autoencoder that can simulate and generate the propagation structure feature according to the source text of the event; when the first source text of the event to be detected is obtained, the Based on the conditional variational autoencoder, the first propagation structure feature of the event to be detected is automatically generated by simulation; then, the first propagation structure feature of the event to be detected and the first source text can be combined to perform rumor detection on the event to be detected; on the one hand, In the entire rumor detection process, only the first source text of the event to be detected can be obtained, and the corresponding propagation structure features can be quickly generated, reducing the delay caused by collecting a large number of response tweets before obtaining the propagation structure features. Rumors can be detected at the source of the release, which effectively improves the timeliness of rumor detection; on the other hand, the combination of the source text information of the event and the communication structure features enriches the features related to rumor detection, making the rumor detection results more accurate. precise.
在上述实施例的基础上,本实施例中所述将所述第一源文本和第一传播结构特征输入分类模型中,得到所述分类模型输出的所述待检测事件的谣言检测结果,包括:将所述第一源文本输入所述分类模型的特征提取层,得到所述第一源文本的文本特征;将所述文本特征与所述第一传播结构特征进行拼接后,输入所述分类模型的分类层,得到所述待检测事件的谣言检测结果。On the basis of the above embodiment, in this embodiment, the first source text and the first propagation structure feature are input into the classification model, and the rumor detection result of the to-be-detected event output by the classification model is obtained, including : Input the first source text into the feature extraction layer of the classification model to obtain the text features of the first source text; after splicing the text features and the first propagation structure features, input the classification The classification layer of the model obtains the rumor detection result of the event to be detected.
可选地,本实施例中的分类模型包括特征提取层和分类层;Optionally, the classification model in this embodiment includes a feature extraction layer and a classification layer;
其中,特征提取层可基于各种神经网络模型构建生成,如循环神经网络和卷积神经网络等,本实施例对此不作具体地限定。The feature extraction layer may be constructed and generated based on various neural network models, such as a cyclic neural network and a convolutional neural network, which are not specifically limited in this embodiment.
可选地,根据第一源文本和第一传播结构特征,对待检测事件进行谣言检测的步骤包括,首先将第一源文本输入分类模型的特征提取层,得到第一源文本的文本特征;Optionally, according to the first source text and the first dissemination structure feature, the step of performing rumor detection on the event to be detected includes, first, inputting the first source text into a feature extraction layer of a classification model to obtain text features of the first source text;
然后,再将第一源文本的文本特征和第一传播结构特征进行拼接,得到融合特征;再将融合特征输入分类层,得到待检测事件的属于谣言的概率P(rumor),根据待检测事件的属于谣言的概率,确定待检测事件是否属于谣言。Then, the text features of the first source text and the first propagation structure features are spliced to obtain fusion features; then the fusion features are input into the classification layer to obtain the probability P(rumor) of the event to be detected belonging to rumors, according to the event to be detected. The probability of belonging to a rumor is determined to determine whether the event to be detected is a rumor.
具体确定方式可以为,将待检测事件的属于谣言的概率与预设值进行比较,若大于预设值则确定待检测事件属于谣言事件,否则属于谣言事件。The specific determination method may be as follows: comparing the probability of the event to be detected belonging to a rumor with a preset value, and if it is greater than the preset value, it is determined that the event to be detected belongs to a rumor event, otherwise it belongs to a rumor event.
本实施例通过分类模型的特征提取层对第一源文本进行特征提取,可以提取出影响谣言检测的深层特征,使得谣言检测结果更加准确;并根据包含文本特征与第一传播结构特征的融合特征,对待检测事件进行谣言检测,可进一步提高谣言检测的准确性。In this embodiment, the feature extraction layer of the classification model performs feature extraction on the first source text, which can extract the deep features that affect the rumor detection, so that the rumor detection result is more accurate; , rumor detection of the event to be detected can further improve the accuracy of rumor detection.
在上述实施例的基础上,本实施例中所述分类模型的特征提取层基于循环神经网络模型构建生成;所述分类模型的分类层基于softmax函数构建生成。On the basis of the above embodiment, the feature extraction layer of the classification model in this embodiment is constructed and generated based on the cyclic neural network model; the classification layer of the classification model is constructed and generated based on the softmax function.
可选地,分类模型基于循环神经网络和分类层构建生成;Optionally, the classification model is constructed and generated based on a recurrent neural network and a classification layer;
其中,其中,循环神经网络可以是标准循环神经网络(Recurrent NeuralNetwork,RNN),也是其衍生算法中的一种,如LSTM(Long short-term memory,长短期记忆)网络,本实施例对此不作具体地限定。Among them, the cyclic neural network may be a standard cyclic neural network (Recurrent Neural Network, RNN), which is also one of its derivative algorithms, such as an LSTM (Long short-term memory, long short-term memory) network, which is not made in this embodiment. Specifically defined.
如图2所示,分类模型由LSTM网络和softmax函数构建生成。As shown in Figure 2, the classification model is constructed by LSTM network and softmax function.
以下以由LSTM网络和softmax函数构建的分类模型为例,对本实施例中的谣言检测方法展开描述。The rumor detection method in this embodiment is described below by taking the classification model constructed by the LSTM network and the softmax function as an example.
在基于分类模型对待检测事件进行谣言检测之前,可以预先训练得到训练后的分类模型,具体训练步骤可以包括:Before performing rumor detection on the event to be detected based on the classification model, the trained classification model can be pre-trained, and the specific training steps can include:
首先,对样本事件的第二源文本进行向量化,得到样本事件i的第二源文本向量ni是源文本向量ci中包含的单词数量,xt代表源文本向量ci中的第t个词的词向量。First, vectorize the second source text of the sample event to obtain the second source text vector of the sample event i n i is the number of words contained in the source text vector c i , and x t represents the word vector of the t-th word in the source text vector c i .
然后,将ci输入到一个LSTM网络中,提取源文本向量的高阶特征表示 Then, ci is fed into an LSTM network to extract high-order feature representations of the source text vectors
其中,LSTM网络含有三个门,分别为遗忘门、输入门和输出门。每个门单元的输入有三个,分别为当前时刻LSTM网络的输入值xt,上一时刻LSTM网络的输出值at-1,以及上一时刻的单元状态值ct-1。Among them, the LSTM network contains three gates, namely forget gate, input gate and output gate. There are three inputs to each gate unit, which are the input value x t of the LSTM network at the current moment, the output value a t-1 of the LSTM network at the previous moment, and the unit state value c t-1 at the previous moment.
LSTM网络中遗忘门的计算公式为:The calculation formula of the forget gate in the LSTM network is:
Ff=σ(Wf[at-1,xt]+bf);F f =σ(W f [a t-1 , x t ]+b f );
Ff是遗忘门的输出,遗忘门用于读取当前网络的输入信息xt与上一时刻LSTM单元输出值at-1;σ(·)是激活函数,控制输出在0到1之间;Wf是权重系数,bf偏置项,at-1是上一时刻LSTM网络的输出值。Ff决定上一时刻LSTM网络的状态值的信息量有多少会被遗弃,取值在0到1之间;当Ff取值为0,代表遗忘上一时刻单元状态值ct-1的全部信息,取值为1则保留所有ct-1中的信息。F f is the output of the forget gate, which is used to read the input information x t of the current network and the output value a t-1 of the LSTM unit at the previous moment; σ( ) is the activation function, and the control output is between 0 and 1 ; W f is the weight coefficient, b f bias term, at -1 is the output value of the LSTM network at the previous moment. F f determines how much information of the state value of the LSTM network at the last moment will be abandoned, and the value is between 0 and 1; when F f is 0, it means forgetting the state value c t-1 of the unit at the previous moment. All information, if the value is 1, all the information in c t-1 is retained.
LSTM中输入门用于控制输入数据xt有多少信息被输入到单元中,计算公式为:The input gate in LSTM is used to control how much information of the input data x t is input into the unit. The calculation formula is:
Fi=σ(Wi[at-1,xt]+bi);F i =σ(W i [a t-1 , x t ]+ bi );
Fi是输入门的输出,用于控制多少输入信息能够输入到网络单元中是LSTM网络状态值的中间过渡量,ct是当前单元更新后的状态值;Wi和Wc是权重系数,bi与bc是偏置项。F i is the output of the input gate, which controls how much input information can be fed into the network unit is the intermediate transition amount of the state value of the LSTM network, c t is the updated state value of the current unit; Wi and W c are the weight coefficients, and bi and b c are the bias terms.
LSTM的输出门控制当前单元状态值ct-1有多少信息被输出,计算公式为:The output gate of LSTM controls how much information is output from the current cell state value c t-1 . The calculation formula is:
Fo=σ(Wo[at-1,xt]+bo);F o =σ(W o [a t-1 , x t ]+b o );
at=Fo*ct;a t =F o *c t ;
Fo是输出控制门,控制当前单元状态值ct的信息输出比例。Wo与bo是权重系数与偏置项。at为LSTM网络最终输出的源文本的高阶特征向量。F o is the output control gate, which controls the information output ratio of the current unit state value ct . W o and b o are weight coefficients and bias terms. at is the high-order feature vector of the source text finally output by the LSTM network.
然后,提取条件变分自编码器生成样本事件的第二传播结构特征,将第二传播结构特征与对应的源文本特征向量ai拼接,再通过一个全连接层并使用softmax函数进行分类检测,得到属于谣言标签的概率,计算公式为:Then, the conditional variational autoencoder is extracted to generate the second propagation structure feature of the sample event, and the second propagation structure feature is spliced with the corresponding source text feature vector a i , and then passes through a fully connected layer and uses the softmax function for classification and detection, To get the probability of belonging to the rumor label, the calculation formula is:
其中,是分类模型输出的第i个样本事件ci的谣言检测结果,ai是样本事件ci的源文本的高阶特征表示;Hi为样本事件ci的传播结构特征。in, is the rumor detection result of the ith sample event c i output by the classification model, a i is the high-order feature representation of the source text of the sample event c i ; H i is the propagation structure feature of the sample event c i .
由于对Hi的积分是不可求的,因此使用蒙特卡洛对条件变分自编码器生成的传播结构特征进行M次采样并求其均值作为期望最终预测的标签概率表达式可转换为下式:Since the integral over Hi is not obtainable, Monte Carlo is used to sample the propagation structure features generated by the conditional variational autoencoder M times and take the mean value as the expectation The final predicted label probability expression can be transformed into the following formula:
是预测的谣言检测标签;ai是源文本的高阶特征,z是隐变量。可使用负对数似然函数作为分类模型的损失函数,具体计算公式: is the predicted rumor detection label; ai is the high-order feature of the source text, and z is the latent variable. The negative log-likelihood function can be used as the loss function of the classification model. The specific calculation formula is:
N是样本事件的数量,ai是样本事件ci的源文本的高阶特征表示;是第i个样本事件的预测谣言检测标签。N is the number of sample events, a i is the high-order feature representation of the source text of sample events c i ; is the predicted rumor detection label of the ith sample event.
最后,根据分类模型的损失函数对分类模型进行迭代训练,直到满足分类模型的训练终止条件,以获取可准确区分谣言事件和非谣言事件的分类模型。Finally, the classification model is iteratively trained according to the loss function of the classification model until the training termination condition of the classification model is satisfied, so as to obtain a classification model that can accurately distinguish rumor events and non-rumor events.
本实施例通过采用基于循环神经网络使得条件生成对抗网络中可更好地的捕获用于表征谣言事件与非谣言事件在时间序列上的文本,进而使得检测结果更加准确;另外,使用蒙特卡洛对条件变分自编码器生成的传播结构特征进行多次采样后求其均值的方式,对谣言检测概率函数进行转换,进而可快速有效地获取最优化的分类模型。In this embodiment, by using a recurrent neural network, the conditional generative adversarial network can better capture the text used to characterize the rumor event and the non-rumor event in the time series, thereby making the detection result more accurate; in addition, using Monte Carlo The method of sampling the propagation structure features generated by the conditional variational autoencoder for many times and then calculating the mean value can transform the rumor detection probability function, and then the optimal classification model can be obtained quickly and effectively.
在上述各实施例的基础上,本实施例中所述传播结构图基于如下步骤构建:以所述样本事件中的第二源文本和各响应推文为节点,根据任意两个节点之间的响应关系构建所述样本事件的传播结构图。On the basis of the above embodiments, the propagation structure diagram in this embodiment is constructed based on the following steps: taking the second source text and each response tweet in the sample event as nodes, according to the relationship between any two nodes The response relationship constructs a propagation structure diagram of the sample event.
其中,传播结构图为表征事件中各帖子之间的互动过程和信息的传播结构,即各帖子之间的响应关系。Among them, the dissemination structure diagram represents the interaction process among the posts in the event and the dissemination structure of the information, that is, the response relationship between the posts.
可选地,以事件为单位,为每个样本事件构建传播结构图;对于第i个事件,其传播结构图为Gi=(Vi,Ei);其中,Vi为传播图的节点集,每个节点代表一个贴子,Ei为第i个事件传播结构图的边集,每条边代表任意两个帖子节点之间的响应关系,Ai代表第i个事件传播结构图的邻接矩阵。Optionally, take the event as a unit to construct a propagation structure diagram for each sample event; for the i-th event, its propagation structure diagram is G i =(V i ,E i ); wherein, V i is a node of the propagation diagram set, each node represents a post, E i is the edge set of the ith event propagation structure graph, each edge represents the response relationship between any two post nodes, A i represents the ith event propagation structure graph adjacency matrix.
其中,邻接矩阵可根据任意两个帖子节点之间的响应关系构建生成。Among them, the adjacency matrix can be constructed and generated according to the response relationship between any two post nodes.
响应关系为任意两个帖子之间的响应关系,具体可以根据实际需求进行设置。The response relationship is the response relationship between any two posts, which can be set according to actual needs.
本实施例通过构建传播结构图,可挖掘事件中各帖子之间的传播结构信息,捕捉更加丰富的上下文关系,更有利用谣言检测。In this embodiment, by constructing a propagation structure diagram, it is possible to mine the propagation structure information among the posts in the event, capture richer contextual relationships, and utilize rumor detection.
在上述实施例的基础上,本实施例中所述响应关系包括转发关系、时间维度关系和评论关系。On the basis of the foregoing embodiment, the response relationship in this embodiment includes a forwarding relationship, a time dimension relationship, and a commenting relationship.
可选地,转发关系用于表征两个帖子之间的转发层次关系,任意两个帖子之间可能存在转发关系,可能不存在转发关系;如源帖与转发帖之间存在转发关系;从源帖转发文本的两个转发帖之间不存在转发关系。Optionally, the forwarding relationship is used to represent the forwarding hierarchical relationship between two posts. There may or may not be a forwarding relationship between any two posts; for example, there is a forwarding relationship between the source post and the forwarding post; There is no forwarding relationship between the two forwarded posts of the forwarded text.
评论关系用于表征两个帖子之间的评论层次关系,任意两个帖子之间可能存在评论关系,可能不存在评论关系;如源帖与评论帖之间存在转发关系;对源帖进行评论的两个评论帖之间不存在评论关系。The comment relationship is used to represent the comment level relationship between two posts. There may or may not be a comment relationship between any two posts; for example, there is a forwarding relationship between the source post and the comment post; There is no comment relationship between the two comment posts.
时间维度关系为,对同一帖子进行转发或评论的任意两个帖子之间存在的时间维度;或不同帖子发布的时间存在的时间维度。The time dimension relationship is the time dimension that exists between any two posts that forward or comment on the same post; or the time dimension that exists when different posts are published.
本实施例综合响应关系包括转发关系、时间维度关系和评论关系构建传播结构图,使得传播结构图中包含各帖子之间丰富的关系特征,进而使得谣言检测结果更加准确。The comprehensive response relationship in this embodiment includes a forwarding relationship, a time dimension relationship, and a comment relationship to construct a communication structure diagram, so that the communication structure diagram contains rich relationship features between posts, thereby making the rumor detection result more accurate.
在上述各实施例的基础上,本实施例中所述第二传播结构特征基于如下步骤获取:将所述样本事件的传播结构图输入所述图卷积神经网络模型中,得到所述图卷积神经网络模型的特征提取层输出的所述样本事件的第二传播结构特征;其中,所述图卷积神经网络模型根据所述样本事件的传播结构图和所述样本事件的谣言检测标签进行训练获取。On the basis of the above embodiments, the second propagation structure feature in this embodiment is obtained based on the following steps: inputting the propagation structure diagram of the sample event into the graph convolutional neural network model to obtain the graph volume The second propagation structure feature of the sample event output by the feature extraction layer of the convolutional neural network model; wherein, the graph convolutional neural network model is based on the propagation structure diagram of the sample event and the rumor detection label of the sample event. training acquisition.
需要说明的是,在条件变分自编码器进行训练之前,需要预先获取样本事件的第二传播结构特征;第二传播结构特征的获取步骤包括,首先,获取样本事件的传播结构图;然后,将传播结构图中的各节点和邻接矩阵作为图卷积神经网络模型的输入,对传播结构图中的各节点进行更新,以得到相应的传播结构特征。其中,每个节点信息的更新是通过与之相连的邻居节点及其自身信息进行聚合形成。It should be noted that, before the conditional variational autoencoder is trained, the second propagation structure feature of the sample event needs to be acquired in advance; the obtaining step of the second propagation structure feature includes: first, obtaining the propagation structure diagram of the sample event; then, Each node and adjacency matrix in the propagation structure graph are used as the input of the graph convolutional neural network model, and each node in the propagation structure graph is updated to obtain the corresponding propagation structural features. The update of the information of each node is formed by the aggregation of the adjacent nodes connected to it and its own information.
可选地,利用图卷积网络模型(Graph convolution Network,GCN)构建事件传播结构特征。每层GCN与下一层GCN之间的节点更新公式为:Optionally, a Graph Convolution Network (GCN) model is used to construct event propagation structural features. The node update formula between each layer of GCN and the next layer of GCN is:
其中,为第i个事件传播结构图的经过l层卷积更新后所有节点的特征矩阵;其中hij (l)为事件的第j个节点经过GCN更新后的特征向量;为归一化对称邻接矩阵,Di为邻接矩阵Ai的度矩阵;W(l)为权重矩阵;B(l)为偏置项;σ(·)为激活函数。当l=0时,Hi (0)=Xi;Xi为第i个事件的文本向量;将Xi输入到l层GCN中,经过l层卷积更新后得到传播图特征矩阵Hi (l),则所有事件传播图特征矩阵H(l)={H1 (l),H2 (l),…,Hi (l),…,Hn (l)}。in, is the feature matrix of all nodes after the l-layer convolution update of the i-th event propagation structure diagram; wherein h ij (l) is the feature vector of the j-th node of the event after GCN update; is the normalized symmetric adjacency matrix, D i is the degree matrix of the adjacency matrix A i ; W (l) is the weight matrix; B (l) is the bias term; σ(·) is the activation function. When l=0, H i (0) =X i ; X i is the text vector of the i-th event; X i is input into the 1-layer GCN, and the propagation map feature matrix H i is obtained after the 1-layer convolution update (l) , then all event propagation graph feature matrices H (l) = {H 1 (l) , H 2 (l) , ..., H i (l) , ..., H n (l) }.
将获得的传播图特征Hi (l)进行池化操作后得到Hi,并用softmax分类器预测事件的谣言标签,公式为:The obtained propagation map feature H i (l) is pooled to obtain H i , and the softmax classifier is used to predict the rumor label of the event. The formula is:
其中,是第i个事件的标签,W是权重参数,b是偏置项。in, is the label of the ith event, W is the weight parameter, and b is the bias term.
然后,根据预测谣言标签和真实谣言标签得到损失函数;基于损失函数对图卷积神经网络模型进行训练。其中,损失函数使用交叉熵函数进行模型优化的公式为:Then, the loss function is obtained according to the predicted rumor label and the real rumor label; the graph convolutional neural network model is trained based on the loss function. Among them, the loss function uses the cross entropy function to optimize the model as:
和yic分别为事件的预测谣言标签和真实谣言标签;N为样本事件的数量;C是标签类别数量,本实施例中标签类别数量为2,即谣言事件与非谣言事件。 and y ic are the predicted rumor label and the real rumor label of the event, respectively; N is the number of sample events; C is the number of label categories. In this embodiment, the number of label categories is 2, that is, rumor events and non-rumor events.
需要说明的是,此处的谣言检测为辅助训练,训练完成后得到样本事件的传播结构特征Hi,作为条件变分自编码器的训练数据。It should be noted that the rumor detection here is auxiliary training. After the training is completed, the propagation structure feature H i of the sample event is obtained, which is used as the training data of the conditional variational autoencoder.
本实施例通过图卷积神经网络模型可快速准确地获取样本事件的第二传播结构特征,并将其作为训练数据,以训练获取根据源文本模拟生成传播结构特征的条件变分自编码器。In this embodiment, the graph convolutional neural network model can quickly and accurately acquire the second propagation structure feature of the sample event, and use it as training data to train a conditional variational autoencoder that simulates and generates the propagation structure feature according to the source text.
在上述各实施例的基础上,本实施例中所述将所述第一源文本输入条件变分自编码器中,得到所述条件变分自编码器生成的所述待检测事件的第一传播结构特征,包括:对所述第一源文本进行预处理;其中,所述预处理包括去除乱码数据和文本向量化处理;将预处理后的第一源文本输入所述条件变分自编码器中,得到所述待检测事件的第一传播结构特征。On the basis of the above embodiments, in this embodiment, the first source text is input into the conditional variational autoencoder, and the first source of the to-be-detected event generated by the conditional variational autoencoder is obtained. Propagating structural features includes: preprocessing the first source text; wherein, the preprocessing includes removing garbled data and text vectorization; inputting the preprocessed first source text into the conditional variational auto-encoding In the device, the first propagation structure feature of the to-be-detected event is obtained.
可选地,在执行步骤102之前,可先对第一源文本进行数据处理;预处理的步骤包括:首先,去除第一源文本信息中的乱码数据和/或空行数据,以得到不包含乱码数据和/或空行数据的有效的第一源文本信息;然后,使用Doc2vec模型将第一源文本编码成文本向量,得到第一源文本信息的文本向量Optionally, before performing
本实施例通过将第一源文本转换为文本向量,可以将文本中的每个词表示为低维的实数域取值的向量,可以将每个词的语义特征也编码进文本向量中。In this embodiment, by converting the first source text into a text vector, each word in the text can be represented as a low-dimensional real number domain value vector, and the semantic feature of each word can also be encoded into the text vector.
在对第一源文本信息进行预处理后,可获取到第一源文本信息的文本向量;可将第一源文本信息文本向量输入条件变分自编码器中,得到待检测事件的第一传播结构特征。After preprocessing the first source text information, the text vector of the first source text information can be obtained; the first source text information text vector can be input into the conditional variational autoencoder to obtain the first propagation of the event to be detected. Structure.
需要说明的是,在训练过程,也可按照上述方式对样本事件的第二源文本和响应推文进行预处理,并将预处理后的第二源文本和响应推文训练样本,以对条件变分自编码器、图卷积神经网络模型和分类模型进行训练。It should be noted that, in the training process, the second source text and response tweets of the sample events can also be preprocessed in the above-mentioned manner, and the preprocessed second source text and response tweet training samples can be used to evaluate the conditions. Variational autoencoders, graph convolutional neural network models, and classification models are trained.
本实施例通过对第一源文本预处理,可生成包含每一词的语义特征的词向量,同时为后续谣言检测奠定了基础。In this embodiment, by preprocessing the first source text, a word vector containing the semantic features of each word can be generated, and at the same time, a foundation is laid for subsequent rumor detection.
如图3和图4,为本实施例提出的一种基于生成传播结构特征的早期谣言检测方法,主要包括以下几个步骤:收集样本;数据提取,并使文本向量化;构造样本事件的传播结构图,使用图卷积网络构造传播结构特征;根据样本事件的源帖和传播结构特征训练图变分自编码器,训练完成后得到一个能够准确生成传播结构特征的生成器;根据样本事件的源帖和传播结构特征,以及真实谣言标签,对分类模型进行训练,得到一个可对事件进行准确分类的分类模型;将待检测事件的源帖作为条件,生成器生成与条件相对应的模拟传播结构特征;将待检测事件的源帖与其对应的模拟传播结构特征结合,再通过一个softmax函数进行谣言检测。As shown in FIG. 3 and FIG. 4 , an early rumor detection method based on generating propagation structure features proposed in this embodiment mainly includes the following steps: collecting samples; extracting data and vectorizing the text; constructing the propagation of sample events Structural graph, using graph convolutional network to construct propagation structural features; training graph variational autoencoder according to the source post and propagation structural features of sample events, and obtaining a generator that can accurately generate propagation structural features after training; Source post and propagation structure features, as well as real rumor labels, train the classification model to obtain a classification model that can accurately classify events; take the source post of the event to be detected as a condition, and the generator generates a simulated propagation corresponding to the condition. Structural features; combine the source post of the event to be detected with its corresponding simulated propagation structural features, and then use a softmax function to detect rumors.
以下对上述步骤进行详述,具体包括:The above steps are described in detail below, including:
具体逻辑流程如下:The specific logic flow is as follows:
步骤1,收集样本数据;
步骤2,文本向量化;该步骤具体包括:Step 2, text vectorization; this step specifically includes:
步骤2.1,处理样本数据集,以事件为单位,将同一事件的源帖及其转帖信息进行聚合;Step 2.1, process the sample data set, and aggregate the source posts and their repost information of the same event in units of events;
步骤2.2,使用Doc2vec将样本事件的所有帖子进行文本向量化,得到每个帖子的文本向量表示;Step 2.2, use Doc2vec to vectorize all the posts of the sample event to get the text vector representation of each post;
步骤3,训练图卷积网络GCN模型,构造事件的传播结构特征;该步骤具体包括:Step 3, train the graph convolutional network GCN model to construct the propagation structure features of the event; this step specifically includes:
步骤3.1,构造样本事件传播结构图G=(V,E),并得到邻接矩阵A。V是节点集,E是边集,每个节点代表一个帖子,每条边代表两个帖子节点之间的转发关系、时间维度关系和评论关系;Step 3.1, construct the sample event propagation structure graph G=(V, E), and obtain the adjacency matrix A. V is the node set, E is the edge set, each node represents a post, and each edge represents the forwarding relationship, time dimension relationship and comment relationship between two post nodes;
步骤3.2,利用图卷积网络模型GCN构造每个样本事件的传播结构特征。Step 3.2, using the graph convolutional network model GCN to construct the propagation structure features of each sample event.
步骤3.3,将获得的传播结构特征输入到softmax函数,得到样本事件的谣言预测标签,根据谣言预测标签和谣言真实标签得到的损失函数,对图卷积网络模型进行训练,直到满足模型的训练终止条件;Step 3.3, input the obtained propagation structure features into the softmax function, obtain the rumor prediction label of the sample event, and train the graph convolution network model according to the loss function obtained from the rumor prediction label and the true label of the rumor, until the training of the model is terminated. condition;
将样本事件的传播结构图输入训练后的图卷积网络模型中,获得样本事件的传播结构特征Hi,作为后续条件生成对抗网络模型CGAN的训练数据集;The propagation structure diagram of the sample event is input into the trained graph convolutional network model, and the propagation structure feature H i of the sample event is obtained, which is used as the training data set of the adversarial network model CGAN for subsequent conditions;
步骤4,根据样本事件的源帖和传播结构特征训练CGAN模型;该步骤具体包括:Step 4: Train the CGAN model according to the source post and propagation structure features of the sample event; this step specifically includes:
步骤4.1,提取样本事件的源帖的文本向量,作为条件变分自编码器CVAE模型中的输入条件;Step 4.1, extract the text vector of the source post of the sample event as the input condition in the conditional variational autoencoder CVAE model;
步骤4.2,提取步骤3中图卷积网络模型输出的传播结构特征Hi,作为条件变分自编码器CVAE模型中的真实样本数据。Step 4.2, extract the propagation structure feature H i output by the graph convolutional network model in step 3 as the real sample data in the conditional variational autoencoder CVAE model.
步骤4.3,训练CVAE模型;其中,CVAE由编码器E(·)和解码器器D(·)组成;E(·)的输入是源帖ci(作为条件)以及真实样本数据Hi,输出是隐变量z的均值μ和方差δ2;D(·)的输入是源帖ci以及采样后的隐变量z,输出是对真实样本数据Hi重构的数据;Step 4.3, train CVAE model; wherein, CVAE consists of encoder E(·) and decoder D(·); the input of E(·) is source frame c i (as a condition) and real sample data H i , output is the mean μ and variance δ 2 of the latent variable z; the input of D(·) is the source post c i and the sampled latent variable z, and the output is the reconstructed data of the real sample data Hi ;
步骤5,根据样本事件的源帖和传播结构特征,以及真实谣言标签,对分类模型进行训练;该步骤包括:Step 5: Train the classification model according to the source post and dissemination structure features of the sample event, as well as the true rumor label; this step includes:
步骤5.1,获取样本事件的源帖的文本向量和传播结构特征;Step 5.1, obtain the text vector and propagation structure features of the source post of the sample event;
步骤5.2,将样本事件的源帖的文本向量和传播结构特征作为输入信息,将样本事件的谣言检测标签作为真实标签,对分类模型进行训练;Step 5.2, using the text vector and the propagation structure feature of the source post of the sample event as input information, and using the rumor detection label of the sample event as the real label, to train the classification model;
步骤6,根据待检测事件的源帖生成对应的模拟传播结构特征;该步骤包括:Step 6, generating corresponding simulated propagation structure features according to the source post of the event to be detected; this step includes:
步骤6.1,提取待检测事件的源帖的文本向量;Step 6.1, extract the text vector of the source post of the event to be detected;
步骤6.2,将源帖的文本向量输入到条件变分自编码器中,经过条件变分自编码器CVAE生成对应的模拟传播结构特征;Step 6.2, input the text vector of the source post into the conditional variational autoencoder, and generate the corresponding simulated propagation structure features through the conditional variational autoencoder CVAE;
步骤7,根据待检测事件的源帖与生成的模拟传播结构特征进行早期谣言检测;该步骤包括:Step 7: Perform early rumor detection according to the source post of the event to be detected and the generated simulated propagation structure features; this step includes:
步骤7.1,提取待检测事件的源帖的文本信息、模拟传播结构特征;Step 7.1, extract the text information of the source post of the event to be detected, and simulate the propagation structure characteristics;
步骤7.2,将源文本经过word2vec向量化后输入到长短期LSTM中,得到源文本的高阶文本特征表示ai;Step 7.2, input the source text into the long-term and short-term LSTM after word2vec vectorization, and obtain the high-order text feature representation a i of the source text;
步骤7.3,将文本特征ai与模拟传播结构特征结合后输入到softmax函数中进行谣言检测。Step 7.3, combine the text feature a i with the simulated propagation structure feature and input it into the softmax function for rumor detection.
综上,本实施例中的谣言检测方法,首先,根据样本事件的源帖与转帖构建事件传播结构图,之后利用图卷积神经网络对样本事件的传播结构图进行传播结构特征提取;然后将样本事件的源帖与提取的传播结构特征输入到条件变分自编码器中,对条件变分自编码器进行训练,最终获得一个能够生成与真实传播结构特征相似的模拟传播结构特征的生成器;之后将待检测事件的源帖输入到训练好的生成器中,生成与源帖对应的模拟传播结构特征;最终将源帖和生成的模拟传播结构特征结合进行早期谣言检测,可实现在发布源帖时,即可快速准确地检测出谣言事件。To sum up, in the rumor detection method in this embodiment, first, an event propagation structure graph is constructed according to the source post and repost of the sample event, and then the graph convolutional neural network is used to extract the propagation structure feature of the propagation structure graph of the sample event; then The source post of the sample event and the extracted propagation structure features are input into the conditional variational autoencoder, and the conditional variational autoencoder is trained, and finally a generation of simulated propagation structure features that can generate similar to the real propagation structure features is obtained. Then the source post of the event to be detected is input into the trained generator to generate the simulated propagation structure features corresponding to the source post; finally, the source post and the generated simulated propagation structure feature are combined for early rumor detection, which can be achieved in Rumor events can be detected quickly and accurately when the source post is published.
综上,本实施例使用图卷积网络构建传播结构特征,能更好的捕捉源帖与转帖之间的紧密联系,能够更好地学习传播结构中隐藏的信息特征;基于条件变分自编码器,能够训练出一个性能良好的生成器,该生成器能够在只给定源帖文本信息的情况下自动生成传播结构特征,即能够在谣言早期仅有源帖信息的情况下预测谣言的传播结构,实现早期谣言检测。此外生成传播结构特征模型在早期缺乏传播信息的情况下扩充了早期谣言检测的数据。To sum up, this embodiment uses the graph convolutional network to construct the propagation structure feature, which can better capture the close connection between the source post and the reposted post, and can better learn the hidden information features in the propagation structure; Encoder, which can train a generator with good performance, which can automatically generate the propagation structure features given only the text information of the source post, that is, it can predict the rumors in the early stage of the rumor when only the source post information is available. Propagation structure for early rumor detection. In addition, the generative propagation structure feature model augments the data of early rumor detection in the absence of early propagation information.
下面对本发明提供的基于生成传播结构特征的早期谣言检测装置进行描述,下文描述的基于生成传播结构特征的早期谣言检测装置与上文描述的基于生成传播结构特征的早期谣言检测方法可相互对应参照。The following describes the early rumor detection device based on the generated propagation structure feature provided by the present invention. The early rumor detection device based on the generated propagation structure feature described below and the early rumor detection method based on the generated propagation structure feature described above may refer to each other correspondingly. .
如图5所示,本实施例提供一种基于生成传播结构特征的早期谣言检测装置,该装置包括:文本获取模块501、传播结构特征生成模块502和谣言检测模块503,其中:As shown in FIG. 5 , this embodiment provides an early rumor detection device based on generating propagation structure features. The device includes: a text acquisition module 501, a propagation structure feature generation module 502, and a rumor detection module 503, wherein:
文本获取模块501用于获取待检测事件的第一源文本;The text acquisition module 501 is used to acquire the first source text of the event to be detected;
传播结构特征生成模块502用于将所述第一源文本输入条件变分自编码器中,得到所述条件变分自编码器生成的所述待检测事件的第一传播结构特征;其中,所述条件变分自编码器根据样本事件的第二源文本和第二传播结构特征进行训练获取;所述第二传播结构特征基于图卷积神经网络模型对样本事件的传播结构图进行特征提取获得;所述传播结构图根据所述样本事件的第二源文本和所述第二源文本对应的响应推文构建获取;The propagation structure feature generation module 502 is configured to input the first source text into a conditional variational auto-encoder, and obtain the first propagation structure feature of the to-be-detected event generated by the conditional variational auto-encoder; wherein, the The conditional variational autoencoder is obtained by training according to the second source text of the sample event and the second propagation structure feature; the second propagation structure feature is obtained by feature extraction on the propagation structure graph of the sample event based on the graph convolutional neural network model. ; The propagation structure diagram is constructed and obtained according to the second source text of the sample event and the response tweet corresponding to the second source text;
谣言检测模块503用于将所述第一源文本和第一传播结构特征输入分类模型中,得到所述分类模型输出的所述待检测事件的谣言检测结果;所述分类模型根据所述第二源文本信息和第二传播结构特征,以及所述样本事件的谣言检测标签进行训练获取。The rumor detection module 503 is configured to input the first source text and the first propagation structure feature into the classification model, and obtain the rumor detection result of the to-be-detected event output by the classification model; the classification model is based on the second The source text information, the second propagation structure feature, and the rumor detection label of the sample event are acquired through training.
本实施例通过根据样本事件的第二源文本和响应推文构建传播结构图,并基于图卷积神经网络模型对传播结构图进行特征提取获得样本事件的第二传播结构特征,根据样本事件的第二源文本、第二传播结构特征进行迭代,即可得到可根据事件的源文本,模拟生成传播结构特征的条件变分自编码器;在获取到待检测事件的第一源文本时,即可基于条件变分自编码器自动模拟生成待检测事件的第一传播结构特征;然后,联合待检测事件的第一传播结构特征和第一源文本,即可对待检测事件进行谣言检测;一方面,整个谣言检测过程中,只需要获取到待检测事件的第一源文本,即可快速生成相应的传播结构特征,减少收集大量响应推文后才能获取传播结构特征带来的延时,且在谣言的发布的源头上即可检测出谣言,有效提高谣言检测的时效性;另一方面,将事件的源文本信息和传播结构特征进行结合,丰富了与谣言检测相关的特征,使得谣言检测结果更加准确。In this embodiment, a propagation structure diagram is constructed according to the second source text of the sample event and the response tweet, and feature extraction is performed on the propagation structure diagram based on the graph convolutional neural network model to obtain the second propagation structure feature of the sample event. The second source text and the second propagation structure feature are iterated to obtain a conditional variational autoencoder that can simulate and generate the propagation structure feature according to the source text of the event; when the first source text of the event to be detected is obtained, that is, The first propagation structure feature of the event to be detected can be automatically simulated and generated based on the conditional variational autoencoder; then, the first propagation structure feature of the event to be detected and the first source text can be combined to perform rumor detection on the event to be detected; on the one hand , in the entire rumor detection process, only the first source text of the event to be detected can be obtained, and the corresponding propagation structure features can be quickly generated, reducing the delay caused by collecting a large number of response tweets before obtaining the propagation structure features. Rumors can be detected at the source of the release of rumors, which can effectively improve the timeliness of rumor detection; on the other hand, the combination of the source text information of the event and the characteristics of the communication structure enriches the features related to the rumor detection, making the rumor detection results. more precise.
在上述实施例的基础上,本实施例中谣言检测模块,具体用于:将所述第一源文本输入所述分类模型的特征提取层,得到所述第一源文本的文本特征;将所述文本特征与所述第一传播结构特征进行拼接后,输入所述分类模型的分类层,得到所述待检测事件的谣言检测结果。On the basis of the above embodiment, the rumor detection module in this embodiment is specifically configured to: input the first source text into the feature extraction layer of the classification model to obtain text features of the first source text; After splicing the text feature and the first propagation structure feature, input the classification layer of the classification model to obtain the rumor detection result of the to-be-detected event.
在上述实施例的基础上,本实施例中所述分类模型的特征提取层基于循环神经网络模型构建生成;所述分类模型的分类层基于softmax函数构建生成。On the basis of the above embodiment, the feature extraction layer of the classification model in this embodiment is constructed and generated based on the cyclic neural network model; the classification layer of the classification model is constructed and generated based on the softmax function.
在上述各实施例的基础上,本实施例中所述传播结构图基于如下步骤构建:以所述样本事件中的第二源文本和各响应推文为节点,根据任意两个节点之间的响应关系构建所述样本事件的传播结构图。On the basis of the above embodiments, the propagation structure diagram in this embodiment is constructed based on the following steps: taking the second source text and each response tweet in the sample event as nodes, according to the relationship between any two nodes The response relationship constructs a propagation structure diagram of the sample event.
在上述实施例的基础上,本实施例中所述响应关系包括转发关系、时间维度关系和评论关系。On the basis of the foregoing embodiment, the response relationship in this embodiment includes a forwarding relationship, a time dimension relationship, and a commenting relationship.
在上述各实施例的基础上,本实施例中所述第二传播结构特征基于如下步骤获取:将所述样本事件的传播结构图输入所述图卷积神经网络模型中,得到所述图卷积神经网络模型的特征提取层输出的所述样本事件的第二传播结构特征;其中,所述图卷积神经网络模型根据所述样本事件的传播结构图和所述样本事件的谣言检测标签进行训练获取。On the basis of the above embodiments, the second propagation structure feature in this embodiment is obtained based on the following steps: inputting the propagation structure diagram of the sample event into the graph convolutional neural network model to obtain the graph volume The second propagation structure feature of the sample event output by the feature extraction layer of the convolutional neural network model; wherein, the graph convolutional neural network model is based on the propagation structure diagram of the sample event and the rumor detection label of the sample event. training acquisition.
在上述各实施例的基础上,本实施例中传播结构特征生成模块,具体用于:对所述第一源文本进行预处理;其中,所述预处理包括去除乱码数据和文本向量化处理;将预处理后的第一源文本输入所述条件变分自编码器中,得到所述待检测事件的第一传播结构特征。On the basis of the above embodiments, the propagation structure feature generation module in this embodiment is specifically configured to: preprocess the first source text; wherein, the preprocessing includes removing garbled data and text vectorization; Inputting the preprocessed first source text into the conditional variational autoencoder to obtain the first propagation structure feature of the event to be detected.
图6示例了一种电子设备的实体结构示意图,如图6所示,该电子设备可以包括:处理器(processor)601、通信接口(Communications Interface)602、存储器(memory)603和通信总线604,其中,处理器601,通信接口602,存储器603通过通信总线604完成相互间的通信。处理器601可以调用存储器603中的逻辑指令,以执行基于生成传播结构特征的早期谣言检测方法,该方法包括:获取待检测事件的第一源文本;将所述第一源文本输入条件变分自编码器中,得到所述条件变分自编码器生成的所述待检测事件的第一传播结构特征;其中,所述条件变分自编码器根据样本事件的第二源文本和第二传播结构特征进行训练获取;所述第二传播结构特征基于图卷积神经网络模型对样本事件的传播结构图进行特征提取获得;所述传播结构图根据所述样本事件的第二源文本和所述第二源文本对应的响应推文构建获取;将所述第一源文本和第一传播结构特征输入分类模型中,得到所述分类模型输出的所述待检测事件的谣言检测结果;所述分类模型根据所述第二源文本信息和第二传播结构特征,以及所述样本事件的谣言检测标签进行训练获取。FIG. 6 illustrates a schematic diagram of the physical structure of an electronic device. As shown in FIG. 6 , the electronic device may include: a processor (processor) 601, a communication interface (Communications Interface) 602, a memory (memory) 603 and a
此外,上述的存储器603中的逻辑指令可以通过软件功能单元的形式实现并作为独立的产品销售或使用时,可以存储在一个计算机可读取存储介质中。基于这样的理解,本发明的技术方案本质上或者说对现有技术做出贡献的部分或者该技术方案的部分可以以软件产品的形式体现出来,该计算机软件产品存储在一个存储介质中,包括若干指令用以使得一台计算机设备(可以是个人计算机,服务器,或者网络设备等)执行本发明各个实施例所述方法的全部或部分步骤。而前述的存储介质包括:U盘、移动硬盘、只读存储器(ROM,Read-Only Memory)、随机存取存储器(RAM,Random Access Memory)、磁碟或者光盘等各种可以存储程序代码的介质。In addition, the above-mentioned logic instructions in the
另一方面,本发明还提供一种计算机程序产品,所述计算机程序产品包括计算机程序,计算机程序可存储在非暂态计算机可读存储介质上,所述计算机程序被处理器执行时,计算机能够执行上述各方法所提供的基于生成传播结构特征的早期谣言检测方法,该方法包括:获取待检测事件的第一源文本;将所述第一源文本输入条件变分自编码器中,得到所述条件变分自编码器生成的所述待检测事件的第一传播结构特征;其中,所述条件变分自编码器根据样本事件的第二源文本和第二传播结构特征进行训练获取;所述第二传播结构特征基于图卷积神经网络模型对样本事件的传播结构图进行特征提取获得;所述传播结构图根据所述样本事件的第二源文本和所述第二源文本对应的响应推文构建获取;将所述第一源文本和第一传播结构特征输入分类模型中,得到所述分类模型输出的所述待检测事件的谣言检测结果;所述分类模型根据所述第二源文本信息和第二传播结构特征,以及所述样本事件的谣言检测标签进行训练获取。In another aspect, the present invention also provides a computer program product, the computer program product includes a computer program, the computer program can be stored on a non-transitory computer-readable storage medium, and when the computer program is executed by a processor, the computer can Execute the early rumor detection method based on the generated propagation structure feature provided by the above methods, the method includes: obtaining the first source text of the event to be detected; inputting the first source text into the conditional variational encoder, and obtaining the the first propagation structure feature of the event to be detected generated by the conditional variational autoencoder; wherein, the conditional variational autoencoder performs training and acquisition according to the second source text and the second propagation structure feature of the sample event; The second propagation structure feature is obtained by feature extraction on the propagation structure diagram of the sample event based on the graph convolutional neural network model; the propagation structure diagram is obtained according to the second source text of the sample event and the response corresponding to the second source text Tweets are constructed and acquired; the first source text and the first propagation structure feature are input into the classification model, and the rumor detection result of the to-be-detected event output by the classification model is obtained; the classification model is based on the second source The text information and the second propagation structure feature, and the rumor detection label of the sample event are acquired through training.
又一方面,本发明还提供一种非暂态计算机可读存储介质,其上存储有计算机程序,该计算机程序被处理器执行时实现以执行上述各方法提供的基于生成传播结构特征的早期谣言检测方法,该方法包括:获取待检测事件的第一源文本;将所述第一源文本输入条件变分自编码器中,得到所述条件变分自编码器生成的所述待检测事件的第一传播结构特征;其中,所述条件变分自编码器根据样本事件的第二源文本和第二传播结构特征进行训练获取;所述第二传播结构特征基于图卷积神经网络模型对样本事件的传播结构图进行特征提取获得;所述传播结构图根据所述样本事件的第二源文本和所述第二源文本对应的响应推文构建获取;将所述第一源文本和第一传播结构特征输入分类模型中,得到所述分类模型输出的所述待检测事件的谣言检测结果;所述分类模型根据所述第二源文本信息和第二传播结构特征,以及所述样本事件的谣言检测标签进行训练获取。In yet another aspect, the present invention also provides a non-transitory computer-readable storage medium on which a computer program is stored, and when the computer program is executed by a processor, the computer program is implemented to execute the early rumors based on the generated propagation structure features provided by the above methods A detection method, the method comprising: acquiring a first source text of an event to be detected; inputting the first source text into a conditional variational auto-encoder, and obtaining a conditional variational auto-encoder of the to-be-detected event generated by the encoder The first propagation structural feature; wherein, the conditional variational autoencoder is obtained by training according to the second source text of the sample event and the second propagation structural feature; the second propagation structural feature is based on the graph convolutional neural network model. The propagation structure diagram of the event is obtained by feature extraction; the propagation structure diagram is constructed and obtained according to the second source text of the sample event and the response tweet corresponding to the second source text; the first source text and the first source text are obtained. The propagation structure feature is input into the classification model, and the rumor detection result of the to-be-detected event output by the classification model is obtained; the classification model is based on the second source text information and the second propagation structure feature, and the sample event. The rumor detection label is acquired for training.
以上所描述的装置实施例仅仅是示意性的,其中所述作为分离部件说明的单元可以是或者也可以不是物理上分开的,作为单元显示的部件可以是或者也可以不是物理单元,即可以位于一个地方,或者也可以分布到多个网络单元上。可以根据实际的需要选择其中的部分或者全部模块来实现本实施例方案的目的。本领域普通技术人员在不付出创造性的劳动的情况下,即可以理解并实施。The device embodiments described above are only illustrative, wherein the units described as separate components may or may not be physically separated, and the components shown as units may or may not be physical units, that is, they may be located in One place, or it can be distributed over multiple network elements. Some or all of the modules may be selected according to actual needs to achieve the purpose of the solution in this embodiment. Those of ordinary skill in the art can understand and implement it without creative effort.
通过以上的实施方式的描述,本领域的技术人员可以清楚地了解到各实施方式可借助软件加必需的通用硬件平台的方式来实现,当然也可以通过硬件。基于这样的理解,上述技术方案本质上或者说对现有技术做出贡献的部分可以以软件产品的形式体现出来,该计算机软件产品可以存储在计算机可读存储介质中,如ROM/RAM、磁碟、光盘等,包括若干指令用以使得一台计算机设备(可以是个人计算机,服务器,或者网络设备等)执行各个实施例或者实施例的某些部分所述的方法。From the description of the above embodiments, those skilled in the art can clearly understand that each embodiment can be implemented by means of software plus a necessary general hardware platform, and certainly can also be implemented by hardware. Based on this understanding, the above-mentioned technical solutions can be embodied in the form of software products in essence or the parts that make contributions to the prior art, and the computer software products can be stored in computer-readable storage media, such as ROM/RAM, magnetic A disc, an optical disc, etc., includes several instructions for causing a computer device (which may be a personal computer, a server, or a network device, etc.) to perform the methods described in various embodiments or some parts of the embodiments.
最后应说明的是:以上实施例仅用以说明本发明的技术方案,而非对其限制;尽管参照前述实施例对本发明进行了详细的说明,本领域的普通技术人员应当理解:其依然可以对前述各实施例所记载的技术方案进行修改,或者对其中部分技术特征进行等同替换;而这些修改或者替换,并不使相应技术方案的本质脱离本发明各实施例技术方案的精神和范围。Finally, it should be noted that the above embodiments are only used to illustrate the technical solutions of the present invention, but not to limit them; although the present invention has been described in detail with reference to the foregoing embodiments, those of ordinary skill in the art should understand that it can still be The technical solutions described in the foregoing embodiments are modified, or some technical features thereof are equivalently replaced; and these modifications or replacements do not make the essence of the corresponding technical solutions deviate from the spirit and scope of the technical solutions of the embodiments of the present invention.
Claims (10)
Priority Applications (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN202111574716.9A CN114357160B (en) | 2021-12-21 | 2021-12-21 | Early rumor detection method and device based on generated propagation structural features |
Applications Claiming Priority (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN202111574716.9A CN114357160B (en) | 2021-12-21 | 2021-12-21 | Early rumor detection method and device based on generated propagation structural features |
Publications (2)
Publication Number | Publication Date |
---|---|
CN114357160A true CN114357160A (en) | 2022-04-15 |
CN114357160B CN114357160B (en) | 2024-07-23 |
Family
ID=81100561
Family Applications (1)
Application Number | Title | Priority Date | Filing Date |
---|---|---|---|
CN202111574716.9A Active CN114357160B (en) | 2021-12-21 | 2021-12-21 | Early rumor detection method and device based on generated propagation structural features |
Country Status (1)
Country | Link |
---|---|
CN (1) | CN114357160B (en) |
Cited By (2)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN117573865A (en) * | 2023-10-19 | 2024-02-20 | 南昌大学 | A rumor fuzzy detection method based on interpretable adaptive learning |
CN119195278A (en) * | 2024-11-26 | 2024-12-27 | 宁波东海集团有限公司 | Independent metering and zoned pressure regulation control system for water supply network based on Monte Carlo simulation |
Citations (2)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
WO2021042503A1 (en) * | 2019-09-06 | 2021-03-11 | 平安科技(深圳)有限公司 | Information classification extraction method, apparatus, computer device and storage medium |
CN113742604A (en) * | 2021-08-24 | 2021-12-03 | 三峡大学 | Rumor detection method and device, electronic equipment and storage medium |
-
2021
- 2021-12-21 CN CN202111574716.9A patent/CN114357160B/en active Active
Patent Citations (2)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
WO2021042503A1 (en) * | 2019-09-06 | 2021-03-11 | 平安科技(深圳)有限公司 | Information classification extraction method, apparatus, computer device and storage medium |
CN113742604A (en) * | 2021-08-24 | 2021-12-03 | 三峡大学 | Rumor detection method and device, electronic equipment and storage medium |
Non-Patent Citations (1)
Title |
---|
琚心怡;: "基于深层双向Transformer编码器的早期谣言检测", 信息通信, no. 05, 15 May 2020 (2020-05-15) * |
Cited By (2)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN117573865A (en) * | 2023-10-19 | 2024-02-20 | 南昌大学 | A rumor fuzzy detection method based on interpretable adaptive learning |
CN119195278A (en) * | 2024-11-26 | 2024-12-27 | 宁波东海集团有限公司 | Independent metering and zoned pressure regulation control system for water supply network based on Monte Carlo simulation |
Also Published As
Publication number | Publication date |
---|---|
CN114357160B (en) | 2024-07-23 |
Similar Documents
Publication | Publication Date | Title |
---|---|---|
CN113641820B (en) | Perspective-level text emotion classification method and system based on graph convolutional neural network | |
CN113239186B (en) | Graph convolution network relation extraction method based on multi-dependency relation representation mechanism | |
CN107609009B (en) | Text emotion analysis method and device, storage medium and computer equipment | |
CN111581983A (en) | Method for predicting social concern hotspots in network public opinion events based on group analysis | |
CN110263324A (en) | Text handling method, model training method and device | |
CN111639252A (en) | False news identification method based on news-comment relevance analysis | |
CN105279495A (en) | Video description method based on deep learning and text summarization | |
CN109947918A (en) | Semantic analysis method for intelligent customer service dialogue scenarios | |
CN113434683B (en) | Text classification method, device, medium and electronic equipment | |
CN108399241A (en) | A kind of emerging much-talked-about topic detecting system based on multiclass feature fusion | |
CN116579347A (en) | Comment text emotion analysis method, system, equipment and medium based on dynamic semantic feature fusion | |
CN114722833B (en) | A semantic classification method and device | |
CN111859979A (en) | Sarcastic text collaborative recognition method, apparatus, device, and computer-readable medium | |
CN113449204A (en) | Social event classification method and device based on local aggregation graph attention network | |
CN114357160A (en) | Early rumor detection method and device based on generation propagation structure characteristics | |
CN116245110A (en) | Multi-dimensional information fusion user standing detection method based on graph attention network | |
CN115952343A (en) | Social robot detection method based on multi-relation graph convolutional network | |
CN114692605A (en) | A method and device for generating keywords by merging syntactic structure information | |
Yang | [Retracted] Application of LSTM Neural Network Technology Embedded in English Intelligent Translation | |
CN110245292B (en) | Natural language relation extraction method based on neural network noise filtering characteristics | |
CN113761875B (en) | Event extraction method and device, electronic equipment and storage medium | |
CN112861474B (en) | Information labeling method, device, equipment and computer readable storage medium | |
CN109977194A (en) | Text similarity computing method, system, equipment and medium based on unsupervised learning | |
Ermatita et al. | Sentiment Analysis of COVID-19 using Multimodal Fusion Neural Networks. | |
CN113919338B (en) | Method and device for processing text data |
Legal Events
Date | Code | Title | Description |
---|---|---|---|
PB01 | Publication | ||
PB01 | Publication | ||
SE01 | Entry into force of request for substantive examination | ||
SE01 | Entry into force of request for substantive examination | ||
GR01 | Patent grant | ||
GR01 | Patent grant |