CN111966786A - Microblog rumor detection method - Google Patents

Microblog rumor detection method Download PDF

Info

Publication number
CN111966786A
CN111966786A CN202010757089.1A CN202010757089A CN111966786A CN 111966786 A CN111966786 A CN 111966786A CN 202010757089 A CN202010757089 A CN 202010757089A CN 111966786 A CN111966786 A CN 111966786A
Authority
CN
China
Prior art keywords
microblog
model
training
text
detection method
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Granted
Application number
CN202010757089.1A
Other languages
Chinese (zh)
Other versions
CN111966786B (en
Inventor
宋玉蓉
潘德宇
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Nanjing University of Posts and Telecommunications
Original Assignee
Nanjing University of Posts and Telecommunications
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Nanjing University of Posts and Telecommunications filed Critical Nanjing University of Posts and Telecommunications
Priority to CN202010757089.1A priority Critical patent/CN111966786B/en
Publication of CN111966786A publication Critical patent/CN111966786A/en
Application granted granted Critical
Publication of CN111966786B publication Critical patent/CN111966786B/en
Active legal-status Critical Current
Anticipated expiration legal-status Critical

Links

Images

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F16/00Information retrieval; Database structures therefor; File system structures therefor
    • G06F16/30Information retrieval; Database structures therefor; File system structures therefor of unstructured textual data
    • G06F16/35Clustering; Classification
    • G06F16/353Clustering; Classification into predefined classes
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F16/00Information retrieval; Database structures therefor; File system structures therefor
    • G06F16/30Information retrieval; Database structures therefor; File system structures therefor of unstructured textual data
    • G06F16/33Querying
    • G06F16/3331Query processing
    • G06F16/334Query execution
    • G06F16/3344Query execution using natural language analysis
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F18/00Pattern recognition
    • G06F18/20Analysing
    • G06F18/21Design or setup of recognition systems or techniques; Extraction of features in feature space; Blind source separation
    • G06F18/214Generating training patterns; Bootstrap methods, e.g. bagging or boosting
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F40/00Handling natural language data
    • G06F40/20Natural language analysis
    • G06F40/205Parsing
    • G06F40/211Syntactic parsing, e.g. based on context-free grammar [CFG] or unification grammars
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06NCOMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
    • G06N3/00Computing arrangements based on biological models
    • G06N3/02Neural networks
    • G06N3/04Architecture, e.g. interconnection topology
    • G06N3/045Combinations of networks
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06QINFORMATION AND COMMUNICATION TECHNOLOGY [ICT] SPECIALLY ADAPTED FOR ADMINISTRATIVE, COMMERCIAL, FINANCIAL, MANAGERIAL OR SUPERVISORY PURPOSES; SYSTEMS OR METHODS SPECIALLY ADAPTED FOR ADMINISTRATIVE, COMMERCIAL, FINANCIAL, MANAGERIAL OR SUPERVISORY PURPOSES, NOT OTHERWISE PROVIDED FOR
    • G06Q50/00Information and communication technology [ICT] specially adapted for implementation of business processes of specific business sectors, e.g. utilities or tourism
    • G06Q50/01Social networking

Landscapes

  • Engineering & Computer Science (AREA)
  • Theoretical Computer Science (AREA)
  • Physics & Mathematics (AREA)
  • General Physics & Mathematics (AREA)
  • General Engineering & Computer Science (AREA)
  • Data Mining & Analysis (AREA)
  • Artificial Intelligence (AREA)
  • General Health & Medical Sciences (AREA)
  • Health & Medical Sciences (AREA)
  • Computational Linguistics (AREA)
  • Business, Economics & Management (AREA)
  • Evolutionary Computation (AREA)
  • Computing Systems (AREA)
  • Databases & Information Systems (AREA)
  • Life Sciences & Earth Sciences (AREA)
  • Software Systems (AREA)
  • Human Resources & Organizations (AREA)
  • Molecular Biology (AREA)
  • Bioinformatics & Cheminformatics (AREA)
  • Bioinformatics & Computational Biology (AREA)
  • Computer Vision & Pattern Recognition (AREA)
  • Evolutionary Biology (AREA)
  • Biophysics (AREA)
  • Economics (AREA)
  • Mathematical Physics (AREA)
  • Marketing (AREA)
  • Primary Health Care (AREA)
  • Strategic Management (AREA)
  • Tourism & Hospitality (AREA)
  • General Business, Economics & Management (AREA)
  • Biomedical Technology (AREA)
  • Audiology, Speech & Language Pathology (AREA)
  • Machine Translation (AREA)

Abstract

The invention provides a microblog rumor detection method, which considers the attention mechanism and comprises the following steps: collecting microblog events and corresponding comment data sets as sample data; preprocessing the sample data, and respectively extracting text contents of the original microblog and the comment; pre-training the text by adopting a BERT pre-training model, and generating a sentence vector with a fixed length for each sentence of the text; constructing a dictionary, and extracting an original microblog and a plurality of corresponding comments to form a microblog event vector matrix; training the vector matrix by adopting a deep learning method Text CNN-Attention, and constructing a multi-level training model; and carrying out classification detection on the vector matrix according to the multi-level training model to obtain a rumor detection result corresponding to the social network data. Compared with the traditional rumor detection method, the method improves the accuracy.

Description

Microblog rumor detection method
Technical Field
The invention belongs to the technical field of natural language processing, and particularly relates to a microblog rumor detection method.
Background
Rumors generally refer to statements or descriptions that are not verified, often in relation to an event. With the rapid development of social media, rumors can rapidly spread through social media at the rate of nuclear fission. Microblogs, namely micro blogs, which are one of social media, are an emerging class of open internet social services in the web2.0 era. Users can update their microblogs with short characters at any time and any place by means of the internet or the propagation media such as mobile phones, and share information with more users. Compared with the traditional blog, the microblog shows the following propagation characteristics: instant blog sharing, innovative interactive mode and vivid on-site deduction. It shows on the propagation effect: accumulated human qi, economy and quickness. However, in the diversified distribution, the distribution and diffusion of rumors on the microblogs are promoted by the free distribution content, the civilized distributor, the wide audience and the diversified distribution channel. The propagation of the rumors on the microblog is mostly performed through the comment and forwarding of information between users, and if the false rumors are widely propagated, certain negative effects are generated on the society.
Approaches to rumor detection generally fall into two categories: one is a method for machine learning based on traditional artificial feature extraction, which is characterized by mining features from factors such as rumor content, rumor users, rumor propagation, emotion polarity and user influence and carrying out rumor detection through classifiers such as Bayes and decision trees; the other type is based on a deep learning method, potential features in a text are learned by constructing a neural network and matching with a nonlinear function, feature representation learning is carried out on a text sequence through neural network models such as CNN and RNN, and finally rumor detection is carried out through a nonlinear classifier. At present, word2vec word vectors or ELMo are mostly adopted in a pre-training model in the research of constructing a neural network for rumor detection through deep learning, but the word vectors obtained in the former cannot solve the problem of polysemous words, so that each word obtained through training can only correspond to one vector to represent, the latter can dynamically adjust word embedding according to the context, but LSTM is used for feature extraction instead of transform, and ELMo uses context vector splicing as a current vector, so that the fused vector features are poor. The training model mostly adopts a CNN or RNN network, but although the CNN network can extract sentence meaning characteristics, context and word order characteristics are ignored, and the CNN network can not distinguish the characteristics with obvious influence when the CNN network splices the pooled characteristics after full connection operation. The invention provides a new rumor detection model considering an attention mechanism aiming at the existing challenges, selects a BERT pre-training model capable of extracting potential features of a text in the aspect of text preprocessing, introduces the attention mechanism into a CNN model on the training model, can automatically allocate different weights according to different event influence forces, and finally uses a Softmax classifier to detect the rumor.
In view of the above, a method for detecting microblog rumors is needed to solve the above problems.
Disclosure of Invention
The invention aims to provide a microblog rumor detection method with high accuracy.
In order to achieve the above object, the present invention provides a microblog rumor detection method, comprising the following steps:
A. collecting microblog events and corresponding comment data sets as sample data;
B. preprocessing sample data, and respectively extracting text contents of an original microblog and a comment;
C. pre-training the text by adopting a BERT pre-training model, and generating a sentence vector with a fixed length for each sentence of the text;
D. constructing a dictionary, and extracting an original microblog and a plurality of corresponding comments to form a microblog event vector matrix;
E. training the vector matrix by adopting a deep learning method Text CNN-Attention, and constructing a multi-level training model;
F. and carrying out classification detection on the vector matrix according to the multi-level training model to obtain a rumor detection result corresponding to the social network data.
As a further improvement of the invention, the sample data comprises rumor sample data and non-rumor sample data.
As a further improvement of the present invention, in the step B, noise in the json file is removed by using a regular expression.
As a further improvement of the present invention, the whole text after pre-training is processed according to the training data and the test data according to the following steps of 4: the ratio of 1 is used for subsequent model processing.
As a further improvement of the present invention, the pre-trained BERT model and code enable the embedding of word vectors.
As a further improvement of the invention, the BERT model is used as a word vector model, can fully describe character level, word level and sentence level so as to lead the relation characteristics between sentences, and gradually moves the NLP task to the pre-training generated sentence vector.
As a further improvement of the invention, the BERT model proposes a pre-training target: a Masked Language Model (MLM) overcomes the traditional unidirectional limitation, and an MLM target allows the representation of contexts fusing the left side and the right side, so that a deep bidirectional Transformer can be pre-trained.
As a further improvement of the present invention, the BERT model introduces a "Next sentence prediction" task, which can be used to train the representation of the text pairs with MLM.
As a further improvement of the invention, the BERT model predicts whether texts at two ends of the input BERT are continuous or not by using sentence-level negative sampling; during the training process, the second segment of the input model will be randomly selected from all the texts with a probability of 50%, and the remaining 50% will select the subsequent text of the first segment.
As a further improvement of the invention, the constructed multi-level training model consists of a Text CNN part and an attention mechanism part; the Text CNN model performs convolution operation on a vector matrix to be detected by using three convolution kernels with convolution sizes of 3,4 and 5 respectively to obtain different feature representations of different convolution kernels based on the vector matrix, only one maximum feature is generated by each convolution kernel corresponding to an input matrix through pooling operation, and the feature representations obtained by the convolution kernels with different sizes are connected through full-connection operation; the attention mechanism gives different weights to the characteristics generated after full connection according to different output influences of each characteristic, so that the characteristics with large influence have larger influence when rumor detection is carried out.
The invention has the following beneficial effects: according to the microblog rumor detection method, a BERT pre-training model is applied in a text preprocessing stage, dependence of a longer distance can be captured more efficiently by using a Transformer, deep context information can be mined, and the sentence vectors obtained through pre-training have better potential characteristics; the training model introduces an attention mechanism, different weights are given to different characteristics according to influence of the characteristics, so that the characteristics which have larger influence on an output result are given more weights, more important influence is generated on the result, rumor detection is facilitated, and the accuracy of detection is improved.
Drawings
In order to more clearly illustrate the technical solutions of the embodiments of the present invention, the drawings used in the description of the embodiments will be briefly introduced below, and it is obvious that the drawings in the following description are only some embodiments of the present invention.
Wherein:
FIG. 1 is a general flow chart for rumor detection;
FIG. 2 is a schematic diagram of the structure of the BERT model;
FIG. 3 is a flow chart of a microblog rumor detection method in consideration of attention mechanism according to the present invention;
FIG. 4 is a schematic structural diagram of a neural network Text CNN model;
FIG. 5 is a schematic diagram of a pull-in mechanism;
FIG. 6 is a MATLAB simulation chart of an experimental result of the embodiment;
FIG. 7 is a MATLAB simulation chart of the results of the second experiment in example II.
Detailed Description
In order to make the objects, technical solutions and advantages of the present invention more apparent, the present invention will be described in detail with reference to the accompanying drawings and specific embodiments.
The invention discloses a microblog rumor detection method, which considers the attention mechanism, and the overall flow of the method is shown as a figure 1, and mainly comprises the following steps:
step 1, collecting microblog events and corresponding comment data as sample data;
sample data here includes rumor sample data and non-rumor sample data;
the rumor sample data label is "1" and the non-rumor sample data label is "0".
Step 2, preprocessing sample data, and extracting corresponding text content by using a regular expression;
the main purpose of preprocessing is to remove noise in the text, including non-chinese characters, punctuation, stop words, etc. The sample data is stored in a json format file; the json file stores data in the form of "key value pairs," with the data name as the key in the json file, and the crawled data value as the value in the json file, e.g., "text: breakfast". No association is allowed to avoid crossing provinces. ";
all data of a single microblog original event are a json file, and all data of all comments of the single event are the json file;
removing noise in the json file by using a regular expression, and correspondingly extracting and storing text contents of the microblog original events and all comments of the microblog original events;
all texts were as per training data and test data 4: the ratio of 1 is used for subsequent model processing.
Step 3, downloading a BERT pre-training model, and converting the text into corresponding sentence vectors;
the BERT model can be obtained by downloading a BERT pre-training model of Google, the pre-trained Chinese BERT model and the code are both from the BERT of Google Research, word vector embedding can be realized, and a basic structure model is shown in FIG. 2;
BERT: a method for improving architecture-based fine-tuning is disclosed, which is called bi-directional encoding Representation of Bidirectional Encoder repetition from transforms. The BERT model is used as a word vector model, can fully describe character level, word level and sentence level so as to lead the relation characteristics between sentences, and aims to gradually move a downstream NLP task to a pre-training generated sentence vector;
the BERT model includes the following features: the BERT model proposes a new pre-training target: a Masked Language Model (MLM) overcomes the traditional unidirectional limitation, and an MLM target allows the representation of contexts on the left side and the right side of fusion, so that a deep bidirectional Transformer can be pre-trained; the BERT model introduces a task of 'next sentence prediction', and can train the representation of a text pair together with the MLM; the BERT model applies sentence-level negative sampling, and for sentence-level continuity prediction, whether texts at two ends of input BERT are continuous or not is predicted. During the training process, the second segment of the input model will be randomly selected from all the texts with a probability of 50%, and the remaining 50% will select the subsequent text of the first segment.
Step 4, constructing a corresponding input matrix according to the selected sentence length and the sentence vector dimension;
a BERT base model is adopted, the number of network layers is 12, and the dimensionality of a trained sentence vector is 768 dimensions;
and selecting a fixed sentence vector from the microblog original text and the sentence vectors corresponding to all the comments to form an input matrix.
And 5, constructing a Text CNN-Attention multi-level training model by adopting a deep learning method.
Fig. 3 is a detailed flowchart of a rumor detection method considering attention mechanism proposed in the present invention, where the first layer of the model is an input layer, and mainly consists of sentence vectors generated by inputting a BERT pre-training model, and the whole microblog event is a corresponding random number comment added from an original microblog; next, convolution layers are followed, where the sentence vectors of the input layer are learned by performing convolution using filters of different sizes, respectively, and feature representations based on the different filters can be obtained. Splicing the features belonging to the same window to obtain a feature vector of the window, and obtaining a feature sequence according to different sequences; the third layer introduces an attention mechanism into the feature sequence, each feature can be given different weights according to different attention distribution, so that features with larger influence on the output result are given more weights, the result is influenced more importantly, and finally the output is transmitted to a classifier to judge whether the event rumor is not.
FIG. 4 shows a structural description of the Text CNN model, and the detailed process is as follows:
(1) for all rumor and non-rumor events in the dataset and their corresponding comments, a sentence vector was trained by the BERT preprocessing model. For each microblog event, selecting a plurality of corresponding comments under the event and the original microblog as input and transmitting the input and transmission layers, wherein the input layer is an m x n matrix, m is the total number of the selected events, and n is the length of a single sentence vector.
(2) The method comprises the steps of performing convolution by using three filters with different sizes to respectively obtain characteristics corresponding to different filters, enabling the filters to continuously slide in an m multiplied by n input matrix, setting the length of each filter as k and the width of each filter as n as the width of the input matrix in order to extract the characteristics conveniently, and expressing the characteristics extracted by one filter as h epsilon Rk×nThen the feature obtained for any one u of m is:
wu=(xu,xu+1,…,xu-k+1)
after the convolution is finished on the input matrix, a feature list c is generated, and the features generated by each convolution correspond to c: c. Cu=f(wuH + b), where f is the ReLU function and b is the bias term.
(3) When the filter slides over an input with a length of m, the length of the feature list is (m-k +1), and if q filters exist, q feature lists are generated, and q is spliced to obtain a matrix:
W1=[c1,c2,…,cq]
cqrepresenting the list of features generated by the qth filter. And a total of three filters of different sizes are used, the total matrix generated finally is:
W=[W1,W2,W3]=[c1,c2,…,cq,cq+1,…,c2q,c2q+1,…,c3q]
(4) and (3) performing maximum pooling operation on the characteristics obtained by each filter to obtain output characteristics, and fully connecting the output characteristics of different filters to obtain CNN output:
W'=[c11,c22,…,ckk]。
(5) the attention layer is adopted to perform weighted summation on the output of the CNN layer to obtain a hidden layer representation of a microblog sequence, and a structure drawing introducing an attention mechanism is shown in fig. 5. Different weights can be given to the hidden state sequence W' output by the CNN by introducing an attention mechanism to the CNN, so that the microblog sequence information can be utilized by the model with emphasis when the representation of the microblog sequence is learned. The attention layer will output c of the CNN networkkkAs input, a representation v corresponding to the microblog sequence is outputkk
hi=tanh(WA*ckk+bA)
Figure BDA0002611935370000071
Figure BDA0002611935370000072
The composition matrix V ═ V11,v22,…,vkk],WAAs a weight matrix, bAIs an offset value, hiIs ckkHidden layer of (a)iIs hiAnd context hASimilarity of (v)iIs the output vector.
(6) And sending the output to a full connection layer, and obtaining the probability output of rumors and non-rumors through Softmax so as to achieve the purpose of judging rumor events.
And 6, training and testing the input matrix by using a multi-level training model to obtain a corresponding rumor detection result.
The first embodiment is as follows:
to demonstrate the effectiveness of the present invention, we selected a series of microblog platform based event data collated by Ma et al and used in the paper, the data set being raw information captured through the microblog API and all forwarding and replying to a given event, and also captured general subject posts that were not reported as rumors and collected a similar number of rumor events, detailed statistics are listed in the following table:
Figure BDA0002611935370000073
we fit all data to the training set and test set 4: 1, the specific division is as listed in the following table:
Figure BDA0002611935370000074
Figure BDA0002611935370000081
the evaluation indexes used for evaluating the effectiveness of the model are four values, namely accuracy, precision, recall and F1, and the predicted result and the actual result are as follows:
Figure BDA0002611935370000082
four baseline methods, SVM-TS, CNN-1, CNN-2, CNN-GRU, were used for comparison, and the detailed data for the effect of our method on rumor testing compared to the baseline method are shown in the following table, and the MATLAB simulation graph of the experimental results is shown in fig. 6:
Figure BDA0002611935370000083
it can be known from the table that the final accuracy of rumor detection performed by using a classifier in the conventional SVM-TS method is only 85.7%, the effect is not particularly excellent, and the final result of comparing the three models of GRU-1, GRU-2 and CNN-GRU shows that the accuracy is better represented by 95.7% because different potential features in the input can be extracted through a filter after a convolutional neural network is added into a training model, and the characteristics having a large influence on the output result are given more weights by taking the output of CNN as the input after an attention mechanism is introduced into the model, so that the more important influence on the result is beneficial to the rumor detection, and the result shows that the accuracy of the model reaches 96.8%, and the recall rate and the F1 value are also improved without errors.
Example two:
in order to prove the feasibility of the method, another microblog Data set CED _ Data set [23] is selected for testing, and sentence vectors obtained by using the same pre-training model are trained on different training models to obtain the accuracy for comparison. The data set contains 1538 rumor events and 1849 non-rumor events, we follow training and test set 4: 1, the experimental data are listed in the following table, and the MATLAB simulation graph of the experimental results is shown in fig. 7:
Figure BDA0002611935370000091
experimental results show that sentence vectors obtained through the BERT pre-training model are trained on different training models and still have deviation in the aspect of accuracy, but the deviation amplitude is small when different pre-training models are used before comparison. The accuracy of the SVM-TS is about 86.7% through experiments, then GRU-1, CNN-GRU and GRU-2 models are sequentially adopted, the best effect is the CNN-Attention model which is proposed by the inventor, the accuracy reaches 95.3%, and the effect which is reflected on the recall rate and the F1 value is the best effect of a plurality of models.
In conclusion, the model shows the best effect on two different data sets, the characteristic expression effect of the preprocessed sentence vectors can be greatly improved by using the BERT pre-training model, the potential characteristics in the text can be more effectively extracted by matching with the CNN model integrated with the attention mechanism, and the model has great significance on rumor detection tasks.
The microblog rumor event detection problem is explained mainly from two aspects of a pre-training model and a training model, the influence of the pre-training model on an experimental result is mainly explained, and a better effect can be achieved when part of downstream NLP tasks are transferred to the pre-training model; on the basis of a traditional Text CNN model, a new rumor detection model introducing an attention mechanism is provided on a training model, and input sentence vectors can be given different weights according to the influence degree of the input sentence vectors on the input rumor, so that the influence on predicting whether an event rumor occurs or not is positively influenced. The method has good rumor detection effect through experimental verification on a real microblog data set.
Although the present invention has been described in detail with reference to the preferred embodiments, it will be understood by those skilled in the art that various changes may be made and equivalents may be substituted for elements thereof without departing from the spirit and scope of the present invention.

Claims (10)

1. A microblog rumor detection method is characterized by comprising the following steps:
A. collecting microblog events and corresponding comment data sets as sample data;
B. preprocessing the sample data, and respectively extracting text contents of the original microblog and the comment;
C. pre-training the text by adopting a BERT pre-training model, and generating a sentence vector with a fixed length for each sentence of the text;
D. constructing a dictionary, and extracting an original microblog and a plurality of corresponding comments to form a microblog event vector matrix;
E. training the vector matrix by adopting a deep learning method Text CNN-Attention, and constructing a multi-level training model;
F. and carrying out classification detection on the vector matrix according to the multi-level training model to obtain a rumor detection result corresponding to the social network data.
2. The microblog rumor detection method of claim 1, wherein: the sample data includes rumor sample data and non-rumor sample data.
3. The microblog rumor detection method of claim 1, wherein: and B, removing the noise in the json file by using a regular expression.
4. The microblog rumor detection method of claim 3, wherein: and the pre-trained all texts are processed according to the training data and the test data according to the following steps of 4: the ratio of 1 is used for subsequent model processing.
5. The microblog rumor detection method of claim 4, wherein: the pre-trained BERT model and code enable the embedding of word vectors.
6. The microblog rumor detection method of claim 5, wherein: the BERT model is used as a word vector model, can fully describe character level, word level and sentence level so as to enable the relation characteristics between sentences, and gradually moves the NLP task to the pre-training generated sentence vector.
7. The microblog rumor detection method of claim 1, wherein: the BERT model proposes a pre-training target: a Masked Language Model (MLM) overcomes the traditional unidirectional limitation, and an MLM target allows the representation of contexts fusing the left side and the right side, so that a deep bidirectional Transformer can be pre-trained.
8. The microblog rumor detection method of claim 7, wherein: the BERT model introduces a "next sentence prediction" task that can be used to train the representation of text pairs with MLM.
9. The microblog rumor detection method of claim 8, wherein: the BERT model predicts whether texts at two ends of input BERT are continuous or not by using sentence-level negative sampling; during the training process, the second segment of the input model will be randomly selected from all the texts with a probability of 50%, and the remaining 50% will select the subsequent text of the first segment.
10. The microblog rumor detection method of claim 1, wherein: the constructed multi-level training model consists of a Text CNN and an attention mechanism; the Text CNN model performs convolution operation on a vector matrix to be detected by using three convolution kernels with convolution sizes of 3,4 and 5 respectively to obtain different feature representations of different convolution kernels based on the vector matrix, only one maximum feature is generated by each convolution kernel corresponding to an input matrix through pooling operation, and the feature representations obtained by the convolution kernels with different sizes are connected through full-connection operation; the attention mechanism gives different weights to the characteristics generated after full connection according to different output influences of each characteristic, so that the characteristics with large influence have larger influence when rumor detection is carried out.
CN202010757089.1A 2020-07-31 2020-07-31 Microblog rumor detection method Active CN111966786B (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN202010757089.1A CN111966786B (en) 2020-07-31 2020-07-31 Microblog rumor detection method

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN202010757089.1A CN111966786B (en) 2020-07-31 2020-07-31 Microblog rumor detection method

Publications (2)

Publication Number Publication Date
CN111966786A true CN111966786A (en) 2020-11-20
CN111966786B CN111966786B (en) 2022-10-25

Family

ID=73363172

Family Applications (1)

Application Number Title Priority Date Filing Date
CN202010757089.1A Active CN111966786B (en) 2020-07-31 2020-07-31 Microblog rumor detection method

Country Status (1)

Country Link
CN (1) CN111966786B (en)

Cited By (9)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN112560495A (en) * 2020-12-09 2021-03-26 新疆师范大学 Microblog rumor detection method based on emotion analysis
CN112818011A (en) * 2021-01-12 2021-05-18 南京邮电大学 Improved TextCNN and TextRNN rumor identification method
CN113127643A (en) * 2021-05-11 2021-07-16 江南大学 Deep learning rumor detection method integrating microblog themes and comments
CN113158075A (en) * 2021-03-30 2021-07-23 昆明理工大学 Comment-fused multitask joint rumor detection method
CN113204641A (en) * 2021-04-12 2021-08-03 武汉大学 Annealing attention rumor identification method and device based on user characteristics
CN113326437A (en) * 2021-06-22 2021-08-31 哈尔滨工程大学 Microblog early rumor detection method based on dual-engine network and DRQN
CN113377959A (en) * 2021-07-07 2021-09-10 江南大学 Few-sample social media rumor detection method based on meta learning and deep learning
CN113705099A (en) * 2021-05-09 2021-11-26 电子科技大学 Social platform rumor detection model construction method and detection method based on contrast learning
CN116401339A (en) * 2023-06-07 2023-07-07 北京百度网讯科技有限公司 Data processing method, device, electronic equipment, medium and program product

Citations (3)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN108280057A (en) * 2017-12-26 2018-07-13 厦门大学 A kind of microblogging rumour detection method based on BLSTM
CN111144131A (en) * 2019-12-25 2020-05-12 北京中科研究院 Network rumor detection method based on pre-training language model
CN111159338A (en) * 2019-12-23 2020-05-15 北京达佳互联信息技术有限公司 Malicious text detection method and device, electronic equipment and storage medium

Patent Citations (3)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN108280057A (en) * 2017-12-26 2018-07-13 厦门大学 A kind of microblogging rumour detection method based on BLSTM
CN111159338A (en) * 2019-12-23 2020-05-15 北京达佳互联信息技术有限公司 Malicious text detection method and device, electronic equipment and storage medium
CN111144131A (en) * 2019-12-25 2020-05-12 北京中科研究院 Network rumor detection method based on pre-training language model

Cited By (13)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN112560495A (en) * 2020-12-09 2021-03-26 新疆师范大学 Microblog rumor detection method based on emotion analysis
CN112560495B (en) * 2020-12-09 2024-03-15 新疆师范大学 Microblog rumor detection method based on emotion analysis
CN112818011A (en) * 2021-01-12 2021-05-18 南京邮电大学 Improved TextCNN and TextRNN rumor identification method
CN113158075A (en) * 2021-03-30 2021-07-23 昆明理工大学 Comment-fused multitask joint rumor detection method
CN113204641A (en) * 2021-04-12 2021-08-03 武汉大学 Annealing attention rumor identification method and device based on user characteristics
CN113705099B (en) * 2021-05-09 2023-06-13 电子科技大学 Social platform rumor detection model construction method and detection method based on contrast learning
CN113705099A (en) * 2021-05-09 2021-11-26 电子科技大学 Social platform rumor detection model construction method and detection method based on contrast learning
CN113127643A (en) * 2021-05-11 2021-07-16 江南大学 Deep learning rumor detection method integrating microblog themes and comments
CN113326437A (en) * 2021-06-22 2021-08-31 哈尔滨工程大学 Microblog early rumor detection method based on dual-engine network and DRQN
CN113326437B (en) * 2021-06-22 2022-06-21 哈尔滨工程大学 Microblog early rumor detection method based on dual-engine network and DRQN
CN113377959A (en) * 2021-07-07 2021-09-10 江南大学 Few-sample social media rumor detection method based on meta learning and deep learning
CN113377959B (en) * 2021-07-07 2022-12-09 江南大学 Few-sample social media rumor detection method based on meta learning and deep learning
CN116401339A (en) * 2023-06-07 2023-07-07 北京百度网讯科技有限公司 Data processing method, device, electronic equipment, medium and program product

Also Published As

Publication number Publication date
CN111966786B (en) 2022-10-25

Similar Documents

Publication Publication Date Title
CN111966786B (en) Microblog rumor detection method
CN111144131B (en) Network rumor detection method based on pre-training language model
CN110119765A (en) A kind of keyword extracting method based on Seq2seq frame
CN113051916B (en) Interactive microblog text emotion mining method based on emotion offset perception in social network
CN111310476A (en) Public opinion monitoring method and system using aspect-based emotion analysis method
CN105183717A (en) OSN user emotion analysis method based on random forest and user relationship
CN109815485A (en) A kind of method, apparatus and storage medium of the identification of microblogging short text feeling polarities
CN113627550A (en) Image-text emotion analysis method based on multi-mode fusion
CN112270187A (en) Bert-LSTM-based rumor detection model
CN110472245A (en) A kind of multiple labeling emotional intensity prediction technique based on stratification convolutional neural networks
Zhang et al. Exploring deep recurrent convolution neural networks for subjectivity classification
CN115329085A (en) Social robot classification method and system
Saha et al. Sentiment Classification in Bengali News Comments using a hybrid approach with Glove
CN116757218A (en) Short text event coreference resolution method based on sentence relation prediction
CN116644760A (en) Dialogue text emotion analysis method based on Bert model and double-channel model
CN113220964B (en) Viewpoint mining method based on short text in network message field
Shan Social Network Text Sentiment Analysis Method Based on CNN‐BiGRU in Big Data Environment
Patil et al. Hate speech detection using deep learning and text analysis
Kavatagi et al. A context aware embedding for the detection of hate speech in social media networks
CN113486143A (en) User portrait generation method based on multi-level text representation and model fusion
CN111859955A (en) Public opinion data analysis model based on deep learning
AL-Sammarraie et al. Image Captions and Hashtags Generation Using Deep Learning Approach
Islam et al. Bengali social media post sentiment analysis using deep learning and BERT model
CN111221941A (en) Social media rumor identification algorithm based on text content and literary style
Al Azhar et al. Identifying Author in Bengali Literature by Bi-LSTM with Attention Mechanism

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination
GR01 Patent grant
GR01 Patent grant