CN115935953A - False news detection method and device, electronic equipment and storage medium - Google Patents

False news detection method and device, electronic equipment and storage medium Download PDF

Info

Publication number
CN115935953A
CN115935953A CN202310036408.3A CN202310036408A CN115935953A CN 115935953 A CN115935953 A CN 115935953A CN 202310036408 A CN202310036408 A CN 202310036408A CN 115935953 A CN115935953 A CN 115935953A
Authority
CN
China
Prior art keywords
news
event
false
data
news data
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Pending
Application number
CN202310036408.3A
Other languages
Chinese (zh)
Inventor
王欢
魏小梅
张永成
沙瀛
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Huazhong Agricultural University
Original Assignee
Huazhong Agricultural University
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Huazhong Agricultural University filed Critical Huazhong Agricultural University
Priority to CN202310036408.3A priority Critical patent/CN115935953A/en
Publication of CN115935953A publication Critical patent/CN115935953A/en
Pending legal-status Critical Current

Links

Images

Landscapes

  • Information Retrieval, Db Structures And Fs Structures Therefor (AREA)

Abstract

The invention provides a false news detection method, a false news detection device, electronic equipment and a storage medium, wherein the method comprises the following steps: acquiring an event from news data based on text features corresponding to the news data, and judging whether the news data is first type news or second type news; when the news data is of a first type, searching a historical event corresponding to the news data based on a first event discriminator to judge whether the news data is false news; when the news data is the second type news, judging whether the news data is false news or not based on the second event discriminator; the second event discriminator comprises a false event detector and an event feature extractor; the false event detector is used for identifying text features to obtain the probability that the corresponding event is false time; the event feature extractor is used for classifying the news data based on the probability and determining whether the news data is false news. The invention can solve the technical problem that false news cannot be quickly and efficiently identified in the prior art.

Description

False news detection method and device, electronic equipment and storage medium
Technical Field
The invention relates to the technical field of network information transmission, in particular to a false news detection method and device, electronic equipment and a storage medium.
Background
The popularity of social networking services has led to a rapid proliferation of their user population, and at the same time, has brought about a tremendous increase in the amount of information. Social networking online platforms allow their users to freely post information on their platforms. These huge user groups release huge amounts of information on a daily basis, but are also flooded with a lot of unrealistic or otherwise spurious information. False information can be rapidly spread by using a social network platform, so that how to rapidly and efficiently identify false news is a very significant topic.
Disclosure of Invention
In view of the above, there is a need to provide a method, an apparatus, an electronic device and a storage medium for detecting false news, so as to solve the technical problem in the prior art that false news cannot be identified quickly and efficiently.
In order to achieve the above object, the present invention provides a false news detection method, including:
acquiring news data to be detected, and extracting characteristics of the news data to obtain corresponding text characteristics;
acquiring an event from the news data based on the text feature, and judging whether the news data is first type news or second type news based on the event;
under the condition that the news data is determined to be a first type of news, inputting the news data into a first event discriminator so as to retrieve a historical event corresponding to the news data based on the first event discriminator and judge whether the news data is false news;
under the condition that the news data is determined to be second type news, inputting the news data into a second event discriminator so as to judge whether the news data is false news or not based on the second event discriminator;
wherein the second event discriminator comprises a false event detector and an event feature extractor; the false event detector is used for identifying the text features to obtain the probability that the corresponding event is false time; the event feature extractor is used for classifying the news data based on the probability and determining whether the news data is false news.
Further, the false event detector is based on transfer learning of false events against the generating network.
Further, the performing feature extraction on the news data to obtain corresponding text features includes:
performing word segmentation processing and part-of-speech tagging on the news data to obtain tagged words;
learning based on a pre-trained word embedding model to obtain a word vector corresponding to the marked word;
performing dimensionality reduction processing on sentence vectors corresponding to the news data based on the word vectors to obtain word embedded vectors;
and obtaining the text features based on the word embedding vector.
Further, the obtaining the text feature based on the word embedding vector includes:
inputting the word embedding vector into a convolution filter to obtain a characteristic vector corresponding to each sentence in the news data;
and performing maximum pooling on the feature vectors to obtain the text features.
Further, the acquiring an event from the news data based on the text feature and determining that the news data is a first type of news or a second type of news based on the event includes:
searching key words from the news data based on the text features;
retrieving a similar news set from a keyword news inverted index table based on the keywords;
determining cosine similarity of different news in the similar news set, and clustering the news in the similar news set based on the cosine similarity of the different news in the similar news set to obtain an event set corresponding to the keyword;
judging whether the news data is first type news or second type news based on the event set;
the keyword news inverted index table comprises a plurality of preset keywords and a news inverted table corresponding to each preset keyword.
Further, the determining news data as a first type news or a second type news based on the event set includes:
entropy filtering is carried out on the event set to obtain filtered events;
and carrying out LCS algorithm filtering on the filtered event, and judging that the filtered event is first type news or second type news based on the LCS algorithm filtering result.
Further, the retrieving similar news sets from the keyword news inverted index table based on the keywords comprises:
determining cosine similarity corresponding to each piece of news corresponding to the keywords in the keyword news inverted index table;
and constructing the similar news set based on the news with cosine similarity larger than a preset threshold.
The invention also provides a false news detection device, comprising:
the extraction module is used for acquiring news data to be detected and extracting the characteristics of the news data to obtain corresponding text characteristics;
the first judging module is used for acquiring an event from the news data based on the text characteristic and judging whether the news data is first type news or second type news based on the event;
the second judgment module is used for inputting the news data into a first event discriminator under the condition that the news data is determined to be the first type of news, so as to retrieve the historical events corresponding to the news data based on the first event discriminator and judge whether the news data is false news;
the third judging module is used for inputting the news data into a second event discriminator under the condition that the news data is determined to be the second type news, so as to judge whether the news data is false news or not based on the second event discriminator;
wherein the second event discriminator comprises a false event detector and an event feature extractor; the false event detector is used for identifying the text features to obtain the probability that the corresponding event is false time; the event feature extractor is used for classifying the news data based on the probability and determining whether the news data is false news.
The present invention also provides an electronic device comprising a memory and a processor, wherein,
the memory is used for storing programs;
the processor, coupled to the memory, is configured to execute the program stored in the memory to implement the steps of the false news detection method according to any one of the above items.
The invention also provides a non-transitory computer readable storage medium having stored thereon a computer program which, when executed by a processor, implements a false news detection method as claimed in any one of the above.
The beneficial effects of adopting the above-mentioned implementation mode are: according to the false news detection method, the false news detection device, the electronic equipment and the storage medium, the event is obtained from the news data through the text characteristics corresponding to the news data to be detected, and the news data is judged to be the first type news or the second type news based on the event; under the condition that the news data are determined to be the first type news, retrieving a historical event corresponding to the news data based on a first event discriminator, and judging whether the news data are false news or not; judging whether the news data is false news or not based on a second event discriminator under the condition that the news data is determined to be second type news; the second event discriminator comprises a false event detector and an event feature extractor; the false event detector is used for identifying text features to obtain the probability that the corresponding event is false time; the event feature extractor is used for classifying the news data based on the probability and determining whether the news data is false news. The method divides news data into the first type news and the second type news, respectively inputs the first type news and the second type news into different event discriminators to discriminate the false news, and realizes the quick and efficient detection of the false news on the basis of the event correlation among the different news.
Drawings
In order to more clearly illustrate the technical solutions in the embodiments of the present invention, the drawings required to be used in the description of the embodiments are briefly introduced below, and it is obvious that the drawings in the description below are only some embodiments of the present invention, and it is obvious for those skilled in the art that other drawings can be obtained based on these drawings without creative efforts.
FIG. 1 is a flowchart illustrating a false news detection method according to an embodiment of the present invention;
FIG. 2 is a schematic diagram of a keyword news inverted index provided by the present invention;
FIG. 3 is a schematic diagram of an inverted index of keyword event IDs provided by the present invention;
FIG. 4 is a schematic structural diagram of a false news detection apparatus provided in the present invention;
fig. 5 is a schematic structural diagram of an embodiment of an electronic device provided in the present invention.
Detailed Description
The technical solutions in the embodiments of the present invention will be clearly and completely described below with reference to the drawings in the embodiments of the present invention. It should be apparent that the described embodiments are only some embodiments of the present invention, and not all embodiments. All other embodiments, which can be derived by a person skilled in the art from the embodiments given herein without making any creative effort, shall fall within the protection scope of the present invention.
In the description of the embodiments of the present application, "a plurality" means two or more unless otherwise specified.
The terms "comprises," "comprising," and any other variation thereof, in the embodiments of the present invention, are intended to cover a non-exclusive inclusion, such that a process, method, apparatus, article, or apparatus that comprises a list of steps or modules is not necessarily limited to those steps or modules explicitly listed, but may include other steps or modules not expressly listed or inherent to such process, method, article, or apparatus.
The naming or numbering of the steps appearing in the embodiments of the present invention does not mean that the steps in the method flow must be executed in chronological/logical order indicated by the naming or numbering, and the named or numbered steps of the flow may change the execution order according to the technical purpose to be achieved, as long as the same or similar technical effects are achieved.
Reference herein to "an embodiment" means that a particular feature, structure, or characteristic described in connection with the embodiment can be included in at least one embodiment of the invention. The appearances of the phrase in various places in the specification are not necessarily all referring to the same embodiment, nor are separate or alternative embodiments mutually exclusive of other embodiments. It is explicitly and implicitly understood by one skilled in the art that the embodiments described herein can be combined with other embodiments.
The invention provides a false news detection method, a false news detection device, an electronic device and a storage medium, which are respectively described below.
As shown in fig. 1, the present invention provides a false news detection method, including:
and 110, acquiring news data to be detected, and extracting features of the news data to obtain corresponding text features.
It can be understood that the method provided by the invention can be implemented based on an ECNNet architecture, the ECNNet architecture integrates a word vector model, a convolutional neural network, a generation countermeasure network and an incremental clustering algorithm, and false news detection for social network emergencies is realized through four modules, namely a feature extractor, an event mapper, a familiar event judger and an unknown event judger.
Determining a time range T, obtaining enough original news data from mainstream social networks such as the Xinlang microblog, the today' S first news and the Tencent news through a web crawler, screening and determining a research range S of an event. News data are input into a feature extractor, and the feature extractor extracts information features from text contents of the news data, wherein a Convolutional Neural Network (CNN) can be used as a core module of the feature extractor.
And 120, acquiring an event from the news data based on the text feature, and judging that the news data is a first type news or a second type news based on the event.
It will be appreciated that the text features are input to an event mapper, which, in conjunction with the text features extracted by the feature extractor, collects events from the news data and classifies them into familiar news (i.e., first type news) and unknown news (i.e., second type news).
Step 130, when it is determined that the news data is the first type of news, inputting the news data into a first event discriminator to retrieve a historical event corresponding to the news data based on the first event discriminator, and determining whether the news data is false news.
It will be appreciated that the first event discriminator is also a familiar event discriminator, which receives the news feed from the event mapper and predicts whether the news is real or false based on retrieving the corresponding historical events.
For each news event determined as "familiar news", a historical event set corresponding thereto is searched, and then an event correlation degree of the familiar news and the retrieved event set is calculated. The definition of the event correlation degree in the invention is as follows:
suppose there is a historical event 1 and an event 2, event 2 is composed of n news items, news respectively 1 ,news 2 ,……,news n . The cosine similarity of these news with historical event 1 is sim respectively 1 ,sim 2 ,……,sim n . The event association degree S of the historical event 1 and the event 2 is defined as follows:
Figure BDA0004045410210000071
for each historical event in the search set, the event can be represented as a false event by a label of 1, and the event can be represented as a true event by a label of 0, and then the news truth R is defined as follows:
Figure BDA0004045410210000072
wherein l i A tag value representing the ith news item. If the finally calculated news trueness R is larger than 0, the predicted familiar news is indicated to be false news, otherwise, the familiar news is considered to be true.
Step 140, inputting the news data into a second event discriminator to judge whether the news data is false news based on the second event discriminator under the condition that the news data is determined to be of a second type news;
wherein the second event discriminator comprises a false event detector and an event feature extractor; the false event detector is used for identifying the text features to obtain the probability that the corresponding event is false time; the event feature extractor is used for classifying the news data based on the probability and determining whether the news data is false news.
It is understood that the second event discriminator is also an unknown event discriminator, and the unknown event discriminator uses the generative countermeasures model to perform the migratory learning about the false news and thus to determine whether the unknown news is the false news. In addition, a convolutional neural network layer is integrated for event feature learning.
An event is typically a collection of news stories, that is, news and news that are storied for the same event have event associations, and there are also event associations between news and events. Thus, the use of an unknown event determiner that generates a competing network to build an ECNNet can be considered. The unknown event judger consists of two modules: a false event detector and an event feature extractor.
In some embodiments, the spurious event detector is based on transfer learning of spurious events against the generating network.
It is understood that the false event detector performs transfer learning in false events using a generation countermeasure network, and the event feature extractor extracts event features using a two-layer fully connected neural network.
The false event detector performs transfer learning in false events by generating an antagonistic network, and extracts a text feature representation R from a feature extractor T As an input, and outputs the probability that the ith event is a false event, representing the ith event as m i . The false event detector applies a full link layer with softmax to predict whether the news is false news. The invention labels the false event detector as G d (·;θ d ) Wherein theta d All parameters contained are indicated. The text characteristic R of the ith news T As input, the output is denoted as P θ (m i ) The calculation formula is as follows:
P θ (m i )=G d (G f (m i ;θ f );θ d )
the goal of the event detector is to identify whether unknown news is fake, using Y d To represent a training set that is artificially labeled and uses cross entropy to compute a loss function, as shown in the following equation:
Figure BDA0004045410210000081
in the following process, the optimum parameter theta is found f And theta d To make the loss function L df ,θ d ) Reaching a minimum value, this process can be expressed as:
Figure BDA0004045410210000082
however, the false event detector can only learn the characteristic features specific to the event, and cannot perform generalization, which is not favorable for detecting the event not included in the training data set. This therefore requires the ability to learn transferable features representative of newly occurring events. Consider that the model is able to learn more general feature representations that may represent news having the same event relevance. Therefore, in order to learn common features of events, it is considered to add an event feature extractor as follows to refine the model.
The event feature extractor is essentially a two-way confrontation transfer learning process, and makes full use of data features of a source field and a target field.
The event feature extractor is denoted G e (R Fe ) Wherein all parameters G contained therein are indicated e It deploys a convolutional neural network layer to correctly classify input domains including news and events. Definition of Y e Is a set of domain tags, so the loss of an event discriminator can be expressed as:
Figure BDA0004045410210000091
since the arbiter seeks to identify the input domain, from the perspective of the arbiter, the loss function should be minimized to find the best parameters:
Figure BDA0004045410210000092
however, for the event feature extractor, it aims to fool the false event detector into learning common features of events. An increase in the penalty function means that it learns more common features, since a larger penalty function indicates a greater difficulty in correctly classifying events into clusters. Therefore, it is necessary to maximize the above loss to find the optimal parameters.
Before the training phase of the unknown event judger, all historical events are extracted as a training set. In the training phase, the generation of the countermeasure network G is used on the one hand f (·;θ f ) Training data set, in order to improve the performance of false event detection, an event feature extractor cooperates with the false event detector to minimize the loss L dfd ). On the other hand, there is a very small max game between the event feature extractor and the spurious event detector. Thus, the final loss can be expressed as:
L finalfde )=L dfd )=L efe )
to find the optimum parameters
Figure BDA0004045410210000093
It is desirable to minimize L final And consider infinitesimal betting. Thus, the optimal parameters are related to the time at which the process reaches equilibrium, which can be expressed as:
Figure BDA0004045410210000094
Figure BDA0004045410210000095
in some embodiments, the extracting the feature of the news data to obtain the corresponding text feature includes:
performing word segmentation processing and part-of-speech tagging on the news data to obtain tagged words;
learning based on a pre-trained word embedding model to obtain a word vector corresponding to the marked word;
performing dimensionality reduction processing on sentence vectors corresponding to the news data based on the word vectors to obtain word embedded vectors;
and obtaining the text features based on the word embedding vector.
It is understood that, in order to extract the rich text features in news, word segmentation processing and part-of-speech tagging are sequentially performed on the input news text (i.e., news data). The word vector for each word is then learned through a pre-trained word embedding model, and in addition, the sentence vector for each news is dimension reduced using a PCA (principal component analysis) model. The resulting ordered list of words (i.e., word embedding vectors) after processing is the input to the text feature extractor.
In order to better extract corresponding information features from news texts, a convolutional neural network is adopted as a core module of a text feature extractor. And adding a modified convolutional neural network model, namely Text-CNN, into the Text feature extractor. The architecture of Text-CNN uses multiple filters with different size windows to filter out Text features of different granularity.
In some embodiments, said deriving said text feature based on said word embedding vector comprises:
inputting the word embedding vector into a convolution filter to obtain a characteristic vector corresponding to each sentence in the news data;
and performing maximum pooling on the feature vectors to obtain the text features.
It will be appreciated that each word in the news text is text vectorized, represented as a word embedding vector. The word embedding vector for each word or phrase is initialized using a pre-trained word embedding model on a given data set. For the ith word in a sentence, the corresponding k-dimensional word embedding vector is denoted as T i ∈R k Thus, a sentence with n words can be represented as:
Figure BDA0004045410210000101
wherein, the first and the second end of the pipe are connected with each other,
Figure BDA0004045410210000102
representing join operators. The convolution filter with the window size h takes h word continuous sequences in the text as input and outputs a characteristic. To clearly illustrate the process, taking as an example a continuous sequence of h words starting with the ith word, the filtering operation can be expressed as:
t i =σ(W c ·T i:i+h-1 )
where σ (-) is an activation function of ReLU, W c Representing the weights of the convolution filter. This filter can also be applied to the remaining words, and then a feature vector for this sentence can be obtained as follows:
t=[t 1 ,t 2 ,...,t n-h+1 ]
for each feature vector, a maximum value is found using a maximum pooling operation in order to extract the most important information in the text. Now the corresponding characteristics of a particular filter are obtained. This process is repeated until the characteristics of all filters are obtained. To extract text features with different granularities, different window sizes are applied. For a particular window size, there is n h A different filter.
Thus, assuming there are c possible window sizes, there is a total of c n h A filter. The text property after the max pooling operation is noted as
Figure BDA0004045410210000111
After the max pooling operation, a fully-connected layer is used to ensure the final text feature representation (denoted @)>
Figure BDA0004045410210000112
) Having a p-dimensional feature:
Figure BDA0004045410210000113
wherein, W tf Is the weight matrix of the full connection layer.
In some embodiments, the obtaining an event from the news data based on the text feature and determining that the news data is a first type of news or a second type of news based on the event includes:
searching for keywords from the news data based on the text features;
retrieving a similar news set from a keyword news inverted index table based on the keywords;
determining cosine similarity of different news in the similar news set, and clustering the news in the similar news set based on the cosine similarity of the different news in the similar news set to obtain an event set corresponding to the keyword;
judging whether the news data is first type news or second type news based on the event set;
the keyword news inverted index table comprises a plurality of preset keywords and a news inverted table corresponding to each preset keyword.
It is understood that the steps in this embodiment are implemented based on an event mapper, which comprises 3 parts: a keyword-news mapper, a keyword-event mapper, and a filter.
The keyword-news mapper is used for quickly searching whether the input news has similar news in the existing news, and is essentially a dynamically updated inverted index table. To reduce the time required to search previously encountered news like input news d while maintaining constant time and space requirements, a keyword-news inverted index table maintained within a time window t may be used, as shown in fig. 2. The set M is continuously updated by replacing the oldest news with the newest incoming news to keep the memory requirements of the keyword-news inverted index table unchanged, the number of keywords may become very large due to unlimited use of the vocabulary in the news stream. Each entry of the keyword-news inverted index table contains a keyword and a finite set Q. The set Q is the latest news in which the keyword appears. But the number of news exceeds the limit of Q, the oldest news will be replaced by the newest news containing the key word.
The first k keywords are selected from news based on a TF-IDF (term frequency-inverse document frequency) method, and then a potential similar news set is retrieved by calculating the cosine similarity of each of the k keywords and each corresponding news in the keyword-news inverted index table.
For example, the news input "news release in city a" will report that 4 cases of dynamic people in city a (zone E2 and zone F2) are added from day 16 to 22 in 22 days 4 months. There are 1 additional dynamic persons for Y. "wherein the first three TF-IDF weighted keywords are" A City "," C dynamic ", and" Y dynamic ", respectively. Each keyword is searched in the keyword-news inverted index table, and news having IDs of 3, 5, 7, 15, 18, 21, and 25 are retrieved. Finally, the cosine distance is used to calculate the similarity between the two news vectors. The formula for calculating the cosine distance is shown as follows:
Figure BDA0004045410210000121
if none of the news has a cosine similarity higher than the tsi value (threshold), this indicates that no similar news has occurred before. Thus, a new event will be created and news d will be assigned to the event and then sent to the keyword-to-event mapper. If there is similarity above tsi, then news d will be sent directly to the keyword-to-event mapper. Finally, the keyword-news mapper is updated by adding news to the corresponding news set of k terms.
The keyword-event mapper is used to detect and divide events into familiar events and unknown events. In the initial state, the keyword-to-event mapper only contains a dynamically updated inverted index table of keyword-to-historical event, as shown in fig. 3. Similar to the keyword-news mapper, each row has a keyword and a finite set, where the number of historical events does not exceed Q. When the number exceeds the limit, the oldest event will be replaced by the newest event.
In some embodiments, the determining news data as first type news or second type news based on the event set includes:
entropy filtering is carried out on the event set to obtain filtered events;
and performing LCS (longest common subsequence) algorithm filtering on the filtered event, and judging that the filtered event is the first type news or the second type news based on an LCS algorithm filtering result.
It can be understood that in the keyword-event mapper, a clustering method is used to cluster news into events, and smaller events, called fragment events, are generated in the clustering process. The presence of debris events affects the accuracy and speed of the model.
To improve the accuracy and speed of the model, filters are used to filter out debris events and those meaningless candidate events. The filter mainly comprises two parts, namely entropy filtering and LCS algorithm filtering.
Entropy filtering uses entropy information in the candidate event cluster. The entropy of each candidate event cluster is calculated and compared with a preset entropy threshold (tend), if the entropy of the candidate event cluster is smaller than the entropy threshold tend, the information quantity of the candidate event cluster is considered to be less than the set minimum information quantity, the candidate event cluster is judged to be a fragment cluster, and discarding processing is carried out.
LCS algorithm filtering considers that news in events generally have a similar sentence structure, uses the LCS algorithm for each candidate event, and records the length of the maximum LCS. Events with maximum LCS below tlcs (threshold) are then discarded, and for events above tlcs, events representing news with maximum LCS are discarded. Finally, the remaining news belonging to the familiar event is sent to the familiar event judger, and another kind of unknown news is sent to the unknown event judger.
In some embodiments, the retrieving the similar news set from the keyword news inverted index table based on the keyword includes:
determining cosine similarity corresponding to each piece of news corresponding to the keywords in the keyword news inverted index table;
and constructing the similar news set based on the news with the cosine similarity larger than a preset threshold.
It can be understood that, based on the above example, the first three keywords "city a", "C dynamic" and "Y dynamic" can be found, and clustering is performed by using the cosine similarity between news vectors as a measurement standard by using an incremental clustering method. Therefore, news related to the keyword "a city" is clustered into events having event IDs of 2,5, 10, and 11, as shown in fig. 3. Thereafter, when the input news includes the keyword "city a", the keyword-event mapper rapidly searches the event set with event IDs of 2,5, 10, and 11, and calculates cosine similarity between the input news and each event in the event set. If the cosine similarity is higher than tes (threshold), news is added to the corresponding event. Otherwise, a new event is created and news is added to the newly created event, which is then inserted into the keyword-event mapper.
In summary, the ECNNet architecture provided by the present invention includes four key components: feature extractor, event mapper, familiar event judger, and unknown event judger. In the training phase of the model, a training set X is provided first 1 Entering a Text-CNN model in a feature extractor to obtain a Text feature representation of news
Figure BDA0004045410210000141
Thereafter, the text feature represents ≥ er>
Figure BDA0004045410210000142
Input into an event mapper which divides the input news into familiar news N f And unknown News N u Two broad categories of news. Then, the familiar news forms a historical event set in a familiar event judger for quickly predicting whether the familiar news is true or false or not; unknown news is input into an unknown event judger for generative confrontation training to extract an event feature representation.
In the test phase of the model, a test set X is provided u Entering a Text-CNN model to train to obtain Text characteristic representation of an unlabeled news set
Figure BDA0004045410210000143
Subsequently, the text characteristic of the test set is represented ≥>
Figure BDA0004045410210000144
The event data is input into an event mapper for dividing familiar news into unknown news. In the familiar event judger, a historical event set is retrieved to calculate the truth of news R θ (u i ) In order to determine whether it is false news. In the unknown event judger, a false news detector predicts whether news is false news or not and outputs a calculated tag set Y u
In the field of social network false news detection, there is no internationally recognized standard test corpus or corpus approximately related thereto. In order to fairly evaluate the performance of the ECNNet model, the data sources of the experiment are various social platforms and portal sites in the simplified Chinese network, such as microblog, hosta, small Red book, tencent news, today's headline and the like, and 666 news in total from 12/1/2021/12/31/2021 are obtained in a web crawler mode.
Then, these 666 news items are manually tagged, and the news item is marked as 0 for true news and 1 for false news. Wherein, 446 pieces of real news and 220 pieces of false news.
The statistical information on the data set is shown in table 1:
TABLE 1
Figure BDA0004045410210000151
In the conventional false news detection evaluation, accuracy (Precision), recall (Recall) and F-score (F-score) are three very important evaluation indexes. The accuracy is relative to the prediction result, and it indicates how many samples predicted to be positive are real positive samples, and the positive class prediction is denoted as TP, and the negative class prediction is denoted as FP. The recall is for the original sample, which indicates how many of the positive examples in the sample were predicted to be correct. The original positive class prediction is designated as positive as TP and the original positive as negative as FN. The F1 value is the harmonic mean of the correct rate and the recall rate.
The evaluation index used herein is represented by the following formula:
Figure BDA0004045410210000161
Figure BDA0004045410210000162
Figure BDA0004045410210000163
to verify the validity of the proposed model, the choice of the reference method is mainly considered from two aspects: traditional machine learning models and neural network deep learning models.
The invention mainly selects the following four reference methods:
1. support Vector Machines (SVM). The support vector machine trains a support vector machine model using the normalized text feature representation and the set of true labels. Contest C is set to 50 and the kernel function is set to RF.
2. Random Forest (RF). The random forest is trained using the normalized text feature representation and the set of real labels to a random forest model. The parameter n _ estimators is set to 50.
3. Linear Regression (LR). Linear regression uses the normalized text feature representation and the set of true labels to train the Logistic regression model. The parameter solver was set to lbfgs.
4. Long Short Term Memory neural networks (LSTM). The long-short term memory neural network uses a fully-connected layer as a text feature extractor, and the text vector representation is from a text feature evaluator. The model has 256-dimensional hidden size, the input of the full connection layer is text characteristics, and the output is the probability of real news.
The experimental steps are as follows:
the experiment was completed by calling the machine learning libraries Scikit-Learn and Pythrch. Python version 3.7.2, scikit-Leanrn version 0.21.2, pytorch version 1.10.0. The training set and the test set were divided in a ratio of 8. The training set is used to optimize the parameter settings of the ECNNet, and the test set is used to evaluate the performance of the model. The ECNNet model recommended parameter settings are shown in table 2:
TABLE 2
Figure BDA0004045410210000171
In the feature extractor, the dimension of word embedding is set to be 512 dimensions, the time window size is different from 1 to 4, and the full connection layer size of the feature extractor is set to be 32. For the event detector, its full connection layer size is set to 64. The remaining detailed parameter settings are shown in table 2. For all basic control methods and the proposed ECNNet model, the same batch size 100 was used in the training phase and the training runs were 100.
For all basic control groups, either the recommended or optimal parameter settings were used. To obtain a fair evaluation, the same pretreatment operation was performed for all experimental methods.
Results and analysis:
the above method is verified on the data set provided by the present invention and compared with several false news detection methods which are most popular at present, and the results are shown in table 3:
TABLE 3
Figure BDA0004045410210000172
Figure BDA0004045410210000181
Compared with other experimental methods, the ECNNet algorithm provided by the invention has good effects on accuracy, recall rate and F1 value. The accuracy of ECNNet is much higher than other baseline methods, and is improved by 0.02 compared with the prior advanced baseline method RF. In terms of recall, ECNNet has also a recall ratio higher than most of the baseline methods previously available, differing by only 0.06 from the baseline method LSTM which is the top-ranked show. The ECNNet F1-Score achieves the best performance by comprehensively considering the recall rate and the accuracy of the ECNNet, and the model has unique superiority in false news detection.
Comprehensive experiment results show that the false news detection algorithm based on event association can effectively improve the recall rate of the algorithm and realize accurate detection of false news in a social network.
The invention provides a false news detector ECNNet based on event correlation for social network emergencies, which can realize quick and efficient detection of false news based on event correlation among different news.
The ECNNet model provided by the invention uses an event clustering device to measure the difference between different events and further realize event clustering. The keyword-event ID inverted index table maintained by the event clustering device can quickly realize the detection of familiar news.
The ECNNet model provided by the invention is a general framework for false news detection, and high cohesion and low coupling are realized among all modules in the ECNNet model. The user can change or expand according to actual demand.
Experiments show that the ECNNet model provided by the invention can effectively realize false news detection and has better effect on accuracy and recall rate.
According to the false news detection method provided by the invention, an event is obtained from news data through text characteristics corresponding to the news data to be detected, and the news data is judged to be first type news or second type news based on the event; under the condition that the news data is determined to be the first type of news, retrieving a historical event corresponding to the news data based on a first event discriminator, and judging whether the news data is false news or not; under the condition that the news data is determined to be of the second type news, judging whether the news data is false news or not based on a second event discriminator; the second event discriminator comprises a false event detector and an event feature extractor; the false event detector is used for identifying text features to obtain the probability that the corresponding event is false time; the event feature extractor is used for classifying the news data based on the probability and determining whether the news data is false news. The method divides news data into the first type news and the second type news, respectively inputs the first type news and the second type news into different event discriminators to discriminate the false news, and realizes the quick and efficient detection of the false news on the basis of the event correlation among the different news.
As shown in fig. 4, the present invention also provides a false news detection apparatus 400, including:
the extraction module 410 is configured to acquire news data to be detected, and perform feature extraction on the news data to obtain corresponding text features;
a first determining module 420, configured to obtain an event from the news data based on the text feature, and determine that the news data is a first type of news or a second type of news based on the event;
a second determining module 430, configured to, in a case that it is determined that the news data is of a first type, input the news data into a first event discriminator, so as to retrieve, based on the first event discriminator, a historical event corresponding to the news data, and determine whether the news data is false news;
a third determining module 440, configured to, in a case that it is determined that the news data is of a second type of news, input the news data into a second event discriminator to determine whether the news data is false news based on the second event discriminator;
wherein the second event discriminator comprises a false event detector and an event feature extractor; the false event detector is used for identifying the text characteristics to obtain the probability that the corresponding event is false time; the event feature extractor is used for classifying the news data based on the probability and determining whether the news data is false news.
The false news detection device provided in the above embodiment can implement the technical solutions described in the above false news detection method embodiments, and the specific implementation principles of the above modules or units can be referred to the corresponding contents in the above false news detection method embodiments, and are not described herein again.
As shown in fig. 5, the present invention also provides an electronic device 500. The electronic device 500 includes a processor 501, a memory 502, and a display 503. Fig. 5 shows only some of the components of the electronic device 500, but it is to be understood that not all of the shown components are required to be implemented, and that more or fewer components may be implemented instead.
The storage 502 may be an internal storage unit of the electronic device 500, such as a hard disk or a memory of the electronic device 500, in some embodiments. The memory 502 may also be an external storage device of the electronic device 500 in other embodiments, such as a plug-in hard disk, a Smart Media Card (SMC), a Secure Digital (SD) Card, a Flash memory Card (Flash Card), etc., provided on the electronic device 500.
Further, the memory 502 may also include both internal storage units and external storage devices of the electronic device 500. The memory 502 is used for storing application software and various data for installing the electronic device 500.
The processor 501 may be a Central Processing Unit (CPU), a microprocessor or other data Processing chip in some embodiments, and is used for running program codes stored in the memory 502 or Processing data, such as the false news detection method of the present invention.
The display 503 may be an LED display, a liquid crystal display, a touch-sensitive liquid crystal display, an OLED (Organic Light-Emitting Diode) touch device, or the like in some embodiments. The display 503 is used to display information at the electronic device 500 and to display a visual user interface. The components 501-503 of the electronic device 500 communicate with each other via a system bus.
In some embodiments of the present invention, when processor 501 executes the false news detection program in memory 502, the following steps may be implemented:
acquiring news data to be detected, and extracting characteristics of the news data to obtain corresponding text characteristics;
acquiring an event from the news data based on the text feature, and judging whether the news data is first type news or second type news based on the event;
under the condition that the news data is determined to be first type news, inputting the news data into a first event discriminator so as to retrieve a historical event corresponding to the news data based on the first event discriminator and judge whether the news data is false news or not;
under the condition that the news data is determined to be second type news, inputting the news data into a second event discriminator so as to judge whether the news data is false news or not based on the second event discriminator;
wherein the second event discriminator comprises a false event detector and an event feature extractor; the false event detector is used for identifying the text features to obtain the probability that the corresponding event is false time; the event feature extractor is used for classifying the news data based on the probability and determining whether the news data is false news.
It should be understood that: the processor 501, when executing the false news detection program in the memory 502, may perform other functions in addition to the above functions, which may be described in detail in the foregoing description of the corresponding method embodiments.
Further, the type of the mentioned electronic device 500 is not specifically limited in the embodiment of the present invention, and the electronic device 500 may be a portable electronic device such as a mobile phone, a tablet computer, a Personal Digital Assistant (PDA), a wearable device, and a laptop computer (laptop). Exemplary embodiments of portable electronic devices include, but are not limited to, portable electronic devices that carry an IOS, android, microsoft, or other operating system. The portable electronic device may also be other portable electronic devices such as laptop computers (laptop) with touch sensitive surfaces (e.g., touch panels), etc. It should also be understood that in other embodiments of the present invention, the electronic device 500 may not be a portable electronic device, but may be a desktop computer having a touch-sensitive surface (e.g., a touch panel).
In yet another aspect, the present invention also provides a non-transitory computer-readable storage medium having stored thereon a computer program, which when executed by a processor, implements a method for false news detection provided by the above methods, the method comprising:
acquiring news data to be detected, and extracting characteristics of the news data to obtain corresponding text characteristics;
acquiring an event from the news data based on the text feature, and judging whether the news data is first type news or second type news based on the event;
under the condition that the news data is determined to be a first type of news, inputting the news data into a first event discriminator so as to retrieve a historical event corresponding to the news data based on the first event discriminator and judge whether the news data is false news;
under the condition that the news data is determined to be second type news, inputting the news data into a second event discriminator so as to judge whether the news data is false news or not based on the second event discriminator;
wherein the second event discriminator comprises a false event detector and an event feature extractor; the false event detector is used for identifying the text features to obtain the probability that the corresponding event is false time; the event feature extractor is used for classifying the news data based on the probability and determining whether the news data is false news.
Those skilled in the art will appreciate that all or part of the flow of the method implementing the above embodiments may be implemented by a computer program, which is stored in a computer-readable storage medium, to instruct related hardware. The computer readable storage medium is a magnetic disk, an optical disk, a read-only memory or a random access memory.
The method, the device, the electronic device and the storage medium for detecting the false news provided by the invention are introduced in detail, a specific example is applied in the text to explain the principle and the implementation mode of the invention, and the description of the embodiment is only used for helping to understand the method and the core idea of the invention; meanwhile, for those skilled in the art, according to the idea of the present invention, there may be variations in the specific embodiments and the application scope, and in summary, the content of the present specification should not be construed as a limitation to the present invention.

Claims (10)

1. A false news detection method, comprising:
acquiring news data to be detected, and extracting characteristics of the news data to obtain corresponding text characteristics;
acquiring an event from the news data based on the text feature, and judging whether the news data is first type news or second type news based on the event;
under the condition that the news data is determined to be first type news, inputting the news data into a first event discriminator so as to retrieve a historical event corresponding to the news data based on the first event discriminator and judge whether the news data is false news or not;
under the condition that the news data is determined to be of a second type of news, inputting the news data into a second event discriminator so as to judge whether the news data is false news or not based on the second event discriminator;
wherein the second event discriminator comprises a false event detector and an event feature extractor; the false event detector is used for identifying the text characteristics to obtain the probability that the corresponding event is false time; the event feature extractor is used for classifying the news data based on the probability and determining whether the news data is false news.
2. A false news detection method according to claim 1, wherein the false event detector is based on transfer learning of false events against a generating network.
3. The false news detection method of claim 1, wherein the extracting the features of the news data to obtain corresponding text features comprises:
performing word segmentation processing and part-of-speech tagging on the news data to obtain tagged words;
learning based on a pre-trained word embedding model to obtain a word vector corresponding to the marked word;
performing dimensionality reduction processing on sentence vectors corresponding to the news data based on the word vectors to obtain word embedded vectors;
and obtaining the text features based on the word embedding vector.
4. A false news detection method according to claim 3, wherein said deriving the text feature based on the word embedding vector comprises:
inputting the word embedding vector into a convolution filter to obtain a characteristic vector corresponding to each sentence in the news data;
and performing maximum pooling on the feature vectors to obtain the text features.
5. The false news detection method of any one of claims 1-4, wherein the obtaining an event from the news data based on the text feature and determining whether the news data is a first type of news or a second type of news based on the event comprises:
searching for keywords from the news data based on the text features;
retrieving a similar news set from a keyword news inverted index table based on the keywords;
determining cosine similarity of different news in the similar news set, and clustering the news in the similar news set based on the cosine similarity of the different news in the similar news set to obtain an event set corresponding to the keyword;
judging whether the news data is first type news or second type news based on the event set;
the keyword news inverted index table comprises a plurality of preset keywords and a news inverted table corresponding to each preset keyword.
6. The false news detection method of claim 5, wherein the determining news data as either a first type of news or a second type of news based on the set of events comprises:
entropy filtering is carried out on the event set to obtain filtered events;
and performing LCS algorithm filtering on the filtered event, and judging that the filtered event is the first type news or the second type news based on the LCS algorithm filtering result.
7. The false news detection method of claim 5, wherein the retrieving similar news sets from a keyword news inverted index table based on the keyword comprises:
determining cosine similarity corresponding to each piece of news corresponding to the keywords in the keyword news inverted index table;
and constructing the similar news set based on the news with cosine similarity larger than a preset threshold.
8. A false news detection device, comprising:
the extraction module is used for acquiring news data to be detected and extracting the characteristics of the news data to obtain corresponding text characteristics;
the first judging module is used for acquiring an event from the news data based on the text characteristic and judging whether the news data is first type news or second type news based on the event;
the second judgment module is used for inputting the news data into a first event discriminator under the condition that the news data is determined to be the first type of news, so as to retrieve the historical events corresponding to the news data based on the first event discriminator and judge whether the news data is false news;
the third judging module is used for inputting the news data into a second event discriminator under the condition that the news data is determined to be the second type news, so as to judge whether the news data is false news or not based on the second event discriminator;
wherein the second event discriminator comprises a false event detector and an event feature extractor; the false event detector is used for identifying the text characteristics to obtain the probability that the corresponding event is false time; the event feature extractor is used for classifying the news data based on the probability and determining whether the news data is false news.
9. An electronic device comprising a memory and a processor, wherein,
the memory is used for storing programs;
the processor, coupled to the memory, is configured to execute the program stored in the memory to implement the steps in the false news detection method according to any one of claims 1 to 7.
10. A non-transitory computer readable storage medium having stored thereon a computer program, wherein the computer program, when executed by a processor, implements the false news detection method according to any one of claims 1 to 7.
CN202310036408.3A 2023-01-09 2023-01-09 False news detection method and device, electronic equipment and storage medium Pending CN115935953A (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN202310036408.3A CN115935953A (en) 2023-01-09 2023-01-09 False news detection method and device, electronic equipment and storage medium

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN202310036408.3A CN115935953A (en) 2023-01-09 2023-01-09 False news detection method and device, electronic equipment and storage medium

Publications (1)

Publication Number Publication Date
CN115935953A true CN115935953A (en) 2023-04-07

Family

ID=86650873

Family Applications (1)

Application Number Title Priority Date Filing Date
CN202310036408.3A Pending CN115935953A (en) 2023-01-09 2023-01-09 False news detection method and device, electronic equipment and storage medium

Country Status (1)

Country Link
CN (1) CN115935953A (en)

Cited By (1)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN117034905A (en) * 2023-08-07 2023-11-10 重庆邮电大学 Internet false news identification method based on big data

Cited By (2)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN117034905A (en) * 2023-08-07 2023-11-10 重庆邮电大学 Internet false news identification method based on big data
CN117034905B (en) * 2023-08-07 2024-05-14 重庆邮电大学 Internet false news identification method based on big data

Similar Documents

Publication Publication Date Title
Shi et al. Functional and contextual attention-based LSTM for service recommendation in mashup creation
CN108629043B (en) Webpage target information extraction method, device and storage medium
Hua et al. Clickage: Towards bridging semantic and intent gaps via mining click logs of search engines
CN107992596B (en) Text clustering method, text clustering device, server and storage medium
US7707162B2 (en) Method and apparatus for classifying multimedia artifacts using ontology selection and semantic classification
US7685201B2 (en) Person disambiguation using name entity extraction-based clustering
CN111797214A (en) FAQ database-based problem screening method and device, computer equipment and medium
CN107784092A (en) A kind of method, server and computer-readable medium for recommending hot word
KR101754473B1 (en) Method and system for automatically summarizing documents to images and providing the image-based contents
CN111461553A (en) System and method for monitoring and analyzing public sentiment in scenic spot
CN107844533A (en) A kind of intelligent Answer System and analysis method
US10366108B2 (en) Distributional alignment of sets
CN107506472B (en) Method for classifying browsed webpages of students
Shi et al. Hashtagger+: Efficient high-coverage social tagging of streaming news
CN113434636B (en) Semantic-based approximate text searching method, semantic-based approximate text searching device, computer equipment and medium
Wu et al. Extracting topics based on Word2Vec and improved Jaccard similarity coefficient
CN115935953A (en) False news detection method and device, electronic equipment and storage medium
CN107908649B (en) Text classification control method
CN110019556B (en) Topic news acquisition method, device and equipment thereof
Yan et al. Efficient large-scale stance detection in tweets
CN113792131B (en) Keyword extraction method and device, electronic equipment and storage medium
CN111061939B (en) Scientific research academic news keyword matching recommendation method based on deep learning
CN114741550A (en) Image searching method and device, electronic equipment and computer readable storage medium
CN113722484A (en) Rumor detection method, device, equipment and storage medium based on deep learning
Chaudhary et al. Fake News Detection During 2016 US Elections Using Bootstrapped Metadata-Based Naïve Bayesian Classifier

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination