CN113901810A - Cross-domain false news detection method based on multi-representation learning - Google Patents
Cross-domain false news detection method based on multi-representation learning Download PDFInfo
- Publication number
- CN113901810A CN113901810A CN202111124543.0A CN202111124543A CN113901810A CN 113901810 A CN113901810 A CN 113901810A CN 202111124543 A CN202111124543 A CN 202111124543A CN 113901810 A CN113901810 A CN 113901810A
- Authority
- CN
- China
- Prior art keywords
- domain
- news
- false
- sharing
- representation learning
- Prior art date
- Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
- Pending
Links
Images
Classifications
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F40/00—Handling natural language data
- G06F40/20—Natural language analysis
- G06F40/279—Recognition of textual entities
- G06F40/289—Phrasal analysis, e.g. finite state techniques or chunking
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F16/00—Information retrieval; Database structures therefor; File system structures therefor
- G06F16/90—Details of database functions independent of the retrieved data types
- G06F16/906—Clustering; Classification
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F18/00—Pattern recognition
- G06F18/20—Analysing
- G06F18/24—Classification techniques
- G06F18/241—Classification techniques relating to the classification model, e.g. parametric or non-parametric approaches
- G06F18/2415—Classification techniques relating to the classification model, e.g. parametric or non-parametric approaches based on parametric or probabilistic models, e.g. based on likelihood ratio or false acceptance rate versus a false rejection rate
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06N—COMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
- G06N3/00—Computing arrangements based on biological models
- G06N3/02—Neural networks
- G06N3/04—Architecture, e.g. interconnection topology
- G06N3/045—Combinations of networks
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06N—COMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
- G06N3/00—Computing arrangements based on biological models
- G06N3/02—Neural networks
- G06N3/08—Learning methods
Landscapes
- Engineering & Computer Science (AREA)
- Theoretical Computer Science (AREA)
- Physics & Mathematics (AREA)
- General Engineering & Computer Science (AREA)
- Data Mining & Analysis (AREA)
- General Physics & Mathematics (AREA)
- Artificial Intelligence (AREA)
- General Health & Medical Sciences (AREA)
- Life Sciences & Earth Sciences (AREA)
- Computational Linguistics (AREA)
- Evolutionary Computation (AREA)
- Health & Medical Sciences (AREA)
- Computing Systems (AREA)
- Molecular Biology (AREA)
- Databases & Information Systems (AREA)
- Biomedical Technology (AREA)
- Biophysics (AREA)
- Software Systems (AREA)
- Mathematical Physics (AREA)
- Bioinformatics & Cheminformatics (AREA)
- Computer Vision & Pattern Recognition (AREA)
- Probability & Statistics with Applications (AREA)
- Evolutionary Biology (AREA)
- Audiology, Speech & Language Pathology (AREA)
- Bioinformatics & Computational Biology (AREA)
- Image Analysis (AREA)
Abstract
The invention relates to a cross-domain false news detection method based on multi-representation learning. The technical scheme of the invention is a cross-domain false news detection method based on multi-representation learning, which is used for acquiring a news text to be detected and a domain label to which the news text belongs; inputting the news text into a BERT model, and extracting word embedding vectors of the news text; inputting word embedding vectors and field labels of news texts into a field sharing feature generator based on multi-representation learning to obtain a fused field sharing feature expression; and inputting the fused domain sharing feature expression into a false news classifier, and outputting a probability value result of true and false news classification. The method is suitable for the field of false news detection. According to the method, the weights of different fields for different field sharing characteristics are dynamically adjusted according to the relations among different fields through the relations among the field door model learning fields, the learning difficulty of field sharing knowledge is reduced, and the cross-field false news detection capability is improved.
Description
Technical Field
The invention relates to a cross-domain false news detection method based on multi-representation learning. The method is suitable for the field of false news detection.
Background
With the development of the internet, social media becomes an important channel for people to obtain information. However, the development of things is always twosided, and social media brings convenience to people and provides a channel for the wide and rapid spread of false news. The flooding of false news can cause serious economic, political, etc. harm to society. The false news relates to a plurality of fields (such as military affairs, politics and the like), the data distribution of different fields is different, and how to detect the false news of cross-field becomes an important problem to be solved at present.
False news is defined as: a message that is intentionally kneaded and can be verified as fake. With the rich media of network media, the form of news is also diversified, and news can include multi-modal information such as news text, pictures, videos, and the like.
The false news detection method may be classified into a news content-based method and a social context-based method according to the input type. False news detection methods based on news content typically distinguish between real and false news by mining the respective patterns of false (or real) news content. The false news detection method based on the social context focuses on detecting various information left in the news social media propagation process, and removing the news content, wherein the information also comprises a propagation graph structure, forwarding content, comment content, participated user information and the like.
The current cross-domain false news detection method is based on a domain self-adaptive method, and aligns the distribution of all domains, so as to extract the domain sharing characteristics of all domains to detect the false news. The domain sharing feature can be regarded as knowledge common among domains, and can improve the capability of false news detection in all domains.
The domain-adaptive-based domain-sharing feature extraction method forcibly aligns all the domains in the same feature space to generate a domain-sharing feature, and has the following defects: (1) shared knowledge in different fields is different, some fields are similar and can extract migratable shared features, and some fields have larger difference, and forced extraction of shared knowledge in some fields may cause negative migration phenomenon and reduce model performance. (2) With the increase of the number of the fields, the field alignment is more and more difficult, the knowledge shared by the fields is more and more difficult to learn, and the effect of forcibly extracting the shared features of all the fields is not obviously improved.
Disclosure of Invention
The technical problem to be solved by the invention is as follows: aiming at the existing problems, a cross-domain false news detection method based on multi-representation learning is provided.
The technical scheme adopted by the invention is as follows: a cross-domain false news detection method based on multi-representation learning is characterized in that:
acquiring a news text to be detected and a domain label to which the news text belongs;
inputting the news text into a BERT model, and extracting word embedding vectors of the news text;
inputting word embedding vectors and field labels of news texts into a field sharing feature generator based on multi-representation learning to obtain a fused field sharing feature expression;
and inputting the fused domain sharing feature expression into a false news classifier, and outputting a probability value result of true and false news classification.
The method for inputting the word embedding vector and the domain label of the news text into the domain sharing feature generator based on multi-representation learning to obtain the fused domain sharing feature expression comprises the following steps:
the words of the news text are embedded into vectors and input into a plurality of domain sharing experts to generate a plurality of different domain sharing characteristics, and each domain sharing characteristic focuses on one aspect of domain sharing knowledge;
inputting the domain label into the trained domain gate model to obtain the weight of the shared features of each domain;
and carrying out weighted summation on a plurality of domain sharing characteristics generated based on a plurality of domain sharing experts and corresponding domain sharing characteristic weights obtained by the domain gate model to obtain a fused domain sharing characteristic expression.
And using a multilayer perceptron as the false news classifier, wherein the multilayer perceptron is composed of multilayer fully-connected neural networks, the last layer of the classifier is normalized by using a softmax activation function, and two floating point numbers with the sum of 1 are output and respectively represent a probability value for judging whether the news is true and a probability value for judging whether the news is false.
And using binary cross entropy loss as a loss function of the false news classifier for false news detection tasks, and minimizing the difference between a predicted value and a true value of false news detection.
A cross-domain false news detection device based on multi-representation learning is characterized by comprising:
the device comprises a to-be-detected content acquisition module, a to-be-detected content acquisition module and a to-be-detected content acquisition module, wherein the to-be-detected content acquisition module is used for acquiring a news text to be detected and a domain label to which the news text belongs;
the word embedded vector extraction module is used for inputting the news text into the BERT model and extracting a word embedded vector of the news text;
the shared feature extraction module is used for inputting the word embedding vector and the domain label of the news text into the domain shared feature generator based on multi-representation learning to obtain a fused domain shared feature expression;
and the true and false classification module is used for inputting the fused domain sharing feature expression into a false news classifier and outputting a probability value result of true and false news classification.
The method for inputting the word embedding vector and the domain label of the news text into the domain sharing feature generator based on multi-representation learning to obtain the fused domain sharing feature expression comprises the following steps:
the words of the news text are embedded into vectors and input into a plurality of domain sharing experts to generate a plurality of different domain sharing characteristics, and each domain sharing characteristic focuses on one aspect of domain sharing knowledge;
inputting the domain label into the trained domain gate model to obtain the weight of the shared features of each domain;
and carrying out weighted summation on a plurality of domain sharing characteristics generated based on a plurality of domain sharing experts and corresponding domain sharing characteristic weights obtained by the domain gate model to obtain a fused domain sharing characteristic expression.
And using a multilayer perceptron as the false news classifier, wherein the multilayer perceptron is composed of multilayer fully-connected neural networks, the last layer of the classifier is normalized by using a softmax activation function, and two floating point numbers with the sum of 1 are output and respectively represent a probability value for judging whether the news is true and a probability value for judging whether the news is false.
And using binary cross entropy loss as a loss function of the false news classifier for false news detection tasks, and minimizing the difference between a predicted value and a true value of false news detection.
A storage medium having stored thereon a computer program executable by a processor, the computer program comprising: the computer program when executed implements the steps of the cross-domain false news detection method based on multi-representation learning.
A cross-domain false news detection based on multi-representation learning, having a memory and a processor, the memory having stored thereon a computer program executable by the processor, characterized by: the computer program when executed implements the steps of the cross-domain false news detection method based on multi-representation learning.
The invention has the beneficial effects that: the domain sharing feature extraction method based on multi-representation learning can extract more domain sharing knowledge, a plurality of different domain sharing features are generated through a plurality of domain sharing experts, each domain sharing feature focuses on one aspect of the domain sharing knowledge, and the learning difficulty of the domain sharing knowledge is reduced.
According to the method, the weight of the shared features of different fields to different fields is dynamically adjusted according to the relation between the different fields through the relation between the field door models and the learning fields.
According to the method, transferable shared knowledge in different fields is captured through the relation among the field learning fields, the fused field shared feature representation is obtained, and finally the fused field shared feature is input into the false news classifier to carry out false news detection, so that the cross-field false news detection capability is improved.
Drawings
FIG. 1 is a flow chart of an embodiment.
FIG. 2 is a block diagram of a domain sharing feature generator in an embodiment.
Detailed Description
The embodiment is a cross-domain false news detection method based on multi-representation learning, and the method specifically comprises the following steps:
s1, acquiring a news text to be detected and a domain label to which the news text belongs;
and S2, inputting the news text into the BERT model, and extracting a word embedding vector of the news text.
In this example, a trained chinese BERT model is used as a model for text encoding, BERT is a pre-trained language model related to context, and can be used as a classifier alone or can extract the last layer of features as word embedding vectors.
In this embodiment, a news text s to be predicted is input, and a word embedding vector x of the news text is output after passing through a BERT model, and the formula is as follows:
x=BERT(s)
and S3, inputting the word embedding vector and the domain label of the news text into a domain sharing feature generator based on multi-representation learning to obtain a fused domain sharing feature expression.
In this embodiment, a domain-shared feature generator based on multi-representation learning is provided with a plurality of domain-shared experts (domain-shared experts), each domain-shared expert is a feature generator, and each expert is responsible for capturing a part of the domain-shared features. Word-embedded vector x for input news text, through ith domain sharing expertGenerating domain sharing featuresThe formula is as follows:
wherein the content of the first and second substances,sharing parameters of experts for the ith domain;sharing experts for ith domainGenerated domain sharing features, each domain sharing feature focusing on an aspect of the domain sharing knowledge.
In this example, the domain sharing feature generator is provided with a domain gate model, different domain labels are input into the domain gate model, and through model training, the domain gate model can automatically learn which domain sharing features should be concerned by the domain, and then outputs the weight of the input domain to all the domain sharing features.
In this embodiment, the domain gate model is composed of multiple layers of perceptrons, and the last layer is normalized by the softmax layer. Input of the Domain door model is a Domain Embedded representation f corresponding to the Domain tagdThe example sets a randomly initialized vector representation for each domain as the domain-embedded representation. After entering the domain-embedded representation, the domain gate outputs a weight w for each domain-sharing feature. The input and output formula of the domain gate is as follows:
w=Gate(fd;θgate)
wherein, thetagateIs a parameter of the domain gate.
The sharing characteristics of different fields are different in different fields, and the sharing characteristic fusion scheme in the field can be dynamically adjusted according to the characteristics of the field by learning a field gate.
In the embodiment, the weighting summation is carried out on the multiple domain sharing characteristics generated by the multiple domain sharing experts and the corresponding domain sharing characteristic weight obtained by the domain gate model, so as to obtain the fused domain sharing characteristic expression. The formula is as follows:
wherein, wiRepresents the weight of the shared features of the ith domain,domain sharing characteristics generated for the ith domain sharing expert, fsharedSharing feature expression for the fused domain.
And S4, inputting the fused domain sharing feature expression into a false news classifier, and outputting a probability value result of the news true and false classification.
In the embodiment, a Multi-Layer Perceptron (MLP) is used as a false news classifier, the Multi-Layer Perceptron is composed of multiple layers of fully-connected neural networks, in order to enable a model to output probability value results of true and false classification, a softmax activation function is used for normalization on the last Layer of the classifier, two floating point numbers with the sum of 1 are output, and the floating point numbers respectively represent a probability value for judging whether news is true and a probability value for judging whether news is false. Is given by the formula
p=softmax(MLP(fshared))
The false news detection task performed by the false news classifier in this example is a Binary classification task, and therefore minimizes the false news detection predictor using Binary Cross-Entropy Loss (BCELoss) as a Loss function for the false news detection taskWith the true value ycBetweenThe difference in (a). The loss function is formulated as:
whereinTrue values are classified for the false news of the ith sample,the false news category prediction value of the ith sample is obtained.
The embodiment also provides a cross-domain false news detection device based on multi-representation learning, which comprises a content acquisition module to be detected, a word embedding vector extraction module, a shared feature extraction module and a true and false classification module.
In this example, the content acquisition module to be detected is used for acquiring a news text to be detected and a domain tag to which the news text belongs; the word embedded vector extraction module is used for inputting the news text into the BERT model and extracting a word embedded vector of the news text; the shared feature extraction module is used for inputting word embedding vectors and field labels of news texts into a field shared feature generator based on multi-representation learning to obtain a fused field shared feature expression; and the true and false classification module is used for inputting the fused domain sharing feature expression into a false news classifier and outputting a probability value result of true and false news classification.
The present embodiment also provides a storage medium having stored thereon a computer program executable by a processor, the computer program when executed implementing the steps of the cross-domain false news detection method based on multi-representation learning in this example.
The embodiment also provides a cross-domain false news detection method based on multi-representation learning, which comprises a memory and a processor, wherein the memory is stored with a computer program capable of being executed by the processor, and the computer program realizes the steps of the cross-domain false news detection method based on multi-representation learning in the embodiment when being executed.
Claims (10)
1. A cross-domain false news detection method based on multi-representation learning is characterized in that:
acquiring a news text to be detected and a domain label to which the news text belongs;
inputting the news text into a BERT model, and extracting word embedding vectors of the news text;
inputting word embedding vectors and field labels of news texts into a field sharing feature generator based on multi-representation learning to obtain a fused field sharing feature expression;
and inputting the fused domain sharing feature expression into a false news classifier, and outputting a probability value result of true and false news classification.
2. The multi-representation learning-based cross-domain false news detection method according to claim 1, characterized in that: the method for inputting the word embedding vector and the domain label of the news text into the domain sharing feature generator based on multi-representation learning to obtain the fused domain sharing feature expression comprises the following steps:
the words of the news text are embedded into vectors and input into a plurality of domain sharing experts to generate a plurality of different domain sharing characteristics, and each domain sharing characteristic focuses on one aspect of domain sharing knowledge;
inputting the domain label into the trained domain gate model to obtain the weight of the shared features of each domain;
and carrying out weighted summation on a plurality of domain sharing characteristics generated based on a plurality of domain sharing experts and corresponding domain sharing characteristic weights obtained by the domain gate model to obtain a fused domain sharing characteristic expression.
3. The multi-representation learning-based cross-domain false news detection method according to claim 1, characterized in that: and using a multilayer perceptron as the false news classifier, wherein the multilayer perceptron is composed of multilayer fully-connected neural networks, the last layer of the classifier is normalized by using a softmax activation function, and two floating point numbers with the sum of 1 are output and respectively represent a probability value for judging whether the news is true and a probability value for judging whether the news is false.
4. The multi-representation learning-based cross-domain false news detection method of claim 3, characterized in that: and using binary cross entropy loss as a loss function of the false news classifier for false news detection tasks, and minimizing the difference between a predicted value and a true value of false news detection.
5. A cross-domain false news detection device based on multi-representation learning is characterized by comprising:
the device comprises a to-be-detected content acquisition module, a to-be-detected content acquisition module and a to-be-detected content acquisition module, wherein the to-be-detected content acquisition module is used for acquiring a news text to be detected and a domain label to which the news text belongs;
the word embedded vector extraction module is used for inputting the news text into the BERT model and extracting a word embedded vector of the news text;
the shared feature extraction module is used for inputting the word embedding vector and the domain label of the news text into the domain shared feature generator based on multi-representation learning to obtain a fused domain shared feature expression;
and the true and false classification module is used for inputting the fused domain sharing feature expression into a false news classifier and outputting a probability value result of true and false news classification.
6. The device of claim 5, wherein the device comprises: the method for inputting the word embedding vector and the domain label of the news text into the domain sharing feature generator based on multi-representation learning to obtain the fused domain sharing feature expression comprises the following steps:
the words of the news text are embedded into vectors and input into a plurality of domain sharing experts to generate a plurality of different domain sharing characteristics, and each domain sharing characteristic focuses on one aspect of domain sharing knowledge;
inputting the domain label into the trained domain gate model to obtain the weight of the shared features of each domain;
and carrying out weighted summation on a plurality of domain sharing characteristics generated based on a plurality of domain sharing experts and corresponding domain sharing characteristic weights obtained by the domain gate model to obtain a fused domain sharing characteristic expression.
7. The device of claim 5, wherein the device comprises: and using a multilayer perceptron as the false news classifier, wherein the multilayer perceptron is composed of multilayer fully-connected neural networks, the last layer of the classifier is normalized by using a softmax activation function, and two floating point numbers with the sum of 1 are output and respectively represent a probability value for judging whether the news is true and a probability value for judging whether the news is false.
8. The device of claim 7, wherein the device comprises: and using binary cross entropy loss as a loss function of the false news classifier for false news detection tasks, and minimizing the difference between a predicted value and a true value of false news detection.
9. A storage medium having stored thereon a computer program executable by a processor, the computer program comprising: the computer program when executed implements the steps of the multi-representation learning based cross-domain false news detection method of any one of claims 1-4.
10. A cross-domain false news detection based on multi-representation learning, having a memory and a processor, the memory having stored thereon a computer program executable by the processor, characterized by: the computer program when executed implements the steps of the multi-representation learning based cross-domain false news detection method of any one of claims 1-4.
Priority Applications (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN202111124543.0A CN113901810A (en) | 2021-09-24 | 2021-09-24 | Cross-domain false news detection method based on multi-representation learning |
Applications Claiming Priority (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN202111124543.0A CN113901810A (en) | 2021-09-24 | 2021-09-24 | Cross-domain false news detection method based on multi-representation learning |
Publications (1)
Publication Number | Publication Date |
---|---|
CN113901810A true CN113901810A (en) | 2022-01-07 |
Family
ID=79029374
Family Applications (1)
Application Number | Title | Priority Date | Filing Date |
---|---|---|---|
CN202111124543.0A Pending CN113901810A (en) | 2021-09-24 | 2021-09-24 | Cross-domain false news detection method based on multi-representation learning |
Country Status (1)
Country | Link |
---|---|
CN (1) | CN113901810A (en) |
Cited By (1)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN114840771A (en) * | 2022-03-04 | 2022-08-02 | 北京中科睿鉴科技有限公司 | False news detection method based on news environment information modeling |
-
2021
- 2021-09-24 CN CN202111124543.0A patent/CN113901810A/en active Pending
Cited By (2)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN114840771A (en) * | 2022-03-04 | 2022-08-02 | 北京中科睿鉴科技有限公司 | False news detection method based on news environment information modeling |
CN114840771B (en) * | 2022-03-04 | 2023-04-28 | 北京中科睿鉴科技有限公司 | False news detection method based on news environment information modeling |
Similar Documents
Publication | Publication Date | Title |
---|---|---|
Zhang et al. | A multiple-layer representation learning model for network-based attack detection | |
Zhang et al. | Patch strategy for deep face recognition | |
CN112861945B (en) | Multi-mode fusion lie detection method | |
Mohan et al. | Spoof net: syntactic patterns for identification of ominous online factors | |
Ra et al. | DeepAnti-PhishNet: Applying deep neural networks for phishing email detection | |
Ji et al. | Few-shot human-object interaction recognition with semantic-guided attentive prototypes network | |
Cui et al. | WEDL-NIDS: Improving network intrusion detection using word embedding-based deep learning method | |
Uçar et al. | A Deep learning approach for detection of malicious URLs | |
Shehu et al. | Lateralized approach for robustness against attacks in emotion categorization from images | |
Gong et al. | Model uncertainty based annotation error fixing for web attack detection | |
CN113901810A (en) | Cross-domain false news detection method based on multi-representation learning | |
Surekha et al. | Digital misinformation and fake news detection using WoT integration with Asian social networks fusion based feature extraction with text and image classification by machine learning architectures | |
Pal et al. | To transfer or not to transfer: Misclassification attacks against transfer learned text classifiers | |
CN117614644A (en) | Malicious website identification method, electronic equipment and storage medium | |
Elnagar et al. | A cognitive framework for detecting phishing websites | |
CN112948578B (en) | DGA domain name open set classification method, device, electronic equipment and medium | |
CN114844682A (en) | DGA domain name detection method and system | |
Sun et al. | Image steganalysis based on convolutional neural network and feature selection | |
Sivanantham et al. | Web Hazard Identification and Detection Using Deep Learning-A Comparative Study | |
CN113312479A (en) | Cross-domain false news detection method | |
Xu et al. | Text adversarial examples generation and defense based on reinforcement learning | |
Nirupama et al. | Development of Novel Classifying System to Identify the Right Sense of Image Sharing in Social Networks Using Deep Convolution Neural Network | |
Desamsetti et al. | Artificial Intelligence Based Fake News Detection Techniques | |
Nivaashini et al. | Deep stacked autoencoder based feature repsentation for phishing URLs detection | |
Liu et al. | Visual sentiment analysis for review images with item-oriented and user-oriented CNN by introducing CBAM |
Legal Events
Date | Code | Title | Description |
---|---|---|---|
PB01 | Publication | ||
PB01 | Publication | ||
SE01 | Entry into force of request for substantive examination | ||
SE01 | Entry into force of request for substantive examination | ||
CB03 | Change of inventor or designer information | ||
CB03 | Change of inventor or designer information |
Inventor after: Cao Juan Inventor after: Wang Yanyan Inventor after: Xie Tian Inventor before: Cao Juan Inventor before: Wang Yanyan Inventor before: Xu Chaoxi Inventor before: Xie Tian Inventor before: Li Jintao |