CN110032733A - A kind of rumour detection method and system for news long text - Google Patents
A kind of rumour detection method and system for news long text Download PDFInfo
- Publication number
- CN110032733A CN110032733A CN201910184862.7A CN201910184862A CN110032733A CN 110032733 A CN110032733 A CN 110032733A CN 201910184862 A CN201910184862 A CN 201910184862A CN 110032733 A CN110032733 A CN 110032733A
- Authority
- CN
- China
- Prior art keywords
- rumour
- paragraph
- collection
- data
- text
- Prior art date
- Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
- Pending
Links
Classifications
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F16/00—Information retrieval; Database structures therefor; File system structures therefor
- G06F16/30—Information retrieval; Database structures therefor; File system structures therefor of unstructured textual data
- G06F16/33—Querying
- G06F16/3331—Query processing
- G06F16/334—Query execution
- G06F16/3344—Query execution using natural language analysis
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F18/00—Pattern recognition
- G06F18/20—Analysing
- G06F18/21—Design or setup of recognition systems or techniques; Extraction of features in feature space; Blind source separation
- G06F18/214—Generating training patterns; Bootstrap methods, e.g. bagging or boosting
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F18/00—Pattern recognition
- G06F18/20—Analysing
- G06F18/24—Classification techniques
- G06F18/241—Classification techniques relating to the classification model, e.g. parametric or non-parametric approaches
- G06F18/2411—Classification techniques relating to the classification model, e.g. parametric or non-parametric approaches based on the proximity to a decision surface, e.g. support vector machines
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F40/00—Handling natural language data
- G06F40/30—Semantic analysis
Landscapes
- Engineering & Computer Science (AREA)
- Theoretical Computer Science (AREA)
- Data Mining & Analysis (AREA)
- General Physics & Mathematics (AREA)
- Artificial Intelligence (AREA)
- Physics & Mathematics (AREA)
- General Engineering & Computer Science (AREA)
- Computer Vision & Pattern Recognition (AREA)
- Life Sciences & Earth Sciences (AREA)
- Bioinformatics & Cheminformatics (AREA)
- Bioinformatics & Computational Biology (AREA)
- Computational Linguistics (AREA)
- Evolutionary Biology (AREA)
- Evolutionary Computation (AREA)
- Databases & Information Systems (AREA)
- Health & Medical Sciences (AREA)
- Audiology, Speech & Language Pathology (AREA)
- General Health & Medical Sciences (AREA)
- Information Retrieval, Db Structures And Fs Structures Therefor (AREA)
Abstract
The present invention relates to a kind of rumour detection methods and system for news long text, it include: to obtain the text for being greater than default number of words in specified news platform as long text, extract the keyword of paragraph in long text, and social data is obtained with the keyword retrieval social platform, the related data of the paragraph is obtained using text relevant algorithm;Obtain labeled data collection, labeled data collection includes the multiple social datas for having marked rumour information, use the multiple disaggregated models of labeled data collection training, and the disaggregated model collection that training is completed is combined into Fusion Model, the confidence score of related data is obtained using Fusion Model, to represent paragraph as the probability of non-rumour.The present invention solves the problems, such as to be difficult to directly differentiate long article using heterologous detection method.
Description
Technical field
The present invention relates to the rumour detection field in big data analysis, in particular to a kind of rumour for news long text
Detection method and system.
Background technique
It is the main source that people obtain news information as Internet news media platform is closely bound up with our life
One of.However, there is a large amount of deceptive information, especially part news media platforms in media platform to increase information source
It introduces that the long article text quality for causing wherein to issue from media number is irregular, easily becomes the issue source of rumour, these letters
It ceases and the normal spin and civil life of society is brought and seriously affected, carrying out rumour detection for media platform also becomes
It is particularly important.Long article data involved in this patent, which refer to, is present in the data that text size in news media's platform is greater than 140.
The mode of traditional manual identified rumour needs to expend a large amount of manpower and material resources, it is difficult to meet requirement of real-time, and long article number
More disperse according to semantic information, the artificial cost that marks further increases.Currently, carrying out rumour detection using machine learning method
Work in, mainly using microblogging, push away short text data in top grade platform as research object, for news such as " flash reports everyday "
Long article data research in media platform is less.Microblogging, the short essay data pushed away in top grade platform can provide more for learning algorithm
More learning characteristics.Such as content characteristic, user characteristics, propagation characteristic, temporal characteristics etc., in conjunction with currently a popular engineering
Algorithm or deep learning algorithm are practised, has had reached higher accuracy for the rumour detection method of short essay data.And due to
Masses cannot participate in the contents production of news media's platform, therefore long article data are rich without social media data in such platform
Rich data characteristics, it is available that common detection algorithm usually only has content of text, and finds according to the observation, long article text
It is usually weaker in the characteristic aspects distinction such as semanteme, emotion, punctuation mark, so that sorting algorithm is difficult to ensure accuracy.Therefore
The present invention proposes a kind of new rumour detection method for news media's platform long article data.
In the rumour detection method based on content, main explicit features and semantic implicit features using syntax.It is aobvious
In terms of formula character, the prior art proposes to use word feature, symbolic feature and the simple affective characteristics of content of text;
Whether the prior art proposes using string length, word number, includes punctuation mark, issuing time etc. feature.It is implicit special
Sign aspect, the prior art are expressed using the hidden layer of Recognition with Recurrent Neural Network study message, improve experiment effect;The prior art uses
Term vector obtains the semantic meaning representation of text using convolutional neural networks as input.Since the platform datas text such as microblogging is shorter,
Information is concentrated, and text style is different, therefore content characteristic can make rumour detection obtain better effect.And in news media's platform
Data text it is longer, semanteme dispersion, clause grammer is plain, is difficult to obtain preferable classifying quality using only content characteristic.
Research at present for the detection of long article rumour is less, and the prior art is for " food health " " medical health " two necks
The long article in domain carries out rumour identification, according to " rumour has the characteristics that abnormal emotion feature ", the method for proposing to use sentiment analysis
Carry out rumour detection.But this method does not have universality, only effective to certain types of rumour.
For the problem present on, the present invention proposes that a kind of rumour for long article data in news media's platform detects
Method.It has been observed that the rumour in long article data is usually only present in some paragraph, this method is by more mature micro-
Rich short essay rumour detection method, is first accounted for long article as unit of paragraph, is extracted to each paragraph corresponding crucial
Word, into microblog, search obtains microblog data, under the premise of guaranteeing that microblog data is relevant to long article paragraph content, uses
Fusion Model calculates the confidence level of microblog data, and then obtains the confidence score of each paragraph in long article.
Summary of the invention
In view of the above-mentioned problems, the present invention proposes a kind of rumour detection method for long article data in news media's platform,
It mainly solves the problems, such as to be assessed to find close data in microblog, while providing the credible of each paragraph in long article
Spend score.
In particular it relates to a kind of rumour detection method for news long text, including:
Step 1 obtains the text for being greater than default number of words in specified news platform as long text, extracts the long text middle section
The keyword fallen, and social data is obtained with the keyword retrieval social platform, the paragraph is obtained using text relevant algorithm
Related data;
Step 2 obtains labeled data collection, which includes the multiple social datas for having marked rumour information, is made
It is combined into Fusion Model with the multiple disaggregated models of labeled data collection training, and by the disaggregated model collection that training is completed, is melted using this
Molding type obtains the confidence score of the related data, to represent the paragraph as the probability of non-rumour.
This is directed to the rumour detection method of news long text, and wherein the step 1 includes: to use TF- for each paragraph
IDF method is extracted to obtain the keyword of paragraph.
This is directed to the rumour detection method of news long text, and wherein the step 1 includes: to calculate the social activity with the keyword
Similarity between data and the paragraph, and gather the social data that the similarity is greater than threshold value, as the related data.
This is directed to the rumour detection method of news long text, and wherein multiple disaggregated model includes: supporting vector in step 2
Machine, random forest, extreme random tree, gradient promote decision tree, the promotion of limit gradient and Logic Regression Models.
This is directed to the rumour detection method of news long text, wherein the multiple disaggregated models of training in the step 2 specifically:
Labeled data collection is divided into training set and test set, and training set is divided into 5 foldings of same size, for the support
Vector machine, the random forest, the extreme random tree, the gradient promote decision tree, the limit gradient lift scheme, choose instruction respectively
Practice 4 foldings concentrated to be trained, residue 1 folds into capable prediction, by the support vector machines, the random forest, the extreme random tree, is somebody's turn to do
Gradient promotes decision tree, the respective prediction result collection of limit gradient lift scheme is combined into the first middle trained collection, and each
While secondary trained, which is predicted, if being b to the prediction result each time of test seti, by the support to
Amount machine, the random forest, the extreme random tree, the gradient promote the respective prediction knot of decision tree, the limit gradient lift scheme
Fruit is averaged to obtain the second middle trained collection, is instructed using the Logic Regression Models in the first middle trained collection and first centre
Practice and train and test on collection, obtains the final Fusion Model.
The invention also discloses a kind of rumour detection system for news long text, including:
Module 1 obtains the text for being greater than default number of words in specified news platform as long text, extracts the long text middle section
The keyword fallen, and social data is obtained with the keyword retrieval social platform, the paragraph is obtained using text relevant algorithm
Related data;
Module 2 obtains labeled data collection, which includes the multiple social datas for having marked rumour information, is made
It is combined into Fusion Model with the multiple disaggregated models of labeled data collection training, and by the disaggregated model collection that training is completed, is melted using this
Molding type obtains the confidence score of the related data, to represent the paragraph as the probability of non-rumour.
This is directed to the rumour detection system of news long text, and wherein the module 1 includes: to use TF- for each paragraph
IDF method is extracted to obtain the keyword of paragraph.
This is directed to the rumour detection system of news long text, and wherein the module 1 includes: to calculate the social activity with the keyword
Similarity between data and the paragraph, and gather the social data that the similarity is greater than threshold value, as the related data.
This is directed to the rumour detection system of news long text, and wherein multiple disaggregated model includes: supporting vector in module 2
Machine, random forest, extreme random tree, gradient promote decision tree, the promotion of limit gradient and Logic Regression Models.
This is directed to the rumour detection system of news long text, wherein the multiple disaggregated models of training in the module 2 specifically:
Labeled data collection is divided into training set and test set, and training set is divided into 5 foldings of same size, for the support
Vector machine, the random forest, the extreme random tree, the gradient promote decision tree, the limit gradient lift scheme, choose instruction respectively
Practice 4 foldings concentrated to be trained, residue 1 folds into capable prediction, by the support vector machines, the random forest, the extreme random tree, is somebody's turn to do
Gradient promotes decision tree, the respective prediction result collection of limit gradient lift scheme is combined into the first middle trained collection, and each
While secondary trained, which is predicted, if being b to the prediction result each time of test seti, by the support to
Amount machine, the random forest, the extreme random tree, the gradient promote the respective prediction knot of decision tree, the limit gradient lift scheme
Fruit is averaged to obtain the second middle trained collection, is instructed using the Logic Regression Models in the first middle trained collection and first centre
Practice and train and test on collection, obtains the final Fusion Model.
The technology of the present invention effect includes: to solve the problems, such as to be difficult to directly differentiate long article using heterologous detection method, is made
The specific paragraph in long article there are rumour can be navigated to sectional detecting method, the method for the present invention is pervasive in news media's platform
The rumour of middle long article data detects.
Detailed description of the invention
Fig. 1 is the method block diagram for obtaining long article paragraph;
Fig. 2 is Model Fusion stacking method schematic diagram;
Fig. 3 holistic approach block diagram of the present invention.
Specific implementation details
Key point of the present invention includes:
1, sectional detecting method.It is in the whole text that rumour/non-rumour is different from traditional short text, long article number in news media's platform
According to rumour be usually only present in certain several paragraph, for this feature, long article is carried out ballad by the present invention as unit of paragraph
Speech detection.Sectional detecting method can provide the confidence score of each paragraph in long article, navigate to the physical segment for rumour occur
It falls, makes result that more there is interpretation.
2, heterologous detection method.Availability data feature is less in long article data, is directly difficult to using machine learning method
It is assessed.The present invention carries out rumour detection using the microblog data that search obtains different source, before guaranteeing similar in content
It puts, obtains the confidence score of corresponding long article paragraph.Heterologous detection method enriches data characteristics, and it is accurate to improve detection
Property.
3, construction feature calculates microblog data confidence score using Fusion Model.The present invention has crawled Sina weibo platform
The rumour data of middle official's certification, and non-rumour data are obtained by manually marking normal microblogging.Further, the present invention extracts
Comment number in microblog data, 22 data characteristicses such as thumb up number, forwarding number, first using support vector machines, random forest,
Gradient promotes decision tree etc., and totally 6 models carry out initial training to data, then using the stacking method in Fusion Model
New training test data set is constructed, is finally trained on neotectonics data set using Logic Regression Models.Merge mould
Type keeps the rumour testing result of microblog data more accurate.
To allow features described above and effect of the invention that can illustrate more clearly understandable, special embodiment below, and cooperate
Bright book attached drawing is described in detail below.
The method of the present invention can provide confidence score to each of long article paragraph text, and pervasive in news simultaneously
The rumour detection of long article data in media platform.Invention is illustrated with specific implementation method with reference to the accompanying drawing.
Step 1, to long article segment processing, obtain relevant microblog data.
Long article data characteristics distinction in news media's platform is weaker, is difficult to directly carry out from long article text merely
Rumour/non-rumour classification, therefore, the present invention propose the data characteristics progress rumour detection using relevant microblog data rich, more
The not strong disadvantage of long article feature differentiation is mended, and then obtains the assessment score of long article, obtains the flow chart of relevant microblog data such as
Shown in Fig. 1.
Found according to statistics: the rumour of long article data exists only in the part paragraph in text in news media's platform, and its
His paragraph remains as normal text, and the present invention first merges the paragraph of number of words in long article less (such as less than 25 words),
It is authenticated as unit of paragraph.For each paragraph, TF-IDF (term frequency-inverse is used
Document frequency) method extracts paragraph keyword, and wherein TF represents the normalization of the word frequency of occurrences in a document
Value, the bigger word of the frequency of occurrences, TF value is bigger, calculation method are as follows:
Word sum in word frequency of occurrence/document in TF=document
IDF represents inverse document frequency of the word in collection of document, and the number of files comprising the word is fewer, and IDF value is bigger.
Calculation method are as follows:
IDF=log (total number of documents/(number of files+1 comprising the word))
TF-IDF finally represents the importance of word in a document, the key that the present invention extracts using the product of TF and IDF
Word is 4 forward words of TF-IDF score.
Microblog provides searching interface, and user inputs keyword and is obtained with corresponding microblog data.Utilize this
Interface, the present invention develop data acquisition program, using the keyword extracted, crawl the homepage number in search return list
According to.Further, in order to guarantee the content relevance of microblog data Yu long article paragraph, the present invention uses word embedding grammar
Word2vec obtains the vector expression of word in microblog data and long article paragraph respectively, according to the TF-IDF weight of word, to word
Vector weighting is averaged to obtain the vector expression of text.And the degree of correlation size using cosine correlation calculations between the two, if
Microblog data and the vector expression of long article paragraph are respectivelyThen text relevant calculation formula are as follows:
Relevance threshold, reservation and the biggish microblog data of long article correlation are set, is obtained in long article using the above method
The corresponding microblog data of each paragraph (related data).
Step 2, certification analysis is carried out to microblog data.
Microblog data corresponding for long article paragraph, the present invention promote decision using support vector machines, random forest, gradient
The fusion of 6 models such as tree to microblog data carries out certification analysis.
Microblogging certification analysis is it is believed that belong to two classification problems, and the rumour data in training data are both from microblog
The rumour data of official's certification, non-rumour data are from artificial mark.For each microblog data, the point in microblogging is extracted
Praise the data characteristicses such as number, comment number, forwarding number totally 22 social characteristics.Stacking method uses the multiple model structures in upper layer first
New training test data set is built out, is then trained again using underlying model.Method is as shown in Figure 2:
The present invention has used support vector machines (SVM), random forest (RF), extreme random tree (ET), gradient to mention at the middle and upper levels
Decision tree (GDBT), limit gradient promotion (xgboost) totally 5 models are risen, data set is divided into training set and test set first,
And training set is divided into 5 foldings of same size, for each model, 4 foldings chosen in training set are trained, another to fold into
Row prediction (guarantees that each model is different with the data set to give a forecast), if the result predicted each time is ai, 5 times are predicted
As a result combination forms matrix A, becomes new training dataset.While training each time, test set data are predicted,
If the prediction result each time to test set is bi, 5 prediction results are averaged to obtain matrix B, become new test data
Collection.Finally, training and testing on new training dataset A and new test data set B using Logic Regression Models, obtain most
Whole evaluation model.
Fusion Model can reduce the deviation that single model occurs in assorting process, achieve in rumour detection more preferable
Effect.For each long article paragraph, the confidence score of relevant microblog data is obtained using Model Fusion method, to represent
The paragraph is the probability of non-rumour.
The following are system embodiment corresponding with above method embodiment, present embodiment can be mutual with above embodiment
Cooperation is implemented.The relevant technical details mentioned in above embodiment are still effective in the present embodiment, in order to reduce repetition,
Which is not described herein again.Correspondingly, the relevant technical details mentioned in present embodiment are also applicable in above embodiment.
The invention also discloses a kind of rumour detection system for news long text, including:
Module 1 obtains the text for being greater than default number of words in specified news platform as long text, extracts the long text middle section
The keyword fallen, and social data is obtained with the keyword retrieval social platform, the paragraph is obtained using text relevant algorithm
Related data;
Module 2 obtains labeled data collection, which includes the multiple social datas for having marked rumour information, is made
It is combined into Fusion Model with the multiple disaggregated models of labeled data collection training, and by the disaggregated model collection that training is completed, is melted using this
Molding type obtains the confidence score of the related data, to represent the paragraph as the probability of non-rumour.
This is directed to the rumour detection system of news long text, and wherein the module 1 includes: to use TF- for each paragraph
IDF method is extracted to obtain the keyword of paragraph.
This is directed to the rumour detection system of news long text, and wherein the module 1 includes: to calculate the social activity with the keyword
Similarity between data and the paragraph, and gather the social data that the similarity is greater than threshold value, as the related data.
This is directed to the rumour detection system of news long text, and wherein multiple disaggregated model includes: supporting vector in module 2
Machine, random forest, extreme random tree, gradient promote decision tree, the promotion of limit gradient and Logic Regression Models.
This is directed to the rumour detection system of news long text, wherein the multiple disaggregated models of training in the module 2 specifically:
Labeled data collection is divided into training set and test set, and training set is divided into 5 foldings of same size, for the support
Vector machine, the random forest, the extreme random tree, the gradient promote decision tree, the limit gradient lift scheme, choose instruction respectively
Practice 4 foldings concentrated to be trained, residue 1 folds into capable prediction, by the support vector machines, the random forest, the extreme random tree, is somebody's turn to do
Gradient promotes decision tree, the respective prediction result collection of limit gradient lift scheme is combined into the first middle trained collection, and each
While secondary trained, which is predicted, if being b to the prediction result each time of test seti, by the support to
Amount machine, the random forest, the extreme random tree, the gradient promote the respective prediction knot of decision tree, the limit gradient lift scheme
Fruit is averaged to obtain the second middle trained collection, is instructed using the Logic Regression Models in the first middle trained collection and first centre
Practice and train and test on collection, obtains the final Fusion Model.
Claims (10)
1. a kind of rumour detection method for news long text characterized by comprising
Step 1 obtains the text for being greater than default number of words in specified news platform as long text, extracts paragraph in the long text
Keyword, and social data is obtained with the keyword retrieval social platform, the phase of the paragraph is obtained using text relevant algorithm
Close data;
Step 2 obtains labeled data collection, which includes the multiple social datas for having marked rumour information, uses this
The multiple disaggregated models of labeled data collection training, and the disaggregated model collection that training is completed is combined into Fusion Model, use the fusion mould
Type obtains the confidence score of the related data, to represent the paragraph as the probability of non-rumour.
2. being directed to the rumour detection method of news long text as described in claim 1, which is characterized in that the step 1 includes: pair
In each paragraph, extract to obtain the keyword of paragraph using TF-IDF method.
3. being directed to the rumour detection method of news long text as described in claim 1, which is characterized in that the step 1 includes: meter
Calculator has the similarity between the social data of the keyword and the paragraph, and gathers the social data that the similarity is greater than threshold value,
As the related data.
4. being directed to the rumour detection method of news long text as described in claim 1, which is characterized in that multiple in step 2
Disaggregated model includes: support vector machines, random forest, extreme random tree, gradient promotion decision tree, limit gradient is promoted and logic
Regression model.
5. being directed to the rumour detection method of news long text as described in claim 1, which is characterized in that training in the step 2
Multiple disaggregated models specifically:
Labeled data collection is divided into training set and test set, and training set is divided into 5 foldings of same size, for the supporting vector
Machine, the random forest, the extreme random tree, the gradient promote decision tree, the limit gradient lift scheme, choose training set respectively
In 4 foldings be trained, residue 1 folds into capable prediction, by the support vector machines, the random forest, the extreme random tree, the gradient
Promote decision tree, the respective prediction result collection of limit gradient lift scheme is combined into the first middle trained collection, and instructs each time
While white silk, which is predicted, if being b to the prediction result each time of test seti, by the supporting vector
Machine, the random forest, the extreme random tree, the gradient promote decision tree, the respective prediction result of limit gradient lift scheme
It is averaged to obtain the second middle trained collection, using the Logic Regression Models in the first middle trained collection and first middle trained
It trains and tests on collection, obtain the final Fusion Model.
6. a kind of rumour detection system for news long text characterized by comprising
Module 1 obtains the text for being greater than default number of words in specified news platform as long text, extracts paragraph in the long text
Keyword, and social data is obtained with the keyword retrieval social platform, the phase of the paragraph is obtained using text relevant algorithm
Close data;
Module 2 obtains labeled data collection, which includes the multiple social datas for having marked rumour information, uses this
The multiple disaggregated models of labeled data collection training, and the disaggregated model collection that training is completed is combined into Fusion Model, use the fusion mould
Type obtains the confidence score of the related data, to represent the paragraph as the probability of non-rumour.
7. being directed to the rumour detection system of news long text as claimed in claim 6, which is characterized in that the module 1 includes: pair
In each paragraph, extract to obtain the keyword of paragraph using TF-IDF method.
8. being directed to the rumour detection system of news long text as claimed in claim 6, which is characterized in that the module 1 includes: meter
Calculator has the similarity between the social data of the keyword and the paragraph, and gathers the social data that the similarity is greater than threshold value,
As the related data.
9. being directed to the rumour detection system of news long text as claimed in claim 6, which is characterized in that multiple in module 2
Disaggregated model includes: support vector machines, random forest, extreme random tree, gradient promotion decision tree, limit gradient is promoted and logic
Regression model.
10. being directed to the rumour detection system of news long text as claimed in claim 6, which is characterized in that training in the module 2
Multiple disaggregated models specifically:
Labeled data collection is divided into training set and test set, and training set is divided into 5 foldings of same size, for the supporting vector
Machine, the random forest, the extreme random tree, the gradient promote decision tree, the limit gradient lift scheme, choose training set respectively
In 4 foldings be trained, residue 1 folds into capable prediction, by the support vector machines, the random forest, the extreme random tree, the gradient
Promote decision tree, the respective prediction result collection of limit gradient lift scheme is combined into the first middle trained collection, and instructs each time
While white silk, which is predicted, if being b to the prediction result each time of test seti, by the supporting vector
Machine, the random forest, the extreme random tree, the gradient promote decision tree, the respective prediction result of limit gradient lift scheme
It is averaged to obtain the second middle trained collection, using the Logic Regression Models in the first middle trained collection and first middle trained
It trains and tests on collection, obtain the final Fusion Model.
Priority Applications (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN201910184862.7A CN110032733A (en) | 2019-03-12 | 2019-03-12 | A kind of rumour detection method and system for news long text |
Applications Claiming Priority (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN201910184862.7A CN110032733A (en) | 2019-03-12 | 2019-03-12 | A kind of rumour detection method and system for news long text |
Publications (1)
Publication Number | Publication Date |
---|---|
CN110032733A true CN110032733A (en) | 2019-07-19 |
Family
ID=67235919
Family Applications (1)
Application Number | Title | Priority Date | Filing Date |
---|---|---|---|
CN201910184862.7A Pending CN110032733A (en) | 2019-03-12 | 2019-03-12 | A kind of rumour detection method and system for news long text |
Country Status (1)
Country | Link |
---|---|
CN (1) | CN110032733A (en) |
Cited By (10)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN110188284A (en) * | 2019-04-25 | 2019-08-30 | 中国科学院计算技术研究所 | A kind of rumour detection method and system based on retrieval auxiliary |
CN110532563A (en) * | 2019-09-02 | 2019-12-03 | 苏州美能华智能科技有限公司 | The detection method and device of crucial paragraph in text |
CN111143551A (en) * | 2019-12-04 | 2020-05-12 | 支付宝(杭州)信息技术有限公司 | Text preprocessing method, classification method, device and equipment |
CN111475648A (en) * | 2020-03-30 | 2020-07-31 | 东软集团股份有限公司 | Text classification model generation method, text classification method, device and equipment |
CN111489065A (en) * | 2020-03-27 | 2020-08-04 | 北京理工大学 | Node risk assessment integrating ICT supply chain network topology and product business information |
CN111506710A (en) * | 2020-07-01 | 2020-08-07 | 平安国际智慧城市科技股份有限公司 | Information sending method and device based on rumor prediction model and computer equipment |
CN111611981A (en) * | 2020-06-28 | 2020-09-01 | 腾讯科技(深圳)有限公司 | Information identification method and device and information identification neural network training method and device |
CN111694955A (en) * | 2020-05-08 | 2020-09-22 | 中国科学院计算技术研究所 | Early dispute message detection method and system for social platform |
CN111831790A (en) * | 2020-06-23 | 2020-10-27 | 广东工业大学 | False news identification method based on low threshold integration and text content matching |
CN114897270A (en) * | 2022-06-15 | 2022-08-12 | 青岛文达通科技股份有限公司 | Semantic information fused public opinion propagation quantity prediction method and system |
Citations (3)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN108038240A (en) * | 2017-12-26 | 2018-05-15 | 武汉大学 | Based on content, the social networks rumour detection method of user's multiplicity |
CN108228853A (en) * | 2018-01-11 | 2018-06-29 | 北京信息科技大学 | A kind of microblogging rumour recognition methods and system |
CN108614855A (en) * | 2018-03-19 | 2018-10-02 | 众安信息技术服务有限公司 | A kind of rumour recognition methods |
-
2019
- 2019-03-12 CN CN201910184862.7A patent/CN110032733A/en active Pending
Patent Citations (3)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN108038240A (en) * | 2017-12-26 | 2018-05-15 | 武汉大学 | Based on content, the social networks rumour detection method of user's multiplicity |
CN108228853A (en) * | 2018-01-11 | 2018-06-29 | 北京信息科技大学 | A kind of microblogging rumour recognition methods and system |
CN108614855A (en) * | 2018-03-19 | 2018-10-02 | 众安信息技术服务有限公司 | A kind of rumour recognition methods |
Non-Patent Citations (1)
Title |
---|
JUAN CAO 等: ""Automatic Rumor Detection on Microblogs: A Survey"", 《ARXIV:1807.03505V1》 * |
Cited By (16)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN110188284B (en) * | 2019-04-25 | 2022-01-28 | 中国科学院计算技术研究所 | Rumor detection method and system based on retrieval assistance |
CN110188284A (en) * | 2019-04-25 | 2019-08-30 | 中国科学院计算技术研究所 | A kind of rumour detection method and system based on retrieval auxiliary |
CN110532563A (en) * | 2019-09-02 | 2019-12-03 | 苏州美能华智能科技有限公司 | The detection method and device of crucial paragraph in text |
CN110532563B (en) * | 2019-09-02 | 2023-06-20 | 苏州美能华智能科技有限公司 | Method and device for detecting key paragraphs in text |
CN111143551A (en) * | 2019-12-04 | 2020-05-12 | 支付宝(杭州)信息技术有限公司 | Text preprocessing method, classification method, device and equipment |
CN111489065A (en) * | 2020-03-27 | 2020-08-04 | 北京理工大学 | Node risk assessment integrating ICT supply chain network topology and product business information |
CN111475648A (en) * | 2020-03-30 | 2020-07-31 | 东软集团股份有限公司 | Text classification model generation method, text classification method, device and equipment |
CN111475648B (en) * | 2020-03-30 | 2023-11-14 | 东软集团股份有限公司 | Text classification model generation method, text classification device and equipment |
CN111694955A (en) * | 2020-05-08 | 2020-09-22 | 中国科学院计算技术研究所 | Early dispute message detection method and system for social platform |
CN111694955B (en) * | 2020-05-08 | 2023-09-12 | 中国科学院计算技术研究所 | Early dispute message detection method and system for social platform |
CN111831790A (en) * | 2020-06-23 | 2020-10-27 | 广东工业大学 | False news identification method based on low threshold integration and text content matching |
CN111831790B (en) * | 2020-06-23 | 2023-07-14 | 广东工业大学 | False news identification method based on low threshold integration and text content matching |
CN111611981A (en) * | 2020-06-28 | 2020-09-01 | 腾讯科技(深圳)有限公司 | Information identification method and device and information identification neural network training method and device |
CN111506710B (en) * | 2020-07-01 | 2020-11-06 | 平安国际智慧城市科技股份有限公司 | Information sending method and device based on rumor prediction model and computer equipment |
CN111506710A (en) * | 2020-07-01 | 2020-08-07 | 平安国际智慧城市科技股份有限公司 | Information sending method and device based on rumor prediction model and computer equipment |
CN114897270A (en) * | 2022-06-15 | 2022-08-12 | 青岛文达通科技股份有限公司 | Semantic information fused public opinion propagation quantity prediction method and system |
Similar Documents
Publication | Publication Date | Title |
---|---|---|
CN110032733A (en) | A kind of rumour detection method and system for news long text | |
CN109829166B (en) | People and host customer opinion mining method based on character-level convolutional neural network | |
KR101284788B1 (en) | Apparatus for question answering based on answer trustworthiness and method thereof | |
CN103544255B (en) | Text semantic relativity based network public opinion information analysis method | |
US7761447B2 (en) | Systems and methods that rank search results | |
CN103853738B (en) | A kind of recognition methods of info web correlation region | |
CN103823859B (en) | Name recognition algorithm based on combination of decision-making tree rules and multiple statistic models | |
CN106528599A (en) | A rapid fuzzy matching algorithm for strings in mass audio data | |
CN101599071A (en) | The extraction method of conversation text topic | |
CN107239439A (en) | Public sentiment sentiment classification method based on word2vec | |
CN110888991B (en) | Sectional type semantic annotation method under weak annotation environment | |
Cai et al. | Large-scale question classification in cqa by leveraging wikipedia semantic knowledge | |
CN102637192A (en) | Method for answering with natural language | |
CN106933800A (en) | A kind of event sentence abstracting method of financial field | |
CN102750316A (en) | Concept relation label drawing method based on semantic co-occurrence model | |
CN113312474A (en) | Similar case intelligent retrieval system of legal documents based on deep learning | |
CN108520038B (en) | Biomedical literature retrieval method based on sequencing learning algorithm | |
CN111221968B (en) | Author disambiguation method and device based on subject tree clustering | |
CN1687924A (en) | Method for producing internet personage information search engine | |
CN110134799B (en) | BM25 algorithm-based text corpus construction and optimization method | |
CN106126502A (en) | A kind of emotional semantic classification system and method based on support vector machine | |
CN104794209B (en) | Chinese microblogging mood sorting technique based on Markov logical network and system | |
CN112084312B (en) | Intelligent customer service system constructed based on knowledge graph | |
CN106960003A (en) | Plagiarize the query generation method of the retrieval of the source based on machine learning in detection | |
CN115329085A (en) | Social robot classification method and system |
Legal Events
Date | Code | Title | Description |
---|---|---|---|
PB01 | Publication | ||
PB01 | Publication | ||
SE01 | Entry into force of request for substantive examination | ||
SE01 | Entry into force of request for substantive examination | ||
RJ01 | Rejection of invention patent application after publication | ||
RJ01 | Rejection of invention patent application after publication |
Application publication date: 20190719 |