CN113515624B - Text classification method for emergency news - Google Patents

Text classification method for emergency news Download PDF

Info

Publication number
CN113515624B
CN113515624B CN202110467773.0A CN202110467773A CN113515624B CN 113515624 B CN113515624 B CN 113515624B CN 202110467773 A CN202110467773 A CN 202110467773A CN 113515624 B CN113515624 B CN 113515624B
Authority
CN
China
Prior art keywords
event
news
cluster
feature
events
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Active
Application number
CN202110467773.0A
Other languages
Chinese (zh)
Other versions
CN113515624A (en
Inventor
孙锐
谢红
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Leshan Normal University
Original Assignee
Leshan Normal University
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Leshan Normal University filed Critical Leshan Normal University
Priority to CN202110467773.0A priority Critical patent/CN113515624B/en
Publication of CN113515624A publication Critical patent/CN113515624A/en
Application granted granted Critical
Publication of CN113515624B publication Critical patent/CN113515624B/en
Active legal-status Critical Current
Anticipated expiration legal-status Critical

Links

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F16/00Information retrieval; Database structures therefor; File system structures therefor
    • G06F16/30Information retrieval; Database structures therefor; File system structures therefor of unstructured textual data
    • G06F16/35Clustering; Classification
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F16/00Information retrieval; Database structures therefor; File system structures therefor
    • G06F16/30Information retrieval; Database structures therefor; File system structures therefor of unstructured textual data
    • G06F16/36Creation of semantic tools, e.g. ontology or thesauri
    • G06F16/374Thesaurus
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F18/00Pattern recognition
    • G06F18/20Analysing
    • G06F18/22Matching criteria, e.g. proximity measures
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F18/00Pattern recognition
    • G06F18/20Analysing
    • G06F18/24Classification techniques
    • G06F18/241Classification techniques relating to the classification model, e.g. parametric or non-parametric approaches
    • G06F18/2411Classification techniques relating to the classification model, e.g. parametric or non-parametric approaches based on the proximity to a decision surface, e.g. support vector machines
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F40/00Handling natural language data
    • G06F40/30Semantic analysis

Landscapes

  • Engineering & Computer Science (AREA)
  • Theoretical Computer Science (AREA)
  • Data Mining & Analysis (AREA)
  • Physics & Mathematics (AREA)
  • General Physics & Mathematics (AREA)
  • General Engineering & Computer Science (AREA)
  • Artificial Intelligence (AREA)
  • Bioinformatics & Cheminformatics (AREA)
  • Evolutionary Computation (AREA)
  • Evolutionary Biology (AREA)
  • Computer Vision & Pattern Recognition (AREA)
  • Bioinformatics & Computational Biology (AREA)
  • Life Sciences & Earth Sciences (AREA)
  • Computational Linguistics (AREA)
  • Databases & Information Systems (AREA)
  • Health & Medical Sciences (AREA)
  • Audiology, Speech & Language Pathology (AREA)
  • General Health & Medical Sciences (AREA)
  • Information Retrieval, Db Structures And Fs Structures Therefor (AREA)

Abstract

The invention provides a text classification method for emergency news, which belongs to the field of natural language processing and comprises the following steps: collecting news documents, finishing data cleaning, and preprocessing operations such as word segmentation, dependency analysis, reference resolution and the like of the documents to obtain a news data set D; adding the news data set D into a background corpus, and learning the distributed representation of words after training by using Word2 Vec; carrying out event extraction on each news D in the news data set D and constructing an event dictionary; clustering all events in an event dictionary by adopting a Chinese whistle method without parameter clustering to obtain an event cluster; calculating the occurrence frequency and the inverted document frequency of each event cluster obtained after clustering to extract characteristic events; constructing a feature vector for each news document according to the feature event; and (5) adopting a classification algorithm of a support vector machine to complete the classification of the news document. The method has strong semantic characterization capability and class distinction.

Description

Text classification method for emergency news
Technical Field
The invention belongs to the field of natural language processing, and particularly relates to a text classification method for emergency news.
Background
Sudden events are natural disasters, accident disasters, public health events and social security events which are suddenly happened, cause or possibly cause serious social hazards and need emergency treatment measures to be taken for treatment. After an event, related news reports are rapidly spread over the network, and most of the news reports are focused on government departments and people. The news is classified according to the topics by using a text classification technology, so that people can know and analyze the reasons, processes and subsequent influences of the occurrence of the event, and convenience is provided for related departments to control, alleviate and eliminate serious social hazards caused by the emergency and make auxiliary decisions.
Many sub-events are often accompanied or derived during the occurrence or evolution of an incident. For example, the occurrence of the event "typhoon wei Ma Xun attack" generally also occurs the events such as "weather desk issue early warning", "personnel injury", "communication interruption" and "personnel transfer", while the occurrence of the event "yunnan earthquake" generally occurs the events such as "yunnan earthquake", "personnel death", "house collapse" and "civil administration report". By analyzing some events with significant features, news is easily categorized by different incident topics.
In the field of natural language processing, an event generally refers to the occurrence or change in state of an action, consisting of a trigger word and one or more arguments. The event itself contains the semantic relation among words, has stronger semantic characterization capability than the traditional bag-of-words model, and thus has better category distinction. Therefore, text classification using events as features should be simpler and more efficient for sudden event news.
With the deep application of IT technology, after an emergency, a large number of related news reports appear on the network, and most news texts are focused on government departments and people. The news is classified according to the topics by using a text classification technology, so that people can know and analyze the reasons, processes and subsequent influences of the occurrence of the event, and convenience is provided for related departments to control, alleviate and eliminate serious social hazards caused by the emergency and make auxiliary decisions. The prior art mainly adopts a classification model of basic word bags, namely adopts vocabulary characteristics to represent documents. The technology ignores semantic relations among words, and has weak semantic characterization capability.
Therefore, the application provides a text classification method for emergency news.
Disclosure of Invention
In order to overcome the defects in the prior art, the invention provides a text classification method aiming at emergency news.
In order to achieve the above object, the present invention provides the following technical solutions:
a text classification method for emergency news comprises the following steps:
the method comprises the following steps:
collecting news documents from the internet, finishing data cleaning, and carrying out preprocessing operations of word segmentation, dependency analysis and reference resolution on each document in the news documents by using a natural language processing tool to obtain a news data set D;
adding the preprocessed news data set D into a background corpus, and learning the distributed representation of words after training by using Word2 Vec;
extracting an event from each news D in the news data set D, and constructing an event dictionary;
clustering all events in an event dictionary by adopting a Chinese whistle method without parameter clustering to obtain an event cluster;
calculating the occurrence frequency and the inverted document frequency of each event cluster obtained after clustering to extract characteristic events;
constructing a feature vector for each news document according to the feature event;
and (5) adopting a classification algorithm of a support vector machine to complete the classification of the news document.
Preferably, the data cleaning of the news document is accomplished using existing natural language processing kits.
Preferably, the specific steps of extracting the event for each news D in the news data set D and constructing an event dictionary include:
scanning dependency analysis relations with the types of 'nsubj' and 'dobj' in each news d dependency analysis result to obtain a binary dependency relation set ea, wherein the binary relation is used for representing event argument relation;
scanning the binary dependency relationship set ea in turn, and merging into a candidate event if predicates of two event argument relationships are the same;
each of the remaining unmixed binary dependencies in the binary argument relation set ea is also represented as a candidate event, respectively;
obtaining an event set de of each news from all candidate events, namely, each document consists of a plurality of events;
repeating the four steps, and obtaining all event sets DE of the news data set D after event extraction in all documents in the news data set D is completed;
scanning event set DE, constructing event dictionary
ED={event 1 ,event 2 ,…,event m },event i The i-th event is represented, m represents the dictionary size, namely the number of event categories, and all events with the same argument are in the same category.
Preferably, the specific step of clustering all the events in the event dictionary by using the Chinese whistle method without the parameter cluster to obtain the event cluster includes:
the distributed representation of each event is calculated by adopting a mode of combining semantics:wherein subj, pred and obj represent subject, predicate and object of event, respectively, +.>Representing a kronecker product operation, representing a dot product operation;
using cosine similarity to calculate similarity sim (event) between each pair of events i ,event j );
Clustering all events of an event dictionary ED by adopting a Chinese whistle algorithm to obtain different event clusters;
after the clustering is completed, an event cluster EC= { EC is obtained 1 ,ec 2 ,…,ec x Each cluster ec i Events with high similarity of semantics are contained, and i is the cluster number of the cluster.
Preferably, the step of clustering all the events of the event dictionary ED by using the chinese whistle algorithm to obtain different event clusters includes:
constructing an event graph g= (Vertex, edge), wherein Vertex represents a Vertex set of the graph, edge represents an Edge set of the graph, each event is initially a node and is independent in a cluster, namely vertex=ed= { event1, event2, …, event m Edge= { }, i.e. no edges exist in the graph;
scanning each event node event in turn i Finding out event node event with highest similarity and unconnected for each event node j They are gathered in a cluster, if there are multiple nodes with highest similarity, then they are randomSelecting one;
repeating the scanning steps until convergence conditions are met, wherein the convergence conditions are set according to event similarity threshold values.
Preferably, the specific steps of calculating the occurrence frequency and the inverted document frequency of each event cluster obtained after clustering to extract the characteristic event include:
scanning all event sets DE of a news data set D and counting each event cluster ec i Ecf of the frequency of occurrence of (a);
scanning event set de of each news and calculating each event cluster ec i Is a reverse document frequency idf;
calculate each event cluster ec i Is used to represent each event cluster ec i Is characterized by the significance of (3);
ordering according to the feature significance of the event clusters from big to small, extracting the first K maximum feature values, and constructing a feature event dictionary FED= { FED 1 ,fed 2 ,…,fed k },fed i I=1, 2, …, K for the i-th feature significant event cluster.
Preferably, the specific step of constructing the feature vector for each news document according to the feature event includes:
scanning each event cluster FED in the feature event dictionary FED in turn i Counting the occurrence frequency edf of the event cluster in each news d i
Scanning each event cluster FED in the feature event dictionary FED in turn i Calculating the feature value fd of the document in each feature dimension i =ecf i *idf i *edf i I.e. event cluster salient features ecf i *idf i Document feature edf of event cluster i Is a product of (2);
after the feature event dictionary is scanned, a document feature vector fd= [ fd ] can be obtained 1 ,fd 2 ,…,fd k ]。
The text classification method for the emergency news has the following beneficial effects:
1) The invention adopts atomic events as basic characteristics, and has stronger semantic representation capability and category distinction degree than traditional words;
2) According to the invention, the combination semantics of word vectors are introduced to represent atomic events and an event cluster is generated by adopting a non-parametric clustering algorithm, so that the sparse problem caused by similar event semantics but different expression forms is avoided;
3) The invention improves on the traditional TF.IDF algorithm, and introduces the corpus appearance frequency of the event, the document inverted frequency and the document appearance frequency of the event so as to generate a document vector with more discrimination.
Drawings
In order to more clearly illustrate the embodiments of the present invention and the design thereof, the drawings required for the embodiments will be briefly described below. The drawings in the following description are only some of the embodiments of the present invention and other drawings may be made by those skilled in the art without the exercise of inventive faculty.
Fig. 1 is a flowchart of a text classification method for emergency news according to embodiment 1 of the present invention.
Detailed Description
The present invention will be described in detail below with reference to the drawings and the embodiments, so that those skilled in the art can better understand the technical scheme of the present invention and can implement the same. The following examples are only for more clearly illustrating the technical aspects of the present invention, and are not intended to limit the scope of the present invention.
Example 1
The invention provides a text classification method aiming at emergency news, which collects thematic documents (comprising 92 thematic items of typhoon No. 9, ma Xun attack of the present year, 102 thematic items of Taiwan passenger plane forced landing heavy fire, 54 public bus longitudinal fire cases of Hangzhou, 117 Yunnan earthquake and the like) on a new wave network, and uses the thematic documents as training and testing corpus for verifying the effectiveness of the method provided by the invention. The embodiment on the data set shows that the method is simple and accurate in classification, has stronger classification distinction for emergency news by taking an atomic event as a basic feature, and specifically comprises the following steps as shown in fig. 1:
s1, collecting news document data from a new wave network, cleaning the data, and then carrying out preprocessing operations such as word segmentation, dependency analysis, reference resolution and the like on each document in the news corpus by using a natural language processing tool; the news document collection is denoted as news data set d= { D 1 ,d 2 ,…,d n }, where d i Representing an ith news document, n representing the total number of news in the document set; the example selects Stanford CoreNLP, a natural language processing kit disclosed by Stanford university;
the specific steps of S1 include: and (3) finishing data cleaning of the news document on the crawled thematic document, such as full-angle conversion and half-angle conversion, removing non-Chinese symbols such as redundant URL (uniform resource locator), and preprocessing each document by using the existing natural language processing tool package Stanford CoreNLP, such as word segmentation, dependency analysis, reference resolution and the like, so as to obtain a document D.
S2, adding the preprocessed document D into a background corpus, such as a people daily corpus, and learning a distributed representation of words after training by using a word embedding algorithm; common Word embedding algorithms include Word2Vec, glove, etc., and Word2Vec is selected as the Word embedding algorithm in this example.
S3, extracting an event from each news D in the news data set D, wherein the corresponding event is represented by a triplet atomic event of a main guest structure, and an event dictionary is constructed, and the method comprises the following specific steps of:
s31, scanning dependency analysis relations with the types of 'nsubj' and 'dobj' in each news d dependency analysis result to obtain a binary dependency relation set ea, wherein the binary relations can be used for representing event argument relations;
s32, scanning the binary dependency relationship set ea in turn, and merging the binary dependency relationship set ea into a candidate event if predicates of two event argument relationships are the same; for example, given the statement "gas picture stand 6-day release typhoon early warning", an event "gas picture stand, release, early warning" can be obtained from two dependency relationships "nsubj (release, gas picture stand)" and "dobj (release, early warning)";
s33, respectively representing each of the remaining unmixed binary dependencies in the event argument relation set ea as a candidate event;
s34, obtaining an event set de of each news from all candidate events, namely each document is composed of a plurality of events;
s35, repeating the four steps S31, S32, S33 and S34, and obtaining all event sets DE of the document set D after event extraction in all documents in the news document set D is completed;
s36, scanning an event set DE, and constructing an event dictionary ED= { event1, event2, … and event m },event i The i-th event is represented, m represents the dictionary size, namely the number of event categories, and all events with the same argument are in the same category.
S4, clustering all events in an event dictionary ED by adopting a non-parametric clustering method, wherein the common non-parametric clustering method comprises a Chinese whistle method, a DBSCAN, hierarchical clustering and the like, and the Chinese whistle method is selected as the example, and the implementation steps are as follows:
s41, the distributed representation of each event is related to each argument of the event, so that the distributed representation of each event is calculated by adopting a mode of combining semantics. Common combination modes include concatenation, addition, multiplication and the like, and the example adopts a multiplication operation mode. Specifically, the distributed representation calculation method of the event is as follows:wherein subj, pred and obj represent subject, predicate and object of event, respectively, +.>Representing a kronecker product operation, representing a dot product operation;
s42, calculating similarity sim (event) between each pair of events by adopting cosine similarity i ,event j );
S43, clustering all events of an event dictionary ED by adopting a Chinese whistle algorithm to obtain different event clusters, wherein the method comprises the following specific steps of:
s431, constructing an event graph G= (Vertex, edge), wherein each event is a junction initiallyDots are clustered singly, i.e. vertex=ed= { event1, event2, …, event m Edge= { }, i.e. no edges exist in the graph;
s432, scanning each event node event in turn i Finding out event node event with highest similarity and unconnected for each event node j Aggregating them in a cluster (i.e. adding an edge), and if there are multiple nodes with highest similarity, randomly selecting one node;
s433, repeating S432 until the convergence condition is satisfied, wherein the convergence condition is set according to the event similarity threshold (the threshold selected in this example is sim (event) i ,event j )>0.6)。
S44, after the clustering is completed, obtaining an event cluster EC= { EC 1 ,ec 2 ,…,ec x Each cluster ec i Events with high similarity of semantics are contained, and i is the cluster number of the cluster. For example, events of "person, injury, nil", "person, severe injury, nil", "nil, injury, person", etc. are clustered together.
S5, clustering each event cluster ec i The occurrence frequency and the inverted document frequency are calculated to extract characteristic events, and the implementation steps are as follows:
s51, scanning all event sets DE of a document set D, and counting each event cluster ec i Ecf of the frequency of occurrence of (a);
s52, scanning event sets de of each news, and calculating each event cluster ec i Is a reverse document frequency idf;
s53, calculating each event cluster ec i Is used to represent each event cluster ec i Is characterized by the significance of (3);
s54, sorting according to the feature significance of the event clusters from large to small, extracting the first K (the number of the feature number K can be set according to different embodiments, and the K value is set to 20 in the example) maximum feature values, and constructing a feature event dictionary FED= { FED 1 ,fed 2 ,…,fed k },fed i I=1, 2, …, K for the i-th feature significant event cluster. In particular embodiments, the content appears and frequently occurs in a plurality of news documentsHigher event clusters are extracted as characteristic events such as "people, injury, nil", "Yunnan, occurrence, earthquake" and "aircraft, forced landing, nil", etc.
S6, constructing a feature vector fd of each news document d, wherein the specific steps are as follows:
s61, scanning each event cluster FED in the feature event dictionary FED in turn i Counting the occurrence frequency edf of the event cluster in each news d i
S62, scanning each event cluster FED in the feature event dictionary FED in turn i Calculating the feature value fd of the document in each feature dimension i =ecf i *idf i *edf i I.e. event cluster salient features ecf i *idf i Document feature edf of event cluster i Is a product of (2);
s63, after the characteristic event dictionary scanning is completed, obtaining a document characteristic vector fd= [ fd ] 1 ,fd 2 ,…,fd k ]。
And S7, classifying the news documents by adopting a Support Vector Machine (SVM) classification algorithm. Ten-fold cross-validation is performed on the news data set of the embodiment, and the usual method featuring word is Accury value of 0.83.
Aiming at the text classification of emergency news, the invention adopts atomic events as basic characteristics, extracts remarkable characteristic events by clustering and statistical analysis of the atomic events, and characterizes news document vectors by the characteristic events; the combined semantics of the word vectors are introduced to represent the atomic events, and the event clusters are generated by adopting a non-parametric clustering algorithm, so that the sparse problem caused by similar event semantics but different expression forms is avoided; the method is improved on the traditional TF.IDF algorithm, and feature event dictionary is constructed by introducing corpus occurrence frequency of events, document inverted frequency and document occurrence frequency of the events so as to generate document vectors with more discrimination. The atomic event contains semantic information among words, has stronger semantic representation capability than the traditional words, and solves the problem of low accuracy caused by poor classification and distinction of the traditional classification method based on vocabulary features.
The above embodiments are merely preferred embodiments of the present invention, the protection scope of the present invention is not limited thereto, and any simple changes or equivalent substitutions of technical solutions that can be obviously obtained by those skilled in the art within the technical scope of the present invention disclosed in the present invention belong to the protection scope of the present invention.

Claims (4)

1. The text classification method for the emergency news is characterized by comprising the following steps of:
collecting news documents from the internet, finishing data cleaning, and carrying out preprocessing operations of word segmentation, dependency analysis and reference resolution on each document in the news documents by using a natural language processing tool to obtain a news data set D;
adding the preprocessed news data set D into a background corpus, and learning the distributed representation of words after training by using Word2 Vec;
extracting an event from each news D in the news data set D, and constructing an event dictionary;
clustering all events in an event dictionary by adopting a Chinese whistle method without parameter clustering to obtain an event cluster;
calculating the occurrence frequency and the inverted document frequency of each event cluster obtained after clustering to extract characteristic events;
constructing a feature vector for each news document according to the feature event;
the classification of the news documents is completed by adopting a classification algorithm of a support vector machine;
the specific steps of extracting the event from each news D in the news data set D and constructing an event dictionary include:
scanning dependency analysis relations with the types of 'nsubj' and 'dobj' in each news d dependency analysis result to obtain a binary dependency relation set ea, wherein the binary relation is used for representing event argument relation;
scanning the binary dependency relationship set ea in turn, and merging into a candidate event if predicates of two event argument relationships are the same;
each of the remaining unmixed binary dependencies in the binary argument relation set ea is also represented as a candidate event, respectively;
obtaining an event set de of each news from all candidate events, namely, each document consists of a plurality of events;
repeating the four steps, and obtaining all event sets DE of the news data set D after event extraction in all documents in the news data set D is completed;
scanning event set DE, constructing event dictionary
ED={event 1 ,event 2 ,…,event m },event i The i-th event is represented, m represents the size of a dictionary, namely the number of event categories, and all events with the same argument are in the same category;
the specific step of clustering all the events in the event dictionary by adopting the Chinese whistle method without the parameter cluster to obtain the event cluster comprises the following steps:
the distributed representation of each event is calculated by adopting a mode of combining semantics:wherein subj, pred and obj represent subject, predicate and object of event, respectively, +.>Representing a kronecker product operation, representing a dot product operation;
using cosine similarity to calculate similarity sim (event) between each pair of events i ,event j );
Clustering all events of an event dictionary ED by adopting a Chinese whistle algorithm to obtain different event clusters;
after the clustering is completed, an event cluster EC= { EC is obtained 1 ,ec 2 ,…,ec x Each cluster ec i Events with high similarity of semantics are contained, i is the cluster number of the cluster;
the specific steps of clustering all the events of the event dictionary ED by adopting the Chinese whistle algorithm to obtain different event clusters include:
constructing an event graph g= (Vertex, edge), wherein Vertex represents a Vertex set of the graph, edge represents an Edge set of the graph, each event is initially a node and is independent in a cluster, namely vertex=ed= { event1, event2, …, event m Edge= { }, i.e. no edges exist in the graph;
scanning each event node event in turn i Finding out event node event with highest similarity and unconnected for each event node j Gathering them in a cluster, if there are multiple nodes with highest similarity, randomly selecting one node;
repeating the scanning steps until convergence conditions are met, wherein the convergence conditions are set according to event similarity threshold values.
2. The method for text categorization of emergency news according to claim 1, wherein the data cleansing of news documents is accomplished using existing natural language processing kits.
3. The text classification method for emergency news according to claim 1, wherein the specific step of calculating the occurrence frequency and the inverted document frequency of each event cluster obtained after clustering to extract the feature event comprises:
scanning all event sets DE of a news data set D and counting each event cluster ec i Ecf of the frequency of occurrence of (a);
scanning event set de of each news and calculating each event cluster ec i Is a reverse document frequency idf;
calculate each event cluster ec i Is used to represent each event cluster ec i Is characterized by the significance of (3);
ordering according to the feature significance of the event clusters from big to small, extracting the first K maximum feature values, and constructing a feature event dictionary FED= { FED 1 ,fed 2 ,…,fed k },fed i I=1, 2, …, K for the i-th feature significant event cluster.
4. A method for classifying text for emergency news according to claim 3, wherein said specific step of constructing a feature vector for each news document according to a feature event comprises:
scanning each event cluster FED in the feature event dictionary FED in turn i Counting the occurrence frequency edf of the event cluster in each news d i
Scanning each event cluster FED in the feature event dictionary FED in turn i Calculating the feature value fd of the document in each feature dimension i =ecf i *idf i *edf i I.e. event cluster salient features ecf i *idf i Document feature edf of event cluster i Is a product of (2);
after the feature event dictionary is scanned, a document feature vector fd= [ fd ] can be obtained 1 ,fd 2 ,…,fd k ]。
CN202110467773.0A 2021-04-28 2021-04-28 Text classification method for emergency news Active CN113515624B (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN202110467773.0A CN113515624B (en) 2021-04-28 2021-04-28 Text classification method for emergency news

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN202110467773.0A CN113515624B (en) 2021-04-28 2021-04-28 Text classification method for emergency news

Publications (2)

Publication Number Publication Date
CN113515624A CN113515624A (en) 2021-10-19
CN113515624B true CN113515624B (en) 2023-07-21

Family

ID=78063717

Family Applications (1)

Application Number Title Priority Date Filing Date
CN202110467773.0A Active CN113515624B (en) 2021-04-28 2021-04-28 Text classification method for emergency news

Country Status (1)

Country Link
CN (1) CN113515624B (en)

Families Citing this family (1)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN114722194B (en) * 2022-03-15 2023-05-09 电子科技大学 Automatic construction method for emergency time sequence based on abstract generation algorithm

Citations (4)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN108932311A (en) * 2018-06-20 2018-12-04 天津大学 The method of incident detection and prediction
CN110134757A (en) * 2019-04-19 2019-08-16 杭州电子科技大学 A kind of event argument roles abstracting method based on bull attention mechanism
CN110232149A (en) * 2019-05-09 2019-09-13 北京邮电大学 A kind of focus incident detection method and system
CN111274790A (en) * 2020-02-13 2020-06-12 东南大学 Chapter-level event embedding method and device based on syntactic dependency graph

Family Cites Families (7)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN104239436B (en) * 2014-08-27 2018-01-02 南京邮电大学 It is a kind of that method is found based on the network hotspot event of text classification and cluster analysis
CN107145568A (en) * 2017-05-04 2017-09-08 成都华栖云科技有限公司 A kind of quick media event clustering system and method
CN108197112A (en) * 2018-01-19 2018-06-22 成都睿码科技有限责任公司 A kind of method that event is extracted from news
CN110399478A (en) * 2018-04-19 2019-11-01 清华大学 Event finds method and apparatus
CN109033200B (en) * 2018-06-29 2021-03-02 北京百度网讯科技有限公司 Event extraction method, device, equipment and computer readable medium
CN109299266B (en) * 2018-10-16 2019-11-12 中国搜索信息科技股份有限公司 A kind of text classification and abstracting method for Chinese news emergency event
CN112463952B (en) * 2020-12-22 2023-05-05 安徽商信政通信息技术股份有限公司 News text aggregation method and system based on neighbor search

Patent Citations (4)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN108932311A (en) * 2018-06-20 2018-12-04 天津大学 The method of incident detection and prediction
CN110134757A (en) * 2019-04-19 2019-08-16 杭州电子科技大学 A kind of event argument roles abstracting method based on bull attention mechanism
CN110232149A (en) * 2019-05-09 2019-09-13 北京邮电大学 A kind of focus incident detection method and system
CN111274790A (en) * 2020-02-13 2020-06-12 东南大学 Chapter-level event embedding method and device based on syntactic dependency graph

Also Published As

Publication number Publication date
CN113515624A (en) 2021-10-19

Similar Documents

Publication Publication Date Title
Tixier et al. A graph degeneracy-based approach to keyword extraction
Li et al. Comparison of word embeddings and sentence encodings as generalized representations for crisis tweet classification tasks
Rousseau et al. Main core retention on graph-of-words for single-document keyword extraction
CN111767725B (en) Data processing method and device based on emotion polarity analysis model
WO2019080863A1 (en) Text sentiment classification method, storage medium and computer
Faguo et al. Research on short text classification algorithm based on statistics and rules
CN104794161A (en) Method for monitoring network public opinions
Tixier et al. Gowvis: a web application for graph-of-words-based text visualization and summarization
Sadr et al. Unified topic-based semantic models: a study in computing the semantic relatedness of geographic terms
Taloba et al. A comparative study on using principle component analysis with different text classifiers
CN109359299A (en) A kind of internet of things equipment ability ontology based on commodity data is from construction method
CN107341142B (en) Enterprise relation calculation method and system based on keyword extraction and analysis
Foong et al. Text summarization using latent semantic analysis model in mobile android platform
CN113515624B (en) Text classification method for emergency news
Yang et al. News topic detection based on capsule semantic graph
CN113761192B (en) Text processing method, text processing device and text processing equipment
Hassan et al. Automatic document topic identification using wikipedia hierarchical ontology
US20230244703A1 (en) Text data attribution description and generation method based on text character features
Zhang et al. A hot spot clustering method based on improved kmeans algorithm
CN110413985B (en) Related text segment searching method and device
Zhang et al. Event-based summarization for scientific literature in chinese
CN114461763B (en) Network security event extraction method based on burst word clustering
Armano et al. Stopwords identification by means of characteristic and discriminant analysis
Lim et al. ClaimFinder: A Framework for Identifying Claims in Microblogs.
Li et al. Identification of public opinion on COVID-19 in microblogs

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination
GR01 Patent grant
GR01 Patent grant