CN112685549B - Document-related news element entity identification method and system integrating discourse semantics - Google Patents

Document-related news element entity identification method and system integrating discourse semantics Download PDF

Info

Publication number
CN112685549B
CN112685549B CN202110023176.9A CN202110023176A CN112685549B CN 112685549 B CN112685549 B CN 112685549B CN 202110023176 A CN202110023176 A CN 202110023176A CN 112685549 B CN112685549 B CN 112685549B
Authority
CN
China
Prior art keywords
news
sentence
case
semantics
central
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Active
Application number
CN202110023176.9A
Other languages
Chinese (zh)
Other versions
CN112685549A (en
Inventor
线岩团
王佳雯
王剑
余正涛
郭军军
相艳
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Kunming University of Science and Technology
Original Assignee
Kunming University of Science and Technology
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Kunming University of Science and Technology filed Critical Kunming University of Science and Technology
Priority to CN202110023176.9A priority Critical patent/CN112685549B/en
Publication of CN112685549A publication Critical patent/CN112685549A/en
Application granted granted Critical
Publication of CN112685549B publication Critical patent/CN112685549B/en
Active legal-status Critical Current
Anticipated expiration legal-status Critical

Links

Images

Abstract

The invention relates to a method and a system for identifying an entity of a case-related news element blended with discourse semantics, belonging to the technical field of natural language processing. The method comprises the steps of crawling corpora from a Chinese news network large case key case module, deleting the first segment of the obtained news text, obtaining a news central sentence, constructing a database of the news central sentence and a news positive sentence corresponding to the news central sentence, learning semantic expression of chapters from the news central sentence by adopting a multi-head attention mechanism, and fusing the semantic expression with the news positive sentence; and obtaining the context information after the chapter semantics are blended through the Bi-LSTM, and finally identifying the element entities in the sentences by adopting a conditional random field. The invention provides a method for identifying the entity of the affair-related news elements blended with the chapter semantics aiming at the characteristics of ubiquitous component reference and component omission in the text sentences of the affair-related news, and effectively solves the problem of context and semantic deficiency. Provides powerful support for the follow-up work of analyzing the news and public sentiments involved in the case.

Description

Method and system for identifying entity of affair-related news element integrated with chapter semantics
Technical Field
The invention relates to a method and a system for identifying an entity of a case-related news element blended with discourse semantics, belonging to the technical field of natural language processing.
Background
In the field of public opinion analysis of news related events, high-quality news related element entities are the basis, premise and pillar of follow-up work, and can be widely applied to multiple aspects, such as: extracting the relation of the factors of the news involved in the case, building a knowledge map of the news involved in the case, tracking sensitive words of the news involved in the case and the like. Because the situation of component designation and component omission generally exists in the text sentence of the case-related news, semantic deletion becomes the key point and difficulty of identification of the element entity of the case-related news, which directly affects the accuracy of identification of the element entity, as shown in fig. 3. Therefore, the method and the system for identifying the entity of the affair-related news element under the condition of lacking the semantics are researched, and the method and the system for identifying the entity of the affair-related news element blended with the semantics of the chapters are realized.
Disclosure of Invention
The invention provides a method and a system for identifying an entity of a document-related news element blended with discourse semantics, which are used for relieving the problem of semantic deficiency and learning multi-level and multi-angle semantic understanding, thereby improving the identification effect.
The technical scheme of the invention is as follows: in a first aspect, the invention provides a method for identifying an entity of a news element involved in a case and incorporating into a chapter semantic, the method comprising the following specific steps:
Step1, firstly, deleting the first segment of the case-related news to obtain a news central sentence, then carrying out character-dividing marking on the obtained news main sentence and the news central sentence, and finally constructing a dictionary in one-to-one correspondence with the marked news main sentence and the news central sentence;
step2, converting the news center sentence and the news text sentence into character vectors by using a Skip-gram model;
step3, constructing a recognition model of the affair-related news element entity fused with the space semantic, and realizing the function of effectively extracting the affair-related news element entity.
As a further scheme of the present invention, the Step1 specifically comprises the following steps:
step1.1, crawling case-related news corpora from a Chinese news network major case key case module by using a web crawler program;
step1.2, filtering and denoising the crawled related news corpus to construct related news text-level corpus; storing the case-related news text-level linguistic data into a database;
step1.3, taking out the corpus of the file-related news text level from a Step1.2 database, forming the corpus of the file-related news text level through sentence segmentation, manually deleting the first segment of the file-related news text to obtain a news central sentence, corresponding the news central sentence with the news text sentence one by one, segmenting the file-related news central sentence and the news text sentence to form a corpus of the file-related news text level, and storing the corpus of the file-related news text level corpus into the database;
Step1.4, extracting the sentence-level corpus of the news related to the case from the Step1.3 database, manually labeling the category of the sentence-level corpus of the news related to the case according to the BIEOS label, and classifying the entity category of the key elements of the news related to the case to form the news related to the labeled corpus containing the central sentence of the news.
As a further scheme of the present invention, the Step2 specifically comprises the following steps:
step2.1, firstly, converting the news corpus related to the case into character vectors by using a Skip-gram model to form a character vector table, and converting each word in the news main sentence and the news central sentence into a character vector sequence by searching the character vector table.
As a further aspect of the present invention, the Step3 specifically includes:
step3.1, the entity recognition model of the case-related news elements integrated with the space chapter semantics has two inputs respectively, one is a news text sentence, and the other is a news central sentence; learning chapter semantic representation by using Multi-Head Attention and integrating a news central sentence into a news text sentence from different dimensions to obtain Multi-level semantic features integrated with chapter semantics;
step3.2, after obtaining the multilevel semantic features of the integrated discourse semantics, adopting Bi-LSTM to extract the context semantic features of the integrated discourse semantics;
Step3.3, adopting a conditional random field to perform constrained decoding on the Bi-LSTM output integrated with the semantic features of the chapters, identifying element entities in sentences, and constructing a case-related news element entity identification model integrated with the semantic features of the chapters.
In a second aspect, an embodiment of the present invention further provides a system for identifying an entity of a news element involved in a document merged with a chapter semantic, where the system includes modules for performing the method of the first aspect.
The beneficial effects of the invention are:
the invention provides a Multi-Head attachment-Bi-LSTM-CRF method integrated with chapter semantics for a case-related news element entity identification task. Aiming at the situations of component designation and component omission of a news text sentence related to case news, the method provides the problem that a news center sentence containing text semantics is merged into the news text sentence as text semantics, so that the context semantics is lost. The model learns the semantic representation of the chapters from the smell center sentence, fuses the semantic representation of the chapters with the news text sentence, acquires the context information fused with the semantic of the chapters by adopting Bi-LSTM, and identifies the element entities in the sentences by adopting conditional random fields. Therefore, the model can learn the semantic understanding of multiple layers and angles, the recognition effect is improved, and powerful support is provided for the subsequent news-related public opinion analysis work.
Drawings
FIG. 1 is a general flow diagram of the present invention;
FIG. 2 is a general model architecture diagram of the present invention;
FIG. 3 is a sample illustration;
fig. 4 is a comparison of experimental results for each class.
Detailed Description
Example 1: as shown in fig. 1-4, the method for identifying an entity of a news element involved in a case and incorporating into a chapter semantic includes the following steps:
step1, firstly, deleting the first segment of the case-related news to obtain a news central sentence, then carrying out character-dividing marking on the obtained news main sentence and the news central sentence, and finally constructing a dictionary in one-to-one correspondence with the marked news main sentence and the news central sentence;
step2, converting the news central sentence and the news text sentence into character vectors by using a Skip-gram model;
step3, constructing a recognition model of the affair-related news element entity fused with the space semantic, and realizing the function of effectively extracting the affair-related news element entity.
As a further scheme of the present invention, the Step1 specifically comprises the following steps:
step1.1, crawling case-related news corpora from a Chinese news network major case key case module by using a web crawler program;
step1.2, filtering and denoising the crawled related news corpus to construct related news text-level corpus; storing the case-related news text-level linguistic data into a database;
Step1.3, taking out the corpus of the file-related news text level from a Step1.2 database, forming the corpus of the file-related news text level through sentence segmentation, manually deleting the first segment of the file-related news text to obtain a news central sentence, corresponding the news central sentence with the news text sentence one by one, segmenting the file-related news central sentence and the news text sentence to form a corpus of the file-related news text level, and storing the corpus of the file-related news text level corpus into the database;
step1.4, extracting the sentence level linguistic data of the news related to the case from the Step1.3 database, manually marking the category of the sentence level linguistic data of the news related to the case according to the BIEOS label, and dividing the entity category of the news related to the case into 6 categories: victims, criminal suspects, places of crime, police investigating the case, courts reviewing the case, and other non-essential entity classes. Forming case-related news marking corpora containing the news central sentences.
As a further scheme of the present invention, the Step2 specifically comprises the following steps:
step2.1, firstly, converting the news corpus related to the case into character vectors by using a Skip-gram model to form a character vector table, and converting each word in the news main sentence and the news central sentence into a character vector sequence by searching the character vector table.
As a further aspect of the present invention, the Step3 specifically includes:
step3.1, the entity recognition model of the case-related news elements integrated with the space chapter semantics has two inputs respectively, one is a news text sentence, and the other is a news central sentence; learning chapter semantic representation by using Multi-Head Attention and integrating a news central sentence into a news text sentence from different dimensions to obtain Multi-level semantic features integrated with chapter semantics;
step3.2, after obtaining the multilevel semantic features of the integrated discourse semantics, adopting Bi-LSTM to extract the context semantic features of the integrated discourse semantics;
step3.3, adopting a conditional random field to perform constrained decoding on the Bi-LSTM output integrated with the semantic features of the chapters, identifying element entities in sentences, and constructing a case-related news element entity identification model integrated with the semantic features of the chapters.
The calculation of merging the news center sentence into the news text sentence can be divided into the following three parts:
(1) firstly, taking a news main sentence as key-value and a news central sentence as query, and projecting the news central sentence and the news main sentence into four different expression sub-spaces through linear change.
(2) And then, zooming the attention of the dot product of the news central sentence and the news central sentence in the presentation subspace, namely performing dot product calculation on the news central sentence query and the news central sentence key in the presentation subspace to obtain a mapping score from the news central sentence to the news central sentence, and compressing the score to be between 0 and 1 through soft max. And multiplying the mapping score by the news text sentence value, and fusing the news center sentence related to the case news into the news text sentence to obtain the news text sentence characteristic fused with the text semantic characteristic in the expression subspace.
(3) And finally, splicing the feature results obtained from the 4 different expression subspaces together to obtain a multilevel semantic feature E fusing chapter semantics.
After the multilevel semantic feature E fused with the discourse semantics is obtained, the Bi-LSTM is adopted to extract the context semantic feature fused with the discourse semantics. And cascading the forward and reverse LSTM hidden states to obtain multilevel and more comprehensive semantic features.
And (3) performing constrained decoding on the Bi-LSTM output integrated with the chapter semantic features by using a conditional random field, and constructing an entity recognition model of the case-related news elements integrated with the chapter semantic features.
The experiment of the invention adopts a TensorFlow 1.13.2 framework, and the sentence length settings of the central sentence and the news text sentence are consistent and are 120 characters. In the training process, an Adam optimization algorithm is used, and the learning rate is 0.004; the dimension of the word vector is 128; the neuron number of the single layer of LSTM is 128; the number of iterations and batches were 31 and 16, respectively.
Using the accuracy P, recall R, and F1 values as evaluation indexes of the element entity recognition result, the calculation of 3 evaluation indexes is as follows:
Figure BDA0002889415270000051
Figure BDA0002889415270000052
Figure BDA0002889415270000053
where CE is the number of correctly identified element entities, IE is the number of identified element entities, and SE is the number of sample element entities.
Different from the common named entities, the element entity categories are: criminal suspects, victims, places of record, police investigating the case and court reviewing the case. The 5-type element entity linguistic data are obtained by crawling large case module data in a Chinese news network, the whole linguistic data comprise 97 cases and 2000 pieces of data, a training set, a verification set and a test set are divided according to the ratio of 7:1:2, and sentences in the linguistic data and various element entities are distributed as shown in table 1.
TABLE 1 case-related News corpus statistics
Figure BDA0002889415270000054
The case news central sentence constituting the external knowledge is obtained by reducing the first segment of the news chapters. After the original corpus is obtained, firstly, a dictionary corresponding to the original news sentences and the sentences in the news one by one is built, then, the sentences are subjected to word segmentation, and finally, the corpus after word segmentation is labeled. The labeling samples are shown in table 2.
TABLE 2 sample of news corpus annotation relating to case
Figure BDA0002889415270000061
In the experiment, a BIEOS label is adopted to label each word of a news central sentence and a news text, B _ prefix is a starting label of an element entity, I _ prefix is a middle word label of the element entity, E _ prefix is an ending label of the element entity, S represents a single element entity, O is a non-element entity class, and a labeling sample is shown in a table 2. And finally, pre-training the labeled corpus by using a Skip-gram model to generate a word vector, wherein the vector dimension is 128. After pre-training, each word is numbered to generate a unique id, and the corresponding word and id are stored in an id2word and word2id dictionary, an id2word dictionary.
The experiment is mainly divided into the following three parts: the device comprises a comparison experiment part, an ablation experiment part and an output test part.
The models used in the comparative experimental section are the reference models used herein are:
Bi-LSTM-CRF: and acquiring the context information of the news text by a Bi-LSTM network, and acquiring the label information of the news text by a CRF layer.
Bi-LSTM-Self-orientation-CRF: a Self-extension mechanism is adopted, so that each word in the sentence has global semantic information; the context semantics of a news text sentence are acquired by Bi-LSTM, then global semantics are acquired by Self-orientation, and finally CRF is used for decoding.
Multi-Head attachment-Bi-LSTM-CRF: the Multi-Head Attention model allows the model to understand the sequence of inputs from different angles. In the experiment, 4 multi-head words are adopted to obtain multi-angle semantic information from a new positive-smelling sentence, then Bi-LSTM is adopted to obtain global semantics, and finally CRF is used for decoding. The results of the comparative experiments are shown in table 3.
TABLE 3 comparison of extraction methods of news key elements
Figure BDA0002889415270000071
As can be seen from the experimental results, the three indexes of the Bi-LSTM-CRF model are the lowest; all indexes of the experimental results of the Bi-LSTM-Self-orientation-CRF model are higher than those of the Bi-LSTM-CRF model; p, R, F1 values are respectively improved by 4%, 14% and 10% compared with the Bi-LSTM-CRF model; compared with the Bi-LSTM-Self-orientation-CRF model, the experimental result of the Multi-Head orientation-Bi-LSTM-CRF model is greatly improved; the invention adopts the Multi-Head attachment to integrate into the knowledge of the chapter semantics by combining the characteristic that the Multi-Head attachment can obtain the Multi-dimensional important semantic features of the sentence from different semantic spaces, leads the chapter semantics to supplement the missing semantics in the sentence in different semantic spaces, and then adopts Bi-LSTM to capture the global semantic information integrated with the chapter semantics, thereby realizing the comprehensive improvement of the model, and compared with the Multi-Head attachment-Bi-LSTM-CRF model which does not integrate the chapter semantics, the three index values are respectively improved by 1 percent, 4 percent and 3 percent.
The above results show that the Multi-Head attachment-Bi-LSTM-CRF model integrated with chapter semantics provided by the invention can supplement the semantic information missing in the sentence by integrating chapter semantics, thereby improving the performance of element entity identification.
The above results show that the Multi-Head attachment-Bi-LSTM-CRF model integrated with chapter semantics provided by the invention can supplement the semantic information missing in the sentence by integrating chapter semantics, thereby improving the performance of element entity identification.
In case of news corpus, there are 5 case element categories. The experimental results of the various classes in the different models are shown in fig. 4.
As can be seen from fig. 4, the best class of the 4 model recognition results is "criminal suspect", and the worst class is "place of case". On the recognition result of the criminal suspect, the recognition effect of the Multi-Head orientation-Bi-LSTM-CRF model is best 84%, and the recognition effect of the Multi-Head orientation-Bi-LSTM-CRF model integrated with the chapter semantics is 83% for a little time; the optimal effect of the Multi-Head orientation-Bi-LSTM-CRF model which integrates discourse semantics into the identification result of the victim is 78 percent, and the Multi-Head orientation-Bi-LSTM-CRF model is 77 percent for a little time; on the recognition result of the 'case-and-place', the optimal Multi-Head Attention-Bi-LSTM-CRF model integrated with the discourse semantics is 42%, and the recognition result of the Bi-LSTM-CRF model is 38% for a little time. The maximum F value of the Multi-Head Attention-Bi-LSTM-CRF model which integrates discourse semantics on the recognition result of the investigation police is 58 percent. The Multi-Head orientation-Bi-LSTM-CRF model which integrates discourse semantics on the recognition result of the 'court of management' has the best effect of 79%.
In summary, the Multi-Head attachment-Bi-LSTM-CRF model merged into the discourse semantics does not sufficiently show superiority in the recognition effect of the criminal suspect and the victim compared with other models, but is far superior to other three models in the recognition effects of the three categories of case ground, case-finding police and the audition court.
In summary, the recognition effect of the Multi-Head attachment-Bi-LSTM-CRF model integrated with the space and chapter semantics is the best.
In order to further verify the effectiveness of the Bi-LSTM-CRF model adopting Multi-Head Attention fused into chapter semantics, all parts are deleted and compared respectively, and whether the extraction of the element entities by each part is effective or not is analyzed.
Table 4 ablation experimental results
Figure BDA0002889415270000081
As can be seen from Table 4, the incorporation of chapter semantics has a practical role in the task of element entity recognition. The R value and the F1 value of the Multi-Head Attention-CRF model are low, the P value is reduced by 5% after the discourse semantics are integrated, and the R value and the F1 value are respectively improved by 3% and 2%. The reason is that the Multi-Head attachment is adopted to integrate chapter semantics into a news text sentence from different dimensions, so that the problem of semantic missing is relieved, but semantic information carried by the news text is ignored, and the P value is lower than that of the Multi-Head attachment-CRF model. In the Multi-Head attachment-Bi-LSTM-CRF model integrated with discourse semantics, the accuracy, the recall rate and the F1 value are respectively improved by 7 percent, 29 percent and 23 percent after the Bi-LSTM is added. Therefore, after the Bi-Head orientation is added, the model acquires the chapter semantics through the Multi-Head orientation, and the Bi-LSTM acquires the context semantic information integrated with the chapter semantics, thereby really realizing the Multi-level and Multi-angle semantic understanding.
The following is an embodiment of the system of the present invention, and the embodiment of the present invention further provides a system for identifying an entity of a news element involved in a case, which incorporates into a chapter semantic, and the system includes a module for executing the method of the first aspect.
The dictionary construction module: the method is used for deleting the first segment of the case-related news to obtain a news central sentence, carrying out character-dividing marking on the obtained news main sentence and the news central sentence, and finally constructing a dictionary in one-to-one correspondence with the marked news main sentence and the news central sentence;
a character vector conversion module: the system comprises a browser and a browser, wherein the browser is used for converting a news central sentence and a news text sentence into character vectors by using a Skip-gram model;
constructing an extraction model and an extraction entity module: the method is used for constructing a recognition model of the affair-involved news element entities fused with discourse semantics and realizing the function of effectively extracting the affair-involved news element entities.
In a possible implementation manner, the dictionary construction module is specifically configured to:
crawling case-related news corpora from a Chinese news network major case key case module by using a web crawler program;
filtering and denoising the crawled involved news corpus to construct a text-level corpus of the involved news; storing the case-related news text-level linguistic data into a database;
Taking out the text-level corpus of the case-related news from a database, forming sentence-level corpus of the case-related news text through sentence segmentation processing, manually deleting the first segment of the case-related news text to obtain a news central sentence, corresponding the news central sentence to the news text sentence one by one, separating the case-related news central sentence and the news text sentence to form a sentence-level corpus containing the case-related news text, and storing the corpus of the case-related news text sentence-level corpus into the database;
and (4) taking out the sentence-level linguistic data of the news related to the case from the database, manually carrying out category marking on the sentence-level linguistic data of the news related to the case according to the BIEOS label, and classifying the entity categories of the elements of the news related to the case to form the news related to the case marked linguistic data containing the central sentence of the news.
In a possible implementation manner, the character vector conversion module is specifically configured to:
firstly, converting the case-related news corpus into character vectors by using a Skip-gram model to form a character vector table, and converting each word in a news main sentence and a news central sentence into a character vector sequence by searching the character vector table.
In a possible implementation, the extraction model and extraction entity building module is specifically configured to:
Because the entity recognition model of the case-related news elements integrated with the space and chapter semantics has two inputs respectively, one is a news text sentence, and the other is a news central sentence; learning chapter semantic representation by using Multi-Head Attention and integrating a news central sentence into a news text sentence from different dimensions to obtain Multi-level semantic features integrated with chapter semantics;
after obtaining the multilevel semantic features blended with the discourse semantics, extracting the context semantic features blended with the discourse semantics by adopting Bi-LSTM;
and performing constrained decoding on the Bi-LSTM output integrated with the chapter semantic features by adopting a conditional random field, identifying element entities in sentences, and constructing an incident news element entity identification model integrated with chapter semantics.
While the present invention has been described in detail with reference to the embodiments shown in the drawings, the present invention is not limited to the embodiments, and various changes can be made without departing from the spirit of the present invention within the knowledge of those skilled in the art.

Claims (4)

1. A method for identifying an entity of a news element involved in a case and integrated with discourse semantics is characterized by comprising the following steps: the method comprises the following specific steps:
step1, firstly, deleting the first segment of the case-related news to obtain a news central sentence, then carrying out character-dividing marking on the obtained news main sentence and the news central sentence, and finally constructing a dictionary in one-to-one correspondence with the marked news main sentence and the news central sentence;
Step2, converting the news central sentence and the news text sentence into character vectors by using a Skip-gram model;
step3, constructing a recognition model of the affair-related news element entity fused with the chapter semantics, and realizing the function of effectively extracting the affair-related news element entity;
the specific steps of Step3 include:
step3.1, the entity recognition model of the case-related news elements integrated with the space chapter semantics has two inputs respectively, one is a news text sentence, and the other is a news central sentence; learning chapter semantic representation by using Multi-Head Attention and integrating a news central sentence into a news text sentence from different dimensions to obtain Multi-level semantic features integrated with chapter semantics;
step3.2, after obtaining the multilevel semantic features of the integrated discourse semantics, adopting Bi-LSTM to extract the context semantic features of the integrated discourse semantics;
step3.3, adopting a conditional random field to perform constrained decoding on the Bi-LSTM output integrated with the semantic features of the chapters, identifying element entities in sentences, and constructing a case-related news element entity identification model integrated with the semantic features of the chapters.
2. The method for identifying entities of news related elements merged with discourse semantics according to claim 1, wherein: the specific steps of Step1 are as follows:
Step1.1, crawling case-related news corpora from a Chinese news network major case key case module by using a web crawler program;
step1.2, filtering and denoising the crawled related news corpus to construct related news text-level corpus; storing the case-related news text-level linguistic data into a database;
step1.3, taking out the text-level corpus of the case-related news from a Step1.2 database, forming sentence-level corpus of the case-related news text through sentence segmentation processing, manually deleting the first segment of the case-related news text to obtain a news central sentence, segmenting the news central sentence and the news main sentence to form a sentence-level corpus containing the case-related news text, and storing the corpus of the sub-level corpus of the case-related news positive sentence into the database;
step1.4, extracting the sentence-level corpus of the news related to the case from the Step1.3 database, manually marking the category of the sentence-level corpus of the news related to the case according to the BIEOS label, classifying the entity category of the news related to the case element to form the news related to the marked corpus containing the central news sentence, and enabling the central news sentence to correspond to the main news sentence one by one.
3. The method for identifying entities of news related elements merged with discourse semantics according to claim 1, wherein: the specific steps of Step2 are as follows:
Step2.1, firstly, converting the news corpus related to the case into character vectors by using a Skip-gram model to form a character vector table, and converting each word in the news main sentence and the news central sentence into a character vector sequence by searching the character vector table.
4. A system for identifying an entity of a news element involved in a discourse merged into a discourse semantic, comprising means for performing the method of any one of claims 1 to 3.
CN202110023176.9A 2021-01-08 2021-01-08 Document-related news element entity identification method and system integrating discourse semantics Active CN112685549B (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN202110023176.9A CN112685549B (en) 2021-01-08 2021-01-08 Document-related news element entity identification method and system integrating discourse semantics

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN202110023176.9A CN112685549B (en) 2021-01-08 2021-01-08 Document-related news element entity identification method and system integrating discourse semantics

Publications (2)

Publication Number Publication Date
CN112685549A CN112685549A (en) 2021-04-20
CN112685549B true CN112685549B (en) 2022-07-29

Family

ID=75456526

Family Applications (1)

Application Number Title Priority Date Filing Date
CN202110023176.9A Active CN112685549B (en) 2021-01-08 2021-01-08 Document-related news element entity identification method and system integrating discourse semantics

Country Status (1)

Country Link
CN (1) CN112685549B (en)

Families Citing this family (1)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN114580422B (en) * 2022-03-14 2022-12-13 昆明理工大学 Named entity identification method combining two-stage classification of neighbor analysis

Citations (13)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
JP2016057810A (en) * 2014-09-09 2016-04-21 日本電信電話株式会社 Predicate argument structure extraction device, method, program, and computer readable storage medium
KR20160067469A (en) * 2014-12-04 2016-06-14 강원대학교산학협력단 Apparatus and method for extracting social relation between entity
CN106055658A (en) * 2016-06-02 2016-10-26 中国人民解放军国防科学技术大学 Extraction method aiming at Twitter text event
CN109472026A (en) * 2018-10-31 2019-03-15 北京国信云服科技有限公司 Accurate emotion information extracting methods a kind of while for multiple name entities
CN110032739A (en) * 2019-04-18 2019-07-19 清华大学 Chinese electronic health record name entity abstracting method and system
CN110147551A (en) * 2019-05-14 2019-08-20 腾讯科技(深圳)有限公司 Multi-class entity recognition model training, entity recognition method, server and terminal
CN110852106A (en) * 2019-11-06 2020-02-28 腾讯科技(深圳)有限公司 Named entity processing method and device based on artificial intelligence and electronic equipment
CN111126039A (en) * 2019-12-25 2020-05-08 贵州大学 Relation extraction-oriented sentence structure information acquisition method
CN111178074A (en) * 2019-12-12 2020-05-19 天津大学 Deep learning-based Chinese named entity recognition method
CN111401061A (en) * 2020-03-19 2020-07-10 昆明理工大学 Method for identifying news opinion involved in case based on BERT and Bi L STM-Attention
CN111444719A (en) * 2020-03-17 2020-07-24 车智互联(北京)科技有限公司 Entity identification method and device and computing equipment
CN111783459A (en) * 2020-05-08 2020-10-16 昆明理工大学 Laos named entity recognition method based on improved transform + CRF
CN111832295A (en) * 2020-07-08 2020-10-27 昆明理工大学 Criminal case element identification method based on BERT pre-training model

Family Cites Families (3)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN110472047B (en) * 2019-07-15 2022-12-13 昆明理工大学 Multi-feature fusion Chinese-Yue news viewpoint sentence extraction method
CN111581943A (en) * 2020-04-02 2020-08-25 昆明理工大学 Chinese-over-bilingual multi-document news viewpoint sentence identification method based on sentence association graph
CN111814477B (en) * 2020-07-06 2022-06-21 重庆邮电大学 Dispute focus discovery method and device based on dispute focus entity and terminal

Patent Citations (13)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
JP2016057810A (en) * 2014-09-09 2016-04-21 日本電信電話株式会社 Predicate argument structure extraction device, method, program, and computer readable storage medium
KR20160067469A (en) * 2014-12-04 2016-06-14 강원대학교산학협력단 Apparatus and method for extracting social relation between entity
CN106055658A (en) * 2016-06-02 2016-10-26 中国人民解放军国防科学技术大学 Extraction method aiming at Twitter text event
CN109472026A (en) * 2018-10-31 2019-03-15 北京国信云服科技有限公司 Accurate emotion information extracting methods a kind of while for multiple name entities
CN110032739A (en) * 2019-04-18 2019-07-19 清华大学 Chinese electronic health record name entity abstracting method and system
CN110147551A (en) * 2019-05-14 2019-08-20 腾讯科技(深圳)有限公司 Multi-class entity recognition model training, entity recognition method, server and terminal
CN110852106A (en) * 2019-11-06 2020-02-28 腾讯科技(深圳)有限公司 Named entity processing method and device based on artificial intelligence and electronic equipment
CN111178074A (en) * 2019-12-12 2020-05-19 天津大学 Deep learning-based Chinese named entity recognition method
CN111126039A (en) * 2019-12-25 2020-05-08 贵州大学 Relation extraction-oriented sentence structure information acquisition method
CN111444719A (en) * 2020-03-17 2020-07-24 车智互联(北京)科技有限公司 Entity identification method and device and computing equipment
CN111401061A (en) * 2020-03-19 2020-07-10 昆明理工大学 Method for identifying news opinion involved in case based on BERT and Bi L STM-Attention
CN111783459A (en) * 2020-05-08 2020-10-16 昆明理工大学 Laos named entity recognition method based on improved transform + CRF
CN111832295A (en) * 2020-07-08 2020-10-27 昆明理工大学 Criminal case element identification method based on BERT pre-training model

Non-Patent Citations (7)

* Cited by examiner, † Cited by third party
Title
A parallel computing-based Deep Attention model for named entity recognition;Xiaojun Liu et al;《The Journal of Supercomputing》;20190907;第76卷;814–830 *
Entity Candidate Network for Whole-Aware Named Entity Recognition;Wendong He et al;《https://arxiv.53yu.com/abs/2004.14145》;20200429;1-10 *
Joint entity recognition and relation extraction as a multi-head selection problem;Giannis Bekoulis et al;《Expert Systems with Applications》;20181230;第114卷;34-45 *
Named Entity Recognition for Social Media Texts with Semantic Augmentation;Yuyang Nie et al;《https://arxiv.53yu.com/abs/2010.15458》;20201029;1-9 *
新闻事件识别系统的研究与实现;李昕;《中国优秀硕士学位论文全文数据库 信息科技辑》;20180315(第3期);I138-667 *
融入背景知识的篇章语义分析方法研究;张牧宇;《中国优秀博硕士学位论文全文数据库(博士)信息科技辑》;20170215(第2期);I138-218 *
融入语言模型和注意力机制的临床电子病历命名实体识别;唐国强 等;《计算机科学》;20191122;第47卷(第3期);211-216 *

Also Published As

Publication number Publication date
CN112685549A (en) 2021-04-20

Similar Documents

Publication Publication Date Title
CN111353030B (en) Knowledge question and answer retrieval method and device based on knowledge graph in travel field
CN107808011B (en) Information classification extraction method and device, computer equipment and storage medium
CN111783394B (en) Training method of event extraction model, event extraction method, system and equipment
CN111160031A (en) Social media named entity identification method based on affix perception
WO2015149533A1 (en) Method and device for word segmentation processing on basis of webpage content classification
CN111046656B (en) Text processing method, text processing device, electronic equipment and readable storage medium
CN113569050B (en) Method and device for automatically constructing government affair field knowledge map based on deep learning
CN110413787B (en) Text clustering method, device, terminal and storage medium
CN110879834B (en) Viewpoint retrieval system based on cyclic convolution network and viewpoint retrieval method thereof
CN112633431B (en) Tibetan-Chinese bilingual scene character recognition method based on CRNN and CTC
CN112101027A (en) Chinese named entity recognition method based on reading understanding
CN114036930A (en) Text error correction method, device, equipment and computer readable medium
KR101724398B1 (en) A generation system and method of a corpus for named-entity recognition using knowledge bases
WO2021190662A1 (en) Medical text sorting method and apparatus, electronic device, and storage medium
CN111104801A (en) Text word segmentation method, system, device and medium based on website domain name
CN112784602A (en) News emotion entity extraction method based on remote supervision
CN115098706A (en) Network information extraction method and device
CN112559747A (en) Event classification processing method and device, electronic equipment and storage medium
CN112685549B (en) Document-related news element entity identification method and system integrating discourse semantics
CN112307364B (en) Character representation-oriented news text place extraction method
CN111159405B (en) Irony detection method based on background knowledge
CN110705285B (en) Government affair text subject word library construction method, device, server and readable storage medium
CN117010500A (en) Visual knowledge reasoning question-answering method based on multi-source heterogeneous knowledge joint enhancement
CN115712700A (en) Hot word extraction method, system, computer device and storage medium
CN114238735B (en) Intelligent internet data acquisition method

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination
GR01 Patent grant
GR01 Patent grant