CN113591483B - A document-level event argument extraction method based on sequence labeling - Google Patents
A document-level event argument extraction method based on sequence labeling Download PDFInfo
- Publication number
- CN113591483B CN113591483B CN202110460585.5A CN202110460585A CN113591483B CN 113591483 B CN113591483 B CN 113591483B CN 202110460585 A CN202110460585 A CN 202110460585A CN 113591483 B CN113591483 B CN 113591483B
- Authority
- CN
- China
- Prior art keywords
- word
- span
- context
- representation
- entity
- Prior art date
- Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
- Active
Links
- 238000000605 extraction Methods 0.000 title claims abstract description 57
- 238000002372 labelling Methods 0.000 title claims description 22
- 238000000034 method Methods 0.000 claims abstract description 35
- 230000007246 mechanism Effects 0.000 claims abstract description 26
- 230000004927 fusion Effects 0.000 claims abstract description 19
- 238000012549 training Methods 0.000 claims abstract description 18
- 239000013598 vector Substances 0.000 claims abstract description 16
- 238000013528 artificial neural network Methods 0.000 claims abstract description 5
- 230000000306 recurrent effect Effects 0.000 claims abstract 2
- 239000011159 matrix material Substances 0.000 claims description 18
- 238000012360 testing method Methods 0.000 claims description 17
- 230000006870 function Effects 0.000 claims description 10
- 230000009193 crawling Effects 0.000 claims description 7
- 102100022693 Mucin-4 Human genes 0.000 claims description 6
- 108010008699 Mucin-4 Proteins 0.000 claims description 6
- 239000000284 extract Substances 0.000 claims description 6
- 238000012546 transfer Methods 0.000 claims description 6
- 230000002776 aggregation Effects 0.000 claims description 3
- 238000004220 aggregation Methods 0.000 claims description 3
- 238000011161 development Methods 0.000 claims description 3
- 238000005516 engineering process Methods 0.000 claims description 2
- 238000013507 mapping Methods 0.000 claims description 2
- 230000000694 effects Effects 0.000 abstract description 10
- 230000002457 bidirectional effect Effects 0.000 abstract 1
- 230000009286 beneficial effect Effects 0.000 description 4
- 238000002474 experimental method Methods 0.000 description 4
- 230000001419 dependent effect Effects 0.000 description 3
- 238000011156 evaluation Methods 0.000 description 3
- 230000008569 process Effects 0.000 description 3
- 230000005540 biological transmission Effects 0.000 description 2
- 230000018109 developmental process Effects 0.000 description 2
- 238000012986 modification Methods 0.000 description 2
- 230000004048 modification Effects 0.000 description 2
- 238000012545 processing Methods 0.000 description 2
- 238000011160 research Methods 0.000 description 2
- 238000012216 screening Methods 0.000 description 2
- 230000011218 segmentation Effects 0.000 description 2
- 238000011144 upstream manufacturing Methods 0.000 description 2
- 102100021908 3-mercaptopyruvate sulfurtransferase Human genes 0.000 description 1
- 102100031048 Coiled-coil domain-containing protein 6 Human genes 0.000 description 1
- 101000753843 Homo sapiens 3-mercaptopyruvate sulfurtransferase Proteins 0.000 description 1
- 101000777370 Homo sapiens Coiled-coil domain-containing protein 6 Proteins 0.000 description 1
- 238000007476 Maximum Likelihood Methods 0.000 description 1
- 101000640206 Tityus serrulatus Alpha-mammal toxin Ts2 Proteins 0.000 description 1
- 230000004931 aggregating effect Effects 0.000 description 1
- 230000001413 cellular effect Effects 0.000 description 1
- 239000000470 constituent Substances 0.000 description 1
- 238000010276 construction Methods 0.000 description 1
- 125000004122 cyclic group Chemical group 0.000 description 1
- 230000007547 defect Effects 0.000 description 1
- 238000001514 detection method Methods 0.000 description 1
- 239000006185 dispersion Substances 0.000 description 1
- 238000002156 mixing Methods 0.000 description 1
- 238000003058 natural language processing Methods 0.000 description 1
- 238000007781 pre-processing Methods 0.000 description 1
- 230000000630 rising effect Effects 0.000 description 1
- 238000000638 solvent extraction Methods 0.000 description 1
- 230000007704 transition Effects 0.000 description 1
Classifications
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F40/00—Handling natural language data
- G06F40/30—Semantic analysis
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F40/00—Handling natural language data
- G06F40/20—Natural language analysis
- G06F40/205—Parsing
- G06F40/211—Syntactic parsing, e.g. based on context-free grammar [CFG] or unification grammars
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06N—COMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
- G06N3/00—Computing arrangements based on biological models
- G06N3/02—Neural networks
- G06N3/04—Architecture, e.g. interconnection topology
- G06N3/044—Recurrent networks, e.g. Hopfield networks
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06N—COMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
- G06N3/00—Computing arrangements based on biological models
- G06N3/02—Neural networks
- G06N3/04—Architecture, e.g. interconnection topology
- G06N3/045—Combinations of networks
Landscapes
- Engineering & Computer Science (AREA)
- Theoretical Computer Science (AREA)
- Physics & Mathematics (AREA)
- General Health & Medical Sciences (AREA)
- Health & Medical Sciences (AREA)
- Artificial Intelligence (AREA)
- General Physics & Mathematics (AREA)
- Computational Linguistics (AREA)
- General Engineering & Computer Science (AREA)
- Biomedical Technology (AREA)
- Evolutionary Computation (AREA)
- Molecular Biology (AREA)
- Computing Systems (AREA)
- Data Mining & Analysis (AREA)
- Biophysics (AREA)
- Mathematical Physics (AREA)
- Software Systems (AREA)
- Life Sciences & Earth Sciences (AREA)
- Audiology, Speech & Language Pathology (AREA)
- Machine Translation (AREA)
Abstract
本发明请求保护一种基于序列标注的文档级事件论元抽取方法,包括以下几个步骤:获取语料库实体相关的Wikipedia先验知识,生成词跨度实体语义增强嵌入表示;将词跨度实体语义增强嵌入表示与预训练语言模型得到的上下文表示进行拼接,得到嵌入层的词向量输入;将词表示输入到多跨度双向循环神经网络中,获取词的多跨度上下文特征表示;将多跨度上下文特征表示输入到上下文注意力机制和门控注意力机制模块中,获取词的上下文语义融合特征表示;最后将输出特征表示采用序列标注进行事件论元抽取,利用训练得到的最优模型对未知文档进行事件论元抽取;本发明通过融入先验知识和多跨度上下语义特征表示,有效的提高了文档级事件论元抽取的效果。
The present invention claims a method for extracting document-level event arguments based on sequence annotation, comprising the following steps: obtaining Wikipedia prior knowledge related to corpus entities, generating word span entity semantics enhanced embedding representation; concatenating the word span entity semantics enhanced embedding representation with the context representation obtained by a pre-trained language model to obtain a word vector input for the embedding layer; inputting the word representation into a multi-span bidirectional recurrent neural network to obtain a multi-span context feature representation of the word; inputting the multi-span context feature representation into a context attention mechanism and a gated attention mechanism module to obtain a context semantics fusion feature representation of the word; finally, using sequence annotation to extract event arguments from the output feature representation, and using the optimal model obtained by training to extract event arguments from unknown documents; the present invention effectively improves the effect of document-level event argument extraction by integrating prior knowledge and multi-span contextual semantic feature representations.
Description
Claims (7)
Priority Applications (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN202110460585.5A CN113591483B (en) | 2021-04-27 | 2021-04-27 | A document-level event argument extraction method based on sequence labeling |
Applications Claiming Priority (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN202110460585.5A CN113591483B (en) | 2021-04-27 | 2021-04-27 | A document-level event argument extraction method based on sequence labeling |
Publications (2)
Publication Number | Publication Date |
---|---|
CN113591483A CN113591483A (en) | 2021-11-02 |
CN113591483B true CN113591483B (en) | 2024-12-13 |
Family
ID=78243063
Family Applications (1)
Application Number | Title | Priority Date | Filing Date |
---|---|---|---|
CN202110460585.5A Active CN113591483B (en) | 2021-04-27 | 2021-04-27 | A document-level event argument extraction method based on sequence labeling |
Country Status (1)
Country | Link |
---|---|
CN (1) | CN113591483B (en) |
Families Citing this family (22)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN114185868B (en) * | 2021-10-30 | 2023-05-30 | 西南电子技术研究所(中国电子科技集团公司第十研究所) | Intelligent construction method for Chinese hot event library |
CN113779227B (en) * | 2021-11-12 | 2022-01-25 | 成都数之联科技有限公司 | Case fact extraction method, system, device and medium |
CN114168727B (en) * | 2021-12-06 | 2024-07-12 | 哈尔滨工业大学 | Method, storage medium and equipment for extracting document-level event main body pairs facing financial field |
CN114741516B (en) * | 2021-12-08 | 2024-12-03 | 商汤国际私人有限公司 | Event extraction method and device, electronic device and storage medium |
CN114492377B (en) * | 2021-12-30 | 2024-04-16 | 永中软件股份有限公司 | Event role labeling method, computer equipment and computer readable storage medium |
CN114298053B (en) * | 2022-03-10 | 2022-05-24 | 中国科学院自动化研究所 | A joint event extraction system based on the fusion of feature and attention mechanism |
CN114625842B (en) * | 2022-03-25 | 2024-11-29 | 电子科技大学长三角研究院(衢州) | False comment recognition device based on structural attention enhancement mechanism |
CN114648016B (en) * | 2022-03-29 | 2025-02-11 | 河海大学 | An event argument extraction method based on event element interaction and label semantic enhancement |
CN114742016B (en) * | 2022-04-01 | 2024-07-09 | 山西大学 | Chapter-level event extraction method and device based on multi-granularity entity different composition |
CN114936563B (en) * | 2022-04-27 | 2023-07-25 | 苏州大学 | Event extraction method, device and storage medium |
CN114880431B (en) * | 2022-05-10 | 2024-11-19 | 中国人民解放军国防科技大学 | A prompt-based event argument extraction method and system |
CN115017879A (en) * | 2022-05-27 | 2022-09-06 | 深圳证券信息有限公司 | Text comparison method, computer equipment and computer storage medium |
CN114818721B (en) * | 2022-06-30 | 2022-11-01 | 湖南工商大学 | Event joint extraction model and method combined with sequence labeling |
CN114881038B (en) * | 2022-07-12 | 2022-11-11 | 之江实验室 | Chinese entity and relation extraction method and device based on span and attention mechanism |
CN115238685B (en) * | 2022-09-23 | 2023-03-21 | 华南理工大学 | Combined extraction method for building engineering change events based on position perception |
CN116151241B (en) * | 2023-04-19 | 2023-07-07 | 湖南马栏山视频先进技术研究院有限公司 | Entity identification method and device |
CN116739000A (en) * | 2023-06-08 | 2023-09-12 | 北京智源人工智能研究院 | Method and device for training speaker extraction model for complex context and electronic equipment |
CN117909505B (en) * | 2024-03-13 | 2024-06-07 | 北京邮电大学 | Event argument extraction method and related equipment |
CN118133823B (en) * | 2024-03-14 | 2024-07-30 | 北京语言大学 | Candidate argument filtering method and device for extracting document-level event |
CN118780269B (en) * | 2024-09-12 | 2025-02-18 | 山东浪潮科学研究院有限公司 | High-efficiency cross-document information extraction system and method based on large language model |
CN119003794B (en) * | 2024-10-22 | 2025-02-21 | 北京市科学技术研究院 | Urban site knowledge graph construction method integrating event extraction technology, cultural relic data management system and readable storage medium |
CN119067121B (en) * | 2024-11-05 | 2025-01-28 | 浙江大学 | A general information extraction method |
Family Cites Families (12)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN107797993A (en) * | 2017-11-13 | 2018-03-13 | 成都蓝景信息技术有限公司 | A kind of event extraction method based on sequence labelling |
CN108829801B (en) * | 2018-06-06 | 2020-11-20 | 大连理工大学 | An event-triggered word extraction method based on document-level attention mechanism |
CN109582949B (en) * | 2018-09-14 | 2022-11-22 | 创新先进技术有限公司 | Event element extraction method and device, computing equipment and storage medium |
CN110032641B (en) * | 2019-02-14 | 2024-02-13 | 创新先进技术有限公司 | Method and device for extracting event by using neural network and executed by computer |
CN110135457B (en) * | 2019-04-11 | 2021-04-06 | 中国科学院计算技术研究所 | Method and system for event-triggered word extraction based on autoencoder fusion of document information |
CN112183095A (en) * | 2019-07-02 | 2021-01-05 | 普天信息技术有限公司 | Event extraction method and device |
CN110705289B (en) * | 2019-09-29 | 2023-03-28 | 重庆邮电大学 | Chinese word segmentation method, system and medium based on neural network and fuzzy inference |
CN111191031A (en) * | 2019-12-24 | 2020-05-22 | 上海大学 | An Entity Relationship Classification Method for Unstructured Text Based on WordNet and IDF |
CN111581345A (en) * | 2020-04-26 | 2020-08-25 | 上海明略人工智能(集团)有限公司 | Document level event extraction method and device |
CN111783394B (en) * | 2020-08-11 | 2024-03-08 | 深圳市北科瑞声科技股份有限公司 | Training method of event extraction model, event extraction method, system and equipment |
CN112231447B (en) * | 2020-11-21 | 2023-04-07 | 杭州投知信息技术有限公司 | Method and system for extracting Chinese document events |
CN112528676B (en) * | 2020-12-18 | 2022-07-08 | 南开大学 | Document-level event argument extraction method |
-
2021
- 2021-04-27 CN CN202110460585.5A patent/CN113591483B/en active Active
Non-Patent Citations (1)
Title |
---|
Document-level Event Argument Extraction via Multispan Semantic Reinforcement;Dong Qiu 等;《ISKE2016》;20220418;223-230 * |
Also Published As
Publication number | Publication date |
---|---|
CN113591483A (en) | 2021-11-02 |
Similar Documents
Publication | Publication Date | Title |
---|---|---|
CN113591483B (en) | A document-level event argument extraction method based on sequence labeling | |
Wang et al. | Application of convolutional neural network in natural language processing | |
CN109062893B (en) | Commodity name identification method based on full-text attention mechanism | |
CN111401077B (en) | Language model processing method and device and computer equipment | |
CN113743119B (en) | Chinese named entity recognition module, method and device and electronic equipment | |
CN114757182B (en) | A BERT short text sentiment analysis method with improved training method | |
CN113255320A (en) | Entity relation extraction method and device based on syntax tree and graph attention machine mechanism | |
CN111126069A (en) | Social media short text named entity identification method based on visual object guidance | |
JP5710581B2 (en) | Question answering apparatus, method, and program | |
CN111738003A (en) | Named entity recognition model training method, named entity recognition method and medium | |
CN114298055B (en) | Retrieval method and device based on multilevel semantic matching, computer equipment and storage medium | |
CN113569050A (en) | Method and device for automatic construction of knowledge graph in government affairs field based on deep learning | |
CN111159405B (en) | Irony detection method based on background knowledge | |
CN114153971B (en) | Error correction recognition and classification equipment for Chinese text containing errors | |
CN116151256A (en) | A Few-Shot Named Entity Recognition Method Based on Multi-task and Hint Learning | |
CN114417851B (en) | Emotion analysis method based on keyword weighted information | |
CN114818717A (en) | Chinese named entity recognition method and system fusing vocabulary and syntax information | |
CN115238697A (en) | Judicial named entity recognition method based on natural language processing | |
GB2572320A (en) | Hate speech detection system for online media content | |
Ajees et al. | A named entity recognition system for Malayalam using neural networks | |
CN112685594B (en) | Attention-based weak supervision voice retrieval method and system | |
CN114330350A (en) | Named entity identification method and device, electronic equipment and storage medium | |
CN115526176A (en) | Text recognition method and device, electronic equipment and storage medium | |
CN118585641A (en) | A text summary generation method based on pre-training model | |
Padia et al. | UMBC at SemEval-2018 Task 8: Understanding text about malware |
Legal Events
Date | Code | Title | Description |
---|---|---|---|
PB01 | Publication | ||
PB01 | Publication | ||
SE01 | Entry into force of request for substantive examination | ||
SE01 | Entry into force of request for substantive examination | ||
TA01 | Transfer of patent application right | ||
TA01 | Transfer of patent application right |
Effective date of registration: 20240529 Address after: 518000 1104, Building A, Zhiyun Industrial Park, No. 13, Huaxing Road, Henglang Community, Longhua District, Shenzhen, Guangdong Province Applicant after: Shenzhen Hongyue Information Technology Co.,Ltd. Country or region after: China Address before: 400065 Chongwen Road, Nanshan Street, Nanan District, Chongqing Applicant before: CHONGQING University OF POSTS AND TELECOMMUNICATIONS Country or region before: China |
|
TA01 | Transfer of patent application right | ||
TA01 | Transfer of patent application right |
Effective date of registration: 20241117 Address after: 318000 No. 809, Central Avenue, Jiaojiang District, Taizhou City, Zhejiang Province Applicant after: STATE GRID ZHEJIANG ELECTRIC POWER CO., LTD. TAIZHOU POWER SUPPLY Co. Country or region after: China Address before: 518000 1104, Building A, Zhiyun Industrial Park, No. 13, Huaxing Road, Henglang Community, Longhua District, Shenzhen, Guangdong Province Applicant before: Shenzhen Hongyue Information Technology Co.,Ltd. Country or region before: China |
|
GR01 | Patent grant | ||
GR01 | Patent grant |