CN109145286A - 基于BiLSTM-CRF神经网络模型并融合越南语语言特征的名词短语识别方法 - Google Patents
基于BiLSTM-CRF神经网络模型并融合越南语语言特征的名词短语识别方法 Download PDFInfo
- Publication number
- CN109145286A CN109145286A CN201810707821.7A CN201810707821A CN109145286A CN 109145286 A CN109145286 A CN 109145286A CN 201810707821 A CN201810707821 A CN 201810707821A CN 109145286 A CN109145286 A CN 109145286A
- Authority
- CN
- China
- Prior art keywords
- corpus
- noun
- noun phrase
- vietnamese
- bilstm
- Prior art date
- Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
- Pending
Links
- 238000000034 method Methods 0.000 title claims abstract description 29
- 238000003062 neural network model Methods 0.000 title claims abstract description 14
- 238000012549 training Methods 0.000 claims abstract description 9
- 238000003058 natural language processing Methods 0.000 claims abstract description 8
- 238000012360 testing method Methods 0.000 claims description 9
- 239000000463 material Substances 0.000 claims description 8
- 238000013459 approach Methods 0.000 claims description 6
- 238000004321 preservation Methods 0.000 claims description 3
- 238000012216 screening Methods 0.000 claims description 3
- 238000007619 statistical method Methods 0.000 claims description 3
- 238000012937 correction Methods 0.000 claims 1
- 238000012545 processing Methods 0.000 claims 1
- 238000002474 experimental method Methods 0.000 description 7
- 230000000694 effects Effects 0.000 description 3
- 230000004927 fusion Effects 0.000 description 3
- 235000013399 edible fruits Nutrition 0.000 description 2
- 238000009472 formulation Methods 0.000 description 2
- 239000000203 mixture Substances 0.000 description 2
- 238000005201 scrubbing Methods 0.000 description 2
- 238000013519 translation Methods 0.000 description 2
- 230000009286 beneficial effect Effects 0.000 description 1
- 230000000052 comparative effect Effects 0.000 description 1
- 230000006870 function Effects 0.000 description 1
- 230000000877 morphologic effect Effects 0.000 description 1
- 238000002360 preparation method Methods 0.000 description 1
Classifications
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F40/00—Handling natural language data
- G06F40/20—Natural language analysis
- G06F40/279—Recognition of textual entities
- G06F40/284—Lexical analysis, e.g. tokenisation or collocates
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F40/00—Handling natural language data
- G06F40/20—Natural language analysis
- G06F40/237—Lexical tools
- G06F40/247—Thesauruses; Synonyms
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06N—COMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
- G06N3/00—Computing arrangements based on biological models
- G06N3/02—Neural networks
- G06N3/04—Architecture, e.g. interconnection topology
- G06N3/049—Temporal neural networks, e.g. delay elements, oscillating neurons or pulsed inputs
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06N—COMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
- G06N3/00—Computing arrangements based on biological models
- G06N3/02—Neural networks
- G06N3/08—Learning methods
Landscapes
- Engineering & Computer Science (AREA)
- Theoretical Computer Science (AREA)
- Physics & Mathematics (AREA)
- General Health & Medical Sciences (AREA)
- Health & Medical Sciences (AREA)
- Artificial Intelligence (AREA)
- General Physics & Mathematics (AREA)
- Computational Linguistics (AREA)
- General Engineering & Computer Science (AREA)
- Biomedical Technology (AREA)
- Evolutionary Computation (AREA)
- Molecular Biology (AREA)
- Computing Systems (AREA)
- Data Mining & Analysis (AREA)
- Biophysics (AREA)
- Mathematical Physics (AREA)
- Software Systems (AREA)
- Life Sciences & Earth Sciences (AREA)
- Audiology, Speech & Language Pathology (AREA)
- Machine Translation (AREA)
Abstract
Description
测试语料 | P | R | F |
第一组语料 | 87.89 | 88.74 | 88.31 |
第二组语料 | 88.43 | 88.56 | 88.49 |
第二组语料 | 88.26 | 89.01 | 88.63 |
第二组语料 | 87.68 | 88.87 | 88.27 |
第二组语料 | 88.14 | 88.46 | 88.30 |
平均值 | 88.08 | 88.73 | 88.40 |
模型 | P | R | F |
ME | 79.88 | 80.07 | 79.97 |
CRF | 82.72 | 82.62 | 82.67 |
BiLSTM-CRF | 86.34 | 87.11 | 86.72 |
本发明融合后的方法 | 88.12 | 88.74 | 88.43 |
Claims (4)
Priority Applications (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN201810707821.7A CN109145286A (zh) | 2018-07-02 | 2018-07-02 | 基于BiLSTM-CRF神经网络模型并融合越南语语言特征的名词短语识别方法 |
Applications Claiming Priority (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN201810707821.7A CN109145286A (zh) | 2018-07-02 | 2018-07-02 | 基于BiLSTM-CRF神经网络模型并融合越南语语言特征的名词短语识别方法 |
Publications (1)
Publication Number | Publication Date |
---|---|
CN109145286A true CN109145286A (zh) | 2019-01-04 |
Family
ID=64802653
Family Applications (1)
Application Number | Title | Priority Date | Filing Date |
---|---|---|---|
CN201810707821.7A Pending CN109145286A (zh) | 2018-07-02 | 2018-07-02 | 基于BiLSTM-CRF神经网络模型并融合越南语语言特征的名词短语识别方法 |
Country Status (1)
Country | Link |
---|---|
CN (1) | CN109145286A (zh) |
Cited By (4)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN110096713A (zh) * | 2019-03-21 | 2019-08-06 | 昆明理工大学 | 一种基于SVM-BiLSTM-CRF的老挝语机构名称识别方法 |
CN110334213A (zh) * | 2019-07-09 | 2019-10-15 | 昆明理工大学 | 基于双向交叉注意力机制的汉越新闻事件时序关系识别方法 |
CN112084783A (zh) * | 2020-09-24 | 2020-12-15 | 中国民航大学 | 基于民航不文明旅客的实体识别方法及系统 |
CN112651241A (zh) * | 2021-01-08 | 2021-04-13 | 昆明理工大学 | 一种基于半监督学习的汉语并列结构自动识别方法 |
Citations (5)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN101013421A (zh) * | 2007-02-02 | 2007-08-08 | 清华大学 | 基于规则的汉语基本块自动分析方法 |
CN106569998A (zh) * | 2016-10-27 | 2017-04-19 | 浙江大学 | 一种基于Bi‑LSTM、CNN和CRF的文本命名实体识别方法 |
CN106933809A (zh) * | 2017-03-27 | 2017-07-07 | 三角兽(北京)科技有限公司 | 信息处理装置及信息处理方法 |
CN107622050A (zh) * | 2017-09-14 | 2018-01-23 | 武汉烽火普天信息技术有限公司 | 基于Bi‑LSTM和CRF的文本序列标注系统及方法 |
CN107797994A (zh) * | 2017-09-26 | 2018-03-13 | 昆明理工大学 | 基于约束条件随机场的越南语名词组块识别方法 |
-
2018
- 2018-07-02 CN CN201810707821.7A patent/CN109145286A/zh active Pending
Patent Citations (5)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN101013421A (zh) * | 2007-02-02 | 2007-08-08 | 清华大学 | 基于规则的汉语基本块自动分析方法 |
CN106569998A (zh) * | 2016-10-27 | 2017-04-19 | 浙江大学 | 一种基于Bi‑LSTM、CNN和CRF的文本命名实体识别方法 |
CN106933809A (zh) * | 2017-03-27 | 2017-07-07 | 三角兽(北京)科技有限公司 | 信息处理装置及信息处理方法 |
CN107622050A (zh) * | 2017-09-14 | 2018-01-23 | 武汉烽火普天信息技术有限公司 | 基于Bi‑LSTM和CRF的文本序列标注系统及方法 |
CN107797994A (zh) * | 2017-09-26 | 2018-03-13 | 昆明理工大学 | 基于约束条件随机场的越南语名词组块识别方法 |
Non-Patent Citations (2)
Title |
---|
THAI-HOANG PHAM ET AL: "End-to-end Recurrent Neural Network Models for Vietnamese Named Entity Recognition: Word-level vs. Character-level", 《ARXIV PREPRINT ARXIV》 * |
熊明明 等: "基于CRFs和歧义模型的越南语分词", 《数据采集与处理》 * |
Cited By (6)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN110096713A (zh) * | 2019-03-21 | 2019-08-06 | 昆明理工大学 | 一种基于SVM-BiLSTM-CRF的老挝语机构名称识别方法 |
CN110334213A (zh) * | 2019-07-09 | 2019-10-15 | 昆明理工大学 | 基于双向交叉注意力机制的汉越新闻事件时序关系识别方法 |
CN110334213B (zh) * | 2019-07-09 | 2021-05-11 | 昆明理工大学 | 基于双向交叉注意力机制的汉越新闻事件时序关系识别方法 |
CN112084783A (zh) * | 2020-09-24 | 2020-12-15 | 中国民航大学 | 基于民航不文明旅客的实体识别方法及系统 |
CN112084783B (zh) * | 2020-09-24 | 2022-04-12 | 中国民航大学 | 基于民航不文明旅客的实体识别方法及系统 |
CN112651241A (zh) * | 2021-01-08 | 2021-04-13 | 昆明理工大学 | 一种基于半监督学习的汉语并列结构自动识别方法 |
Similar Documents
Publication | Publication Date | Title |
---|---|---|
CN109145286A (zh) | 基于BiLSTM-CRF神经网络模型并融合越南语语言特征的名词短语识别方法 | |
Hancke et al. | Readability classification for German using lexical, syntactic, and morphological features | |
CN106844658A (zh) | 一种中文文本知识图谱自动构建方法及系统 | |
CN109344236A (zh) | 一种基于多种特征的问题相似度计算方法 | |
CN105975458B (zh) | 一种基于细粒度依存关系的中文长句相似度计算方法 | |
CN101201820B (zh) | 一种双语语料库过滤方法及系统 | |
DE112013005742T5 (de) | Absichtsabschätzungsvorrichtung und Absichtsabschätzungsverfahren | |
CN108509409A (zh) | 一种自动生成语义相近句子样本的方法 | |
CN101251862A (zh) | 一种基于内容的问题自动分类方法及其系统 | |
CN107908712A (zh) | 基于术语提取的跨语言信息匹配方法 | |
CN111930895B (zh) | 基于mrc的文档数据检索方法、装置、设备及存储介质 | |
CN110298036A (zh) | 一种基于词性增量迭代的在线医疗文本症状识别方法 | |
CN106547924A (zh) | 文本信息的情感分析方法及装置 | |
CN107797994A (zh) | 基于约束条件随机场的越南语名词组块识别方法 | |
CN109033166A (zh) | 一种人物属性抽取训练数据集构建方法 | |
CN113157860B (zh) | 一种基于小规模数据的电力设备检修知识图谱构建方法 | |
CN106202039A (zh) | 基于条件随机场的越南语组合词消歧方法 | |
CN106547741A (zh) | 一种基于搭配的汉语文本自动校对方法 | |
CN103336803B (zh) | 一种嵌名春联的计算机生成方法 | |
CN107894977A (zh) | 结合兼类词词性消歧模型和字典的越南语词性标记方法 | |
CN106126501B (zh) | 一种基于依存约束和知识的名词词义消歧方法和装置 | |
CN110019556A (zh) | 一种话题新闻获取方法、装置及其设备 | |
Schottmüller et al. | Issues in translating verb-particle constructions from german to english | |
CN109783648B (zh) | 一种利用asr识别结果改进asr语言模型的方法 | |
Taji et al. | The columbia university-new york university abu dhabi sigmorphon 2016 morphological reinflection shared task submission |
Legal Events
Date | Code | Title | Description |
---|---|---|---|
PB01 | Publication | ||
PB01 | Publication | ||
SE01 | Entry into force of request for substantive examination | ||
SE01 | Entry into force of request for substantive examination | ||
CB03 | Change of inventor or designer information |
Inventor after: Yu Zhengtao Inventor after: Zhao Chen Inventor after: Guo Jianyi Inventor after: Mao Cunli Inventor after: Chen Wei Inventor before: Guo Jianyi Inventor before: Zhao Chen Inventor before: Yu Zhengtao Inventor before: Mao Cunli Inventor before: Chen Wei |
|
CB03 | Change of inventor or designer information | ||
RJ01 | Rejection of invention patent application after publication |
Application publication date: 20190104 |
|
RJ01 | Rejection of invention patent application after publication |