CN107229611B - 一种基于词对齐的历史典籍分词方法 - Google Patents
一种基于词对齐的历史典籍分词方法 Download PDFInfo
- Publication number
- CN107229611B CN107229611B CN201710351463.6A CN201710351463A CN107229611B CN 107229611 B CN107229611 B CN 107229611B CN 201710351463 A CN201710351463 A CN 201710351463A CN 107229611 B CN107229611 B CN 107229611B
- Authority
- CN
- China
- Prior art keywords
- word
- chinese
- alignment
- ancient
- characters
- Prior art date
- Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
- Expired - Fee Related
Links
- 230000011218 segmentation Effects 0.000 title claims abstract description 54
- 238000000034 method Methods 0.000 title claims abstract description 30
- 238000007781 pre-processing Methods 0.000 claims abstract description 10
- 238000013519 translation Methods 0.000 claims description 15
- 238000007689 inspection Methods 0.000 claims 1
- 238000003058 natural language processing Methods 0.000 abstract description 3
- 238000012549 training Methods 0.000 abstract description 3
- 230000000694 effects Effects 0.000 description 3
- 238000011161 development Methods 0.000 description 2
- 230000018109 developmental process Effects 0.000 description 2
- 238000012545 processing Methods 0.000 description 2
- 230000009286 beneficial effect Effects 0.000 description 1
- 238000012217 deletion Methods 0.000 description 1
- 230000037430 deletion Effects 0.000 description 1
- 238000010586 diagram Methods 0.000 description 1
- 238000002372 labelling Methods 0.000 description 1
Images
Classifications
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F40/00—Handling natural language data
- G06F40/20—Natural language analysis
- G06F40/279—Recognition of textual entities
- G06F40/289—Phrasal analysis, e.g. finite state techniques or chunking
Landscapes
- Engineering & Computer Science (AREA)
- Theoretical Computer Science (AREA)
- Health & Medical Sciences (AREA)
- Artificial Intelligence (AREA)
- Audiology, Speech & Language Pathology (AREA)
- Computational Linguistics (AREA)
- General Health & Medical Sciences (AREA)
- Physics & Mathematics (AREA)
- General Engineering & Computer Science (AREA)
- General Physics & Mathematics (AREA)
- Machine Translation (AREA)
Abstract
Description
Claims (3)
Priority Applications (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN201710351463.6A CN107229611B (zh) | 2017-05-18 | 2017-05-18 | 一种基于词对齐的历史典籍分词方法 |
Applications Claiming Priority (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN201710351463.6A CN107229611B (zh) | 2017-05-18 | 2017-05-18 | 一种基于词对齐的历史典籍分词方法 |
Publications (2)
Publication Number | Publication Date |
---|---|
CN107229611A CN107229611A (zh) | 2017-10-03 |
CN107229611B true CN107229611B (zh) | 2020-06-30 |
Family
ID=59934537
Family Applications (1)
Application Number | Title | Priority Date | Filing Date |
---|---|---|---|
CN201710351463.6A Expired - Fee Related CN107229611B (zh) | 2017-05-18 | 2017-05-18 | 一种基于词对齐的历史典籍分词方法 |
Country Status (1)
Country | Link |
---|---|
CN (1) | CN107229611B (zh) |
Families Citing this family (3)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN109684648B (zh) * | 2019-01-14 | 2020-09-01 | 浙江大学 | 一种多特征融合的古今汉语自动翻译方法 |
CN109829159B (zh) * | 2019-01-29 | 2020-02-18 | 南京师范大学 | 一种古汉语文本的一体化自动词法分析方法及系统 |
CN116070643B (zh) * | 2023-04-03 | 2023-08-15 | 武昌理工学院 | 一种古文到英文的固定风格翻译方法及系统 |
Citations (5)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
EP1335301A2 (en) * | 2002-02-07 | 2003-08-13 | Matsushita Electric Industrial Co., Ltd. | Context-aware linear time tokenizer |
CN1567297A (zh) * | 2003-07-03 | 2005-01-19 | 中国科学院声学研究所 | 一种从双语语料库中自动抽取多词翻译等价单元的方法 |
CN102693222A (zh) * | 2012-05-25 | 2012-09-26 | 熊晶 | 基于实例的甲骨文释文机器翻译方法 |
CN105446962A (zh) * | 2015-12-30 | 2016-03-30 | 武汉传神信息技术有限公司 | 原文和译文的对齐方法和装置 |
CN106649289A (zh) * | 2016-12-16 | 2017-05-10 | 中国科学院自动化研究所 | 同时识别双语术语与词对齐的实现方法及实现系统 |
Family Cites Families (1)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
US8463593B2 (en) * | 2007-08-31 | 2013-06-11 | Microsoft Corporation | Natural language hypernym weighting for word sense disambiguation |
-
2017
- 2017-05-18 CN CN201710351463.6A patent/CN107229611B/zh not_active Expired - Fee Related
Patent Citations (5)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
EP1335301A2 (en) * | 2002-02-07 | 2003-08-13 | Matsushita Electric Industrial Co., Ltd. | Context-aware linear time tokenizer |
CN1567297A (zh) * | 2003-07-03 | 2005-01-19 | 中国科学院声学研究所 | 一种从双语语料库中自动抽取多词翻译等价单元的方法 |
CN102693222A (zh) * | 2012-05-25 | 2012-09-26 | 熊晶 | 基于实例的甲骨文释文机器翻译方法 |
CN105446962A (zh) * | 2015-12-30 | 2016-03-30 | 武汉传神信息技术有限公司 | 原文和译文的对齐方法和装置 |
CN106649289A (zh) * | 2016-12-16 | 2017-05-10 | 中国科学院自动化研究所 | 同时识别双语术语与词对齐的实现方法及实现系统 |
Non-Patent Citations (1)
Title |
---|
"基于历史典籍双语平行语料库的术语对齐研究";李秀英;《中国博士学位论文全文数据库》;20110615;F085-9 * |
Also Published As
Publication number | Publication date |
---|---|
CN107229611A (zh) | 2017-10-03 |
Similar Documents
Publication | Publication Date | Title |
---|---|---|
US8660834B2 (en) | User input classification | |
CN107193921B (zh) | 面向搜索引擎的中英混合查询纠错的方法及系统 | |
US8131539B2 (en) | Search-based word segmentation method and device for language without word boundary tag | |
CN105404621B (zh) | 一种用于盲人读取汉字的方法及系统 | |
US20050216253A1 (en) | System and method for reverse transliteration using statistical alignment | |
KR20110083623A (ko) | 음역을 위한 기계 학습 | |
Saloot et al. | An architecture for Malay Tweet normalization | |
CN107229611B (zh) | 一种基于词对齐的历史典籍分词方法 | |
CN115587590A (zh) | 训练语料集构建方法、翻译模型训练方法、翻译方法 | |
Tawfik et al. | Morphology-aware word-segmentation in dialectal Arabic adaptation of neural machine translation | |
CN112231451A (zh) | 指代词恢复方法、装置、对话机器人及存储介质 | |
Yulianti et al. | Normalisation of Indonesian-English code-mixed text and its effect on emotion classification | |
Rasooli et al. | Unsupervised morphology-based vocabulary expansion | |
CN110929518A (zh) | 一种使用重叠拆分规则的文本序列标注算法 | |
WO2014189400A1 (en) | A method for diacritisation of texts written in latin- or cyrillic-derived alphabets | |
Yang et al. | Spell Checking for Chinese. | |
Alsayadi et al. | Integrating semantic features for enhancing arabic named entity recognition | |
KR101663038B1 (ko) | 개체의 표면형 문자열 용례학습기반에 의한 텍스트에서의 개체 범위 인식 장치 및 그 방법 | |
CN112765977A (zh) | 一种基于跨语言数据增强的分词方法及装置 | |
Sen et al. | Bangla natural language processing: A comprehensive review of classical machine learning and deep learning based methods | |
Tongtep et al. | Multi-stage automatic NE and pos annotation using pattern-based and statistical-based techniques for thai corpus construction | |
Qafmolla | Automatic language identification | |
Takahasi et al. | Keyboard logs as natural annotations for word segmentation | |
Celikkaya et al. | A mobile assistant for Turkish | |
Ji et al. | Phonetic name matching for cross-lingual spoken sentence retrieval |
Legal Events
Date | Code | Title | Description |
---|---|---|---|
PB01 | Publication | ||
PB01 | Publication | ||
SE01 | Entry into force of request for substantive examination | ||
SE01 | Entry into force of request for substantive examination | ||
GR01 | Patent grant | ||
GR01 | Patent grant | ||
CB03 | Change of inventor or designer information | ||
CB03 | Change of inventor or designer information |
Inventor after: Che Chao Inventor after: Wu Xiaoting Inventor before: Che Chao Inventor before: Wu Xiaoting |
|
TR01 | Transfer of patent right | ||
TR01 | Transfer of patent right |
Effective date of registration: 20230315 Address after: No. 17, Huixian Street, Qixianling, Lingshui Town, Ganjingzi District, Dalian City, Liaoning Province, 116024 Patentee after: DALIAN TONGDIAN TECHNOLOGY CO.,LTD. Address before: No.10 Xuefu street, Dalian Development Zone, Liaoning Province, 116622 Patentee before: DALIAN University |
|
CF01 | Termination of patent right due to non-payment of annual fee | ||
CF01 | Termination of patent right due to non-payment of annual fee |
Granted publication date: 20200630 |