CN102187335A - 使用比较语料库的命名实体直译 - Google Patents
使用比较语料库的命名实体直译 Download PDFInfo
- Publication number
- CN102187335A CN102187335A CN2009801425260A CN200980142526A CN102187335A CN 102187335 A CN102187335 A CN 102187335A CN 2009801425260 A CN2009801425260 A CN 2009801425260A CN 200980142526 A CN200980142526 A CN 200980142526A CN 102187335 A CN102187335 A CN 102187335A
- Authority
- CN
- China
- Prior art keywords
- named entity
- document
- language
- similarity score
- attached document
- Prior art date
- Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
- Pending
Links
Images
Classifications
-
- G—PHYSICS
- G06—COMPUTING OR CALCULATING; COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F40/00—Handling natural language data
- G06F40/10—Text processing
- G06F40/12—Use of codes for handling textual entities
- G06F40/126—Character encoding
- G06F40/129—Handling non-Latin characters, e.g. kana-to-kanji conversion
-
- G—PHYSICS
- G06—COMPUTING OR CALCULATING; COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F40/00—Handling natural language data
- G06F40/20—Natural language analysis
- G06F40/279—Recognition of textual entities
- G06F40/289—Phrasal analysis, e.g. finite state techniques or chunking
- G06F40/295—Named entity recognition
-
- G—PHYSICS
- G06—COMPUTING OR CALCULATING; COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F40/00—Handling natural language data
- G06F40/40—Processing or translation of natural language
- G06F40/42—Data-driven translation
- G06F40/45—Example-based machine translation; Alignment
Landscapes
- Engineering & Computer Science (AREA)
- Theoretical Computer Science (AREA)
- Health & Medical Sciences (AREA)
- Artificial Intelligence (AREA)
- Audiology, Speech & Language Pathology (AREA)
- Computational Linguistics (AREA)
- General Health & Medical Sciences (AREA)
- Physics & Mathematics (AREA)
- General Engineering & Computer Science (AREA)
- General Physics & Mathematics (AREA)
- Machine Translation (AREA)
- Information Retrieval, Db Structures And Fs Structures Therefor (AREA)
Applications Claiming Priority (3)
| Application Number | Priority Date | Filing Date | Title |
|---|---|---|---|
| US12/255,372 | 2008-10-21 | ||
| US12/255,372 US8560298B2 (en) | 2008-10-21 | 2008-10-21 | Named entity transliteration using comparable CORPRA |
| PCT/US2009/061352 WO2010048204A2 (en) | 2008-10-21 | 2009-10-20 | Named entity transliteration using corporate corpora |
Publications (1)
| Publication Number | Publication Date |
|---|---|
| CN102187335A true CN102187335A (zh) | 2011-09-14 |
Family
ID=42118347
Family Applications (1)
| Application Number | Title | Priority Date | Filing Date |
|---|---|---|---|
| CN2009801425260A Pending CN102187335A (zh) | 2008-10-21 | 2009-10-20 | 使用比较语料库的命名实体直译 |
Country Status (5)
| Country | Link |
|---|---|
| US (1) | US8560298B2 (enExample) |
| EP (1) | EP2359264A4 (enExample) |
| JP (1) | JP5497048B2 (enExample) |
| CN (1) | CN102187335A (enExample) |
| WO (1) | WO2010048204A2 (enExample) |
Cited By (1)
| Publication number | Priority date | Publication date | Assignee | Title |
|---|---|---|---|---|
| CN107193809A (zh) * | 2017-05-18 | 2017-09-22 | 广东小天才科技有限公司 | 一种教材脚本生成方法及装置、用户设备 |
Families Citing this family (13)
| Publication number | Priority date | Publication date | Assignee | Title |
|---|---|---|---|---|
| US8332205B2 (en) * | 2009-01-09 | 2012-12-11 | Microsoft Corporation | Mining transliterations for out-of-vocabulary query terms |
| CN102682763B (zh) * | 2011-03-10 | 2014-07-16 | 北京三星通信技术研究有限公司 | 修正语音输入文本中命名实体词汇的方法、装置及终端 |
| WO2012145782A1 (en) * | 2011-04-27 | 2012-11-01 | Digital Sonata Pty Ltd | Generic system for linguistic analysis and transformation |
| US9176936B2 (en) * | 2012-09-28 | 2015-11-03 | International Business Machines Corporation | Transliteration pair matching |
| US9146919B2 (en) | 2013-01-16 | 2015-09-29 | Google Inc. | Bootstrapping named entity canonicalizers from English using alignment models |
| WO2016048350A1 (en) * | 2014-09-26 | 2016-03-31 | Nuance Communications, Inc. | Improving automatic speech recognition of multilingual named entities |
| US10467346B2 (en) * | 2017-05-18 | 2019-11-05 | Wipro Limited | Method and system for generating named entities |
| US11417322B2 (en) * | 2018-12-12 | 2022-08-16 | Google Llc | Transliteration for speech recognition training and scoring |
| US11062621B2 (en) * | 2018-12-26 | 2021-07-13 | Paypal, Inc. | Determining phonetic similarity using machine learning |
| JP7419961B2 (ja) * | 2020-05-12 | 2024-01-23 | 富士通株式会社 | 文書抽出プログラム、文書抽出装置、及び文書抽出方法 |
| US12147422B2 (en) | 2021-10-27 | 2024-11-19 | Bank Of America Corporation | System and method for transpilation of machine interpretable languages |
| US11977852B2 (en) | 2022-01-12 | 2024-05-07 | Bank Of America Corporation | Anaphoric reference resolution using natural language processing and machine learning |
| US12360990B2 (en) | 2022-11-03 | 2025-07-15 | Bank Of America Corporation | Transliteration of machine interpretable languages for enhanced compaction |
Family Cites Families (25)
| Publication number | Priority date | Publication date | Assignee | Title |
|---|---|---|---|---|
| US6104989A (en) * | 1998-07-29 | 2000-08-15 | International Business Machines Corporation | Real time detection of topical changes and topic identification via likelihood based methods |
| JP3317341B2 (ja) * | 1998-11-19 | 2002-08-26 | 日本電気株式会社 | 類似度計算方法及び装置、類似文書検索方法及び装置 |
| JP3055545B1 (ja) * | 1999-01-19 | 2000-06-26 | 富士ゼロックス株式会社 | 関連文検索装置 |
| US20030191625A1 (en) * | 1999-11-05 | 2003-10-09 | Gorin Allen Louis | Method and system for creating a named entity language model |
| JP3643516B2 (ja) * | 2000-03-23 | 2005-04-27 | 日本電信電話株式会社 | 文書評価方法及び装置及び文書評価プログラムを格納した記録媒体 |
| US7191115B2 (en) * | 2001-06-20 | 2007-03-13 | Microsoft Corporation | Statistical method and apparatus for learning translation relationships among words |
| JP2003141109A (ja) * | 2001-11-07 | 2003-05-16 | Fuji Xerox Co Ltd | 多言語文書処理装置および方法 |
| JP3918531B2 (ja) * | 2001-11-29 | 2007-05-23 | 株式会社日立製作所 | 類似文書検索方法およびシステム |
| EP1485825A4 (en) * | 2002-02-04 | 2008-03-19 | Cataphora Inc | DETAILED EXPLORATION TECHNIQUE OF SOCIOLOGICAL DATA AND CORRESPONDING APPARATUS |
| AU2003218097A1 (en) * | 2002-03-11 | 2003-09-29 | University Of Southern California | Named entity translation |
| US7212963B2 (en) * | 2002-06-11 | 2007-05-01 | Fuji Xerox Co., Ltd. | System for distinguishing names in Asian writing systems |
| US7194455B2 (en) * | 2002-09-19 | 2007-03-20 | Microsoft Corporation | Method and system for retrieving confirming sentences |
| US7475010B2 (en) * | 2003-09-03 | 2009-01-06 | Lingospot, Inc. | Adaptive and scalable method for resolving natural language ambiguities |
| GB0322600D0 (en) * | 2003-09-26 | 2003-10-29 | Univ Ulster | Thematic retrieval in heterogeneous data repositories |
| US7478033B2 (en) * | 2004-03-16 | 2009-01-13 | Google Inc. | Systems and methods for translating Chinese pinyin to Chinese characters |
| WO2006018041A1 (de) * | 2004-08-13 | 2006-02-23 | Swiss Reinsurance Company | Sprach- und textanalysevorrichtung und entsprechendes verfahren |
| US7457808B2 (en) * | 2004-12-17 | 2008-11-25 | Xerox Corporation | Method and apparatus for explaining categorization decisions |
| JP4622589B2 (ja) * | 2005-03-08 | 2011-02-02 | ソニー株式会社 | 情報処理装置および方法、プログラム、並びに記録媒体 |
| US20070011132A1 (en) * | 2005-06-17 | 2007-01-11 | Microsoft Corporation | Named entity translation |
| US20070022134A1 (en) * | 2005-07-22 | 2007-01-25 | Microsoft Corporation | Cross-language related keyword suggestion |
| US7672833B2 (en) * | 2005-09-22 | 2010-03-02 | Fair Isaac Corporation | Method and apparatus for automatic entity disambiguation |
| US8249855B2 (en) * | 2006-08-07 | 2012-08-21 | Microsoft Corporation | Identifying parallel bilingual data over a network |
| US7983903B2 (en) * | 2007-09-07 | 2011-07-19 | Microsoft Corporation | Mining bilingual dictionaries from monolingual web pages |
| US8706474B2 (en) * | 2008-02-23 | 2014-04-22 | Fair Isaac Corporation | Translation of entity names based on source document publication date, and frequency and co-occurrence of the entity names |
| US8275608B2 (en) * | 2008-07-03 | 2012-09-25 | Xerox Corporation | Clique based clustering for named entity recognition system |
-
2008
- 2008-10-21 US US12/255,372 patent/US8560298B2/en not_active Expired - Fee Related
-
2009
- 2009-10-20 CN CN2009801425260A patent/CN102187335A/zh active Pending
- 2009-10-20 WO PCT/US2009/061352 patent/WO2010048204A2/en not_active Ceased
- 2009-10-20 JP JP2011533276A patent/JP5497048B2/ja not_active Expired - Fee Related
- 2009-10-20 EP EP09822578.2A patent/EP2359264A4/en not_active Withdrawn
Cited By (1)
| Publication number | Priority date | Publication date | Assignee | Title |
|---|---|---|---|---|
| CN107193809A (zh) * | 2017-05-18 | 2017-09-22 | 广东小天才科技有限公司 | 一种教材脚本生成方法及装置、用户设备 |
Also Published As
| Publication number | Publication date |
|---|---|
| JP5497048B2 (ja) | 2014-05-21 |
| US20100106484A1 (en) | 2010-04-29 |
| EP2359264A2 (en) | 2011-08-24 |
| JP2012506596A (ja) | 2012-03-15 |
| WO2010048204A3 (en) | 2010-08-12 |
| US8560298B2 (en) | 2013-10-15 |
| EP2359264A4 (en) | 2013-07-10 |
| WO2010048204A2 (en) | 2010-04-29 |
Similar Documents
| Publication | Publication Date | Title |
|---|---|---|
| CN102187335A (zh) | 使用比较语料库的命名实体直译 | |
| Munteanu et al. | Improving machine translation performance by exploiting non-parallel corpora | |
| JP4945086B2 (ja) | 論理形式のための統計的言語モデル | |
| US7050964B2 (en) | Scaleable machine translation system | |
| US20070011132A1 (en) | Named entity translation | |
| US20050216253A1 (en) | System and method for reverse transliteration using statistical alignment | |
| Wu et al. | Inversion transduction grammar constraints for mining parallel sentences from quasi-comparable corpora | |
| CN101714136B (zh) | 将基于语料库的机器翻译系统适应到新领域的方法和装置 | |
| JP2008262587A (ja) | 用例ベースの機械翻訳システム | |
| Costa-Jussá et al. | Statistical machine translation enhancements through linguistic levels: A survey | |
| Zhikov et al. | An efficient algorithm for unsupervised word segmentation with branching entropy and MDL | |
| Udupa et al. | “They Are Out There, If You Know Where to Look”: Mining Transliterations of OOV Query Terms for Cross-Language Information Retrieval | |
| Gao et al. | Statistical query translation models for cross-language information retrieval | |
| Gao et al. | TREC-9 CLIR Experiments at MSRCN. | |
| Li et al. | Name-aware machine translation | |
| Zhou et al. | A hybrid technique for English-Chinese cross language information retrieval | |
| Zhang et al. | Chinese OOV translation and post-translation query expansion in chinese--english cross-lingual information retrieval | |
| Le et al. | Using term position similarity and language modeling for bilingual document alignment | |
| Ehsan et al. | A Pairwise Document Analysis Approach for Monolingual Plagiarism Detection. | |
| El Kahki et al. | Improved transliteration mining using graph reinforcement | |
| Tiedemann et al. | Morphological segmentation and OPUS for Finnish-English machine translation | |
| Groves et al. | Hybridity in MT: Experiments on the Europarl corpus | |
| Salloum et al. | Unsupervised Arabic dialect segmentation for machine translation | |
| He et al. | Cross‐Language Information Retrieval | |
| Vulić et al. | A unified framework for monolingual and cross-lingual relevance modeling based on probabilistic topic models |
Legal Events
| Date | Code | Title | Description |
|---|---|---|---|
| C06 | Publication | ||
| PB01 | Publication | ||
| C10 | Entry into substantive examination | ||
| SE01 | Entry into force of request for substantive examination | ||
| C02 | Deemed withdrawal of patent application after publication (patent law 2001) | ||
| WD01 | Invention patent application deemed withdrawn after publication |
Application publication date: 20110914 |