CN104641367B - 用于格式化电子字符序列的格式化模块、系统和方法 - Google Patents
用于格式化电子字符序列的格式化模块、系统和方法 Download PDFInfo
- Publication number
- CN104641367B CN104641367B CN201380048564.6A CN201380048564A CN104641367B CN 104641367 B CN104641367 B CN 104641367B CN 201380048564 A CN201380048564 A CN 201380048564A CN 104641367 B CN104641367 B CN 104641367B
- Authority
- CN
- China
- Prior art keywords
- rule
- language
- character sequence
- character
- sequence
- Prior art date
- Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
- Active
Links
Classifications
-
- G—PHYSICS
- G06—COMPUTING OR CALCULATING; COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F40/00—Handling natural language data
- G06F40/10—Text processing
- G06F40/12—Use of codes for handling textual entities
-
- G—PHYSICS
- G06—COMPUTING OR CALCULATING; COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F16/00—Information retrieval; Database structures therefor; File system structures therefor
- G06F16/20—Information retrieval; Database structures therefor; File system structures therefor of structured data, e.g. relational data
- G06F16/24—Querying
- G06F16/245—Query processing
- G06F16/2455—Query execution
- G06F16/24564—Applying rules; Deductive queries
-
- G—PHYSICS
- G06—COMPUTING OR CALCULATING; COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F40/00—Handling natural language data
- G06F40/10—Text processing
- G06F40/12—Use of codes for handling textual entities
- G06F40/163—Handling of whitespace
-
- G—PHYSICS
- G06—COMPUTING OR CALCULATING; COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F40/00—Handling natural language data
- G06F40/20—Natural language analysis
- G06F40/274—Converting codes to words; Guess-ahead of partial word inputs
-
- G—PHYSICS
- G06—COMPUTING OR CALCULATING; COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F40/00—Handling natural language data
- G06F40/20—Natural language analysis
- G06F40/263—Language identification
Landscapes
- Engineering & Computer Science (AREA)
- Theoretical Computer Science (AREA)
- Physics & Mathematics (AREA)
- Computational Linguistics (AREA)
- General Engineering & Computer Science (AREA)
- General Physics & Mathematics (AREA)
- Audiology, Speech & Language Pathology (AREA)
- General Health & Medical Sciences (AREA)
- Health & Medical Sciences (AREA)
- Artificial Intelligence (AREA)
- Data Mining & Analysis (AREA)
- Databases & Information Systems (AREA)
- Machine Translation (AREA)
- Document Processing Apparatus (AREA)
Applications Claiming Priority (3)
| Application Number | Priority Date | Filing Date | Title |
|---|---|---|---|
| GBGB1216640.1A GB201216640D0 (en) | 2012-09-18 | 2012-09-18 | Formatting module, system and method for formatting an electronic character sequence |
| GB1216640.1 | 2012-09-18 | ||
| PCT/GB2013/052443 WO2014045032A1 (en) | 2012-09-18 | 2013-09-18 | Formatting module, system and method for formatting an electronic character sequence |
Publications (2)
| Publication Number | Publication Date |
|---|---|
| CN104641367A CN104641367A (zh) | 2015-05-20 |
| CN104641367B true CN104641367B (zh) | 2019-01-11 |
Family
ID=47144444
Family Applications (1)
| Application Number | Title | Priority Date | Filing Date |
|---|---|---|---|
| CN201380048564.6A Active CN104641367B (zh) | 2012-09-18 | 2013-09-18 | 用于格式化电子字符序列的格式化模块、系统和方法 |
Country Status (6)
| Country | Link |
|---|---|
| US (2) | US20150248379A1 (enExample) |
| EP (1) | EP2898426A1 (enExample) |
| JP (1) | JP6273285B2 (enExample) |
| CN (1) | CN104641367B (enExample) |
| GB (1) | GB201216640D0 (enExample) |
| WO (1) | WO2014045032A1 (enExample) |
Families Citing this family (4)
| Publication number | Priority date | Publication date | Assignee | Title |
|---|---|---|---|---|
| WO2014122733A1 (ja) * | 2013-02-06 | 2014-08-14 | 株式会社日立製作所 | 計算機、データアクセス管理方法及び記録媒体 |
| CN106909296A (zh) | 2016-06-07 | 2017-06-30 | 阿里巴巴集团控股有限公司 | 数据的提取方法、装置及终端设备 |
| JP7566520B2 (ja) * | 2020-07-17 | 2024-10-15 | キヤノン株式会社 | 画像処理装置、方法、プログラム |
| JP7724676B2 (ja) * | 2021-10-05 | 2025-08-18 | 株式会社日本総合研究所 | 情報処理方法、プログラム及び情報処理装置 |
Citations (4)
| Publication number | Priority date | Publication date | Assignee | Title |
|---|---|---|---|---|
| US6374242B1 (en) * | 1999-09-29 | 2002-04-16 | Lockheed Martin Corporation | Natural-language information processor with association searches limited within blocks |
| US7080002B1 (en) * | 1997-03-26 | 2006-07-18 | Samsung Electronics Co., Ltd. | Bi-lingual system and method for automatically converting one language into another language |
| CN1851642A (zh) * | 2005-09-09 | 2006-10-25 | 华为技术有限公司 | 一种接口数据文法分析处理系统及其分析处理方法 |
| WO2012042217A1 (en) * | 2010-09-29 | 2012-04-05 | Touchtype Ltd. | System and method for inputting text into electronic devices |
Family Cites Families (31)
| Publication number | Priority date | Publication date | Assignee | Title |
|---|---|---|---|---|
| US4247906A (en) | 1978-11-13 | 1981-01-27 | Wang Laboratories, Inc. | Text editing system having flexible repetitive operation capability |
| US4783803A (en) * | 1985-11-12 | 1988-11-08 | Dragon Systems, Inc. | Speech recognition apparatus and method |
| JPH0762841B2 (ja) | 1986-06-27 | 1995-07-05 | 横河・ヒユ−レツト・パツカ−ド株式会社 | 文書清書装置 |
| US5222225A (en) * | 1988-10-07 | 1993-06-22 | International Business Machines Corporation | Apparatus for processing character string moves in a data processing system |
| US5062143A (en) * | 1990-02-23 | 1991-10-29 | Harris Corporation | Trigram-based method of language identification |
| US5937420A (en) * | 1996-07-23 | 1999-08-10 | Adobe Systems Incorporated | Pointsize-variable character spacing |
| US6513002B1 (en) * | 1998-02-11 | 2003-01-28 | International Business Machines Corporation | Rule-based number formatter |
| US6529864B1 (en) * | 1999-08-11 | 2003-03-04 | Roedy-Black Publishing, Inc. | Interactive connotative dictionary system |
| US20020123994A1 (en) | 2000-04-26 | 2002-09-05 | Yves Schabes | System for fulfilling an information need using extended matching techniques |
| US20040078191A1 (en) * | 2002-10-22 | 2004-04-22 | Nokia Corporation | Scalable neural network-based language identification from written text |
| US7580838B2 (en) * | 2002-11-22 | 2009-08-25 | Scansoft, Inc. | Automatic insertion of non-verbalized punctuation |
| US20060184878A1 (en) * | 2005-02-11 | 2006-08-17 | Microsoft Corporation | Using a description language to provide a user interface presentation |
| US8027832B2 (en) * | 2005-02-11 | 2011-09-27 | Microsoft Corporation | Efficient language identification |
| JP4135950B2 (ja) * | 2005-06-09 | 2008-08-20 | インターナショナル・ビジネス・マシーンズ・コーポレーション | アクセス管理装置、アクセス管理方法、およびプログラム |
| US7552045B2 (en) * | 2006-12-18 | 2009-06-23 | Nokia Corporation | Method, apparatus and computer program product for providing flexible text based language identification |
| US8527262B2 (en) | 2007-06-22 | 2013-09-03 | International Business Machines Corporation | Systems and methods for automatic semantic role labeling of high morphological text for natural language processing applications |
| US8783570B2 (en) * | 2007-08-21 | 2014-07-22 | Symbol Technologies, Inc. | Reader with optical character recognition |
| US8306356B1 (en) * | 2007-09-28 | 2012-11-06 | Language Technologies, Inc. | System, plug-in, and method for improving text composition by modifying character prominence according to assigned character information measures |
| US8706474B2 (en) * | 2008-02-23 | 2014-04-22 | Fair Isaac Corporation | Translation of entity names based on source document publication date, and frequency and co-occurrence of the entity names |
| KR101496885B1 (ko) | 2008-04-07 | 2015-02-27 | 삼성전자주식회사 | 문장 띄어쓰기 시스템 및 방법 |
| US8224641B2 (en) * | 2008-11-19 | 2012-07-17 | Stratify, Inc. | Language identification for documents containing multiple languages |
| JP4701292B2 (ja) * | 2009-01-05 | 2011-06-15 | インターナショナル・ビジネス・マシーンズ・コーポレーション | テキスト・データに含まれる固有表現又は専門用語から用語辞書を作成するためのコンピュータ・システム、並びにその方法及びコンピュータ・プログラム |
| US8879846B2 (en) * | 2009-02-10 | 2014-11-04 | Kofax, Inc. | Systems, methods and computer program products for processing financial documents |
| GB0905457D0 (en) | 2009-03-30 | 2009-05-13 | Touchtype Ltd | System and method for inputting text into electronic devices |
| KR101638594B1 (ko) * | 2010-05-26 | 2016-07-20 | 삼성전자주식회사 | Dna 서열 검색 방법 및 장치 |
| WO2012098544A2 (en) | 2011-01-19 | 2012-07-26 | Keyless Systems, Ltd. | Improved data entry systems |
| US20120262461A1 (en) * | 2011-02-17 | 2012-10-18 | Conversive, Inc. | System and Method for the Normalization of Text |
| WO2014042976A2 (en) * | 2012-09-15 | 2014-03-20 | Numbergun Llc, A Utah Limited Liability Company | Flexible high-speed generation and formatting of application-specified strings |
| US20140136967A1 (en) | 2012-11-09 | 2014-05-15 | Research In Motion Limited | Method of providing predictive text |
| US9811517B2 (en) | 2013-01-29 | 2017-11-07 | Tencent Technology (Shenzhen) Company Limited | Method and system of adding punctuation and establishing language model using a punctuation weighting applied to chinese speech recognized text |
| US8943405B1 (en) | 2013-11-27 | 2015-01-27 | Google Inc. | Assisted punctuation of character strings |
-
2012
- 2012-09-18 GB GBGB1216640.1A patent/GB201216640D0/en not_active Ceased
-
2013
- 2013-09-18 CN CN201380048564.6A patent/CN104641367B/zh active Active
- 2013-09-18 WO PCT/GB2013/052443 patent/WO2014045032A1/en not_active Ceased
- 2013-09-18 US US14/428,972 patent/US20150248379A1/en not_active Abandoned
- 2013-09-18 EP EP13771173.5A patent/EP2898426A1/en not_active Ceased
- 2013-09-18 JP JP2015531650A patent/JP6273285B2/ja not_active Expired - Fee Related
-
2023
- 2023-04-19 US US18/136,730 patent/US12182496B2/en active Active
Patent Citations (4)
| Publication number | Priority date | Publication date | Assignee | Title |
|---|---|---|---|---|
| US7080002B1 (en) * | 1997-03-26 | 2006-07-18 | Samsung Electronics Co., Ltd. | Bi-lingual system and method for automatically converting one language into another language |
| US6374242B1 (en) * | 1999-09-29 | 2002-04-16 | Lockheed Martin Corporation | Natural-language information processor with association searches limited within blocks |
| CN1851642A (zh) * | 2005-09-09 | 2006-10-25 | 华为技术有限公司 | 一种接口数据文法分析处理系统及其分析处理方法 |
| WO2012042217A1 (en) * | 2010-09-29 | 2012-04-05 | Touchtype Ltd. | System and method for inputting text into electronic devices |
Non-Patent Citations (3)
| Title |
|---|
| Multilingual Text Entry using Automatic Language Detection;Yo Ehara 等;《THIRD INTERNATIONAL JOINT CONFERENCE ON NATURAL LANGUAGE PROCESSING,IJCNLP 2008》;20080107;441-448 * |
| Non Breaking Spaces Before Punctuation In French;Apache OpenOffice Wiki;《https://wiki.openoffice.org/wiki/Non_Breaking_Spaces_Before_Punctuation_In_French_(espaces_ins%C3%A9cables)》;20120829;1-5 * |
| Russian Typographical Traditions in Mathematical Literature;Valentin Zaitsev;《Russian Typographical Traditions in Mathematical Literature》;19990920;1-17 * |
Also Published As
| Publication number | Publication date |
|---|---|
| JP2015534171A (ja) | 2015-11-26 |
| US12182496B2 (en) | 2024-12-31 |
| US20150248379A1 (en) | 2015-09-03 |
| US20230252222A1 (en) | 2023-08-10 |
| EP2898426A1 (en) | 2015-07-29 |
| GB201216640D0 (en) | 2012-10-31 |
| CN104641367A (zh) | 2015-05-20 |
| JP6273285B2 (ja) | 2018-01-31 |
| WO2014045032A1 (en) | 2014-03-27 |
Similar Documents
| Publication | Publication Date | Title |
|---|---|---|
| JP5997217B2 (ja) | 言語変換において複数の読み方の曖昧性を除去する方法 | |
| CN106537370B (zh) | 在存在来源和翻译错误的情况下对命名实体鲁棒标记的方法和系统 | |
| CN103970798B (zh) | 数据的搜索和匹配 | |
| JP6799562B2 (ja) | 言語特徴の抽出装置、固有表現の抽出装置、抽出方法、及びプログラム | |
| US20140039879A1 (en) | Generic system for linguistic analysis and transformation | |
| JP2013117978A (ja) | タイピング効率向上のためのタイピング候補の生成方法 | |
| CN103678684A (zh) | 一种基于导航信息检索的中文分词方法 | |
| KR102794379B1 (ko) | 앙상블 스코어를 이용한 학습 데이터 교정 방법 및 그 장치 | |
| US12182496B2 (en) | Formatting module, system and method for formatting an electronic character sequence | |
| CN103049458A (zh) | 一种修正用户词库的方法和系统 | |
| CN104485107A (zh) | 名称的语音识别方法、语音识别系统和语音识别设备 | |
| Uthayamoorthy et al. | Ddspell-a data driven spell checker and suggestion generator for the tamil language | |
| JP5323652B2 (ja) | 類似語決定方法およびシステム | |
| Nunsanga et al. | Part-of-speech tagging for Mizo language using conditional random field | |
| CN114661917B (zh) | 文本扩增方法、系统、计算机设备及可读存储介质 | |
| JP2007334534A (ja) | 文字列入力装置、文字列入力方法、および、プログラム | |
| Fahad et al. | An Approach towards Implementation of Active and Passive voice using LL (1) Parsing | |
| JP4088171B2 (ja) | テキスト解析装置、方法、プログラム及びそのプログラムを記録した記録媒体 | |
| QasemiZadeh et al. | Adaptive language independent spell checking using intelligent traverse on a tree | |
| JP3737817B2 (ja) | 表現変換方法及び表現変換装置 | |
| JP4478042B2 (ja) | 頻度情報付き単語集合生成方法、プログラムおよびプログラム記憶媒体、ならびに、頻度情報付き単語集合生成装置、テキスト索引語作成装置、全文検索装置およびテキスト分類装置 | |
| Eutamene et al. | Ontologies and Bigram-based Approach for Isolated Non-word Errors Correction in OCR System. | |
| Manohar et al. | Spellchecker for Malayalam using finite state transition models | |
| JP5348699B2 (ja) | データ分類システム、データ分類方法およびプログラム | |
| TW452711B (en) | Method using word affix for word search |
Legal Events
| Date | Code | Title | Description |
|---|---|---|---|
| C06 | Publication | ||
| PB01 | Publication | ||
| EXSB | Decision made by sipo to initiate substantive examination | ||
| SE01 | Entry into force of request for substantive examination | ||
| GR01 | Patent grant | ||
| GR01 | Patent grant | ||
| TR01 | Transfer of patent right | ||
| TR01 | Transfer of patent right |
Effective date of registration: 20200901 Address after: 98052 Microsoft Avenue, Redmond, Washington, USA Patentee after: MICROSOFT TECHNOLOGY LICENSING, LLC Address before: England Atsushi Patentee before: TOUCHTYPE Ltd. |