CN104641367B - 用于格式化电子字符序列的格式化模块、系统和方法 - Google Patents

用于格式化电子字符序列的格式化模块、系统和方法 Download PDF

Info

Publication number
CN104641367B
CN104641367B CN201380048564.6A CN201380048564A CN104641367B CN 104641367 B CN104641367 B CN 104641367B CN 201380048564 A CN201380048564 A CN 201380048564A CN 104641367 B CN104641367 B CN 104641367B
Authority
CN
China
Prior art keywords
rule
language
character sequence
character
sequence
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Active
Application number
CN201380048564.6A
Other languages
English (en)
Chinese (zh)
Other versions
CN104641367A (zh
Inventor
本杰明·麦德洛克
大卫·马丁内斯·德尔·科拉尔
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Microsoft Technology Licensing LLC
Original Assignee
Touchtype Ltd
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Touchtype Ltd filed Critical Touchtype Ltd
Publication of CN104641367A publication Critical patent/CN104641367A/zh
Application granted granted Critical
Publication of CN104641367B publication Critical patent/CN104641367B/zh
Active legal-status Critical Current
Anticipated expiration legal-status Critical

Links

Classifications

    • GPHYSICS
    • G06COMPUTING OR CALCULATING; COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F40/00Handling natural language data
    • G06F40/10Text processing
    • G06F40/12Use of codes for handling textual entities
    • GPHYSICS
    • G06COMPUTING OR CALCULATING; COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F16/00Information retrieval; Database structures therefor; File system structures therefor
    • G06F16/20Information retrieval; Database structures therefor; File system structures therefor of structured data, e.g. relational data
    • G06F16/24Querying
    • G06F16/245Query processing
    • G06F16/2455Query execution
    • G06F16/24564Applying rules; Deductive queries
    • GPHYSICS
    • G06COMPUTING OR CALCULATING; COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F40/00Handling natural language data
    • G06F40/10Text processing
    • G06F40/12Use of codes for handling textual entities
    • G06F40/163Handling of whitespace
    • GPHYSICS
    • G06COMPUTING OR CALCULATING; COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F40/00Handling natural language data
    • G06F40/20Natural language analysis
    • G06F40/274Converting codes to words; Guess-ahead of partial word inputs
    • GPHYSICS
    • G06COMPUTING OR CALCULATING; COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F40/00Handling natural language data
    • G06F40/20Natural language analysis
    • G06F40/263Language identification

Landscapes

  • Engineering & Computer Science (AREA)
  • Theoretical Computer Science (AREA)
  • Physics & Mathematics (AREA)
  • Computational Linguistics (AREA)
  • General Engineering & Computer Science (AREA)
  • General Physics & Mathematics (AREA)
  • Audiology, Speech & Language Pathology (AREA)
  • General Health & Medical Sciences (AREA)
  • Health & Medical Sciences (AREA)
  • Artificial Intelligence (AREA)
  • Data Mining & Analysis (AREA)
  • Databases & Information Systems (AREA)
  • Machine Translation (AREA)
  • Document Processing Apparatus (AREA)
CN201380048564.6A 2012-09-18 2013-09-18 用于格式化电子字符序列的格式化模块、系统和方法 Active CN104641367B (zh)

Applications Claiming Priority (3)

Application Number Priority Date Filing Date Title
GBGB1216640.1A GB201216640D0 (en) 2012-09-18 2012-09-18 Formatting module, system and method for formatting an electronic character sequence
GB1216640.1 2012-09-18
PCT/GB2013/052443 WO2014045032A1 (en) 2012-09-18 2013-09-18 Formatting module, system and method for formatting an electronic character sequence

Publications (2)

Publication Number Publication Date
CN104641367A CN104641367A (zh) 2015-05-20
CN104641367B true CN104641367B (zh) 2019-01-11

Family

ID=47144444

Family Applications (1)

Application Number Title Priority Date Filing Date
CN201380048564.6A Active CN104641367B (zh) 2012-09-18 2013-09-18 用于格式化电子字符序列的格式化模块、系统和方法

Country Status (6)

Country Link
US (2) US20150248379A1 (enExample)
EP (1) EP2898426A1 (enExample)
JP (1) JP6273285B2 (enExample)
CN (1) CN104641367B (enExample)
GB (1) GB201216640D0 (enExample)
WO (1) WO2014045032A1 (enExample)

Families Citing this family (4)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
WO2014122733A1 (ja) * 2013-02-06 2014-08-14 株式会社日立製作所 計算機、データアクセス管理方法及び記録媒体
CN106909296A (zh) 2016-06-07 2017-06-30 阿里巴巴集团控股有限公司 数据的提取方法、装置及终端设备
JP7566520B2 (ja) * 2020-07-17 2024-10-15 キヤノン株式会社 画像処理装置、方法、プログラム
JP7724676B2 (ja) * 2021-10-05 2025-08-18 株式会社日本総合研究所 情報処理方法、プログラム及び情報処理装置

Citations (4)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US6374242B1 (en) * 1999-09-29 2002-04-16 Lockheed Martin Corporation Natural-language information processor with association searches limited within blocks
US7080002B1 (en) * 1997-03-26 2006-07-18 Samsung Electronics Co., Ltd. Bi-lingual system and method for automatically converting one language into another language
CN1851642A (zh) * 2005-09-09 2006-10-25 华为技术有限公司 一种接口数据文法分析处理系统及其分析处理方法
WO2012042217A1 (en) * 2010-09-29 2012-04-05 Touchtype Ltd. System and method for inputting text into electronic devices

Family Cites Families (31)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US4247906A (en) 1978-11-13 1981-01-27 Wang Laboratories, Inc. Text editing system having flexible repetitive operation capability
US4783803A (en) * 1985-11-12 1988-11-08 Dragon Systems, Inc. Speech recognition apparatus and method
JPH0762841B2 (ja) 1986-06-27 1995-07-05 横河・ヒユ−レツト・パツカ−ド株式会社 文書清書装置
US5222225A (en) * 1988-10-07 1993-06-22 International Business Machines Corporation Apparatus for processing character string moves in a data processing system
US5062143A (en) * 1990-02-23 1991-10-29 Harris Corporation Trigram-based method of language identification
US5937420A (en) * 1996-07-23 1999-08-10 Adobe Systems Incorporated Pointsize-variable character spacing
US6513002B1 (en) * 1998-02-11 2003-01-28 International Business Machines Corporation Rule-based number formatter
US6529864B1 (en) * 1999-08-11 2003-03-04 Roedy-Black Publishing, Inc. Interactive connotative dictionary system
US20020123994A1 (en) 2000-04-26 2002-09-05 Yves Schabes System for fulfilling an information need using extended matching techniques
US20040078191A1 (en) * 2002-10-22 2004-04-22 Nokia Corporation Scalable neural network-based language identification from written text
US7580838B2 (en) * 2002-11-22 2009-08-25 Scansoft, Inc. Automatic insertion of non-verbalized punctuation
US20060184878A1 (en) * 2005-02-11 2006-08-17 Microsoft Corporation Using a description language to provide a user interface presentation
US8027832B2 (en) * 2005-02-11 2011-09-27 Microsoft Corporation Efficient language identification
JP4135950B2 (ja) * 2005-06-09 2008-08-20 インターナショナル・ビジネス・マシーンズ・コーポレーション アクセス管理装置、アクセス管理方法、およびプログラム
US7552045B2 (en) * 2006-12-18 2009-06-23 Nokia Corporation Method, apparatus and computer program product for providing flexible text based language identification
US8527262B2 (en) 2007-06-22 2013-09-03 International Business Machines Corporation Systems and methods for automatic semantic role labeling of high morphological text for natural language processing applications
US8783570B2 (en) * 2007-08-21 2014-07-22 Symbol Technologies, Inc. Reader with optical character recognition
US8306356B1 (en) * 2007-09-28 2012-11-06 Language Technologies, Inc. System, plug-in, and method for improving text composition by modifying character prominence according to assigned character information measures
US8706474B2 (en) * 2008-02-23 2014-04-22 Fair Isaac Corporation Translation of entity names based on source document publication date, and frequency and co-occurrence of the entity names
KR101496885B1 (ko) 2008-04-07 2015-02-27 삼성전자주식회사 문장 띄어쓰기 시스템 및 방법
US8224641B2 (en) * 2008-11-19 2012-07-17 Stratify, Inc. Language identification for documents containing multiple languages
JP4701292B2 (ja) * 2009-01-05 2011-06-15 インターナショナル・ビジネス・マシーンズ・コーポレーション テキスト・データに含まれる固有表現又は専門用語から用語辞書を作成するためのコンピュータ・システム、並びにその方法及びコンピュータ・プログラム
US8879846B2 (en) * 2009-02-10 2014-11-04 Kofax, Inc. Systems, methods and computer program products for processing financial documents
GB0905457D0 (en) 2009-03-30 2009-05-13 Touchtype Ltd System and method for inputting text into electronic devices
KR101638594B1 (ko) * 2010-05-26 2016-07-20 삼성전자주식회사 Dna 서열 검색 방법 및 장치
WO2012098544A2 (en) 2011-01-19 2012-07-26 Keyless Systems, Ltd. Improved data entry systems
US20120262461A1 (en) * 2011-02-17 2012-10-18 Conversive, Inc. System and Method for the Normalization of Text
WO2014042976A2 (en) * 2012-09-15 2014-03-20 Numbergun Llc, A Utah Limited Liability Company Flexible high-speed generation and formatting of application-specified strings
US20140136967A1 (en) 2012-11-09 2014-05-15 Research In Motion Limited Method of providing predictive text
US9811517B2 (en) 2013-01-29 2017-11-07 Tencent Technology (Shenzhen) Company Limited Method and system of adding punctuation and establishing language model using a punctuation weighting applied to chinese speech recognized text
US8943405B1 (en) 2013-11-27 2015-01-27 Google Inc. Assisted punctuation of character strings

Patent Citations (4)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US7080002B1 (en) * 1997-03-26 2006-07-18 Samsung Electronics Co., Ltd. Bi-lingual system and method for automatically converting one language into another language
US6374242B1 (en) * 1999-09-29 2002-04-16 Lockheed Martin Corporation Natural-language information processor with association searches limited within blocks
CN1851642A (zh) * 2005-09-09 2006-10-25 华为技术有限公司 一种接口数据文法分析处理系统及其分析处理方法
WO2012042217A1 (en) * 2010-09-29 2012-04-05 Touchtype Ltd. System and method for inputting text into electronic devices

Non-Patent Citations (3)

* Cited by examiner, † Cited by third party
Title
Multilingual Text Entry using Automatic Language Detection;Yo Ehara 等;《THIRD INTERNATIONAL JOINT CONFERENCE ON NATURAL LANGUAGE PROCESSING,IJCNLP 2008》;20080107;441-448 *
Non Breaking Spaces Before Punctuation In French;Apache OpenOffice Wiki;《https://wiki.openoffice.org/wiki/Non_Breaking_Spaces_Before_Punctuation_In_French_(espaces_ins%C3%A9cables)》;20120829;1-5 *
Russian Typographical Traditions in Mathematical Literature;Valentin Zaitsev;《Russian Typographical Traditions in Mathematical Literature》;19990920;1-17 *

Also Published As

Publication number Publication date
JP2015534171A (ja) 2015-11-26
US12182496B2 (en) 2024-12-31
US20150248379A1 (en) 2015-09-03
US20230252222A1 (en) 2023-08-10
EP2898426A1 (en) 2015-07-29
GB201216640D0 (en) 2012-10-31
CN104641367A (zh) 2015-05-20
JP6273285B2 (ja) 2018-01-31
WO2014045032A1 (en) 2014-03-27

Similar Documents

Publication Publication Date Title
JP5997217B2 (ja) 言語変換において複数の読み方の曖昧性を除去する方法
CN106537370B (zh) 在存在来源和翻译错误的情况下对命名实体鲁棒标记的方法和系统
CN103970798B (zh) 数据的搜索和匹配
JP6799562B2 (ja) 言語特徴の抽出装置、固有表現の抽出装置、抽出方法、及びプログラム
US20140039879A1 (en) Generic system for linguistic analysis and transformation
JP2013117978A (ja) タイピング効率向上のためのタイピング候補の生成方法
CN103678684A (zh) 一种基于导航信息检索的中文分词方法
KR102794379B1 (ko) 앙상블 스코어를 이용한 학습 데이터 교정 방법 및 그 장치
US12182496B2 (en) Formatting module, system and method for formatting an electronic character sequence
CN103049458A (zh) 一种修正用户词库的方法和系统
CN104485107A (zh) 名称的语音识别方法、语音识别系统和语音识别设备
Uthayamoorthy et al. Ddspell-a data driven spell checker and suggestion generator for the tamil language
JP5323652B2 (ja) 類似語決定方法およびシステム
Nunsanga et al. Part-of-speech tagging for Mizo language using conditional random field
CN114661917B (zh) 文本扩增方法、系统、计算机设备及可读存储介质
JP2007334534A (ja) 文字列入力装置、文字列入力方法、および、プログラム
Fahad et al. An Approach towards Implementation of Active and Passive voice using LL (1) Parsing
JP4088171B2 (ja) テキスト解析装置、方法、プログラム及びそのプログラムを記録した記録媒体
QasemiZadeh et al. Adaptive language independent spell checking using intelligent traverse on a tree
JP3737817B2 (ja) 表現変換方法及び表現変換装置
JP4478042B2 (ja) 頻度情報付き単語集合生成方法、プログラムおよびプログラム記憶媒体、ならびに、頻度情報付き単語集合生成装置、テキスト索引語作成装置、全文検索装置およびテキスト分類装置
Eutamene et al. Ontologies and Bigram-based Approach for Isolated Non-word Errors Correction in OCR System.
Manohar et al. Spellchecker for Malayalam using finite state transition models
JP5348699B2 (ja) データ分類システム、データ分類方法およびプログラム
TW452711B (en) Method using word affix for word search

Legal Events

Date Code Title Description
C06 Publication
PB01 Publication
EXSB Decision made by sipo to initiate substantive examination
SE01 Entry into force of request for substantive examination
GR01 Patent grant
GR01 Patent grant
TR01 Transfer of patent right
TR01 Transfer of patent right

Effective date of registration: 20200901

Address after: 98052 Microsoft Avenue, Redmond, Washington, USA

Patentee after: MICROSOFT TECHNOLOGY LICENSING, LLC

Address before: England Atsushi

Patentee before: TOUCHTYPE Ltd.