CN104641367B - 用于格式化电子字符序列的格式化模块、系统和方法 - Google Patents

用于格式化电子字符序列的格式化模块、系统和方法 Download PDF

Info

Publication number
CN104641367B
CN104641367B CN201380048564.6A CN201380048564A CN104641367B CN 104641367 B CN104641367 B CN 104641367B CN 201380048564 A CN201380048564 A CN 201380048564A CN 104641367 B CN104641367 B CN 104641367B
Authority
CN
China
Prior art keywords
sequence
language
rule
character
electronic
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Active
Application number
CN201380048564.6A
Other languages
English (en)
Chinese (zh)
Other versions
CN104641367A (zh
Inventor
本杰明·麦德洛克
大卫·马丁内斯·德尔·科拉尔
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Microsoft Technology Licensing LLC
Original Assignee
Touchtype Ltd
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Touchtype Ltd filed Critical Touchtype Ltd
Publication of CN104641367A publication Critical patent/CN104641367A/zh
Application granted granted Critical
Publication of CN104641367B publication Critical patent/CN104641367B/zh
Active legal-status Critical Current
Anticipated expiration legal-status Critical

Links

Classifications

    • GPHYSICS
    • G06COMPUTING OR CALCULATING; COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F40/00Handling natural language data
    • G06F40/10Text processing
    • G06F40/12Use of codes for handling textual entities
    • GPHYSICS
    • G06COMPUTING OR CALCULATING; COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F16/00Information retrieval; Database structures therefor; File system structures therefor
    • G06F16/20Information retrieval; Database structures therefor; File system structures therefor of structured data, e.g. relational data
    • G06F16/24Querying
    • G06F16/245Query processing
    • G06F16/2455Query execution
    • G06F16/24564Applying rules; Deductive queries
    • GPHYSICS
    • G06COMPUTING OR CALCULATING; COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F40/00Handling natural language data
    • G06F40/10Text processing
    • G06F40/12Use of codes for handling textual entities
    • G06F40/163Handling of whitespace
    • GPHYSICS
    • G06COMPUTING OR CALCULATING; COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F40/00Handling natural language data
    • G06F40/20Natural language analysis
    • G06F40/274Converting codes to words; Guess-ahead of partial word inputs
    • GPHYSICS
    • G06COMPUTING OR CALCULATING; COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F40/00Handling natural language data
    • G06F40/20Natural language analysis
    • G06F40/263Language identification

Landscapes

  • Engineering & Computer Science (AREA)
  • Theoretical Computer Science (AREA)
  • Physics & Mathematics (AREA)
  • Computational Linguistics (AREA)
  • General Engineering & Computer Science (AREA)
  • General Physics & Mathematics (AREA)
  • Audiology, Speech & Language Pathology (AREA)
  • General Health & Medical Sciences (AREA)
  • Health & Medical Sciences (AREA)
  • Artificial Intelligence (AREA)
  • Data Mining & Analysis (AREA)
  • Databases & Information Systems (AREA)
  • Machine Translation (AREA)
  • Document Processing Apparatus (AREA)
CN201380048564.6A 2012-09-18 2013-09-18 用于格式化电子字符序列的格式化模块、系统和方法 Active CN104641367B (zh)

Applications Claiming Priority (3)

Application Number Priority Date Filing Date Title
GB1216640.1 2012-09-18
GBGB1216640.1A GB201216640D0 (en) 2012-09-18 2012-09-18 Formatting module, system and method for formatting an electronic character sequence
PCT/GB2013/052443 WO2014045032A1 (en) 2012-09-18 2013-09-18 Formatting module, system and method for formatting an electronic character sequence

Publications (2)

Publication Number Publication Date
CN104641367A CN104641367A (zh) 2015-05-20
CN104641367B true CN104641367B (zh) 2019-01-11

Family

ID=47144444

Family Applications (1)

Application Number Title Priority Date Filing Date
CN201380048564.6A Active CN104641367B (zh) 2012-09-18 2013-09-18 用于格式化电子字符序列的格式化模块、系统和方法

Country Status (6)

Country Link
US (2) US20150248379A1 (https=)
EP (1) EP2898426A1 (https=)
JP (1) JP6273285B2 (https=)
CN (1) CN104641367B (https=)
GB (1) GB201216640D0 (https=)
WO (1) WO2014045032A1 (https=)

Families Citing this family (4)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US20150331916A1 (en) * 2013-02-06 2015-11-19 Hitachi, Ltd. Computer, data access management method and recording medium
CN106909296A (zh) 2016-06-07 2017-06-30 阿里巴巴集团控股有限公司 数据的提取方法、装置及终端设备
JP7566520B2 (ja) * 2020-07-17 2024-10-15 キヤノン株式会社 画像処理装置、方法、プログラム
JP7724676B2 (ja) * 2021-10-05 2025-08-18 株式会社日本総合研究所 情報処理方法、プログラム及び情報処理装置

Citations (4)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US6374242B1 (en) * 1999-09-29 2002-04-16 Lockheed Martin Corporation Natural-language information processor with association searches limited within blocks
US7080002B1 (en) * 1997-03-26 2006-07-18 Samsung Electronics Co., Ltd. Bi-lingual system and method for automatically converting one language into another language
CN1851642A (zh) * 2005-09-09 2006-10-25 华为技术有限公司 一种接口数据文法分析处理系统及其分析处理方法
WO2012042217A1 (en) * 2010-09-29 2012-04-05 Touchtype Ltd. System and method for inputting text into electronic devices

Family Cites Families (31)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US4247906A (en) 1978-11-13 1981-01-27 Wang Laboratories, Inc. Text editing system having flexible repetitive operation capability
US4783803A (en) * 1985-11-12 1988-11-08 Dragon Systems, Inc. Speech recognition apparatus and method
JPH0762841B2 (ja) 1986-06-27 1995-07-05 横河・ヒユ−レツト・パツカ−ド株式会社 文書清書装置
US5222225A (en) * 1988-10-07 1993-06-22 International Business Machines Corporation Apparatus for processing character string moves in a data processing system
US5062143A (en) * 1990-02-23 1991-10-29 Harris Corporation Trigram-based method of language identification
US5937420A (en) * 1996-07-23 1999-08-10 Adobe Systems Incorporated Pointsize-variable character spacing
US6513002B1 (en) * 1998-02-11 2003-01-28 International Business Machines Corporation Rule-based number formatter
US6529864B1 (en) * 1999-08-11 2003-03-04 Roedy-Black Publishing, Inc. Interactive connotative dictionary system
US20020123994A1 (en) 2000-04-26 2002-09-05 Yves Schabes System for fulfilling an information need using extended matching techniques
US20040078191A1 (en) * 2002-10-22 2004-04-22 Nokia Corporation Scalable neural network-based language identification from written text
US7580838B2 (en) * 2002-11-22 2009-08-25 Scansoft, Inc. Automatic insertion of non-verbalized punctuation
US8027832B2 (en) * 2005-02-11 2011-09-27 Microsoft Corporation Efficient language identification
US20060184878A1 (en) * 2005-02-11 2006-08-17 Microsoft Corporation Using a description language to provide a user interface presentation
JP4135950B2 (ja) * 2005-06-09 2008-08-20 インターナショナル・ビジネス・マシーンズ・コーポレーション アクセス管理装置、アクセス管理方法、およびプログラム
US7552045B2 (en) * 2006-12-18 2009-06-23 Nokia Corporation Method, apparatus and computer program product for providing flexible text based language identification
US8527262B2 (en) 2007-06-22 2013-09-03 International Business Machines Corporation Systems and methods for automatic semantic role labeling of high morphological text for natural language processing applications
US8783570B2 (en) * 2007-08-21 2014-07-22 Symbol Technologies, Inc. Reader with optical character recognition
US8306356B1 (en) * 2007-09-28 2012-11-06 Language Technologies, Inc. System, plug-in, and method for improving text composition by modifying character prominence according to assigned character information measures
US8706474B2 (en) * 2008-02-23 2014-04-22 Fair Isaac Corporation Translation of entity names based on source document publication date, and frequency and co-occurrence of the entity names
KR101496885B1 (ko) 2008-04-07 2015-02-27 삼성전자주식회사 문장 띄어쓰기 시스템 및 방법
US8224641B2 (en) * 2008-11-19 2012-07-17 Stratify, Inc. Language identification for documents containing multiple languages
JP4701292B2 (ja) * 2009-01-05 2011-06-15 インターナショナル・ビジネス・マシーンズ・コーポレーション テキスト・データに含まれる固有表現又は専門用語から用語辞書を作成するためのコンピュータ・システム、並びにその方法及びコンピュータ・プログラム
US8879846B2 (en) * 2009-02-10 2014-11-04 Kofax, Inc. Systems, methods and computer program products for processing financial documents
GB0905457D0 (en) 2009-03-30 2009-05-13 Touchtype Ltd System and method for inputting text into electronic devices
KR101638594B1 (ko) * 2010-05-26 2016-07-20 삼성전자주식회사 Dna 서열 검색 방법 및 장치
WO2012098544A2 (en) 2011-01-19 2012-07-26 Keyless Systems, Ltd. Improved data entry systems
US20120262461A1 (en) * 2011-02-17 2012-10-18 Conversive, Inc. System and Method for the Normalization of Text
WO2014042976A2 (en) * 2012-09-15 2014-03-20 Numbergun Llc, A Utah Limited Liability Company Flexible high-speed generation and formatting of application-specified strings
US20140136967A1 (en) 2012-11-09 2014-05-15 Research In Motion Limited Method of providing predictive text
US9811517B2 (en) 2013-01-29 2017-11-07 Tencent Technology (Shenzhen) Company Limited Method and system of adding punctuation and establishing language model using a punctuation weighting applied to chinese speech recognized text
US8943405B1 (en) 2013-11-27 2015-01-27 Google Inc. Assisted punctuation of character strings

Patent Citations (4)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US7080002B1 (en) * 1997-03-26 2006-07-18 Samsung Electronics Co., Ltd. Bi-lingual system and method for automatically converting one language into another language
US6374242B1 (en) * 1999-09-29 2002-04-16 Lockheed Martin Corporation Natural-language information processor with association searches limited within blocks
CN1851642A (zh) * 2005-09-09 2006-10-25 华为技术有限公司 一种接口数据文法分析处理系统及其分析处理方法
WO2012042217A1 (en) * 2010-09-29 2012-04-05 Touchtype Ltd. System and method for inputting text into electronic devices

Non-Patent Citations (4)

* Cited by examiner, † Cited by third party
Title
Apache OpenOffice Wiki.Non Breaking Spaces Before Punctuation In French.《https://wiki.openoffice.org/wiki/Non_Breaking_Spaces_Before_Punctuation_In_French_(espaces_ins%C3%A9cables)》.2012,1-5. *
Multilingual Text Entry using Automatic Language Detection;Yo Ehara 等;《THIRD INTERNATIONAL JOINT CONFERENCE ON NATURAL LANGUAGE PROCESSING,IJCNLP 2008》;20080107;441-448 *
Non Breaking Spaces Before Punctuation In French;Apache OpenOffice Wiki;《https://wiki.openoffice.org/wiki/Non_Breaking_Spaces_Before_Punctuation_In_French_(espaces_ins%C3%A9cables)》;20120829;1-5 *
Russian Typographical Traditions in Mathematical Literature;Valentin Zaitsev;《Russian Typographical Traditions in Mathematical Literature》;19990920;1-17 *

Also Published As

Publication number Publication date
US20230252222A1 (en) 2023-08-10
US12182496B2 (en) 2024-12-31
GB201216640D0 (en) 2012-10-31
EP2898426A1 (en) 2015-07-29
JP6273285B2 (ja) 2018-01-31
CN104641367A (zh) 2015-05-20
JP2015534171A (ja) 2015-11-26
WO2014045032A1 (en) 2014-03-27
US20150248379A1 (en) 2015-09-03

Similar Documents

Publication Publication Date Title
JP5997217B2 (ja) 言語変換において複数の読み方の曖昧性を除去する方法
CN103970798B (zh) 数据的搜索和匹配
CN114036930A (zh) 文本纠错方法、装置、设备及计算机可读介质
JP6799562B2 (ja) 言語特徴の抽出装置、固有表現の抽出装置、抽出方法、及びプログラム
CN110321432A (zh) 文本事件信息提取方法、电子装置和非易失性存储介质
CN103678684A (zh) 一种基于导航信息检索的中文分词方法
CN114298010B (zh) 一种融合双语言模型和句子检测的文本生成方法
US12182496B2 (en) Formatting module, system and method for formatting an electronic character sequence
US12437150B2 (en) System and method of performing data training on morpheme processing rules
CN104485107A (zh) 名称的语音识别方法、语音识别系统和语音识别设备
Uthayamoorthy et al. Ddspell-a data driven spell checker and suggestion generator for the tamil language
JP2002117027A (ja) 感情情報抽出方法および感情情報抽出プログラムの記録媒体
JP2007087397A (ja) 形態素解析プログラム、補正プログラム、形態素解析装置、補正装置、形態素解析方法および補正方法
JP2007334534A (ja) 文字列入力装置、文字列入力方法、および、プログラム
JP4088171B2 (ja) テキスト解析装置、方法、プログラム及びそのプログラムを記録した記録媒体
JP3825645B2 (ja) 表現変換方法及び表現変換装置
QasemiZadeh et al. Adaptive language independent spell checking using intelligent traverse on a tree
JP4478042B2 (ja) 頻度情報付き単語集合生成方法、プログラムおよびプログラム記憶媒体、ならびに、頻度情報付き単語集合生成装置、テキスト索引語作成装置、全文検索装置およびテキスト分類装置
JP3737817B2 (ja) 表現変換方法及び表現変換装置
KR100998291B1 (ko) 키워드 스트링을 구조화하고 검출하는 방법 및 장치
TW452711B (en) Method using word affix for word search
JP4765107B2 (ja) 文字列入力装置、および、プログラム
CN121350092A (zh) 一种结构化数据解析方法及装置
Song Optimisation of intelligent English grammar error correction based on multi-strategy Pinyin detection and hierarchical enhancement
Zaghal et al. Arabic morphological analyzer with text to voice

Legal Events

Date Code Title Description
C06 Publication
PB01 Publication
EXSB Decision made by sipo to initiate substantive examination
SE01 Entry into force of request for substantive examination
GR01 Patent grant
GR01 Patent grant
TR01 Transfer of patent right

Effective date of registration: 20200901

Address after: 98052 Microsoft Avenue, Redmond, Washington, USA

Patentee after: MICROSOFT TECHNOLOGY LICENSING, LLC

Address before: England Atsushi

Patentee before: TOUCHTYPE Ltd.

TR01 Transfer of patent right