New! Search for patents from more than 100 countries including Australia, Brazil, Sweden and more

WO2010018453A3 - System and method for processing electronically generated text - Google Patents

System and method for processing electronically generated text Download PDF

Info

Publication number
WO2010018453A3
WO2010018453A3 PCT/IB2009/006552 IB2009006552W WO2010018453A3 WO 2010018453 A3 WO2010018453 A3 WO 2010018453A3 IB 2009006552 W IB2009006552 W IB 2009006552W WO 2010018453 A3 WO2010018453 A3 WO 2010018453A3
Authority
WO
WIPO (PCT)
Prior art keywords
text string
sequence
category
text
processing means
Prior art date
Application number
PCT/IB2009/006552
Other languages
French (fr)
Other versions
WO2010018453A2 (en
Inventor
Luis Ramos Dos Santos Lopes
Original Assignee
University Of Cape Town
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Priority to ZA200807044 priority Critical
Priority to ZA2008/0744 priority
Application filed by University Of Cape Town filed Critical University Of Cape Town
Publication of WO2010018453A2 publication Critical patent/WO2010018453A2/en
Publication of WO2010018453A3 publication Critical patent/WO2010018453A3/en

Links

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING; COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F17/00Digital computing or data processing equipment or methods, specially adapted for specific functions
    • G06F17/20Handling natural language data
    • G06F17/27Automatic analysis, e.g. parsing
    • G06F17/2785Semantic analysis
    • GPHYSICS
    • G10MUSICAL INSTRUMENTS; ACOUSTICS
    • G10LSPEECH ANALYSIS OR SYNTHESIS; SPEECH RECOGNITION; SPEECH OR VOICE PROCESSING; SPEECH OR AUDIO CODING OR DECODING
    • G10L15/00Speech recognition
    • G10L15/08Speech classification or search
    • G10L15/18Speech classification or search using natural language modelling
    • G10L15/183Speech classification or search using natural language modelling using context dependencies, e.g. language models
    • G10L15/19Grammatical context, e.g. disambiguation of the recognition hypotheses based on word sequence rules

Abstract

A method and system are disclosed for processing electronically generated text emanating from electronic text generating means such as a speech recognition engine (1 ) or optical character reader that output an initial text string (3). First processing means produce an intermediate text string and second processing means check the intermediate text string optionally as a sequence with one or more other successive intermediate text strings against a knowledge base (28, 32) in order to compare the meaning thereof to items in the knowledge base in an attempt to correct errors in semantics and produce an optionally final processed text string. The first processing means is adapted to apply the steps of categorising each word as belonging to one of a predetermined plurality of categories of parts of speech including nouns, verbs, and at least one other category (4), creating a category sequence corresponding to the text string and comparing the category sequence to a plurality of predetermined permissible sequences to thereby check the syntax of the initial text string (5). An initial text string having a category sequence not corresponding to a predetermined permissible sequence is treated in an effort to remedy the cause of non-correspondence (7).
PCT/IB2009/006552 2008-08-15 2009-08-14 System and method for processing electronically generated text WO2010018453A2 (en)

Priority Applications (2)

Application Number Priority Date Filing Date Title
ZA200807044 2008-08-15
ZA2008/0744 2008-08-15

Publications (2)

Publication Number Publication Date
WO2010018453A2 WO2010018453A2 (en) 2010-02-18
WO2010018453A3 true WO2010018453A3 (en) 2011-04-14

Family

ID=41682291

Family Applications (1)

Application Number Title Priority Date Filing Date
PCT/IB2009/006552 WO2010018453A2 (en) 2008-08-15 2009-08-14 System and method for processing electronically generated text

Country Status (1)

Country Link
WO (1) WO2010018453A2 (en)

Citations (4)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US4947438A (en) * 1987-07-11 1990-08-07 U.S. Philips Corporation Process for the recognition of a continuous flow of spoken words
US20020111803A1 (en) * 2000-12-20 2002-08-15 International Business Machines Corporation Method and system for semantic speech recognition
WO2004044888A1 (en) * 2002-11-13 2004-05-27 Schoenebeck Bernd Voice processing system, method for allocating acoustic and/or written character strings to words or lexical entries
US7383172B1 (en) * 2003-08-15 2008-06-03 Patrick William Jamieson Process and system for semantically recognizing, correcting, and suggesting domain specific speech

Patent Citations (4)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US4947438A (en) * 1987-07-11 1990-08-07 U.S. Philips Corporation Process for the recognition of a continuous flow of spoken words
US20020111803A1 (en) * 2000-12-20 2002-08-15 International Business Machines Corporation Method and system for semantic speech recognition
WO2004044888A1 (en) * 2002-11-13 2004-05-27 Schoenebeck Bernd Voice processing system, method for allocating acoustic and/or written character strings to words or lexical entries
US7383172B1 (en) * 2003-08-15 2008-06-03 Patrick William Jamieson Process and system for semantically recognizing, correcting, and suggesting domain specific speech

Also Published As

Publication number Publication date
WO2010018453A2 (en) 2010-02-18

Similar Documents

Publication Publication Date Title
Chen et al. Unknown word extraction for Chinese documents
Reynolds et al. The SuperSID project: Exploiting high-level information for high-accuracy speaker recognition
US20100217581A1 (en) Multi-Mode Input Method Editor
US7930168B2 (en) Natural language processing of disfluent sentences
Frost Orthographic systems and skilled word recognition processes in reading.
Rousseau et al. Enhancing the TED-LIUM Corpus with Selected Data for Language Modeling and More TED Talks.
Oh et al. An English-Korean transliteration model using pronunciation and contextual rules
Liu et al. Comparing HMM, maximum entropy, and conditional random fields for disfluency detection
Shaalan et al. Person name entity recognition for Arabic
Kondrak Identifying cognates by phonetic and semantic similarity
US20110078105A1 (en) Method for personalizing chat bots
US20060129380A1 (en) System and method for disambiguating non diacritized arabic words in a text
US8515731B1 (en) Synonym verification
Adriaans et al. The EMILE 4.1 grammar induction toolbox
US20090299724A1 (en) System and method for applying bridging models for robust and efficient speech to speech translation
TW498264B (en) Input correction method and system for Chinese characters
Sarma et al. Context-based speech recognition error detection and correction
US20060031070A1 (en) System and method for implementing a refined dictionary for speech recognition
Vu et al. Multilingual bottle-neck features and its application for under-resourced languages
Rastrow et al. Towards using hybrid word and fragment units for vocabulary independent LVCSR systems
Jin et al. Combining cross-stream and time dimensions in phonetic speaker recognition
Wang et al. A multi-pass linear fold algorithm for sentence boundary detection using prosodic cues
US20110196668A1 (en) Integrated Language Model, Related Systems and Methods
Mrva et al. A PLSA-based language model for conversational telephone speech
Johnson et al. Reranking the berkeley and brown parsers

Legal Events

Date Code Title Description
121 Ep: the epo has been informed by wipo that ep was designated in this application

Ref document number: 09806509

Country of ref document: EP

Kind code of ref document: A2

NENP Non-entry into the national phase in:

Ref country code: DE

122 Ep: pct app. not ent. europ. phase

Ref document number: 09806509

Country of ref document: EP

Kind code of ref document: A2