WO2010018453A3 - System and method for processing electronically generated text - Google Patents

System and method for processing electronically generated text Download PDF

Info

Publication number
WO2010018453A3
WO2010018453A3 PCT/IB2009/006552 IB2009006552W WO2010018453A3 WO 2010018453 A3 WO2010018453 A3 WO 2010018453A3 IB 2009006552 W IB2009006552 W IB 2009006552W WO 2010018453 A3 WO2010018453 A3 WO 2010018453A3
Authority
WO
Grant status
Application
Patent type
Prior art keywords
text string
sequence
category
text
processing means
Prior art date
Application number
PCT/IB2009/006552
Other languages
French (fr)
Other versions
WO2010018453A2 (en )
Inventor
Luis Ramos Dos Santos Lopes
Original Assignee
University Of Cape Town
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date

Links

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING; COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F17/00Digital computing or data processing equipment or methods, specially adapted for specific functions
    • G06F17/20Handling natural language data
    • G06F17/27Automatic analysis, e.g. parsing
    • G06F17/2785Semantic analysis
    • GPHYSICS
    • G10MUSICAL INSTRUMENTS; ACOUSTICS
    • G10LSPEECH ANALYSIS OR SYNTHESIS; SPEECH RECOGNITION; SPEECH OR VOICE PROCESSING; SPEECH OR AUDIO CODING OR DECODING
    • G10L15/00Speech recognition
    • G10L15/08Speech classification or search
    • G10L15/18Speech classification or search using natural language modelling
    • G10L15/183Speech classification or search using natural language modelling using context dependencies, e.g. language models
    • G10L15/19Grammatical context, e.g. disambiguation of the recognition hypotheses based on word sequence rules

Abstract

A method and system are disclosed for processing electronically generated text emanating from electronic text generating means such as a speech recognition engine (1 ) or optical character reader that output an initial text string (3). First processing means produce an intermediate text string and second processing means check the intermediate text string optionally as a sequence with one or more other successive intermediate text strings against a knowledge base (28, 32) in order to compare the meaning thereof to items in the knowledge base in an attempt to correct errors in semantics and produce an optionally final processed text string. The first processing means is adapted to apply the steps of categorising each word as belonging to one of a predetermined plurality of categories of parts of speech including nouns, verbs, and at least one other category (4), creating a category sequence corresponding to the text string and comparing the category sequence to a plurality of predetermined permissible sequences to thereby check the syntax of the initial text string (5). An initial text string having a category sequence not corresponding to a predetermined permissible sequence is treated in an effort to remedy the cause of non-correspondence (7).
PCT/IB2009/006552 2008-08-15 2009-08-14 System and method for processing electronically generated text WO2010018453A3 (en)

Priority Applications (2)

Application Number Priority Date Filing Date Title
ZA200807044 2008-08-15
ZA2008/0744 2008-08-15

Publications (2)

Publication Number Publication Date
WO2010018453A2 true WO2010018453A2 (en) 2010-02-18
WO2010018453A3 true true WO2010018453A3 (en) 2011-04-14

Family

ID=41682291

Family Applications (1)

Application Number Title Priority Date Filing Date
PCT/IB2009/006552 WO2010018453A3 (en) 2008-08-15 2009-08-14 System and method for processing electronically generated text

Country Status (1)

Country Link
WO (1) WO2010018453A3 (en)

Citations (4)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US4947438A (en) * 1987-07-11 1990-08-07 U.S. Philips Corporation Process for the recognition of a continuous flow of spoken words
US20020111803A1 (en) * 2000-12-20 2002-08-15 International Business Machines Corporation Method and system for semantic speech recognition
WO2004044888A1 (en) * 2002-11-13 2004-05-27 Schoenebeck Bernd Voice processing system, method for allocating acoustic and/or written character strings to words or lexical entries
US7383172B1 (en) * 2003-08-15 2008-06-03 Patrick William Jamieson Process and system for semantically recognizing, correcting, and suggesting domain specific speech

Patent Citations (4)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US4947438A (en) * 1987-07-11 1990-08-07 U.S. Philips Corporation Process for the recognition of a continuous flow of spoken words
US20020111803A1 (en) * 2000-12-20 2002-08-15 International Business Machines Corporation Method and system for semantic speech recognition
WO2004044888A1 (en) * 2002-11-13 2004-05-27 Schoenebeck Bernd Voice processing system, method for allocating acoustic and/or written character strings to words or lexical entries
US7383172B1 (en) * 2003-08-15 2008-06-03 Patrick William Jamieson Process and system for semantically recognizing, correcting, and suggesting domain specific speech

Also Published As

Publication number Publication date Type
WO2010018453A2 (en) 2010-02-18 application

Similar Documents

Publication Publication Date Title
Habash et al. Arabic preprocessing schemes for statistical machine translation
Chen et al. Unknown word extraction for Chinese documents
Ballesteros et al. Improved transition-based parsing by modeling characters instead of words with LSTMs
US7680649B2 (en) System, method, program product, and networking use for recognizing words and their parts of speech in one or more natural languages
Slimane et al. A new arabic printed text image database and evaluation protocols
US20090055168A1 (en) Word Detection
US20050187768A1 (en) Dynamic N-best algorithm to reduce recognition errors
US20040205737A1 (en) Fast linguistic parsing system
Oh et al. An English-Korean transliteration model using pronunciation and contextual rules
US20100250250A1 (en) Systems and methods for generating a hybrid text string from two or more text strings generated by multiple automated speech recognition systems
Dell’Orletta Ensemble system for Part-of-Speech tagging
Metze et al. The spoken web search task at MediaEval 2012
US20110078105A1 (en) Method for personalizing chat bots
US20060229864A1 (en) Method, device, and computer program product for multi-lingual speech recognition
Adriaans et al. The EMILE 4.1 grammar induction toolbox
US20090299724A1 (en) System and method for applying bridging models for robust and efficient speech to speech translation
Meral et al. Syntactic tools for text watermarking
US20060031070A1 (en) System and method for implementing a refined dictionary for speech recognition
El-Desoky et al. Investigating the use of morphological decomposition and diacritization for improving Arabic LVCSR
Vu et al. Multilingual bottle-neck features and its application for under-resourced languages
Yaman et al. An Integrative and Discriminative Technique for Spoken Utterance Classification.
Gao et al. An audio CAPTCHA to distinguish humans from computers
Jin et al. Combining cross-stream and time dimensions in phonetic speaker recognition
US20060241936A1 (en) Pronunciation specifying apparatus, pronunciation specifying method and recording medium
Wang et al. A multi-pass linear fold algorithm for sentence boundary detection using prosodic cues

Legal Events

Date Code Title Description
121 Ep: the epo has been informed by wipo that ep was designated in this application

Ref document number: 09806509

Country of ref document: EP

Kind code of ref document: A2

NENP Non-entry into the national phase in:

Ref country code: DE

122 Ep: pct app. not ent. europ. phase

Ref document number: 09806509

Country of ref document: EP

Kind code of ref document: A2