AU2001262852A1 - Method for segmentation of text - Google Patents

Method for segmentation of text

Info

Publication number
AU2001262852A1
AU2001262852A1 AU2001262852A AU6285201A AU2001262852A1 AU 2001262852 A1 AU2001262852 A1 AU 2001262852A1 AU 2001262852 A AU2001262852 A AU 2001262852A AU 6285201 A AU6285201 A AU 6285201A AU 2001262852 A1 AU2001262852 A1 AU 2001262852A1
Authority
AU
Australia
Prior art keywords
text elements
predetermined number
initial
consecutive
segmentation
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Abandoned
Application number
AU2001262852A
Inventor
Eva Ejerhed
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Hapax Information Systems AB
Original Assignee
Hapax Information Systems AB
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Hapax Information Systems AB filed Critical Hapax Information Systems AB
Publication of AU2001262852A1 publication Critical patent/AU2001262852A1/en
Abandoned legal-status Critical Current

Links

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F40/00Handling natural language data
    • G06F40/20Natural language analysis
    • G06F40/205Parsing
    • G06F40/211Syntactic parsing, e.g. based on context-free grammar [CFG] or unification grammars
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F40/00Handling natural language data
    • G06F40/20Natural language analysis
    • G06F40/279Recognition of textual entities
    • G06F40/284Lexical analysis, e.g. tokenisation or collocates
    • YGENERAL TAGGING OF NEW TECHNOLOGICAL DEVELOPMENTS; GENERAL TAGGING OF CROSS-SECTIONAL TECHNOLOGIES SPANNING OVER SEVERAL SECTIONS OF THE IPC; TECHNICAL SUBJECTS COVERED BY FORMER USPC CROSS-REFERENCE ART COLLECTIONS [XRACs] AND DIGESTS
    • Y10TECHNICAL SUBJECTS COVERED BY FORMER USPC
    • Y10STECHNICAL SUBJECTS COVERED BY FORMER USPC CROSS-REFERENCE ART COLLECTIONS [XRACs] AND DIGESTS
    • Y10S707/00Data processing: database and file management or data structures
    • Y10S707/99931Database or file accessing
    • Y10S707/99933Query processing, i.e. searching
    • YGENERAL TAGGING OF NEW TECHNOLOGICAL DEVELOPMENTS; GENERAL TAGGING OF CROSS-SECTIONAL TECHNOLOGIES SPANNING OVER SEVERAL SECTIONS OF THE IPC; TECHNICAL SUBJECTS COVERED BY FORMER USPC CROSS-REFERENCE ART COLLECTIONS [XRACs] AND DIGESTS
    • Y10TECHNICAL SUBJECTS COVERED BY FORMER USPC
    • Y10STECHNICAL SUBJECTS COVERED BY FORMER USPC CROSS-REFERENCE ART COLLECTIONS [XRACs] AND DIGESTS
    • Y10S707/00Data processing: database and file management or data structures
    • Y10S707/99931Database or file accessing
    • Y10S707/99933Query processing, i.e. searching
    • Y10S707/99934Query formulation, input preparation, or translation
    • YGENERAL TAGGING OF NEW TECHNOLOGICAL DEVELOPMENTS; GENERAL TAGGING OF CROSS-SECTIONAL TECHNOLOGIES SPANNING OVER SEVERAL SECTIONS OF THE IPC; TECHNICAL SUBJECTS COVERED BY FORMER USPC CROSS-REFERENCE ART COLLECTIONS [XRACs] AND DIGESTS
    • Y10TECHNICAL SUBJECTS COVERED BY FORMER USPC
    • Y10STECHNICAL SUBJECTS COVERED BY FORMER USPC CROSS-REFERENCE ART COLLECTIONS [XRACs] AND DIGESTS
    • Y10S707/00Data processing: database and file management or data structures
    • Y10S707/99931Database or file accessing
    • Y10S707/99933Query processing, i.e. searching
    • Y10S707/99935Query augmenting and refining, e.g. inexact access
    • YGENERAL TAGGING OF NEW TECHNOLOGICAL DEVELOPMENTS; GENERAL TAGGING OF CROSS-SECTIONAL TECHNOLOGIES SPANNING OVER SEVERAL SECTIONS OF THE IPC; TECHNICAL SUBJECTS COVERED BY FORMER USPC CROSS-REFERENCE ART COLLECTIONS [XRACs] AND DIGESTS
    • Y10TECHNICAL SUBJECTS COVERED BY FORMER USPC
    • Y10STECHNICAL SUBJECTS COVERED BY FORMER USPC CROSS-REFERENCE ART COLLECTIONS [XRACs] AND DIGESTS
    • Y10S707/00Data processing: database and file management or data structures
    • Y10S707/99931Database or file accessing
    • Y10S707/99933Query processing, i.e. searching
    • Y10S707/99936Pattern matching access

Abstract

A computerized method, and a corresponding apparatus, for segmentation of a stream of text elements comprising analyzed tokens into one or more initial clauses is disclosed. In the method, a predetermined number of consecutive text elements of said stream of text elements are scanned, starting from a given position. The predetermined number of consecutive text elements are compared with each pattern of a set of patterns for beginnings of initial clauses, and a beginning of an initial clause is identified in the predetermined number of consecutive text elements, if the predetermined number of consecutive text elements match one pattern of the set of patterns for beginnings of initial clauses. The given position is then moved at least one position forward and the scanning, comparison and identification is repeated.
AU2001262852A 2000-05-31 2001-05-30 Method for segmentation of text Abandoned AU2001262852A1 (en)

Applications Claiming Priority (3)

Application Number Priority Date Filing Date Title
SE0002034 2000-05-31
SE0002034A SE517005C2 (en) 2000-05-31 2000-05-31 Segmentation of text
PCT/SE2001/001204 WO2001093088A1 (en) 2000-05-31 2001-05-30 Method for segmentation of text

Publications (1)

Publication Number Publication Date
AU2001262852A1 true AU2001262852A1 (en) 2001-12-11

Family

ID=20279913

Family Applications (1)

Application Number Title Priority Date Filing Date
AU2001262852A Abandoned AU2001262852A1 (en) 2000-05-31 2001-05-30 Method for segmentation of text

Country Status (7)

Country Link
US (1) US6810375B1 (en)
EP (1) EP1305738B1 (en)
AT (1) ATE425500T1 (en)
AU (1) AU2001262852A1 (en)
DE (1) DE60137935D1 (en)
SE (1) SE517005C2 (en)
WO (1) WO2001093088A1 (en)

Families Citing this family (49)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US8190436B2 (en) * 2001-12-07 2012-05-29 At&T Intellectual Property Ii, L.P. System and method of spoken language understanding in human computer dialogs
US7493253B1 (en) 2002-07-12 2009-02-17 Language And Computing, Inc. Conceptual world representation natural language understanding system and method
US8818793B1 (en) 2002-12-24 2014-08-26 At&T Intellectual Property Ii, L.P. System and method of extracting clauses for spoken language understanding
US8849648B1 (en) * 2002-12-24 2014-09-30 At&T Intellectual Property Ii, L.P. System and method of extracting clauses for spoken language understanding
US7499913B2 (en) 2004-01-26 2009-03-03 International Business Machines Corporation Method for handling anchor text
US7424467B2 (en) 2004-01-26 2008-09-09 International Business Machines Corporation Architecture for an indexer with fixed width sort and variable width sort
US7293005B2 (en) 2004-01-26 2007-11-06 International Business Machines Corporation Pipelined architecture for global analysis and index building
US8296304B2 (en) 2004-01-26 2012-10-23 International Business Machines Corporation Method, system, and program for handling redirects in a search engine
US7461064B2 (en) 2004-09-24 2008-12-02 International Buiness Machines Corporation Method for searching documents for ranges of numeric values
US8051096B1 (en) 2004-09-30 2011-11-01 Google Inc. Methods and systems for augmenting a token lexicon
US7996208B2 (en) 2004-09-30 2011-08-09 Google Inc. Methods and systems for selecting a language for text segmentation
US7680648B2 (en) * 2004-09-30 2010-03-16 Google Inc. Methods and systems for improving text segmentation
US8843536B1 (en) 2004-12-31 2014-09-23 Google Inc. Methods and systems for providing relevant advertisements or other content for inactive uniform resource locators using search queries
US8417693B2 (en) 2005-07-14 2013-04-09 International Business Machines Corporation Enforcing native access control to indexed documents
US8073681B2 (en) 2006-10-16 2011-12-06 Voicebox Technologies, Inc. System and method for a cooperative conversational voice user interface
US8631005B2 (en) * 2006-12-28 2014-01-14 Ebay Inc. Header-token driven automatic text segmentation
US7818176B2 (en) 2007-02-06 2010-10-19 Voicebox Technologies, Inc. System and method for selecting and presenting advertisements based on natural language processing of voice-based input
JP5256654B2 (en) * 2007-06-29 2013-08-07 富士通株式会社 Sentence division program, sentence division apparatus, and sentence division method
US8260619B1 (en) 2008-08-22 2012-09-04 Convergys Cmg Utah, Inc. Method and system for creating natural language understanding grammars
US20090150141A1 (en) * 2007-12-07 2009-06-11 David Scott Wible Method and system for learning second or foreign languages
US8140335B2 (en) 2007-12-11 2012-03-20 Voicebox Technologies, Inc. System and method for providing a natural language voice user interface in an integrated voice navigation services environment
US9305548B2 (en) 2008-05-27 2016-04-05 Voicebox Technologies Corporation System and method for an integrated, multi-modal, multi-device natural language voice services environment
US8326637B2 (en) 2009-02-20 2012-12-04 Voicebox Technologies, Inc. System and method for processing multi-modal device interactions in a natural language voice services environment
EP2488963A1 (en) * 2009-10-15 2012-08-22 Rogers Communications Inc. System and method for phrase identification
KR101622111B1 (en) 2009-12-11 2016-05-18 삼성전자 주식회사 Dialog system and conversational method thereof
US8788260B2 (en) * 2010-05-11 2014-07-22 Microsoft Corporation Generating snippets based on content features
US8977538B2 (en) * 2010-09-13 2015-03-10 Richard Salisbury Constructing and analyzing a word graph
US8892550B2 (en) * 2010-09-24 2014-11-18 International Business Machines Corporation Source expansion for information retrieval and information extraction
RU2474870C1 (en) 2011-11-18 2013-02-10 Общество С Ограниченной Ответственностью "Центр Инноваций Натальи Касперской" Method for automated analysis of text documents
US9934218B2 (en) * 2011-12-05 2018-04-03 Infosys Limited Systems and methods for extracting attributes from text content
US9280520B2 (en) * 2012-08-02 2016-03-08 American Express Travel Related Services Company, Inc. Systems and methods for semantic information retrieval
US9607613B2 (en) 2014-04-23 2017-03-28 Google Inc. Speech endpointing based on word comparisons
CN107003996A (en) 2014-09-16 2017-08-01 声钰科技 VCommerce
WO2016044321A1 (en) 2014-09-16 2016-03-24 Min Tang Integration of domain information into state transitions of a finite state transducer for natural language processing
JP2016062264A (en) * 2014-09-17 2016-04-25 株式会社東芝 Interaction support apparatus, method, and program
WO2016061309A1 (en) 2014-10-15 2016-04-21 Voicebox Technologies Corporation System and method for providing follow-up responses to prior natural language inputs of a user
US10388270B2 (en) 2014-11-05 2019-08-20 At&T Intellectual Property I, L.P. System and method for text normalization using atomic tokens
US10614799B2 (en) 2014-11-26 2020-04-07 Voicebox Technologies Corporation System and method of providing intent predictions for an utterance prior to a system detection of an end of the utterance
US10431214B2 (en) 2014-11-26 2019-10-01 Voicebox Technologies Corporation System and method of determining a domain and/or an action related to a natural language input
US10324965B2 (en) * 2014-12-30 2019-06-18 International Business Machines Corporation Techniques for suggesting patterns in unstructured documents
US20170031893A1 (en) * 2015-07-30 2017-02-02 Pat Inc. Set-based Parsing for Computer-Implemented Linguistic Analysis
JP6898322B2 (en) * 2015-11-12 2021-07-07 マイクロソフト テクノロジー ライセンシング,エルエルシー Dialogue assistance
US10331784B2 (en) 2016-07-29 2019-06-25 Voicebox Technologies Corporation System and method of disambiguating natural language processing requests
US10402473B2 (en) * 2016-10-16 2019-09-03 Richard Salisbury Comparing, and generating revision markings with respect to, an arbitrary number of text segments
EP4083998A1 (en) 2017-06-06 2022-11-02 Google LLC End of query detection
US10929754B2 (en) 2017-06-06 2021-02-23 Google Llc Unified endpointer using multitask and multidomain learning
TWI709080B (en) * 2017-06-14 2020-11-01 雲拓科技有限公司 Claim structurally organizing device
US11003854B2 (en) * 2018-10-30 2021-05-11 International Business Machines Corporation Adjusting an operation of a system based on a modified lexical analysis model for a document
US11568153B2 (en) 2020-03-05 2023-01-31 Bank Of America Corporation Narrative evaluator

Family Cites Families (8)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US4887212A (en) 1986-10-29 1989-12-12 International Business Machines Corporation Parser for natural language text
US4864502A (en) 1987-10-07 1989-09-05 Houghton Mifflin Company Sentence analyzer
US5146405A (en) 1988-02-05 1992-09-08 At&T Bell Laboratories Methods for part-of-speech determination and usage
JP2764343B2 (en) 1990-09-07 1998-06-11 富士通株式会社 Clause / phrase boundary extraction method
CN1332340C (en) 1997-03-04 2007-08-15 石仓博 Language analysis system and method
JPH10254877A (en) 1997-03-14 1998-09-25 Omron Corp Style converter, word processor and style converting method
CN1114165C (en) * 1998-02-13 2003-07-09 微软公司 Segmentation of Chinese text into words
EP0962873A1 (en) 1998-06-02 1999-12-08 International Business Machines Corporation Processing of textual information and automated apprehension of information

Also Published As

Publication number Publication date
SE0002034D0 (en) 2000-05-31
SE0002034L (en) 2001-12-01
SE517005C2 (en) 2002-04-02
WO2001093088A1 (en) 2001-12-06
US6810375B1 (en) 2004-10-26
ATE425500T1 (en) 2009-03-15
EP1305738A1 (en) 2003-05-02
DE60137935D1 (en) 2009-04-23
EP1305738B1 (en) 2009-03-11

Similar Documents

Publication Publication Date Title
AU2001262852A1 (en) Method for segmentation of text
HK1042579A1 (en) Method and apparatus for recognizing tone languages using pitch information.
TW357313B (en) Methods and apparatus for handwriting recognition
EP0459746A3 (en) Pattern recognition method, and apparatus therefor
EP0807906A3 (en) Method and apparatus for discriminating and counting documents
GB2318439B (en) Device and method for representing handwriting, and an alphabet therefor
WO2002049253A3 (en) Method and interface for intelligent user-machine interaction
EP0632427A3 (en) Method and apparatus for inputting musical data.
CA2463230A1 (en) A method and apparatus for decoding handwritten characters
EP0924639A3 (en) Character string extraction apparatus and pattern extraction apparatus
CA2410057A1 (en) Apparatus and method for input of ideographic korean syllables from reduced keyboard
MXPA02002557A (en) Authentication using a digital watermark.
EP1396811A3 (en) Apparatus and method for reading and decoding optical codes with result indication
DE69230031D1 (en) Pattern recognition and authenticity testing, especially for handwritten signatures
AU2001271039A1 (en) Fingerprint collation apparatus, fingerprint collation method, and fingerprint collation program
EP0667567A3 (en) Apparatus and method for supporting the implicit structure of freeform lists, outlines, text, tables, and diagrams in a gesture-based input system and editing system.
EP0847041A3 (en) Method and apparatus for speech recognition performing noise adaptation
AU2002238961A1 (en) Information processing apparatus and method, and program
FR2711945B1 (en) Method for producing a security feature on a document, and document obtained.
IT1272259B (en) PROCEDURE AND APPARATUS FOR THE RECOGNITION OF CHARACTERS
EP0576020A3 (en) Character recognizing method and apparatus.
AU2002318487A1 (en) Apparatus, forming means and methods for deep drawing sheet material
BR9709010B1 (en) coin testing method and apparatus, use of a coin testing apparatus, and coin recognition process.
AU5253500A (en) Apparatus and method for inputting alphabet characters on small keypad
EP0652532A3 (en) Character recognition apparatus.