SE0002368L - Metod och system för informationsextrahering - Google Patents

Metod och system för informationsextrahering

Info

Publication number
SE0002368L
SE0002368L SE0002368A SE0002368A SE0002368L SE 0002368 L SE0002368 L SE 0002368L SE 0002368 A SE0002368 A SE 0002368A SE 0002368 A SE0002368 A SE 0002368A SE 0002368 L SE0002368 L SE 0002368L
Authority
SE
Sweden
Prior art keywords
natural language
analyzed
text corpus
variants
word tokens
Prior art date
Application number
SE0002368A
Other languages
English (en)
Other versions
SE517496C2 (sv
SE0002368D0 (sv
Inventor
Eva Ingegerd Ejerhed
Peter A Braroe
Original Assignee
Hapax Information Systems Ab
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Hapax Information Systems Ab filed Critical Hapax Information Systems Ab
Priority to SE0002368A priority Critical patent/SE517496C2/sv
Publication of SE0002368D0 publication Critical patent/SE0002368D0/sv
Priority to US09/599,563 priority patent/US6842730B1/en
Priority to PCT/SE2001/001409 priority patent/WO2001098946A1/en
Priority to AU2001266481A priority patent/AU2001266481A1/en
Priority to EP01944033A priority patent/EP1311983A1/en
Publication of SE0002368L publication Critical patent/SE0002368L/sv
Publication of SE517496C2 publication Critical patent/SE517496C2/sv
Priority to US11/032,075 priority patent/US7194406B2/en
Priority to US11/723,079 priority patent/US7657425B2/en

Links

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F16/00Information retrieval; Database structures therefor; File system structures therefor
    • G06F16/30Information retrieval; Database structures therefor; File system structures therefor of unstructured textual data
    • G06F16/33Querying
    • G06F16/3331Query processing
    • G06F16/3332Query translation
    • G06F16/3334Selection or weighting of terms from queries, including natural language queries
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F16/00Information retrieval; Database structures therefor; File system structures therefor
    • G06F16/30Information retrieval; Database structures therefor; File system structures therefor of unstructured textual data
    • G06F16/33Querying
    • G06F16/3331Query processing
    • G06F16/3332Query translation
    • G06F16/3335Syntactic pre-processing, e.g. stopword elimination, stemming
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F16/00Information retrieval; Database structures therefor; File system structures therefor
    • G06F16/30Information retrieval; Database structures therefor; File system structures therefor of unstructured textual data
    • G06F16/33Querying
    • G06F16/3331Query processing
    • G06F16/334Query execution
    • G06F16/3344Query execution using natural language analysis
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F40/00Handling natural language data
    • G06F40/20Natural language analysis
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F40/00Handling natural language data
    • G06F40/20Natural language analysis
    • G06F40/205Parsing
    • G06F40/211Syntactic parsing, e.g. based on context-free grammar [CFG] or unification grammars
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F40/00Handling natural language data
    • G06F40/20Natural language analysis
    • G06F40/253Grammatical analysis; Style critique
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F40/00Handling natural language data
    • G06F40/20Natural language analysis
    • G06F40/268Morphological analysis
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F40/00Handling natural language data
    • G06F40/30Semantic analysis
    • YGENERAL TAGGING OF NEW TECHNOLOGICAL DEVELOPMENTS; GENERAL TAGGING OF CROSS-SECTIONAL TECHNOLOGIES SPANNING OVER SEVERAL SECTIONS OF THE IPC; TECHNICAL SUBJECTS COVERED BY FORMER USPC CROSS-REFERENCE ART COLLECTIONS [XRACs] AND DIGESTS
    • Y10TECHNICAL SUBJECTS COVERED BY FORMER USPC
    • Y10STECHNICAL SUBJECTS COVERED BY FORMER USPC CROSS-REFERENCE ART COLLECTIONS [XRACs] AND DIGESTS
    • Y10S707/00Data processing: database and file management or data structures
    • Y10S707/99931Database or file accessing
    • Y10S707/99933Query processing, i.e. searching
    • YGENERAL TAGGING OF NEW TECHNOLOGICAL DEVELOPMENTS; GENERAL TAGGING OF CROSS-SECTIONAL TECHNOLOGIES SPANNING OVER SEVERAL SECTIONS OF THE IPC; TECHNICAL SUBJECTS COVERED BY FORMER USPC CROSS-REFERENCE ART COLLECTIONS [XRACs] AND DIGESTS
    • Y10TECHNICAL SUBJECTS COVERED BY FORMER USPC
    • Y10STECHNICAL SUBJECTS COVERED BY FORMER USPC CROSS-REFERENCE ART COLLECTIONS [XRACs] AND DIGESTS
    • Y10S707/00Data processing: database and file management or data structures
    • Y10S707/99931Database or file accessing
    • Y10S707/99933Query processing, i.e. searching
    • Y10S707/99934Query formulation, input preparation, or translation
    • YGENERAL TAGGING OF NEW TECHNOLOGICAL DEVELOPMENTS; GENERAL TAGGING OF CROSS-SECTIONAL TECHNOLOGIES SPANNING OVER SEVERAL SECTIONS OF THE IPC; TECHNICAL SUBJECTS COVERED BY FORMER USPC CROSS-REFERENCE ART COLLECTIONS [XRACs] AND DIGESTS
    • Y10TECHNICAL SUBJECTS COVERED BY FORMER USPC
    • Y10STECHNICAL SUBJECTS COVERED BY FORMER USPC CROSS-REFERENCE ART COLLECTIONS [XRACs] AND DIGESTS
    • Y10S707/00Data processing: database and file management or data structures
    • Y10S707/99931Database or file accessing
    • Y10S707/99933Query processing, i.e. searching
    • Y10S707/99935Query augmenting and refining, e.g. inexact access

Landscapes

  • Engineering & Computer Science (AREA)
  • Theoretical Computer Science (AREA)
  • Physics & Mathematics (AREA)
  • Computational Linguistics (AREA)
  • General Engineering & Computer Science (AREA)
  • General Physics & Mathematics (AREA)
  • Artificial Intelligence (AREA)
  • Audiology, Speech & Language Pathology (AREA)
  • General Health & Medical Sciences (AREA)
  • Health & Medical Sciences (AREA)
  • Data Mining & Analysis (AREA)
  • Databases & Information Systems (AREA)
  • Machine Translation (AREA)
  • Information Retrieval, Db Structures And Fs Structures Therefor (AREA)
SE0002368A 2000-06-22 2000-06-22 Metod och system för informationsextrahering SE517496C2 (sv)

Priority Applications (7)

Application Number Priority Date Filing Date Title
SE0002368A SE517496C2 (sv) 2000-06-22 2000-06-22 Metod och system för informationsextrahering
US09/599,563 US6842730B1 (en) 2000-06-22 2000-06-23 Method and system for information extraction
PCT/SE2001/001409 WO2001098946A1 (en) 2000-06-22 2001-06-20 Method and system for information extraction
AU2001266481A AU2001266481A1 (en) 2000-06-22 2001-06-20 Method and system for information extraction
EP01944033A EP1311983A1 (en) 2000-06-22 2001-06-20 Method and system for information extraction
US11/032,075 US7194406B2 (en) 2000-06-22 2005-01-11 Method and system for information extraction
US11/723,079 US7657425B2 (en) 2000-06-22 2007-03-16 Method and system for information extraction

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
SE0002368A SE517496C2 (sv) 2000-06-22 2000-06-22 Metod och system för informationsextrahering

Publications (3)

Publication Number Publication Date
SE0002368D0 SE0002368D0 (sv) 2000-06-22
SE0002368L true SE0002368L (sv) 2001-12-23
SE517496C2 SE517496C2 (sv) 2002-06-11

Family

ID=20280222

Family Applications (1)

Application Number Title Priority Date Filing Date
SE0002368A SE517496C2 (sv) 2000-06-22 2000-06-22 Metod och system för informationsextrahering

Country Status (5)

Country Link
US (3) US6842730B1 (sv)
EP (1) EP1311983A1 (sv)
AU (1) AU2001266481A1 (sv)
SE (1) SE517496C2 (sv)
WO (1) WO2001098946A1 (sv)

Families Citing this family (61)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US7254773B2 (en) 2000-12-29 2007-08-07 International Business Machines Corporation Automated spell analysis
US7831442B1 (en) * 2001-05-16 2010-11-09 Perot Systems Corporation System and method for minimizing edits for medical insurance claims processing
US7822621B1 (en) 2001-05-16 2010-10-26 Perot Systems Corporation Method of and system for populating knowledge bases using rule based systems and object-oriented software
US8380491B2 (en) * 2002-04-19 2013-02-19 Educational Testing Service System for rating constructed responses based on concepts and a model answer
US7266553B1 (en) * 2002-07-01 2007-09-04 Microsoft Corporation Content data indexing
US20040019478A1 (en) * 2002-07-29 2004-01-29 Electronic Data Systems Corporation Interactive natural language query processing system and method
US7499913B2 (en) 2004-01-26 2009-03-03 International Business Machines Corporation Method for handling anchor text
US8296304B2 (en) 2004-01-26 2012-10-23 International Business Machines Corporation Method, system, and program for handling redirects in a search engine
US7424467B2 (en) * 2004-01-26 2008-09-09 International Business Machines Corporation Architecture for an indexer with fixed width sort and variable width sort
US7293005B2 (en) 2004-01-26 2007-11-06 International Business Machines Corporation Pipelined architecture for global analysis and index building
US7461064B2 (en) 2004-09-24 2008-12-02 International Buiness Machines Corporation Method for searching documents for ranges of numeric values
US7877383B2 (en) * 2005-04-27 2011-01-25 Microsoft Corporation Ranking and accessing definitions of terms
US8417693B2 (en) 2005-07-14 2013-04-09 International Business Machines Corporation Enforcing native access control to indexed documents
US8209335B2 (en) * 2005-09-20 2012-06-26 International Business Machines Corporation Extracting informative phrases from unstructured text
US7895193B2 (en) * 2005-09-30 2011-02-22 Microsoft Corporation Arbitration of specialized content using search results
JP2007122509A (ja) * 2005-10-28 2007-05-17 Rozetta Corp 語句配列の自然度判定装置、方法及びプログラム
US7533089B2 (en) * 2006-06-27 2009-05-12 International Business Machines Corporation Hybrid approach for query recommendation in conversation systems
US10796093B2 (en) 2006-08-08 2020-10-06 Elastic Minds, Llc Automatic generation of statement-response sets from conversational text using natural language processing
WO2008061002A2 (en) * 2006-11-14 2008-05-22 Networked Insights, Inc. Method and system for automatically identifying users to participate in an electronic conversation
US20080154853A1 (en) * 2006-12-22 2008-06-26 International Business Machines Corporation English-language translation of exact interpretations of keyword queries
US20080168049A1 (en) * 2007-01-08 2008-07-10 Microsoft Corporation Automatic acquisition of a parallel corpus from a network
US8112402B2 (en) * 2007-02-26 2012-02-07 Microsoft Corporation Automatic disambiguation based on a reference resource
US8001138B2 (en) * 2007-04-11 2011-08-16 Microsoft Corporation Word relationship driven search
US8374844B2 (en) * 2007-06-22 2013-02-12 Xerox Corporation Hybrid system for named entity resolution
US20090019032A1 (en) * 2007-07-13 2009-01-15 Siemens Aktiengesellschaft Method and a system for semantic relation extraction
US8316036B2 (en) 2007-08-31 2012-11-20 Microsoft Corporation Checkpointing iterators during search
US8346756B2 (en) * 2007-08-31 2013-01-01 Microsoft Corporation Calculating valence of expressions within documents for searching a document index
US8639708B2 (en) * 2007-08-31 2014-01-28 Microsoft Corporation Fact-based indexing for natural language search
US8463593B2 (en) * 2007-08-31 2013-06-11 Microsoft Corporation Natural language hypernym weighting for word sense disambiguation
US8712758B2 (en) * 2007-08-31 2014-04-29 Microsoft Corporation Coreference resolution in an ambiguity-sensitive natural language processing system
US20090070322A1 (en) * 2007-08-31 2009-03-12 Powerset, Inc. Browsing knowledge on the basis of semantic relations
US8868562B2 (en) * 2007-08-31 2014-10-21 Microsoft Corporation Identification of semantic relationships within reported speech
US8280721B2 (en) * 2007-08-31 2012-10-02 Microsoft Corporation Efficiently representing word sense probabilities
US8229970B2 (en) * 2007-08-31 2012-07-24 Microsoft Corporation Efficient storage and retrieval of posting lists
US8229730B2 (en) * 2007-08-31 2012-07-24 Microsoft Corporation Indexing role hierarchies for words in a search index
US20090198488A1 (en) * 2008-02-05 2009-08-06 Eric Arno Vigen System and method for analyzing communications using multi-placement hierarchical structures
US7925743B2 (en) * 2008-02-29 2011-04-12 Networked Insights, Llc Method and system for qualifying user engagement with a website
US8224843B2 (en) * 2008-08-12 2012-07-17 Morphism Llc Collaborative, incremental specification of identities
US8135580B1 (en) * 2008-08-20 2012-03-13 Amazon Technologies, Inc. Multi-language relevance-based indexing and search
US8370128B2 (en) * 2008-09-30 2013-02-05 Xerox Corporation Semantically-driven extraction of relations between named entities
US8949265B2 (en) * 2009-03-05 2015-02-03 Ebay Inc. System and method to provide query linguistic service
US8843476B1 (en) * 2009-03-16 2014-09-23 Guangsheng Zhang System and methods for automated document topic discovery, browsable search and document categorization
US8073718B2 (en) 2009-05-29 2011-12-06 Hyperquest, Inc. Automation of auditing claims
US8255205B2 (en) 2009-05-29 2012-08-28 Hyperquest, Inc. Automation of auditing claims
US8346577B2 (en) * 2009-05-29 2013-01-01 Hyperquest, Inc. Automation of auditing claims
US8447632B2 (en) * 2009-05-29 2013-05-21 Hyperquest, Inc. Automation of auditing claims
US9836460B2 (en) * 2010-06-11 2017-12-05 Lexisnexis, A Division Of Reed Elsevier Inc. Systems and methods for analyzing patent-related documents
WO2012045492A1 (en) * 2010-10-07 2012-04-12 Dublin Institute Of Technology Content retrieval system
CN101950309A (zh) * 2010-10-08 2011-01-19 华中师范大学 一种面向学科领域的新专业词汇识别方法
US8498972B2 (en) * 2010-12-16 2013-07-30 Sap Ag String and sub-string searching using inverted indexes
US9244902B2 (en) * 2011-10-20 2016-01-26 Zynga, Inc. Localization framework for dynamic text
US10068024B2 (en) * 2012-02-01 2018-09-04 Sri International Method and apparatus for correlating and viewing disparate data
EP2856344A1 (de) * 2012-05-24 2015-04-08 IQser IP AG Erzeugung von anfragen an ein datenverarbeitendes system
US9298754B2 (en) * 2012-11-15 2016-03-29 Ecole Polytechnique Federale de Lausanne (EPFL) (027559) Query management system and engine allowing for efficient query execution on raw details
JP5882241B2 (ja) * 2013-01-08 2016-03-09 日本電信電話株式会社 質問応答用検索キーワード生成方法、装置、及びプログラム
US10073835B2 (en) * 2013-12-03 2018-09-11 International Business Machines Corporation Detecting literary elements in literature and their importance through semantic analysis and literary correlation
US9721004B2 (en) 2014-11-12 2017-08-01 International Business Machines Corporation Answering questions via a persona-based natural language processing (NLP) system
US10146751B1 (en) * 2014-12-31 2018-12-04 Guangsheng Zhang Methods for information extraction, search, and structured representation of text data
JP6447161B2 (ja) * 2015-01-20 2019-01-09 富士通株式会社 意味構造検索プログラム、意味構造検索装置、及び意味構造検索方法
US10289680B2 (en) * 2016-05-31 2019-05-14 Oath Inc. Real time parsing and suggestions from pre-generated corpus with hypernyms
WO2024075086A1 (en) * 2022-10-07 2024-04-11 Open Text Corporation System and method for hybrid multilingual search indexing

Family Cites Families (11)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US5309359A (en) 1990-08-16 1994-05-03 Boris Katz Method and apparatus for generating and utlizing annotations to facilitate computer text retrieval
JPH0756933A (ja) 1993-06-24 1995-03-03 Xerox Corp 文書検索方法
US5519608A (en) 1993-06-24 1996-05-21 Xerox Corporation Method for extracting from a text corpus answers to questions stated in natural language by using linguistic analysis and hypothesis generation
US5331556A (en) * 1993-06-28 1994-07-19 General Electric Company Method for natural language data processing using morphological and part-of-speech information
US5794050A (en) 1995-01-04 1998-08-11 Intelligent Text Processing, Inc. Natural language understanding system
US5963940A (en) 1995-08-16 1999-10-05 Syracuse University Natural language information retrieval system and method
EP0934569A2 (en) 1996-04-04 1999-08-11 Flair Technologies, Ltd. A system, software and method for locating information in a collection of text-based information sources
GB9713019D0 (en) 1997-06-20 1997-08-27 Xerox Corp Linguistic search system
CN100524294C (zh) 1997-07-22 2009-08-05 微软公司 使用自然语言处理技术用于处理文本输入的系统
US5933822A (en) 1997-07-22 1999-08-03 Microsoft Corporation Apparatus and methods for an information retrieval system that employs natural language processing of search results to improve overall precision
US6965857B1 (en) * 2000-06-02 2005-11-15 Cogilex Recherches & Developpement Inc. Method and apparatus for deriving information from written text

Also Published As

Publication number Publication date
US7657425B2 (en) 2010-02-02
EP1311983A1 (en) 2003-05-21
SE517496C2 (sv) 2002-06-11
US20050131886A1 (en) 2005-06-16
US6842730B1 (en) 2005-01-11
AU2001266481A1 (en) 2002-01-02
US7194406B2 (en) 2007-03-20
WO2001098946A1 (en) 2001-12-27
SE0002368D0 (sv) 2000-06-22
US20070168181A1 (en) 2007-07-19

Similar Documents

Publication Publication Date Title
SE0002368L (sv) Metod och system för informationsextrahering
WO2007008263A3 (en) Self-organized concept search and data storage method
Pettersson et al. A multilingual evaluation of three spelling normalisation methods for historical text
WO2005052727A3 (en) Extraction of facts from text
WO1997004405A1 (en) Method and apparatus for automated search and retrieval processing
WO2001042981A3 (en) Natural english language search and retrieval system and method
BR0312120A (pt) Método para inserir texto em um dispositivo eletrônico, e, dispositivo eletrônico
Sporleder et al. Idioms in Context: The IDIX Corpus.
Al-Shalabi et al. Proper noun extracting algorithm for arabic language
WO2005062202A3 (en) Knowledge management system with ontology based methods for knowledge extraction and knowledge search
Yusof et al. Qur'anic words stemming
Stamatatos et al. A practical chunker for unrestricted text
Klasinas et al. Web data harvesting for speech understanding grammar induction.
JPS6421624A (en) Japanese document retrieval system
Pecina Reference data for Czech collocation extraction
Tripathi Problems and prospects of Hindi language search and text processing
Curran et al. Transformation-based learning for automatic translation from HTML to XML
Erjavec Compiling and using the IJS-ELAN Parallel Corpus
Roy et al. A proposed Nepali Synset entry and extraction tool
Britto et al. Constructing a parsed corpus of historical Portuguese
Maucec et al. Topic detection for language model adaptation of highly-inflected languages by using a fuzzy comparison function.
Kraaij Exploring transitive translation methods
Paul et al. Pre-processing Steps on Bilingual Corpus for SMT.
Szymanek Structural ambiguity in English word-formation
Ihlström et al. Myths and reality of electronic commerce barriers for SMES?

Legal Events

Date Code Title Description
NUG Patent has lapsed