EA200000035A1 - Способ (варианты) и система классификации текстов - Google Patents

Способ (варианты) и система классификации текстов

Info

Publication number
EA200000035A1
EA200000035A1 EA200000035A EA200000035A EA200000035A1 EA 200000035 A1 EA200000035 A1 EA 200000035A1 EA 200000035 A EA200000035 A EA 200000035A EA 200000035 A EA200000035 A EA 200000035A EA 200000035 A1 EA200000035 A1 EA 200000035A1
Authority
EA
Eurasian Patent Office
Prior art keywords
terms
cluster
knowledge base
documents
significance
Prior art date
Application number
EA200000035A
Other languages
English (en)
Inventor
Максим Жиляев
Original Assignee
Зэ Дайалог Корпорейшн
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Зэ Дайалог Корпорейшн filed Critical Зэ Дайалог Корпорейшн
Publication of EA200000035A1 publication Critical patent/EA200000035A1/ru

Links

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F16/00Information retrieval; Database structures therefor; File system structures therefor
    • G06F16/30Information retrieval; Database structures therefor; File system structures therefor of unstructured textual data
    • G06F16/35Clustering; Classification
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F16/00Information retrieval; Database structures therefor; File system structures therefor
    • G06F16/30Information retrieval; Database structures therefor; File system structures therefor of unstructured textual data
    • G06F16/35Clustering; Classification
    • G06F16/353Clustering; Classification into predefined classes

Landscapes

  • Engineering & Computer Science (AREA)
  • Theoretical Computer Science (AREA)
  • Data Mining & Analysis (AREA)
  • Databases & Information Systems (AREA)
  • Physics & Mathematics (AREA)
  • General Engineering & Computer Science (AREA)
  • General Physics & Mathematics (AREA)
  • Information Retrieval, Db Structures And Fs Structures Therefor (AREA)

Abstract

Документы классифицируются в один или более кластеров (С) в соответствии с заранее заданными классификационными категориями путём построения базы знаний, содержащей матрицы векторов, которые показывают значимость терминов в массиве (Т) текста, образованном документами и классифицированном в базе знаний в каждый кластер (С). Значимость терминов определяется путём предположения о стандартном нормальном распределении вероятности, и термины определяются как значимые в кластере, если их вероятность появления, зависящая от случайности, низка. Для каждого кластера вырабатываются статистические характеристики, содержащие суммы взвешенных произведений и пересечений терминов кластера с терминами массива (Т), и используются в качестве различителей для классификации документов. База знаний строится с использованием префиксных и суффиксных лексических правил, которые чувствительны к контексту и применяются выборочно для улучшения точности и чёткости классификации.Международная заявка была опубликована вместе с отчетом о международном поиске.
EA200000035A 1997-06-16 1998-06-16 Способ (варианты) и система классификации текстов EA200000035A1 (ru)

Applications Claiming Priority (2)

Application Number Priority Date Filing Date Title
US08/876,271 US6137911A (en) 1997-06-16 1997-06-16 Test classification system and method
PCT/US1998/012604 WO1998058344A1 (en) 1997-06-16 1998-06-16 Text classification system and method

Publications (1)

Publication Number Publication Date
EA200000035A1 true EA200000035A1 (ru) 2000-08-28

Family

ID=25367325

Family Applications (1)

Application Number Title Priority Date Filing Date
EA200000035A EA200000035A1 (ru) 1997-06-16 1998-06-16 Способ (варианты) и система классификации текстов

Country Status (6)

Country Link
US (1) US6137911A (ru)
EP (1) EP0996927A4 (ru)
AU (1) AU760495B2 (ru)
EA (1) EA200000035A1 (ru)
NZ (1) NZ502332A (ru)
WO (1) WO1998058344A1 (ru)

Families Citing this family (304)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US7082426B2 (en) * 1993-06-18 2006-07-25 Cnet Networks, Inc. Content aggregation method and apparatus for an on-line product catalog
US5822720A (en) 1994-02-16 1998-10-13 Sentius Corporation System amd method for linking streams of multimedia data for reference material for display
US6470307B1 (en) * 1997-06-23 2002-10-22 National Research Council Of Canada Method and apparatus for automatically identifying keywords within a document
US6562077B2 (en) * 1997-11-14 2003-05-13 Xerox Corporation Sorting image segments into clusters based on a distance measurement
GB2334115A (en) * 1998-01-30 1999-08-11 Sharp Kk Processing text eg for approximate translation
US7043426B2 (en) * 1998-04-01 2006-05-09 Cyberpulse, L.L.C. Structured speech recognition
US6507678B2 (en) * 1998-06-19 2003-01-14 Fujitsu Limited Apparatus and method for retrieving character string based on classification of character
JP3665480B2 (ja) * 1998-06-24 2005-06-29 富士通株式会社 文書整理装置および方法
US6144958A (en) 1998-07-15 2000-11-07 Amazon.Com, Inc. System and method for correcting spelling errors in search queries
US6915254B1 (en) * 1998-07-30 2005-07-05 A-Life Medical, Inc. Automatically assigning medical codes using natural language processing
US7039856B2 (en) * 1998-09-30 2006-05-02 Ricoh Co., Ltd. Automatic document classification using text and images
GB9821787D0 (en) * 1998-10-06 1998-12-02 Data Limited Apparatus for classifying or processing data
US6212532B1 (en) * 1998-10-22 2001-04-03 International Business Machines Corporation Text categorization toolkit
JP2000132560A (ja) * 1998-10-23 2000-05-12 Matsushita Electric Ind Co Ltd 中国語テレテキスト処理方法及び装置
US6493711B1 (en) 1999-05-05 2002-12-10 H5 Technologies, Inc. Wide-spectrum information search engine
EP1212699A4 (en) * 1999-05-05 2006-01-11 West Publishing Co SYSTEM, METHOD AND SOFTWARE FOR CLASSIFYING DOCUMENTS
US6490548B1 (en) * 1999-05-14 2002-12-03 Paterra, Inc. Multilingual electronic transfer dictionary containing topical codes and method of use
DE19960081A1 (de) * 1999-06-09 2000-12-14 Grateach Gmbh Suchmaschine
US6592627B1 (en) * 1999-06-10 2003-07-15 International Business Machines Corporation System and method for organizing repositories of semi-structured documents such as email
US6897866B1 (en) * 1999-07-30 2005-05-24 Battelle Memorial Institute Method and apparatus for entity relationship visualization
US6397215B1 (en) * 1999-10-29 2002-05-28 International Business Machines Corporation Method and system for automatic comparison of text classifications
US6424971B1 (en) * 1999-10-29 2002-07-23 International Business Machines Corporation System and method for interactive classification and analysis of data
US6546387B1 (en) * 1999-11-15 2003-04-08 Transcom Software Inc. Computer network information management system and method using intelligent software agents
US6411962B1 (en) * 1999-11-29 2002-06-25 Xerox Corporation Systems and methods for organizing text
US6668256B1 (en) * 2000-01-19 2003-12-23 Autonomy Corporation Ltd Algorithm for automatic selection of discriminant term combinations for document categorization
GB2362238A (en) * 2000-05-12 2001-11-14 Applied Psychology Res Ltd Automatic text classification
US7770102B1 (en) 2000-06-06 2010-08-03 Microsoft Corporation Method and system for semantically labeling strings and providing actions based on semantically labeled strings
US7716163B2 (en) * 2000-06-06 2010-05-11 Microsoft Corporation Method and system for defining semantic categories and actions
US7712024B2 (en) 2000-06-06 2010-05-04 Microsoft Corporation Application program interfaces for semantically labeling strings and providing actions based on semantically labeled strings
US7421645B2 (en) * 2000-06-06 2008-09-02 Microsoft Corporation Method and system for providing electronic commerce actions based on semantically labeled strings
US7788602B2 (en) 2000-06-06 2010-08-31 Microsoft Corporation Method and system for providing restricted actions for recognized semantic categories
US6757692B1 (en) * 2000-06-09 2004-06-29 Northrop Grumman Corporation Systems and methods for structured vocabulary search and classification
US6718325B1 (en) * 2000-06-14 2004-04-06 Sun Microsystems, Inc. Approximate string matcher for delimited strings
US6678692B1 (en) * 2000-07-10 2004-01-13 Northrop Grumman Corporation Hierarchy statistical analysis system and method
US6810376B1 (en) * 2000-07-11 2004-10-26 Nusuara Technologies Sdn Bhd System and methods for determining semantic similarity of sentences
JP2002041544A (ja) * 2000-07-25 2002-02-08 Toshiba Corp テキスト情報分析装置
US6990496B1 (en) * 2000-07-26 2006-01-24 Koninklijke Philips Electronics N.V. System and method for automated classification of text by time slicing
US7503000B1 (en) * 2000-07-31 2009-03-10 International Business Machines Corporation Method for generation of an N-word phrase dictionary from a text corpus
US6823331B1 (en) * 2000-08-28 2004-11-23 Entrust Limited Concept identification system and method for use in reducing and/or representing text content of an electronic document
AUPR033800A0 (en) 2000-09-25 2000-10-19 Telstra R & D Management Pty Ltd A document categorisation system
US7200606B2 (en) * 2000-11-07 2007-04-03 The Regents Of The University Of California Method and system for selecting documents by measuring document quality
US6640228B1 (en) 2000-11-10 2003-10-28 Verizon Laboratories Inc. Method for detecting incorrectly categorized data
US6798912B2 (en) * 2000-12-18 2004-09-28 Koninklijke Philips Electronics N.V. Apparatus and method of program classification based on syntax of transcript information
US6820081B1 (en) 2001-03-19 2004-11-16 Attenex Corporation System and method for evaluating a structured message store for message redundancy
US7032174B2 (en) * 2001-03-27 2006-04-18 Microsoft Corporation Automatically adding proper names to a database
US20020174429A1 (en) * 2001-03-29 2002-11-21 Srinivas Gutta Methods and apparatus for generating recommendation scores
US7120646B2 (en) * 2001-04-09 2006-10-10 Health Language, Inc. Method and system for interfacing with a multi-level data structure
US7778816B2 (en) 2001-04-24 2010-08-17 Microsoft Corporation Method and system for applying input mode bias
US6823323B2 (en) 2001-04-26 2004-11-23 Hewlett-Packard Development Company, L.P. Automatic classification method and apparatus
US7542961B2 (en) * 2001-05-02 2009-06-02 Victor Gogolak Method and system for analyzing drug adverse effects
US6778994B2 (en) 2001-05-02 2004-08-17 Victor Gogolak Pharmacovigilance database
US7925612B2 (en) * 2001-05-02 2011-04-12 Victor Gogolak Method for graphically depicting drug adverse effect risks
US6789091B2 (en) 2001-05-02 2004-09-07 Victor Gogolak Method and system for web-based analysis of drug adverse effects
US7107254B1 (en) * 2001-05-07 2006-09-12 Microsoft Corporation Probablistic models and methods for combining multiple content classifiers
US7831442B1 (en) 2001-05-16 2010-11-09 Perot Systems Corporation System and method for minimizing edits for medical insurance claims processing
US7236940B2 (en) * 2001-05-16 2007-06-26 Perot Systems Corporation Method and system for assessing and planning business operations utilizing rule-based statistical modeling
US7822621B1 (en) 2001-05-16 2010-10-26 Perot Systems Corporation Method of and system for populating knowledge bases using rule based systems and object-oriented software
US7272594B1 (en) 2001-05-31 2007-09-18 Autonomy Corporation Ltd. Method and apparatus to link to a related document
US7043492B1 (en) 2001-07-05 2006-05-09 Requisite Technology, Inc. Automated classification of items using classification mappings
US7010515B2 (en) * 2001-07-12 2006-03-07 Matsushita Electric Industrial Co., Ltd. Text comparison apparatus
US7216088B1 (en) 2001-07-26 2007-05-08 Perot Systems Corporation System and method for managing a project based on team member interdependency and impact relationships
US7130861B2 (en) 2001-08-16 2006-10-31 Sentius International Corporation Automated creation and delivery of database content
US6804670B2 (en) * 2001-08-22 2004-10-12 International Business Machines Corporation Method for automatically finding frequently asked questions in a helpdesk data set
WO2003019321A2 (en) * 2001-08-27 2003-03-06 E-Base Ltd. Methodology for constructing and optimizing a self-populating directory
US7461006B2 (en) * 2001-08-29 2008-12-02 Victor Gogolak Method and system for the analysis and association of patient-specific and population-based genomic data with drug safety adverse event data
US20030046297A1 (en) * 2001-08-30 2003-03-06 Kana Software, Inc. System and method for a partially self-training learning system
US6978274B1 (en) 2001-08-31 2005-12-20 Attenex Corporation System and method for dynamically evaluating latent concepts in unstructured documents
US6888548B1 (en) * 2001-08-31 2005-05-03 Attenex Corporation System and method for generating a visualized data representation preserving independent variable geometric relationships
US6778995B1 (en) 2001-08-31 2004-08-17 Attenex Corporation System and method for efficiently generating cluster groupings in a multi-dimensional concept space
US6915009B2 (en) * 2001-09-07 2005-07-05 Fuji Xerox Co., Ltd. Systems and methods for the automatic segmentation and clustering of ordered information
US20030064354A1 (en) * 2001-09-28 2003-04-03 Lewis Daniel M. System and method for linking content standards, curriculum, instructions and assessment
US7139755B2 (en) 2001-11-06 2006-11-21 Thomson Scientific Inc. Method and apparatus for providing comprehensive search results in response to user queries entered over a computer network
US7313531B2 (en) 2001-11-29 2007-12-25 Perot Systems Corporation Method and system for quantitatively assessing project risk and effectiveness
AUPR958901A0 (en) * 2001-12-18 2002-01-24 Telstra New Wave Pty Ltd Information resource taxonomy
US20030154181A1 (en) * 2002-01-25 2003-08-14 Nec Usa, Inc. Document clustering with cluster refinement and model selection capabilities
US7271804B2 (en) * 2002-02-25 2007-09-18 Attenex Corporation System and method for arranging concept clusters in thematic relationships in a two-dimensional visual display area
JP4142881B2 (ja) * 2002-03-07 2008-09-03 富士通株式会社 文書類似度算出装置、クラスタリング装置および文書抽出装置
US7673234B2 (en) * 2002-03-11 2010-03-02 The Boeing Company Knowledge management using text classification
US7693830B2 (en) 2005-08-10 2010-04-06 Google Inc. Programmable search engine
US7716199B2 (en) 2005-08-10 2010-05-11 Google Inc. Aggregating context data for programmable search engines
US7743045B2 (en) 2005-08-10 2010-06-22 Google Inc. Detecting spam related and biased contexts for programmable search engines
WO2003085551A1 (en) * 2002-04-05 2003-10-16 Hyperwave Software Forschungs- Und Entwicklungs Gmbh Data visualization system
US20030232317A1 (en) * 2002-04-22 2003-12-18 Patz Richard J. Method of presenting an assessment
US7325194B2 (en) * 2002-05-07 2008-01-29 Microsoft Corporation Method, system, and apparatus for converting numbers between measurement systems based upon semantically labeled strings
US7707496B1 (en) 2002-05-09 2010-04-27 Microsoft Corporation Method, system, and apparatus for converting dates between calendars and languages based upon semantically labeled strings
US20030214523A1 (en) * 2002-05-16 2003-11-20 Kuansan Wang Method and apparatus for decoding ambiguous input using anti-entities
US7742048B1 (en) 2002-05-23 2010-06-22 Microsoft Corporation Method, system, and apparatus for converting numbers based upon semantically labeled strings
US7707024B2 (en) 2002-05-23 2010-04-27 Microsoft Corporation Method, system, and apparatus for converting currency values based upon semantically labeled strings
NL1020670C2 (nl) * 2002-05-24 2003-11-25 Oce Tech Bv Het bepalen van een semantische afbeelding.
US7281245B2 (en) 2002-06-05 2007-10-09 Microsoft Corporation Mechanism for downloading software components from a remote source for use by a local software application
US7827546B1 (en) 2002-06-05 2010-11-02 Microsoft Corporation Mechanism for downloading software components from a remote source for use by a local software application
US7356537B2 (en) 2002-06-06 2008-04-08 Microsoft Corporation Providing contextually sensitive tools and help content in computer-generated documents
US7003522B1 (en) 2002-06-24 2006-02-21 Microsoft Corporation System and method for incorporating smart tags in online content
US7716676B2 (en) 2002-06-25 2010-05-11 Microsoft Corporation System and method for issuing a message to a program
US7392479B2 (en) 2002-06-27 2008-06-24 Microsoft Corporation System and method for providing namespace related information
US7209915B1 (en) 2002-06-28 2007-04-24 Microsoft Corporation Method, system and apparatus for routing a query to one or more providers
JP2004139553A (ja) * 2002-08-19 2004-05-13 Matsushita Electric Ind Co Ltd 文書検索システムおよび質問応答システム
US8128414B1 (en) 2002-08-20 2012-03-06 Ctb/Mcgraw-Hill System and method for the development of instructional and testing materials
US7383258B2 (en) * 2002-10-03 2008-06-03 Google, Inc. Method and apparatus for characterizing documents based on clusters of related words
US7231393B1 (en) 2003-09-30 2007-06-12 Google, Inc. Method and apparatus for learning a probabilistic generative model for text
US8825681B2 (en) * 2002-12-18 2014-09-02 International Business Machines Corporation Method, system and program product for transmitting electronic communications using automatically formed contact groups
US7783614B2 (en) 2003-02-13 2010-08-24 Microsoft Corporation Linking elements of a document to corresponding fields, queries and/or procedures in a database
WO2004075015A2 (en) * 2003-02-14 2004-09-02 Ctb/Mcgraw-Hill System and method for creating, assessing, modifying, and using a learning map
US7711550B1 (en) 2003-04-29 2010-05-04 Microsoft Corporation Methods and system for recognizing names in a computer-generated document and for providing helpful actions associated with recognized names
US7395256B2 (en) * 2003-06-20 2008-07-01 Agency For Science, Technology And Research Method and platform for term extraction from large collection of documents
US7739588B2 (en) 2003-06-27 2010-06-15 Microsoft Corporation Leveraging markup language data for semantically labeling text strings and data and for providing actions based on semantically labeled text strings and data
US7610313B2 (en) * 2003-07-25 2009-10-27 Attenex Corporation System and method for performing efficient document scoring and clustering
US20050222989A1 (en) * 2003-09-30 2005-10-06 Taher Haveliwala Results based personalization of advertisements in a search engine
US8321278B2 (en) * 2003-09-30 2012-11-27 Google Inc. Targeted advertisements based on user profiles and page profile
JP2005115628A (ja) * 2003-10-07 2005-04-28 Hewlett-Packard Development Co Lp 定型表現を用いた文書分類装置・方法・プログラム
US20050131935A1 (en) * 2003-11-18 2005-06-16 O'leary Paul J. Sector content mining system using a modular knowledge base
US7404195B1 (en) 2003-12-09 2008-07-22 Microsoft Corporation Programmable object model for extensible markup language markup in an application
US7487515B1 (en) 2003-12-09 2009-02-03 Microsoft Corporation Programmable object model for extensible markup language schema validation
US7434157B2 (en) 2003-12-09 2008-10-07 Microsoft Corporation Programmable object model for namespace or schema library support in a software application
US7178102B1 (en) 2003-12-09 2007-02-13 Microsoft Corporation Representing latent data in an extensible markup language document
NZ548445A (en) * 2003-12-31 2009-05-31 Thomson Reuters Glo Resources Systems, methods, interfaces and software for extending search results beyond initial query-defined boundaries
US7287012B2 (en) * 2004-01-09 2007-10-23 Microsoft Corporation Machine-learned approach to determining document relevance for search over large electronic collections of documents
GB2411014A (en) * 2004-02-11 2005-08-17 Autonomy Corp Ltd Automatic searching for relevant information
US7191175B2 (en) 2004-02-13 2007-03-13 Attenex Corporation System and method for arranging concept clusters in thematic neighborhood relationships in a two-dimensional visual display space
US7509573B1 (en) 2004-02-17 2009-03-24 Microsoft Corporation Anti-virus security information in an extensible markup language document
US7716223B2 (en) 2004-03-29 2010-05-11 Google Inc. Variable personalization of search results in a search engine
JP4634736B2 (ja) * 2004-04-22 2011-02-16 ヒューレット−パッカード デベロップメント カンパニー エル.ピー. 専門的記述と非専門的記述間の語彙変換方法・プログラム・システム
US7980855B1 (en) 2004-05-21 2011-07-19 Ctb/Mcgraw-Hill Student reporting systems and methods
US7565630B1 (en) 2004-06-15 2009-07-21 Google Inc. Customization of search results for search queries received from third party sites
US7275052B2 (en) * 2004-08-20 2007-09-25 Sap Ag Combined classification based on examples, queries, and keywords
US7340672B2 (en) * 2004-09-20 2008-03-04 Intel Corporation Providing data integrity for data streams
WO2006036150A1 (en) 2004-09-28 2006-04-06 Nielsen Media Research, Inc Data classification methods and apparatus for use with data fusion
EP1817709A2 (en) * 2004-10-25 2007-08-15 Prosanos Corporation Method, system, and software for analyzing pharmacovigilance data
EP1826682A1 (en) * 2004-11-12 2007-08-29 JustSystems Corporation Document managing device and document managing method
US7356777B2 (en) 2005-01-26 2008-04-08 Attenex Corporation System and method for providing a dynamic user interface for a dense three-dimensional scene
US7404151B2 (en) * 2005-01-26 2008-07-22 Attenex Corporation System and method for providing a dynamic user interface for a dense three-dimensional scene
US7680773B1 (en) * 2005-03-31 2010-03-16 Google Inc. System for automatically managing duplicate documents when crawling dynamic documents
US7657521B2 (en) * 2005-04-15 2010-02-02 General Electric Company System and method for parsing medical data
US7644055B2 (en) * 2005-05-02 2010-01-05 Sap, Ag Rule-based database object matching with comparison certainty
US7548917B2 (en) * 2005-05-06 2009-06-16 Nelson Information Systems, Inc. Database and index organization for enhanced document retrieval
US7516130B2 (en) * 2005-05-09 2009-04-07 Trend Micro, Inc. Matching engine with signature generation
CN100470544C (zh) * 2005-05-24 2009-03-18 国际商业机器公司 用于链接文档的方法、设备和系统
WO2006133252A2 (en) * 2005-06-08 2006-12-14 The Regents Of The University Of California Doubly ranked information retrieval and area search
US7788590B2 (en) 2005-09-26 2010-08-31 Microsoft Corporation Lightweight reference user interface
US7992085B2 (en) 2005-09-26 2011-08-02 Microsoft Corporation Lightweight reference user interface
US10607355B2 (en) 2005-10-26 2020-03-31 Cortica, Ltd. Method and system for determining the dimensions of an object shown in a multimedia content item
US11032017B2 (en) 2005-10-26 2021-06-08 Cortica, Ltd. System and method for identifying the context of multimedia content elements
US9529984B2 (en) 2005-10-26 2016-12-27 Cortica, Ltd. System and method for verification of user identification based on multimedia content elements
US10360253B2 (en) 2005-10-26 2019-07-23 Cortica, Ltd. Systems and methods for generation of searchable structures respective of multimedia data content
US11620327B2 (en) 2005-10-26 2023-04-04 Cortica Ltd System and method for determining a contextual insight and generating an interface with recommendations based thereon
US10848590B2 (en) 2005-10-26 2020-11-24 Cortica Ltd System and method for determining a contextual insight and providing recommendations based thereon
US9477658B2 (en) 2005-10-26 2016-10-25 Cortica, Ltd. Systems and method for speech to speech translation using cores of a natural liquid architecture system
US10621988B2 (en) 2005-10-26 2020-04-14 Cortica Ltd System and method for speech to text translation using cores of a natural liquid architecture system
US10380623B2 (en) 2005-10-26 2019-08-13 Cortica, Ltd. System and method for generating an advertisement effectiveness performance score
US9646005B2 (en) 2005-10-26 2017-05-09 Cortica, Ltd. System and method for creating a database of multimedia content elements assigned to users
US9747420B2 (en) 2005-10-26 2017-08-29 Cortica, Ltd. System and method for diagnosing a patient based on an analysis of multimedia content
US8266185B2 (en) 2005-10-26 2012-09-11 Cortica Ltd. System and methods thereof for generation of searchable structures respective of multimedia data content
US11003706B2 (en) 2005-10-26 2021-05-11 Cortica Ltd System and methods for determining access permissions on personalized clusters of multimedia content elements
US10380164B2 (en) 2005-10-26 2019-08-13 Cortica, Ltd. System and method for using on-image gestures and multimedia content elements as search queries
US10691642B2 (en) 2005-10-26 2020-06-23 Cortica Ltd System and method for enriching a concept database with homogenous concepts
US10585934B2 (en) 2005-10-26 2020-03-10 Cortica Ltd. Method and system for populating a concept database with respect to user identifiers
US11386139B2 (en) 2005-10-26 2022-07-12 Cortica Ltd. System and method for generating analytics for entities depicted in multimedia content
US10614626B2 (en) 2005-10-26 2020-04-07 Cortica Ltd. System and method for providing augmented reality challenges
US8312031B2 (en) 2005-10-26 2012-11-13 Cortica Ltd. System and method for generation of complex signatures for multimedia data content
US10372746B2 (en) 2005-10-26 2019-08-06 Cortica, Ltd. System and method for searching applications using multimedia content elements
US11361014B2 (en) 2005-10-26 2022-06-14 Cortica Ltd. System and method for completing a user profile
US10949773B2 (en) 2005-10-26 2021-03-16 Cortica, Ltd. System and methods thereof for recommending tags for multimedia content elements based on context
US8818916B2 (en) 2005-10-26 2014-08-26 Cortica, Ltd. System and method for linking multimedia data elements to web pages
US9218606B2 (en) 2005-10-26 2015-12-22 Cortica, Ltd. System and method for brand monitoring and trend analysis based on deep-content-classification
US11019161B2 (en) 2005-10-26 2021-05-25 Cortica, Ltd. System and method for profiling users interest based on multimedia content analysis
US10635640B2 (en) 2005-10-26 2020-04-28 Cortica, Ltd. System and method for enriching a concept database
US11403336B2 (en) 2005-10-26 2022-08-02 Cortica Ltd. System and method for removing contextually identical multimedia content elements
US10191976B2 (en) 2005-10-26 2019-01-29 Cortica, Ltd. System and method of detecting common patterns within unstructured data elements retrieved from big data sources
US10535192B2 (en) 2005-10-26 2020-01-14 Cortica Ltd. System and method for generating a customized augmented reality environment to a user
US9953032B2 (en) 2005-10-26 2018-04-24 Cortica, Ltd. System and method for characterization of multimedia content signals using cores of a natural liquid architecture system
US9031999B2 (en) 2005-10-26 2015-05-12 Cortica, Ltd. System and methods for generation of a concept based database
US11604847B2 (en) 2005-10-26 2023-03-14 Cortica Ltd. System and method for overlaying content on a multimedia content element based on user interest
US10193990B2 (en) 2005-10-26 2019-01-29 Cortica Ltd. System and method for creating user profiles based on multimedia content
US9191626B2 (en) 2005-10-26 2015-11-17 Cortica, Ltd. System and methods thereof for visual analysis of an image on a web-page and matching an advertisement thereto
US10380267B2 (en) 2005-10-26 2019-08-13 Cortica, Ltd. System and method for tagging multimedia content elements
US10698939B2 (en) 2005-10-26 2020-06-30 Cortica Ltd System and method for customizing images
US10776585B2 (en) 2005-10-26 2020-09-15 Cortica, Ltd. System and method for recognizing characters in multimedia content
US10180942B2 (en) 2005-10-26 2019-01-15 Cortica Ltd. System and method for generation of concept structures based on sub-concepts
US10387914B2 (en) 2005-10-26 2019-08-20 Cortica, Ltd. Method for identification of multimedia content elements and adding advertising content respective thereof
US8326775B2 (en) 2005-10-26 2012-12-04 Cortica Ltd. Signature generation for multimedia deep-content-classification by a large-scale matching system and method thereof
US11216498B2 (en) 2005-10-26 2022-01-04 Cortica, Ltd. System and method for generating signatures to three-dimensional multimedia data elements
US9372940B2 (en) 2005-10-26 2016-06-21 Cortica, Ltd. Apparatus and method for determining user attention using a deep-content-classification (DCC) system
US9384196B2 (en) 2005-10-26 2016-07-05 Cortica, Ltd. Signature generation for multimedia deep-content-classification by a large-scale matching system and method thereof
US10742340B2 (en) 2005-10-26 2020-08-11 Cortica Ltd. System and method for identifying the context of multimedia content elements displayed in a web-page and providing contextual filters respective thereto
US9767143B2 (en) 2005-10-26 2017-09-19 Cortica, Ltd. System and method for caching of concept structures
US7676463B2 (en) * 2005-11-15 2010-03-09 Kroll Ontrack, Inc. Information exploration systems and method
US7756855B2 (en) * 2006-10-11 2010-07-13 Collarity, Inc. Search phrase refinement by search term replacement
US8429184B2 (en) * 2005-12-05 2013-04-23 Collarity Inc. Generation of refinement terms for search queries
US8903810B2 (en) * 2005-12-05 2014-12-02 Collarity, Inc. Techniques for ranking search results
EP1960952A4 (en) 2005-12-06 2011-05-11 Ingenix Inc ANALYSIS OF ADMINISTRATIVE DATA OF REQUESTS FOR CARE AND OTHER SOURCES OF DATA
US20080086356A1 (en) * 2005-12-09 2008-04-10 Steve Glassman Determining advertisements using user interest information and map-based location information
US20070143298A1 (en) * 2005-12-16 2007-06-21 Microsoft Corporation Browsing items related to email
US7813919B2 (en) * 2005-12-20 2010-10-12 Xerox Corporation Class description generation for clustering and categorization
US7657506B2 (en) * 2006-01-03 2010-02-02 Microsoft International Holdings B.V. Methods and apparatus for automated matching and classification of data
US7949538B2 (en) * 2006-03-14 2011-05-24 A-Life Medical, Inc. Automated interpretation of clinical encounters with cultural cues
US7529719B2 (en) * 2006-03-17 2009-05-05 Microsoft Corporation Document characterization using a tensor space model
US8731954B2 (en) * 2006-03-27 2014-05-20 A-Life Medical, Llc Auditing the coding and abstracting of documents
US8204213B2 (en) * 2006-03-29 2012-06-19 International Business Machines Corporation System and method for performing a similarity measure of anonymized data
EP1883040A1 (en) * 2006-07-28 2008-01-30 IEE International Electronics & Engineering S.A.R.L. Pattern classification method
GB2441598A (en) * 2006-09-07 2008-03-12 Fujin Technology Plc Categorisation of Data using Structural Analysis
US20080077579A1 (en) * 2006-09-22 2008-03-27 Cuneyt Ozveren Classification For Peer-To-Peer Collaboration
US20080086368A1 (en) * 2006-10-05 2008-04-10 Google Inc. Location Based, Content Targeted Online Advertising
US8442972B2 (en) 2006-10-11 2013-05-14 Collarity, Inc. Negative associations for search results ranking and refinement
US10733326B2 (en) 2006-10-26 2020-08-04 Cortica Ltd. System and method for identification of inappropriate multimedia content
US9208174B1 (en) * 2006-11-20 2015-12-08 Disney Enterprises, Inc. Non-language-based object search
JP4467583B2 (ja) * 2007-01-17 2010-05-26 富士通株式会社 設計支援プログラム、設計支援方法および設計支援装置
US7877371B1 (en) 2007-02-07 2011-01-25 Google Inc. Selectively deleting clusters of conceptually related words from a generative model for text
US7890521B1 (en) * 2007-02-07 2011-02-15 Google Inc. Document-based synonym generation
US9507858B1 (en) 2007-02-28 2016-11-29 Google Inc. Selectively merging clusters of conceptually related words in a generative model for text
US7873640B2 (en) * 2007-03-27 2011-01-18 Adobe Systems Incorporated Semantic analysis documents to rank terms
US7908552B2 (en) 2007-04-13 2011-03-15 A-Life Medical Inc. Mere-parsing with boundary and semantic driven scoping
US8682823B2 (en) * 2007-04-13 2014-03-25 A-Life Medical, Llc Multi-magnitudinal vectors with resolution based on source vector features
US8180725B1 (en) 2007-08-01 2012-05-15 Google Inc. Method and apparatus for selecting links to include in a probabilistic generative model for text
US9946846B2 (en) * 2007-08-03 2018-04-17 A-Life Medical, Llc Visualizing the documentation and coding of surgical procedures
US20090132522A1 (en) * 2007-10-18 2009-05-21 Sami Leino Systems and methods for organizing innovation documents
US20090192784A1 (en) * 2008-01-24 2009-07-30 International Business Machines Corporation Systems and methods for analyzing electronic documents to discover noncompliance with established norms
US7930306B2 (en) * 2008-04-30 2011-04-19 Msc Intellectual Properties B.V. System and method for near and exact de-duplication of documents
US8438178B2 (en) 2008-06-26 2013-05-07 Collarity Inc. Interactions among online digital identities
US8527522B2 (en) * 2008-09-05 2013-09-03 Ramp Holdings, Inc. Confidence links between name entities in disparate documents
US9092517B2 (en) * 2008-09-23 2015-07-28 Microsoft Technology Licensing, Llc Generating synonyms based on query log data
US8341095B2 (en) * 2009-01-12 2012-12-25 Nec Laboratories America, Inc. Supervised semantic indexing and its extensions
US8290961B2 (en) * 2009-01-13 2012-10-16 Sandia Corporation Technique for information retrieval using enhanced latent semantic analysis generating rank approximation matrix by factorizing the weighted morpheme-by-document matrix
US8166051B1 (en) * 2009-02-03 2012-04-24 Sandia Corporation Computation of term dominance in text documents
US8713007B1 (en) * 2009-03-13 2014-04-29 Google Inc. Classifying documents using multiple classifiers
US20100293179A1 (en) * 2009-05-14 2010-11-18 Microsoft Corporation Identifying synonyms of entities using web search
US8533203B2 (en) * 2009-06-04 2013-09-10 Microsoft Corporation Identifying synonyms of entities using a document collection
US8515957B2 (en) 2009-07-28 2013-08-20 Fti Consulting, Inc. System and method for displaying relationships between electronically stored information to provide classification suggestions via injection
US8612446B2 (en) 2009-08-24 2013-12-17 Fti Consulting, Inc. System and method for generating a reference set for use during document review
US8972436B2 (en) * 2009-10-28 2015-03-03 Yahoo! Inc. Translation model and method for matching reviews to objects
WO2011056086A2 (en) * 2009-11-05 2011-05-12 Google Inc. Statistical stemming
US8875038B2 (en) 2010-01-19 2014-10-28 Collarity, Inc. Anchoring for content synchronization
US8392175B2 (en) * 2010-02-01 2013-03-05 Stratify, Inc. Phrase-based document clustering with automatic phrase extraction
CN102141978A (zh) 2010-02-02 2011-08-03 阿里巴巴集团控股有限公司 一种文本分类的方法及系统
US9600566B2 (en) 2010-05-14 2017-03-21 Microsoft Technology Licensing, Llc Identifying entity synonyms
US8874581B2 (en) * 2010-07-29 2014-10-28 Microsoft Corporation Employing topic models for semantic class mining
JP5403696B2 (ja) * 2010-10-12 2014-01-29 株式会社Nec情報システムズ 言語モデル生成装置、その方法及びそのプログラム
US8818927B2 (en) * 2011-06-09 2014-08-26 Gfk Holding Inc. Method for generating rules and parameters for assessing relevance of information derived from internet traffic
CN102298632B (zh) * 2011-09-06 2014-10-29 神华集团有限责任公司 字符串相似度计算方法及装置以及物资分类方法及装置
US8862880B2 (en) 2011-09-23 2014-10-14 Gfk Holding Inc. Two-stage anonymization of mobile network subscriber personal information
US8745019B2 (en) 2012-03-05 2014-06-03 Microsoft Corporation Robust discovery of entity synonyms using query logs
US10032131B2 (en) 2012-06-20 2018-07-24 Microsoft Technology Licensing, Llc Data services for enterprises leveraging search system data assets
US9594831B2 (en) 2012-06-22 2017-03-14 Microsoft Technology Licensing, Llc Targeted disambiguation of named entities
US9171069B2 (en) * 2012-07-31 2015-10-27 Freedom Solutions Group, Llc Method and apparatus for analyzing a document
US9229924B2 (en) 2012-08-24 2016-01-05 Microsoft Technology Licensing, Llc Word detection and domain dictionary recommendation
US9323767B2 (en) 2012-10-01 2016-04-26 Longsand Limited Performance and scalability in an intelligent data operating layer system
US9256836B2 (en) 2012-10-31 2016-02-09 Open Text Corporation Reconfigurable model for auto-classification system and method
IL224482B (en) * 2013-01-29 2018-08-30 Verint Systems Ltd System and method for keyword spotting using representative dictionary
US9256687B2 (en) 2013-06-28 2016-02-09 International Business Machines Corporation Augmenting search results with interactive search matrix
WO2015035193A1 (en) 2013-09-05 2015-03-12 A-Life Medical, Llc Automated clinical indicator recognition with natural language processing
US10133727B2 (en) 2013-10-01 2018-11-20 A-Life Medical, Llc Ontologically driven procedure coding
US10685052B2 (en) * 2013-12-13 2020-06-16 Danmarks Tekniske Universitet Method of and system for information retrieval
US8837835B1 (en) 2014-01-20 2014-09-16 Array Technology, LLC Document grouping system
WO2015167420A1 (en) * 2014-04-28 2015-11-05 Hewlett-Packard Development Company, L.P. Term chain clustering
US9720977B2 (en) * 2014-06-10 2017-08-01 International Business Machines Corporation Weighting search criteria based on similarities to an ingested corpus in a question and answer (QA) system
RU2580424C1 (ru) 2014-11-28 2016-04-10 Общество С Ограниченной Ответственностью "Яндекс" Способ выявления незначащих лексических единиц в текстовом сообщении и компьютер
IL242218B (en) 2015-10-22 2020-11-30 Verint Systems Ltd A system and method for maintaining a dynamic dictionary
IL242219B (en) 2015-10-22 2020-11-30 Verint Systems Ltd System and method for keyword searching using both static and dynamic dictionaries
WO2017105641A1 (en) 2015-12-15 2017-06-22 Cortica, Ltd. Identification of key points in multimedia data elements
US11195043B2 (en) 2015-12-15 2021-12-07 Cortica, Ltd. System and method for determining common patterns in multimedia content elements based on key points
CN107436875B (zh) * 2016-05-25 2020-12-04 华为技术有限公司 文本分类方法及装置
US11068546B2 (en) 2016-06-02 2021-07-20 Nuix North America Inc. Computer-implemented system and method for analyzing clusters of coded documents
US10402491B2 (en) * 2016-12-21 2019-09-03 Wipro Limited System and method for creating and building a domain dictionary
CN108509408B (zh) * 2017-02-27 2019-11-22 芋头科技(杭州)有限公司 一种句子相似度判断方法
US11760387B2 (en) 2017-07-05 2023-09-19 AutoBrains Technologies Ltd. Driving policies determination
WO2019012527A1 (en) 2017-07-09 2019-01-17 Cortica Ltd. ORGANIZATION OF DEPTH LEARNING NETWORKS
CN107957989B9 (zh) 2017-10-23 2021-01-12 创新先进技术有限公司 基于集群的词向量处理方法、装置以及设备
CN108170663A (zh) 2017-11-14 2018-06-15 阿里巴巴集团控股有限公司 基于集群的词向量处理方法、装置以及设备
US10789460B1 (en) * 2018-01-24 2020-09-29 The Boston Consulting Group, Inc. Methods and systems for screening documents
CN110727769B (zh) 2018-06-29 2024-04-19 阿里巴巴(中国)有限公司 语料库生成方法及装置、人机交互处理方法及装置
US10846544B2 (en) 2018-07-16 2020-11-24 Cartica Ai Ltd. Transportation prediction system and method
US10839694B2 (en) 2018-10-18 2020-11-17 Cartica Ai Ltd Blind spot alert
US11181911B2 (en) 2018-10-18 2021-11-23 Cartica Ai Ltd Control transfer of a vehicle
US20200133308A1 (en) 2018-10-18 2020-04-30 Cartica Ai Ltd Vehicle to vehicle (v2v) communication less truck platooning
US11126870B2 (en) 2018-10-18 2021-09-21 Cartica Ai Ltd. Method and system for obstacle detection
US10748038B1 (en) 2019-03-31 2020-08-18 Cortica Ltd. Efficient calculation of a robust signature of a media unit
US11700356B2 (en) 2018-10-26 2023-07-11 AutoBrains Technologies Ltd. Control transfer of a vehicle
US10789535B2 (en) 2018-11-26 2020-09-29 Cartica Ai Ltd Detection of road elements
CN110032724B (zh) * 2018-12-19 2022-11-25 阿里巴巴集团控股有限公司 用于识别用户意图的方法及装置
US10943143B2 (en) * 2018-12-28 2021-03-09 Paypal, Inc. Algorithm for scoring partial matches between words
US11643005B2 (en) 2019-02-27 2023-05-09 Autobrains Technologies Ltd Adjusting adjustable headlights of a vehicle
US11285963B2 (en) 2019-03-10 2022-03-29 Cartica Ai Ltd. Driver-based prediction of dangerous events
US11694088B2 (en) 2019-03-13 2023-07-04 Cortica Ltd. Method for object detection using knowledge distillation
US11132548B2 (en) 2019-03-20 2021-09-28 Cortica Ltd. Determining object information that does not explicitly appear in a media unit signature
US10776669B1 (en) 2019-03-31 2020-09-15 Cortica Ltd. Signature generation and object detection that refer to rare scenes
US10796444B1 (en) 2019-03-31 2020-10-06 Cortica Ltd Configuring spanning elements of a signature generator
US11222069B2 (en) 2019-03-31 2022-01-11 Cortica Ltd. Low-power calculation of a signature of a media unit
US10789527B1 (en) 2019-03-31 2020-09-29 Cortica Ltd. Method for object detection using shallow neural networks
US11495337B1 (en) 2019-12-12 2022-11-08 Allscripts Software, Llc Computing system for full textual search of a patient record
US11593662B2 (en) 2019-12-12 2023-02-28 Autobrains Technologies Ltd Unsupervised cluster generation
US10748022B1 (en) 2019-12-12 2020-08-18 Cartica Ai Ltd Crowd separation
US11741511B2 (en) * 2020-02-03 2023-08-29 Intuit Inc. Systems and methods of business categorization and service recommendation
US11590988B2 (en) 2020-03-19 2023-02-28 Autobrains Technologies Ltd Predictive turning assistant
US11836189B2 (en) 2020-03-25 2023-12-05 International Business Machines Corporation Infer text classifiers for large text collections
US11827215B2 (en) 2020-03-31 2023-11-28 AutoBrains Technologies Ltd. Method for training a driving related object detector
US11328117B2 (en) * 2020-05-17 2022-05-10 International Business Machines Corporation Automated content modification based on a user-specified context
US11941565B2 (en) 2020-06-11 2024-03-26 Capital One Services, Llc Citation and policy based document classification
US11275776B2 (en) 2020-06-11 2022-03-15 Capital One Services, Llc Section-linked document classifiers
US11756424B2 (en) 2020-07-24 2023-09-12 AutoBrains Technologies Ltd. Parking assist
CN112487194A (zh) * 2020-12-17 2021-03-12 平安消费金融有限公司 文档分类规则的更新方法、装置、设备以及存储介质
CN114398534B (zh) * 2021-01-05 2023-09-12 上海邮电设计咨询研究院有限公司 事件聚类文本检索系统
US20220319646A1 (en) * 2021-04-05 2022-10-06 Cerner Innovation, Inc. Machine learning engine and rule engine for document auto-population using historical and contextual data
CN113569004A (zh) * 2021-06-15 2021-10-29 南京航空航天大学 一种针对限制性自然语言用例建模的智能提示方法
US11281858B1 (en) * 2021-07-13 2022-03-22 Exceed AI Ltd Systems and methods for data classification
CN115455987B (zh) * 2022-11-14 2023-05-05 合肥高维数据技术有限公司 基于字频词频的字符分组方法、存储介质及电子设备

Family Cites Families (9)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US4754489A (en) * 1985-10-15 1988-06-28 The Palantir Corporation Means for resolving ambiguities in text based upon character context
US4876731A (en) * 1988-02-19 1989-10-24 Nynex Corporation Neural network model in pattern recognition using probabilistic contextual information
US5075896A (en) * 1989-10-25 1991-12-24 Xerox Corporation Character and phoneme recognition based on probability clustering
US5062143A (en) * 1990-02-23 1991-10-29 Harris Corporation Trigram-based method of language identification
US5151950A (en) * 1990-10-31 1992-09-29 Go Corporation Method for recognizing handwritten characters using shape and context analysis
EP0632402B1 (en) * 1993-06-30 2000-09-06 International Business Machines Corporation Method for image segmentation and classification of image elements for document processing
US5537488A (en) * 1993-09-16 1996-07-16 Massachusetts Institute Of Technology Pattern recognition system with statistical classification
JP3375766B2 (ja) * 1994-12-27 2003-02-10 松下電器産業株式会社 文字認識装置
US5625767A (en) * 1995-03-13 1997-04-29 Bartell; Brian Method and system for two-dimensional visualization of an information taxonomy and of text documents based on topical content of the documents

Also Published As

Publication number Publication date
AU8373798A (en) 1999-01-04
EP0996927A1 (en) 2000-05-03
AU760495B2 (en) 2003-05-15
WO1998058344A1 (en) 1998-12-23
EP0996927A4 (en) 2005-08-10
US6137911A (en) 2000-10-24
NZ502332A (en) 2002-10-25

Similar Documents

Publication Publication Date Title
EA200000035A1 (ru) Способ (варианты) и система классификации текстов
D'Andrea et al. Real-time detection of traffic from twitter stream analysis
Carroll MINIMALIST TRAINING.
CN108959924A (zh) 一种基于词向量和深度神经网络的Android恶意代码检测方法
DK0730765T3 (da) Associativt tekstsøgnings- og genfindingssystem
EP0378038A3 (en) Partitioning of sorted lists for multiprocessor sort and merge
EP0790562A3 (en) Computer system data I/O by reference among CPUs and I/O devices
NO20004629L (no) Database egnet for konfigurering og/eller optimalisering av et system, samt fremgangsmÕte for Õ generere databasen
EP0790564A3 (en) Computer system data I/O by reference among I/O devices and multiple memory units
Doyle et al. Spectrally formulated element for wave propagation in 3-D frame structures
EP0789305A3 (en) Computer system data I/O by reference among multiple data sources and sinks
Persson Online bibliometrics. A research tool for every man
Politakis et al. DESIGNING CONSISTENT KNOWLEDGE BASES FOR EXPERT CONSULTATION SYSTEMS.
EP0789306A3 (en) Computer system data I/O by reference among multiple CPUs
MX9306277A (es) Metodo y aparato para el reconocimiento de simbolos, usando preprocesamiento multidimensional y calsificacion de simbolos.
KR850700163A (ko) 부동점상태 코드 발생방법 및 장치
Topál et al. Notes on history and recent records of elk (Alces alces [L.]) in Hungary
Dreher et al. Power Spectral Densities of Literary Rhythms (Chinese)
Wong et al. Radar emitter classification using intrapulse data
Ward et al. General applications of hierarchical grouping using the HIER-GRP computer program
McIntosh Processing Report: Seismic Line RU-3, RICE/HARC/EDGE California Margin Reflection Survey
Yemini Distributed sensors networks (dsn): An attempt to define the issues
Tweedie Comparison of word-based and syntax-based methods: Vocabulary richness measures and the highest frequency elements
JPS57168307A (en) Detecting system for state change of electric power system
CA2100956A1 (en) Text searching and indexing system