DE60141937D1 - Verfahren zum Anpassen einer K-fachen Textpartition an ankommende Daten - Google Patents

Verfahren zum Anpassen einer K-fachen Textpartition an ankommende Daten

Info

Publication number
DE60141937D1
DE60141937D1 DE60141937T DE60141937T DE60141937D1 DE 60141937 D1 DE60141937 D1 DE 60141937D1 DE 60141937 T DE60141937 T DE 60141937T DE 60141937 T DE60141937 T DE 60141937T DE 60141937 D1 DE60141937 D1 DE 60141937D1
Authority
DE
Germany
Prior art keywords
documents
dataset
vector space
space model
clustering
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Expired - Lifetime
Application number
DE60141937T
Other languages
English (en)
Inventor
William Scott Spangler
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
International Business Machines Corp
Original Assignee
International Business Machines Corp
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by International Business Machines Corp filed Critical International Business Machines Corp
Application granted granted Critical
Publication of DE60141937D1 publication Critical patent/DE60141937D1/de
Anticipated expiration legal-status Critical
Expired - Lifetime legal-status Critical Current

Links

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F16/00Information retrieval; Database structures therefor; File system structures therefor
    • G06F16/30Information retrieval; Database structures therefor; File system structures therefor of unstructured textual data
    • G06F16/35Clustering; Classification
    • G06F16/355Class or cluster creation or modification
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F18/00Pattern recognition
    • G06F18/20Analysing
    • G06F18/23Clustering techniques
    • G06F18/232Non-hierarchical techniques
    • G06F18/2321Non-hierarchical techniques using statistics or function optimisation, e.g. modelling of probability density functions
    • G06F18/23213Non-hierarchical techniques using statistics or function optimisation, e.g. modelling of probability density functions with fixed number of clusters, e.g. K-means clustering
DE60141937T 2000-09-26 2001-09-20 Verfahren zum Anpassen einer K-fachen Textpartition an ankommende Daten Expired - Lifetime DE60141937D1 (de)

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
US09/669,680 US7430717B1 (en) 2000-09-26 2000-09-26 Method for adapting a K-means text clustering to emerging data

Publications (1)

Publication Number Publication Date
DE60141937D1 true DE60141937D1 (de) 2010-06-10

Family

ID=24687283

Family Applications (1)

Application Number Title Priority Date Filing Date
DE60141937T Expired - Lifetime DE60141937D1 (de) 2000-09-26 2001-09-20 Verfahren zum Anpassen einer K-fachen Textpartition an ankommende Daten

Country Status (4)

Country Link
US (2) US7430717B1 (de)
EP (1) EP1191463B1 (de)
AT (1) ATE466343T1 (de)
DE (1) DE60141937D1 (de)

Families Citing this family (79)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
SG93868A1 (en) * 2000-06-07 2003-01-21 Kent Ridge Digital Labs Method and system for user-configurable clustering of information
US7430717B1 (en) * 2000-09-26 2008-09-30 International Business Machines Corporation Method for adapting a K-means text clustering to emerging data
US7191175B2 (en) 2004-02-13 2007-03-13 Attenex Corporation System and method for arranging concept clusters in thematic neighborhood relationships in a two-dimensional visual display space
KR100633184B1 (ko) * 2004-04-06 2006-10-12 엘지전자 주식회사 영상표시기기의 메뉴설정장치 및 방법
US8065316B1 (en) 2004-09-30 2011-11-22 Google Inc. Systems and methods for providing search query refinements
US7720848B2 (en) * 2006-03-29 2010-05-18 Xerox Corporation Hierarchical clustering with real-time updating
US20090299822A1 (en) 2006-11-08 2009-12-03 P C Grocery Ltd. System and method for optimized shopping transactions
US20090157631A1 (en) * 2006-12-14 2009-06-18 Jason Coleman Database search enhancements
US20100325109A1 (en) * 2007-02-09 2010-12-23 Agency For Science, Technology And Rearch Keyword classification and determination in language modelling
US7853081B2 (en) * 2007-04-02 2010-12-14 British Telecommunications Public Limited Company Identifying data patterns
US7976379B2 (en) * 2007-11-09 2011-07-12 Igt Gaming system and method having configurable bonus game triggering outcomes
KR101537078B1 (ko) * 2008-11-05 2015-07-15 구글 인코포레이티드 사용자 정의 언어 모델들
US8161028B2 (en) * 2008-12-05 2012-04-17 International Business Machines Corporation System and method for adaptive categorization for use with dynamic taxonomies
US8713018B2 (en) 2009-07-28 2014-04-29 Fti Consulting, Inc. System and method for displaying relationships between electronically stored information to provide classification suggestions via inclusion
US8612446B2 (en) * 2009-08-24 2013-12-17 Fti Consulting, Inc. System and method for generating a reference set for use during document review
CN102141977A (zh) * 2010-02-01 2011-08-03 阿里巴巴集团控股有限公司 一种文本分类的方法及装置
US9519705B2 (en) 2011-01-25 2016-12-13 President And Fellows Of Harvard College Method and apparatus for selecting clusterings to classify a data set
US10445677B2 (en) 2011-03-28 2019-10-15 International Business Machines Corporation System and method for integrating text analytics driven social metrics into business architecture
TWI463339B (zh) * 2011-05-17 2014-12-01 Univ Nat Pingtung Sci & Tech 資料分群方法
WO2013046218A2 (en) * 2011-06-17 2013-04-04 Tata Consultancy Services Limited Method and system for differentiating plurality of scripts of text in broadcast video stream
US9547693B1 (en) 2011-06-23 2017-01-17 Palantir Technologies Inc. Periodic database search manager for multiple data sources
US8954458B2 (en) * 2011-07-11 2015-02-10 Aol Inc. Systems and methods for providing a content item database and identifying content items
KR20130060720A (ko) * 2011-11-30 2013-06-10 한국전자통신연구원 목적 기반 시맨틱 서비스 디스커버리를 위한 서비스 목적 해석 장치 및 방법
US9070203B2 (en) * 2012-02-08 2015-06-30 Mrl Materials Resources Llc Identification and quantification of microtextured regions in materials with ordered crystal structure
US10013488B1 (en) * 2012-09-26 2018-07-03 Amazon Technologies, Inc. Document analysis for region classification
US9104463B2 (en) 2012-11-07 2015-08-11 International Business Machines Corporation Automated and optimal deactivation of service to enable effective resource reusability
JP6015417B2 (ja) * 2012-12-19 2016-10-26 富士通株式会社 クラスタ処理方法、クラスタ処理装置およびプログラム
US8949793B1 (en) * 2012-12-20 2015-02-03 Emc Corporation Test bed design from customer system configurations using machine learning techniques
US9418148B2 (en) 2012-12-31 2016-08-16 Nuance Communications, Inc. System and method to label unlabeled data
US9373031B2 (en) 2013-03-14 2016-06-21 Digitech Systems Private Reserve, LLC System and method for document alignment, correction, and classification
GB2513247A (en) * 2013-03-15 2014-10-22 Palantir Technologies Inc Data clustering
US8937619B2 (en) 2013-03-15 2015-01-20 Palantir Technologies Inc. Generating an object time series from data objects
US8788405B1 (en) 2013-03-15 2014-07-22 Palantir Technologies, Inc. Generating data clusters with customizable analysis strategies
US8917274B2 (en) 2013-03-15 2014-12-23 Palantir Technologies Inc. Event matrix based on integrated data
US10275778B1 (en) 2013-03-15 2019-04-30 Palantir Technologies Inc. Systems and user interfaces for dynamic and interactive investigation based on automatic malfeasance clustering of related data in various data structures
US9965937B2 (en) 2013-03-15 2018-05-08 Palantir Technologies Inc. External malware data item clustering and analysis
US9116975B2 (en) 2013-10-18 2015-08-25 Palantir Technologies Inc. Systems and user interfaces for dynamic and interactive simultaneous querying of multiple data stores
US10579647B1 (en) 2013-12-16 2020-03-03 Palantir Technologies Inc. Methods and systems for analyzing entity performance
US9552615B2 (en) 2013-12-20 2017-01-24 Palantir Technologies Inc. Automated database analysis to detect malfeasance
US10356032B2 (en) 2013-12-26 2019-07-16 Palantir Technologies Inc. System and method for detecting confidential information emails
US8832832B1 (en) 2014-01-03 2014-09-09 Palantir Technologies Inc. IP reputation
US9009827B1 (en) 2014-02-20 2015-04-14 Palantir Technologies Inc. Security sharing system
US9483162B2 (en) 2014-02-20 2016-11-01 Palantir Technologies Inc. Relationship visualizations
CN105022740A (zh) * 2014-04-23 2015-11-04 苏州易维迅信息科技有限公司 非结构化数据的处理方法和装置
US9857958B2 (en) 2014-04-28 2018-01-02 Palantir Technologies Inc. Systems and user interfaces for dynamic and interactive access of, investigation of, and analysis of data objects stored in one or more databases
US9535974B1 (en) 2014-06-30 2017-01-03 Palantir Technologies Inc. Systems and methods for identifying key phrase clusters within documents
US9619557B2 (en) 2014-06-30 2017-04-11 Palantir Technologies, Inc. Systems and methods for key phrase characterization of documents
US9256664B2 (en) 2014-07-03 2016-02-09 Palantir Technologies Inc. System and method for news events detection and visualization
US9785773B2 (en) 2014-07-03 2017-10-10 Palantir Technologies Inc. Malware data item analysis
US10572496B1 (en) 2014-07-03 2020-02-25 Palantir Technologies Inc. Distributed workflow system and database with access controls for city resiliency
US9021260B1 (en) 2014-07-03 2015-04-28 Palantir Technologies Inc. Malware data item analysis
US9202249B1 (en) 2014-07-03 2015-12-01 Palantir Technologies Inc. Data item clustering and analysis
US9043894B1 (en) 2014-11-06 2015-05-26 Palantir Technologies Inc. Malicious software detection in a computing system
US10362133B1 (en) 2014-12-22 2019-07-23 Palantir Technologies Inc. Communication data processing architecture
US10552994B2 (en) 2014-12-22 2020-02-04 Palantir Technologies Inc. Systems and interactive user interfaces for dynamic retrieval, analysis, and triage of data items
US9348920B1 (en) 2014-12-22 2016-05-24 Palantir Technologies Inc. Concept indexing among database of documents using machine learning techniques
US9367872B1 (en) 2014-12-22 2016-06-14 Palantir Technologies Inc. Systems and user interfaces for dynamic and interactive investigation of bad actor behavior based on automatic clustering of related data in various data structures
US9817563B1 (en) 2014-12-29 2017-11-14 Palantir Technologies Inc. System and method of generating data points from one or more data stores of data items for chart creation and manipulation
US20160189183A1 (en) * 2014-12-31 2016-06-30 Flytxt BV System and method for automatic discovery, annotation and visualization of customer segments and migration characteristics
US9678822B2 (en) * 2015-01-02 2017-06-13 Tata Consultancy Services Limited Real-time categorization of log events
US10103953B1 (en) 2015-05-12 2018-10-16 Palantir Technologies Inc. Methods and systems for analyzing entity performance
US9454785B1 (en) 2015-07-30 2016-09-27 Palantir Technologies Inc. Systems and user interfaces for holistic, data-driven investigation of bad actor behavior based on clustering and scoring of related data
US9456000B1 (en) 2015-08-06 2016-09-27 Palantir Technologies Inc. Systems, methods, user interfaces, and computer-readable media for investigating potential malicious communications
US10489391B1 (en) 2015-08-17 2019-11-26 Palantir Technologies Inc. Systems and methods for grouping and enriching data items accessed from one or more databases for presentation in a user interface
US9485265B1 (en) 2015-08-28 2016-11-01 Palantir Technologies Inc. Malicious activity detection system capable of efficiently processing data accessed from databases and generating alerts for display in interactive user interfaces
US11176206B2 (en) 2015-12-01 2021-11-16 International Business Machines Corporation Incremental generation of models with dynamic clustering
US9823818B1 (en) 2015-12-29 2017-11-21 Palantir Technologies Inc. Systems and interactive user interfaces for automatic generation of temporal representation of data objects
US11068546B2 (en) 2016-06-02 2021-07-20 Nuix North America Inc. Computer-implemented system and method for analyzing clusters of coded documents
US10318630B1 (en) 2016-11-21 2019-06-11 Palantir Technologies Inc. Analysis of large bodies of textual data
US10620618B2 (en) 2016-12-20 2020-04-14 Palantir Technologies Inc. Systems and methods for determining relationships between defects
US10325224B1 (en) 2017-03-23 2019-06-18 Palantir Technologies Inc. Systems and methods for selecting machine learning training data
US10606866B1 (en) 2017-03-30 2020-03-31 Palantir Technologies Inc. Framework for exposing network activities
US10235461B2 (en) 2017-05-02 2019-03-19 Palantir Technologies Inc. Automated assistance for generating relevant and valuable search results for an entity of interest
US10482382B2 (en) 2017-05-09 2019-11-19 Palantir Technologies Inc. Systems and methods for reducing manufacturing failure rates
CN108197163B (zh) * 2017-12-14 2021-08-10 上海银江智慧智能化技术有限公司 一种基于裁判文书的结构化处理方法
EP3791340A1 (de) * 2018-05-08 2021-03-17 3M Innovative Properties Company Persönliche schutzausrüstung und sicherheitsmanagementsystem zur vergleichenden beurteilung von sicherheitsereignissen
US11119630B1 (en) 2018-06-19 2021-09-14 Palantir Technologies Inc. Artificial intelligence assisted evaluations and user interface for same
CN111611389B (zh) * 2020-06-04 2022-05-27 华侨大学 基于非参数vmf混合模型的文本数据聚类方法、装置及设备
US11567824B2 (en) 2020-12-15 2023-01-31 International Business Machines Corporation Restricting use of selected input in recovery from system failures

Family Cites Families (10)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US5317507A (en) * 1990-11-07 1994-05-31 Gallant Stephen I Method for document retrieval and for word sense disambiguation using neural networks
US5675819A (en) * 1994-06-16 1997-10-07 Xerox Corporation Document information retrieval using global word co-occurrence patterns
US5787422A (en) * 1996-01-11 1998-07-28 Xerox Corporation Method and apparatus for information accesss employing overlapping clusters
US5864855A (en) * 1996-02-26 1999-01-26 The United States Of America As Represented By The Secretary Of The Army Parallel document clustering process
US5832182A (en) 1996-04-24 1998-11-03 Wisconsin Alumni Research Foundation Method and system for data clustering for very large databases
US6298174B1 (en) * 1996-08-12 2001-10-02 Battelle Memorial Institute Three-dimensional display of document set
US5857179A (en) 1996-09-09 1999-01-05 Digital Equipment Corporation Computer method and apparatus for clustering documents and automatic generation of cluster keywords
US6115708A (en) * 1998-03-04 2000-09-05 Microsoft Corporation Method for refining the initial conditions for clustering with applications to small and large database clustering
US6012058A (en) 1998-03-17 2000-01-04 Microsoft Corporation Scalable system for K-means clustering of large databases
US7430717B1 (en) * 2000-09-26 2008-09-30 International Business Machines Corporation Method for adapting a K-means text clustering to emerging data

Also Published As

Publication number Publication date
US7430717B1 (en) 2008-09-30
EP1191463A2 (de) 2002-03-27
ATE466343T1 (de) 2010-05-15
EP1191463B1 (de) 2010-04-28
US20080215314A1 (en) 2008-09-04
EP1191463A3 (de) 2005-10-12
US7779349B2 (en) 2010-08-17

Similar Documents

Publication Publication Date Title
DE60141937D1 (de) Verfahren zum Anpassen einer K-fachen Textpartition an ankommende Daten
CN108595706B (zh) 一种基于主题词类相似性的文档语义表示方法、文本分类方法和装置
Schnabel et al. Flors: Fast and simple domain adaptation for part-of-speech tagging
US9342991B2 (en) Systems and methods for generating a high-level visual vocabulary
Surdeanu et al. Semantic role labeling using complete syntactic analysis
Ireson et al. Evaluating machine learning for information extraction
US9697475B1 (en) Additive context model for entity resolution
Kate et al. Learning language semantics from ambiguous supervision
CN106570180A (zh) 基于人工智能的语音搜索方法及装置
CN105261358A (zh) 用于语音识别的n元文法模型构造方法及语音识别系统
CN109033085B (zh) 中文分词系统及中文文本的分词方法
Ma et al. Tagging the web: Building a robust web tagger with neural network
Sun et al. Hierarchical verb clustering using graph factorization
Reichart et al. Improved lexical acquisition through dpp-based verb clustering
Indurthi et al. Fermi at semeval-2017 task 7: Detection and interpretation of homographic puns in english language
CN109033066B (zh) 一种摘要形成方法及装置
CN110222338A (zh) 一种机构名实体识别方法
CN104318271A (zh) 一种基于适应性编码和几何平滑汇合的图像分类方法
Romero et al. Modern vs diplomatic transcripts for historical handwritten text recognition
KR20210058059A (ko) 문장 임베딩 기반의 비지도 학습 문서 요약 방법 및 이를 이용한 문서 요약용 디바이스
CN109446334A (zh) 一种实现英文文本分类的方法及相关设备
CN104199811A (zh) 短句解析模型建立方法及系统
Liang Document structure analysis and performance evaluation
CN110909161B (zh) 基于密度聚类和视觉相似度的英文单词分类方法
Talukdar et al. Parts of Speech Taggers for Indo Aryan Languages: A critical Review of Approaches and Performances

Legal Events

Date Code Title Description
8320 Willingness to grant licences declared (paragraph 23)
8364 No opposition during term of opposition