AU2002229035A1 - Method and apparatus for efficient identification of duplicate and near-duplicate documents and text spans using high-discriminability text fragments - Google Patents

Method and apparatus for efficient identification of duplicate and near-duplicate documents and text spans using high-discriminability text fragments

Info

Publication number
AU2002229035A1
AU2002229035A1 AU2002229035A AU2903502A AU2002229035A1 AU 2002229035 A1 AU2002229035 A1 AU 2002229035A1 AU 2002229035 A AU2002229035 A AU 2002229035A AU 2903502 A AU2903502 A AU 2903502A AU 2002229035 A1 AU2002229035 A1 AU 2002229035A1
Authority
AU
Australia
Prior art keywords
duplicate
text
discriminability
efficient identification
spans
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Abandoned
Application number
AU2002229035A
Inventor
Mark Kantrowitz
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
JustSystems Corp
Original Assignee
JustSystems Corp
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by JustSystems Corp filed Critical JustSystems Corp
Publication of AU2002229035A1 publication Critical patent/AU2002229035A1/en
Abandoned legal-status Critical Current

Links

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F16/00Information retrieval; Database structures therefor; File system structures therefor
    • G06F16/30Information retrieval; Database structures therefor; File system structures therefor of unstructured textual data
    • G06F16/33Querying
    • G06F16/3331Query processing
    • G06F16/334Query execution
    • G06F16/3346Query execution using probabilistic model

Landscapes

  • Engineering & Computer Science (AREA)
  • Physics & Mathematics (AREA)
  • Theoretical Computer Science (AREA)
  • Probability & Statistics with Applications (AREA)
  • Computational Linguistics (AREA)
  • Data Mining & Analysis (AREA)
  • Databases & Information Systems (AREA)
  • General Engineering & Computer Science (AREA)
  • General Physics & Mathematics (AREA)
  • Information Retrieval, Db Structures And Fs Structures Therefor (AREA)
  • Machine Translation (AREA)
AU2002229035A 2000-11-15 2001-10-31 Method and apparatus for efficient identification of duplicate and near-duplicate documents and text spans using high-discriminability text fragments Abandoned AU2002229035A1 (en)

Applications Claiming Priority (3)

Application Number Priority Date Filing Date Title
US09713733 2000-11-15
US09/713,733 US6978419B1 (en) 2000-11-15 2000-11-15 Method and apparatus for efficient identification of duplicate and near-duplicate documents and text spans using high-discriminability text fragments
PCT/US2001/048124 WO2002041161A1 (en) 2000-11-15 2001-10-31 Method and apparatus for efficient identification of duplicate and near-duplicate documents and text spans using high-discriminability text fragments

Publications (1)

Publication Number Publication Date
AU2002229035A1 true AU2002229035A1 (en) 2002-05-27

Family

ID=24867308

Family Applications (1)

Application Number Title Priority Date Filing Date
AU2002229035A Abandoned AU2002229035A1 (en) 2000-11-15 2001-10-31 Method and apparatus for efficient identification of duplicate and near-duplicate documents and text spans using high-discriminability text fragments

Country Status (4)

Country Link
US (1) US6978419B1 (en)
JP (1) JP2004519761A (en)
AU (1) AU2002229035A1 (en)
WO (1) WO2002041161A1 (en)

Families Citing this family (223)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US8572069B2 (en) * 1999-03-31 2013-10-29 Apple Inc. Semi-automatic index term augmentation in document retrieval
WO2000058863A1 (en) 1999-03-31 2000-10-05 Verizon Laboratories Inc. Techniques for performing a data query in a computer system
US8275661B1 (en) 1999-03-31 2012-09-25 Verizon Corporate Services Group Inc. Targeted banner advertisements
US6718363B1 (en) * 1999-07-30 2004-04-06 Verizon Laboratories, Inc. Page aggregation for web sites
US6912525B1 (en) 2000-05-08 2005-06-28 Verizon Laboratories, Inc. Techniques for web site integration
US8205237B2 (en) 2000-09-14 2012-06-19 Cox Ingemar J Identifying works, using a sub-linear time search, such as an approximate nearest neighbor search, for initiating a work-based action, such as an action on the internet
US8010988B2 (en) 2000-09-14 2011-08-30 Cox Ingemar J Using features extracted from an audio and/or video work to obtain information about the work
US6658423B1 (en) * 2001-01-24 2003-12-02 Google, Inc. Detecting duplicate and near-duplicate files
US7899825B2 (en) * 2001-06-27 2011-03-01 SAP America, Inc. Method and apparatus for duplicate detection
US6888548B1 (en) 2001-08-31 2005-05-03 Attenex Corporation System and method for generating a visualized data representation preserving independent variable geometric relationships
US6978274B1 (en) * 2001-08-31 2005-12-20 Attenex Corporation System and method for dynamically evaluating latent concepts in unstructured documents
US6778995B1 (en) 2001-08-31 2004-08-17 Attenex Corporation System and method for efficiently generating cluster groupings in a multi-dimensional concept space
US7271804B2 (en) 2002-02-25 2007-09-18 Attenex Corporation System and method for arranging concept clusters in thematic relationships in a two-dimensional visual display area
US7219301B2 (en) * 2002-03-01 2007-05-15 Iparadigms, Llc Systems and methods for conducting a peer review process and evaluating the originality of documents
US20070208698A1 (en) * 2002-06-07 2007-09-06 Dougal Brindley Avoiding duplicate service requests
US8090717B1 (en) 2002-09-20 2012-01-03 Google Inc. Methods and apparatus for ranking documents
US7568148B1 (en) 2002-09-20 2009-07-28 Google Inc. Methods and apparatus for clustering news content
US7725544B2 (en) * 2003-01-24 2010-05-25 Aol Inc. Group based spam classification
US7703000B2 (en) * 2003-02-13 2010-04-20 Iparadigms Llc Systems and methods for contextual mark-up of formatted documents
US7590695B2 (en) 2003-05-09 2009-09-15 Aol Llc Managing electronic messages
JP4014160B2 (en) * 2003-05-30 2007-11-28 インターナショナル・ビジネス・マシーンズ・コーポレーション Information processing apparatus, program, and recording medium
US7739602B2 (en) 2003-06-24 2010-06-15 Aol Inc. System and method for community centric resource sharing based on a publishing subscription model
US7627613B1 (en) 2003-07-03 2009-12-01 Google Inc. Duplicate document detection in a web crawler system
US8136025B1 (en) 2003-07-03 2012-03-13 Google Inc. Assigning document identification tags
US7610313B2 (en) 2003-07-25 2009-10-27 Attenex Corporation System and method for performing efficient document scoring and clustering
US7644076B1 (en) * 2003-09-12 2010-01-05 Teradata Us, Inc. Clustering strings using N-grams
US7577655B2 (en) 2003-09-16 2009-08-18 Google Inc. Systems and methods for improving the ranking of news articles
US7503035B2 (en) * 2003-11-25 2009-03-10 Software Analysis And Forensic Engineering Corp. Software tool for detecting plagiarism in computer source code
US7823127B2 (en) * 2003-11-25 2010-10-26 Software Analysis And Forensic Engineering Corp. Detecting plagiarism in computer source code
US7191175B2 (en) 2004-02-13 2007-03-13 Attenex Corporation System and method for arranging concept clusters in thematic neighborhood relationships in a two-dimensional visual display space
US7809695B2 (en) * 2004-08-23 2010-10-05 Thomson Reuters Global Resources Information retrieval systems with duplicate document detection and presentation functions
US7331010B2 (en) 2004-10-29 2008-02-12 International Business Machines Corporation System, method and storage medium for providing fault detection and correction in a memory subsystem
US8214369B2 (en) * 2004-12-09 2012-07-03 Microsoft Corporation System and method for indexing and prefiltering
US20060142993A1 (en) * 2004-12-28 2006-06-29 Sony Corporation System and method for utilizing distance measures to perform text classification
US7404151B2 (en) 2005-01-26 2008-07-22 Attenex Corporation System and method for providing a dynamic user interface for a dense three-dimensional scene
US7356777B2 (en) 2005-01-26 2008-04-08 Attenex Corporation System and method for providing a dynamic user interface for a dense three-dimensional scene
US7401080B2 (en) * 2005-08-17 2008-07-15 Microsoft Corporation Storage reports duplicate file detection
US20070112752A1 (en) * 2005-11-14 2007-05-17 Wolfgang Kalthoff Combination of matching strategies under consideration of data quality
US7685392B2 (en) 2005-11-28 2010-03-23 International Business Machines Corporation Providing indeterminate read data latency in a memory system
US7542989B2 (en) * 2006-01-25 2009-06-02 Graduate Management Admission Council Method and system for searching, identifying, and documenting infringements on copyrighted information
US7661064B2 (en) * 2006-03-06 2010-02-09 Microsoft Corporation Displaying text intraline diffing output
US8073830B2 (en) * 2006-03-31 2011-12-06 Google Inc. Expanded text excerpts
US20070266001A1 (en) * 2006-05-09 2007-11-15 Microsoft Corporation Presentation of duplicate and near duplicate search results
US8175875B1 (en) * 2006-05-19 2012-05-08 Google Inc. Efficient indexing of documents with similar content
US7765475B2 (en) * 2006-06-13 2010-07-27 International Business Machines Corporation List display with redundant text removal
US8015162B2 (en) * 2006-08-04 2011-09-06 Google Inc. Detecting duplicate and near-duplicate files
US8321197B2 (en) * 2006-10-18 2012-11-27 Teresa Ruth Gaudet Method and process for performing category-based analysis, evaluation, and prescriptive practice creation upon stenographically written and voice-written text files
US7870459B2 (en) 2006-10-23 2011-01-11 International Business Machines Corporation High density high reliability memory module with power gating and a fault tolerant address and command bus
US8515912B2 (en) 2010-07-15 2013-08-20 Palantir Technologies, Inc. Sharing and deconflicting data changes in a multimaster database system
US8983970B1 (en) 2006-12-07 2015-03-17 Google Inc. Ranking content using content and content authors
US8577866B1 (en) 2006-12-07 2013-11-05 Googe Inc. Classifying content
US8930331B2 (en) 2007-02-21 2015-01-06 Palantir Technologies Providing unique views of data based on changes or rules
NZ553484A (en) * 2007-02-28 2008-09-26 Optical Systems Corp Ltd Text management software
US7849399B2 (en) * 2007-06-29 2010-12-07 Walter Hoffmann Method and system for tracking authorship of content in data
US20090063470A1 (en) * 2007-08-28 2009-03-05 Nogacom Ltd. Document management using business objects
US8037073B1 (en) 2007-12-31 2011-10-11 Google Inc. Detection of bounce pad sites
AU2008255269A1 (en) * 2008-02-05 2009-08-20 Nuix Pty. Ltd. Document comparison method and apparatus
US9348499B2 (en) 2008-09-15 2016-05-24 Palantir Technologies, Inc. Sharing objects that rely on local resources with outside servers
TW201027375A (en) 2008-10-20 2010-07-16 Ibm Search system, search method and program
US8572093B2 (en) * 2009-01-13 2013-10-29 Emc Corporation System and method for providing a license description syntax in a software due diligence system
US8479161B2 (en) * 2009-03-18 2013-07-02 Oracle International Corporation System and method for performing software due diligence using a binary scan engine and parallel pattern matching
US8307351B2 (en) * 2009-03-18 2012-11-06 Oracle International Corporation System and method for performing code provenance review in a software due diligence system
CN101859309A (en) * 2009-04-07 2010-10-13 慧科讯业有限公司 System and method for identifying repeated text
US9104695B1 (en) 2009-07-27 2015-08-11 Palantir Technologies, Inc. Geotagging structured data
US8515957B2 (en) 2009-07-28 2013-08-20 Fti Consulting, Inc. System and method for displaying relationships between electronically stored information to provide classification suggestions via injection
EP2471009A1 (en) 2009-08-24 2012-07-04 FTI Technology LLC Generating a reference set for use during document review
US9053296B2 (en) 2010-08-28 2015-06-09 Software Analysis And Forensic Engineering Corporation Detecting plagiarism in computer markup language files
US8423886B2 (en) 2010-09-03 2013-04-16 Iparadigms, Llc. Systems and methods for document analysis
US20120084868A1 (en) * 2010-09-30 2012-04-05 International Business Machines Corporation Locating documents for providing data leakage prevention within an information security management system
US8607140B1 (en) * 2010-12-21 2013-12-10 Google Inc. Classifying changes to resources
US9547693B1 (en) 2011-06-23 2017-01-17 Palantir Technologies Inc. Periodic database search manager for multiple data sources
US8799240B2 (en) 2011-06-23 2014-08-05 Palantir Technologies, Inc. System and method for investigating large amounts of data
US8732574B2 (en) 2011-08-25 2014-05-20 Palantir Technologies, Inc. System and method for parameterizing documents for automatic workflow generation
KR101221832B1 (en) 2011-12-08 2013-01-15 동국대학교 경주캠퍼스 산학협력단 A program recording datum and system for copyright protecting
JP2013149061A (en) * 2012-01-19 2013-08-01 Nec Corp Document similarity evaluation system, document similarity evaluation method, and computer program
FR2989189B1 (en) * 2012-04-04 2017-10-13 Qwant METHOD AND DEVICE FOR QUICKLY PROVIDING INFORMATION
US9798768B2 (en) 2012-09-10 2017-10-24 Palantir Technologies, Inc. Search around visual queries
US8843493B1 (en) * 2012-09-18 2014-09-23 Narus, Inc. Document fingerprint
US9081975B2 (en) 2012-10-22 2015-07-14 Palantir Technologies, Inc. Sharing information between nexuses that use different classification schemes for information access control
US9348677B2 (en) 2012-10-22 2016-05-24 Palantir Technologies Inc. System and method for batch evaluation programs
US9501761B2 (en) 2012-11-05 2016-11-22 Palantir Technologies, Inc. System and method for sharing investigation results
US9501507B1 (en) 2012-12-27 2016-11-22 Palantir Technologies Inc. Geo-temporal indexing and searching
US10140664B2 (en) 2013-03-14 2018-11-27 Palantir Technologies Inc. Resolving similar entities from a transaction database
US10275778B1 (en) 2013-03-15 2019-04-30 Palantir Technologies Inc. Systems and user interfaces for dynamic and interactive investigation based on automatic malfeasance clustering of related data in various data structures
US8924388B2 (en) 2013-03-15 2014-12-30 Palantir Technologies Inc. Computer-implemented systems and methods for comparing and associating objects
US8930897B2 (en) 2013-03-15 2015-01-06 Palantir Technologies Inc. Data integration tool
US8855999B1 (en) 2013-03-15 2014-10-07 Palantir Technologies Inc. Method and system for generating a parser and parsing complex data
US8868486B2 (en) 2013-03-15 2014-10-21 Palantir Technologies Inc. Time-sensitive cube
US8903717B2 (en) 2013-03-15 2014-12-02 Palantir Technologies Inc. Method and system for generating a parser and parsing complex data
US8909656B2 (en) 2013-03-15 2014-12-09 Palantir Technologies Inc. Filter chains with associated multipath views for exploring large data sets
US8799799B1 (en) 2013-05-07 2014-08-05 Palantir Technologies Inc. Interactive geospatial map
US9565152B2 (en) 2013-08-08 2017-02-07 Palantir Technologies Inc. Cable reader labeling
US9785317B2 (en) 2013-09-24 2017-10-10 Palantir Technologies Inc. Presentation and analysis of user interaction data
US8938686B1 (en) 2013-10-03 2015-01-20 Palantir Technologies Inc. Systems and methods for analyzing performance of an entity
US8812960B1 (en) 2013-10-07 2014-08-19 Palantir Technologies Inc. Cohort-based presentation of user interaction data
US9116975B2 (en) 2013-10-18 2015-08-25 Palantir Technologies Inc. Systems and user interfaces for dynamic and interactive simultaneous querying of multiple data stores
US8832594B1 (en) * 2013-11-04 2014-09-09 Palantir Technologies Inc. Space-optimized display of multi-column tables with selective text truncation based on a combined text width
US9105000B1 (en) 2013-12-10 2015-08-11 Palantir Technologies Inc. Aggregating data from a plurality of data sources
US9727622B2 (en) 2013-12-16 2017-08-08 Palantir Technologies, Inc. Methods and systems for analyzing entity performance
US10579647B1 (en) 2013-12-16 2020-03-03 Palantir Technologies Inc. Methods and systems for analyzing entity performance
US10356032B2 (en) 2013-12-26 2019-07-16 Palantir Technologies Inc. System and method for detecting confidential information emails
US9514417B2 (en) 2013-12-30 2016-12-06 Google Inc. Cloud-based plagiarism detection system performing predicting based on classified feature vectors
US8832832B1 (en) 2014-01-03 2014-09-09 Palantir Technologies Inc. IP reputation
KR101577376B1 (en) * 2014-01-21 2015-12-14 (주) 아워텍 System and method for determining infringement of copyright based on the text reference point
US8924429B1 (en) 2014-03-18 2014-12-30 Palantir Technologies Inc. Determining and extracting changed data from a data source
US9836580B2 (en) 2014-03-21 2017-12-05 Palantir Technologies Inc. Provider portal
US9619557B2 (en) 2014-06-30 2017-04-11 Palantir Technologies, Inc. Systems and methods for key phrase characterization of documents
US9129219B1 (en) 2014-06-30 2015-09-08 Palantir Technologies, Inc. Crime risk forecasting
US9535974B1 (en) 2014-06-30 2017-01-03 Palantir Technologies Inc. Systems and methods for identifying key phrase clusters within documents
US9256664B2 (en) 2014-07-03 2016-02-09 Palantir Technologies Inc. System and method for news events detection and visualization
US20160026923A1 (en) 2014-07-22 2016-01-28 Palantir Technologies Inc. System and method for determining a propensity of entity to take a specified action
US9886422B2 (en) * 2014-08-06 2018-02-06 International Business Machines Corporation Dynamic highlighting of repetitions in electronic documents
US9454281B2 (en) 2014-09-03 2016-09-27 Palantir Technologies Inc. System for providing dynamic linked panels in user interface
US9390086B2 (en) 2014-09-11 2016-07-12 Palantir Technologies Inc. Classification system with methodology for efficient verification
US9767172B2 (en) 2014-10-03 2017-09-19 Palantir Technologies Inc. Data aggregation and analysis system
US9501851B2 (en) 2014-10-03 2016-11-22 Palantir Technologies Inc. Time-series analysis system
US9785328B2 (en) 2014-10-06 2017-10-10 Palantir Technologies Inc. Presentation of multivariate data on a graphical user interface of a computing system
US9984133B2 (en) 2014-10-16 2018-05-29 Palantir Technologies Inc. Schematic and database linking system
US9805099B2 (en) 2014-10-30 2017-10-31 The Johns Hopkins University Apparatus and method for efficient identification of code similarity
US9229952B1 (en) 2014-11-05 2016-01-05 Palantir Technologies, Inc. History preserving data pipeline system and method
US9043894B1 (en) 2014-11-06 2015-05-26 Palantir Technologies Inc. Malicious software detection in a computing system
US9483546B2 (en) 2014-12-15 2016-11-01 Palantir Technologies Inc. System and method for associating related records to common entities across multiple lists
US10552994B2 (en) 2014-12-22 2020-02-04 Palantir Technologies Inc. Systems and interactive user interfaces for dynamic retrieval, analysis, and triage of data items
US10362133B1 (en) 2014-12-22 2019-07-23 Palantir Technologies Inc. Communication data processing architecture
US9348920B1 (en) 2014-12-22 2016-05-24 Palantir Technologies Inc. Concept indexing among database of documents using machine learning techniques
US10452651B1 (en) 2014-12-23 2019-10-22 Palantir Technologies Inc. Searching charts
US9335911B1 (en) 2014-12-29 2016-05-10 Palantir Technologies Inc. Interactive user interface for dynamic data analysis exploration and query processing
US9817563B1 (en) 2014-12-29 2017-11-14 Palantir Technologies Inc. System and method of generating data points from one or more data stores of data items for chart creation and manipulation
US11302426B1 (en) 2015-01-02 2022-04-12 Palantir Technologies Inc. Unified data interface and system
US9727560B2 (en) 2015-02-25 2017-08-08 Palantir Technologies Inc. Systems and methods for organizing and identifying documents via hierarchies and dimensions of tags
EP3070622A1 (en) 2015-03-16 2016-09-21 Palantir Technologies, Inc. Interactive user interfaces for location-based data analysis
US9886467B2 (en) 2015-03-19 2018-02-06 Plantir Technologies Inc. System and method for comparing and visualizing data entities and data entity series
US9348880B1 (en) 2015-04-01 2016-05-24 Palantir Technologies, Inc. Federated search of multiple sources with conflict resolution
US10103953B1 (en) 2015-05-12 2018-10-16 Palantir Technologies Inc. Methods and systems for analyzing entity performance
US10628834B1 (en) 2015-06-16 2020-04-21 Palantir Technologies Inc. Fraud lead detection system for efficiently processing database-stored data and automatically generating natural language explanatory information of system results for display in interactive user interfaces
US9418337B1 (en) 2015-07-21 2016-08-16 Palantir Technologies Inc. Systems and models for data analytics
US9392008B1 (en) 2015-07-23 2016-07-12 Palantir Technologies Inc. Systems and methods for identifying information related to payment card breaches
US9996595B2 (en) 2015-08-03 2018-06-12 Palantir Technologies, Inc. Providing full data provenance visualization for versioned datasets
US9456000B1 (en) 2015-08-06 2016-09-27 Palantir Technologies Inc. Systems, methods, user interfaces, and computer-readable media for investigating potential malicious communications
US9600146B2 (en) 2015-08-17 2017-03-21 Palantir Technologies Inc. Interactive geospatial map
US9671776B1 (en) 2015-08-20 2017-06-06 Palantir Technologies Inc. Quantifying, tracking, and anticipating risk at a manufacturing facility, taking deviation type and staffing conditions into account
US11150917B2 (en) 2015-08-26 2021-10-19 Palantir Technologies Inc. System for data aggregation and analysis of data from a plurality of data sources
US9485265B1 (en) 2015-08-28 2016-11-01 Palantir Technologies Inc. Malicious activity detection system capable of efficiently processing data accessed from databases and generating alerts for display in interactive user interfaces
US10706434B1 (en) 2015-09-01 2020-07-07 Palantir Technologies Inc. Methods and systems for determining location information
US9639580B1 (en) 2015-09-04 2017-05-02 Palantir Technologies, Inc. Computer-implemented systems and methods for data management and visualization
US9984428B2 (en) 2015-09-04 2018-05-29 Palantir Technologies Inc. Systems and methods for structuring data from unstructured electronic data files
US9576015B1 (en) 2015-09-09 2017-02-21 Palantir Technologies, Inc. Domain-specific language for dataset transformations
US10733237B2 (en) 2015-09-22 2020-08-04 International Business Machines Corporation Creating data objects to separately store common data included in documents
US9424669B1 (en) 2015-10-21 2016-08-23 Palantir Technologies Inc. Generating graphical representations of event participation flow
US10223429B2 (en) 2015-12-01 2019-03-05 Palantir Technologies Inc. Entity data attribution using disparate data sets
US10706056B1 (en) 2015-12-02 2020-07-07 Palantir Technologies Inc. Audit log report generator
US9514414B1 (en) 2015-12-11 2016-12-06 Palantir Technologies Inc. Systems and methods for identifying and categorizing electronic documents through machine learning
US9760556B1 (en) 2015-12-11 2017-09-12 Palantir Technologies Inc. Systems and methods for annotating and linking electronic documents
US10114884B1 (en) 2015-12-16 2018-10-30 Palantir Technologies Inc. Systems and methods for attribute analysis of one or more databases
US9542446B1 (en) 2015-12-17 2017-01-10 Palantir Technologies, Inc. Automatic generation of composite datasets based on hierarchical fields
US10373099B1 (en) 2015-12-18 2019-08-06 Palantir Technologies Inc. Misalignment detection system for efficiently processing database-stored data and automatically generating misalignment information for display in interactive user interfaces
US9996236B1 (en) 2015-12-29 2018-06-12 Palantir Technologies Inc. Simplified frontend processing and visualization of large datasets
US10089289B2 (en) 2015-12-29 2018-10-02 Palantir Technologies Inc. Real-time document annotation
US10871878B1 (en) 2015-12-29 2020-12-22 Palantir Technologies Inc. System log analysis and object user interaction correlation system
US9792020B1 (en) 2015-12-30 2017-10-17 Palantir Technologies Inc. Systems for collecting, aggregating, and storing data, generating interactive user interfaces for analyzing data, and generating alerts based upon collected data
US10698938B2 (en) 2016-03-18 2020-06-30 Palantir Technologies Inc. Systems and methods for organizing and identifying documents via hierarchies and dimensions of tags
US9652139B1 (en) 2016-04-06 2017-05-16 Palantir Technologies Inc. Graphical representation of an output
US10068199B1 (en) 2016-05-13 2018-09-04 Palantir Technologies Inc. System to catalogue tracking data
WO2017210618A1 (en) 2016-06-02 2017-12-07 Fti Consulting, Inc. Analyzing clusters of coded documents
US10007674B2 (en) 2016-06-13 2018-06-26 Palantir Technologies Inc. Data revision control in large-scale data analytic systems
US10545975B1 (en) 2016-06-22 2020-01-28 Palantir Technologies Inc. Visual analysis of data using sequenced dataset reduction
US10909130B1 (en) 2016-07-01 2021-02-02 Palantir Technologies Inc. Graphical user interface for a database system
US10719188B2 (en) 2016-07-21 2020-07-21 Palantir Technologies Inc. Cached database and synchronization system for providing dynamic linked panels in user interface
US10324609B2 (en) 2016-07-21 2019-06-18 Palantir Technologies Inc. System for providing dynamic linked panels in user interface
US10552002B1 (en) 2016-09-27 2020-02-04 Palantir Technologies Inc. User interface based variable machine modeling
US10133588B1 (en) 2016-10-20 2018-11-20 Palantir Technologies Inc. Transforming instructions for collaborative updates
US10726507B1 (en) 2016-11-11 2020-07-28 Palantir Technologies Inc. Graphical representation of a complex task
US10318630B1 (en) 2016-11-21 2019-06-11 Palantir Technologies Inc. Analysis of large bodies of textual data
US9842338B1 (en) 2016-11-21 2017-12-12 Palantir Technologies Inc. System to identify vulnerable card readers
US11250425B1 (en) 2016-11-30 2022-02-15 Palantir Technologies Inc. Generating a statistic using electronic transaction data
US10467275B2 (en) 2016-12-09 2019-11-05 International Business Machines Corporation Storage efficiency
US9886525B1 (en) 2016-12-16 2018-02-06 Palantir Technologies Inc. Data item aggregate probability analysis system
GB201621434D0 (en) 2016-12-16 2017-02-01 Palantir Technologies Inc Processing sensor logs
US10044836B2 (en) 2016-12-19 2018-08-07 Palantir Technologies Inc. Conducting investigations under limited connectivity
US10249033B1 (en) 2016-12-20 2019-04-02 Palantir Technologies Inc. User interface for managing defects
US10728262B1 (en) 2016-12-21 2020-07-28 Palantir Technologies Inc. Context-aware network-based malicious activity warning systems
US11373752B2 (en) 2016-12-22 2022-06-28 Palantir Technologies Inc. Detection of misuse of a benefit system
US10360238B1 (en) 2016-12-22 2019-07-23 Palantir Technologies Inc. Database systems and user interfaces for interactive data association, analysis, and presentation
US10721262B2 (en) 2016-12-28 2020-07-21 Palantir Technologies Inc. Resource-centric network cyber attack warning system
US10163227B1 (en) * 2016-12-28 2018-12-25 Shutterstock, Inc. Image file compression using dummy data for non-salient portions of images
US10216811B1 (en) 2017-01-05 2019-02-26 Palantir Technologies Inc. Collaborating using different object models
US10762471B1 (en) 2017-01-09 2020-09-01 Palantir Technologies Inc. Automating management of integrated workflows based on disparate subsidiary data sources
US10133621B1 (en) 2017-01-18 2018-11-20 Palantir Technologies Inc. Data analysis system to facilitate investigative process
US10509844B1 (en) 2017-01-19 2019-12-17 Palantir Technologies Inc. Network graph parser
US10515109B2 (en) 2017-02-15 2019-12-24 Palantir Technologies Inc. Real-time auditing of industrial equipment condition
US20180276206A1 (en) * 2017-03-23 2018-09-27 Hcl Technologies Limited System and method for updating a knowledge repository
US10866936B1 (en) 2017-03-29 2020-12-15 Palantir Technologies Inc. Model object management and storage system
US10581954B2 (en) 2017-03-29 2020-03-03 Palantir Technologies Inc. Metric collection and aggregation for distributed software services
US10713432B2 (en) * 2017-03-31 2020-07-14 Adobe Inc. Classifying and ranking changes between document versions
US10133783B2 (en) 2017-04-11 2018-11-20 Palantir Technologies Inc. Systems and methods for constraint driven database searching
US11074277B1 (en) 2017-05-01 2021-07-27 Palantir Technologies Inc. Secure resolution of canonical entities
US10563990B1 (en) 2017-05-09 2020-02-18 Palantir Technologies Inc. Event-based route planning
US10606872B1 (en) 2017-05-22 2020-03-31 Palantir Technologies Inc. Graphical user interface for a database system
US10795749B1 (en) 2017-05-31 2020-10-06 Palantir Technologies Inc. Systems and methods for providing fault analysis user interface
US10956406B2 (en) 2017-06-12 2021-03-23 Palantir Technologies Inc. Propagated deletion of database records and derived data
EP3642766A4 (en) * 2017-06-19 2021-03-03 Equifax, Inc. Machine-learning system for servicing queries for digital content
US11216762B1 (en) 2017-07-13 2022-01-04 Palantir Technologies Inc. Automated risk visualization using customer-centric data analysis
US10942947B2 (en) 2017-07-17 2021-03-09 Palantir Technologies Inc. Systems and methods for determining relationships between datasets
US10430444B1 (en) 2017-07-24 2019-10-01 Palantir Technologies Inc. Interactive geospatial map and geospatial visualization systems
US10956508B2 (en) 2017-11-10 2021-03-23 Palantir Technologies Inc. Systems and methods for creating and managing a data integration workspace containing automatically updated data models
US11281726B2 (en) 2017-12-01 2022-03-22 Palantir Technologies Inc. System and methods for faster processor comparisons of visual graph features
US11314721B1 (en) 2017-12-07 2022-04-26 Palantir Technologies Inc. User-interactive defect analysis for root cause
US10877984B1 (en) 2017-12-07 2020-12-29 Palantir Technologies Inc. Systems and methods for filtering and visualizing large scale datasets
US10783162B1 (en) 2017-12-07 2020-09-22 Palantir Technologies Inc. Workflow assistant
US10769171B1 (en) 2017-12-07 2020-09-08 Palantir Technologies Inc. Relationship analysis and mapping for interrelated multi-layered datasets
US11061874B1 (en) 2017-12-14 2021-07-13 Palantir Technologies Inc. Systems and methods for resolving entity data across various data structures
US10853352B1 (en) 2017-12-21 2020-12-01 Palantir Technologies Inc. Structured data collection, presentation, validation and workflow management
US11263382B1 (en) 2017-12-22 2022-03-01 Palantir Technologies Inc. Data normalization and irregularity detection system
GB201800595D0 (en) 2018-01-15 2018-02-28 Palantir Technologies Inc Management of software bugs in a data processing system
US11599369B1 (en) 2018-03-08 2023-03-07 Palantir Technologies Inc. Graphical user interface configuration system
US10877654B1 (en) 2018-04-03 2020-12-29 Palantir Technologies Inc. Graphical user interfaces for optimizations
US10754822B1 (en) 2018-04-18 2020-08-25 Palantir Technologies Inc. Systems and methods for ontology migration
US10885021B1 (en) 2018-05-02 2021-01-05 Palantir Technologies Inc. Interactive interpreter and graphical user interface
US10754946B1 (en) 2018-05-08 2020-08-25 Palantir Technologies Inc. Systems and methods for implementing a machine learning approach to modeling entity behavior
US11061542B1 (en) 2018-06-01 2021-07-13 Palantir Technologies Inc. Systems and methods for determining and displaying optimal associations of data items
US11119630B1 (en) 2018-06-19 2021-09-14 Palantir Technologies Inc. Artificial intelligence assisted evaluations and user interface for same
US11126638B1 (en) 2018-09-13 2021-09-21 Palantir Technologies Inc. Data visualization and parsing system
US11294928B1 (en) 2018-10-12 2022-04-05 Palantir Technologies Inc. System architecture for relating and linking data objects

Family Cites Families (39)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US4807182A (en) 1986-03-12 1989-02-21 Advanced Software, Inc. Apparatus and method for comparing data groups
US5258910A (en) * 1988-07-29 1993-11-02 Sharp Kabushiki Kaisha Text editor with memory for eliminating duplicate sentences
JP3270783B2 (en) * 1992-09-29 2002-04-02 ゼロックス・コーポレーション Multiple document search methods
CA2175187A1 (en) 1993-10-28 1995-05-04 William K. Thomson Database search summary with user determined characteristics
US5692176A (en) * 1993-11-22 1997-11-25 Reed Elsevier Inc. Associative text search and retrieval system
JP4518574B2 (en) * 1995-08-11 2010-08-04 ソニー株式会社 Recording method and apparatus, recording medium, and reproducing method and apparatus
US5680611A (en) 1995-09-29 1997-10-21 Electronic Data Systems Corporation Duplicate record detection
US5933823A (en) * 1996-03-01 1999-08-03 Ricoh Company Limited Image database browsing and query using texture analysis
US6098034A (en) * 1996-03-18 2000-08-01 Expert Ease Development, Ltd. Method for standardizing phrasing in a document
US5913208A (en) 1996-07-09 1999-06-15 International Business Machines Corporation Identifying duplicate documents from search results without comparing document content
US5920854A (en) * 1996-08-14 1999-07-06 Infoseek Corporation Real-time document collection search engine with phrase indexing
US5898836A (en) 1997-01-14 1999-04-27 Netmind Services, Inc. Change-detection tool indicating degree and location of change of internet documents by comparison of cyclic-redundancy-check(CRC) signatures
US6076051A (en) * 1997-03-07 2000-06-13 Microsoft Corporation Information retrieval utilizing semantic representation of text
US5978828A (en) 1997-06-13 1999-11-02 Intel Corporation URL bookmark update notification of page content or location changes
US6470307B1 (en) * 1997-06-23 2002-10-22 National Research Council Of Canada Method and apparatus for automatically identifying keywords within a document
NZ503279A (en) * 1997-09-04 2001-07-27 British Telecomm Selecting data sets using keywords suitable for searching
US5983216A (en) * 1997-09-12 1999-11-09 Infoseek Corporation Performing automated document collection and selection by providing a meta-index with meta-index values indentifying corresponding document collections
US6353824B1 (en) * 1997-11-18 2002-03-05 Apple Computer, Inc. Method for dynamic presentation of the contents topically rich capsule overviews corresponding to the plurality of documents, resolving co-referentiality in document segments
US6092065A (en) * 1998-02-13 2000-07-18 International Business Machines Corporation Method and apparatus for discovery, clustering and classification of patterns in 1-dimensional event streams
US6628824B1 (en) * 1998-03-20 2003-09-30 Ken Belanger Method and apparatus for image identification and comparison
US6119124A (en) * 1998-03-26 2000-09-12 Digital Equipment Corporation Method for clustering closely resembling data objects
US6185614B1 (en) * 1998-05-26 2001-02-06 International Business Machines Corp. Method and system for collecting user profile information over the world-wide web in the presence of dynamic content using document comparators
US6263348B1 (en) * 1998-07-01 2001-07-17 Serena Software International, Inc. Method and apparatus for identifying the existence of differences between two files
US6741743B2 (en) * 1998-07-31 2004-05-25 Prc. Inc. Imaged document optical correlation and conversion system
US6240409B1 (en) * 1998-07-31 2001-05-29 The Regents Of The University Of California Method and apparatus for detecting and summarizing document similarity within large document sets
US6104990A (en) * 1998-09-28 2000-08-15 Prompt Software, Inc. Language independent phrase extraction
US6549897B1 (en) * 1998-10-09 2003-04-15 Microsoft Corporation Method and system for calculating phrase-document importance
US6473753B1 (en) * 1998-10-09 2002-10-29 Microsoft Corporation Method and system for calculating term-document importance
US6643686B1 (en) * 1998-12-18 2003-11-04 At&T Corp. System and method for counteracting message filtering
US6295529B1 (en) * 1998-12-24 2001-09-25 Microsoft Corporation Method and apparatus for indentifying clauses having predetermined characteristics indicative of usefulness in determining relationships between different texts
US6598054B2 (en) * 1999-01-26 2003-07-22 Xerox Corporation System and method for clustering data objects in a collection
US6366950B1 (en) * 1999-04-02 2002-04-02 Smithmicro Software System and method for verifying users' identity in a network using e-mail communication
US6547829B1 (en) * 1999-06-30 2003-04-15 Microsoft Corporation Method and system for detecting duplicate documents in web crawls
US6718363B1 (en) * 1999-07-30 2004-04-06 Verizon Laboratories, Inc. Page aggregation for web sites
US6442606B1 (en) * 1999-08-12 2002-08-27 Inktomi Corporation Method and apparatus for identifying spoof documents
US6356633B1 (en) * 1999-08-19 2002-03-12 Mci Worldcom, Inc. Electronic mail message processing and routing for call center response to same
US6615209B1 (en) * 2000-02-22 2003-09-02 Google, Inc. Detecting query-specific duplicate documents
US6697998B1 (en) * 2000-06-12 2004-02-24 International Business Machines Corporation Automatic labeling of unlabeled text data
US6658423B1 (en) * 2001-01-24 2003-12-02 Google, Inc. Detecting duplicate and near-duplicate files

Also Published As

Publication number Publication date
WO2002041161A1 (en) 2002-05-23
US6978419B1 (en) 2005-12-20
JP2004519761A (en) 2004-07-02

Similar Documents

Publication Publication Date Title
AU2002229035A1 (en) Method and apparatus for efficient identification of duplicate and near-duplicate documents and text spans using high-discriminability text fragments
AU2001265288A1 (en) Method and apparatus for efficient management of xml documents
AU2001259715A1 (en) Method and apparatus for making predictions about entities represented in documents
AU2002232552A1 (en) Method and apparatus for alphanumeric recognition
AU2002213933A1 (en) Document search and analysing method and apparatus
AU2003216532A1 (en) System and method for classification of documents
AU2002211302A1 (en) System and method for use in text analysis of documents and records
AU2001279003A1 (en) Computer method and apparatus for extracting data from web pages
WO2002021401A8 (en) Computer method and apparatus for petroleum trading and logistics
AU2002239372A1 (en) Method and system for remote printing of documents
AU2001278095A1 (en) Method and apparatus for generating web pages from templates
AU2001264676A1 (en) Method and apparatus for providing customized information
AU2001296362A1 (en) Method and apparatus for document identification and authentication
AU2001275375A1 (en) Method and apparatus for microdermabrasion
AU2001210607A1 (en) Method and apparatus for processing web documents
AU2001278004A1 (en) Internet information retrieval method and apparatus
AU2002232737A1 (en) Method and apparatus for virtual interaction with physical documents
AU2001288962A1 (en) Method and apparatus for processing marketing information
IL133588A0 (en) Apparatus and method for retrieval of documents
GB2374953B (en) Method and apparatus for embodying documents
GB0105612D0 (en) Method and apparatus for identifying documents
AU2003301117A1 (en) Apparatus and method for document content trading
AU2001232128A1 (en) Document handling apparatus and method
AU2001282753A1 (en) System and method for trading of electronic valuable documents
GB2371906B (en) Method and apparatus for embodying documents