IN2014MU00169A - Entity resolution from documents - Google Patents

Entity resolution from documents

Info

Publication number
IN2014MU00169A
IN2014MU00169A IN169/MUM/2014A IN169MU2014A IN2014MU00169A IN 2014MU00169 A IN2014MU00169 A IN 2014MU00169A IN 169MU2014 A IN169MU2014 A IN 169MU2014A IN 2014MU00169 A IN2014MU00169 A IN 2014MU00169A
Authority
IN
India
Prior art keywords
documents
entity
plurality
merged
document
Prior art date
Application number
IN169/MUM/2014A
Inventor
Puneet Agarwal
Gautam Shroff
Pankaj Malhotra
Original Assignee
Tata Consultancy Services Limited
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Tata Consultancy Services Limited filed Critical Tata Consultancy Services Limited
Priority to IN169/MUM/2014A priority Critical patent/IN2014MU00169A/en
Priority to IN2014MU000000169 priority
Publication of IN2014MU00169A publication Critical patent/IN2014MU00169A/en

Links

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING; COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F16/00Information retrieval; Database structures therefor; File system structures therefor
    • G06F16/30Information retrieval; Database structures therefor; File system structures therefor of unstructured textual data
    • G06F16/35Clustering; Classification
    • G06F16/355Class or cluster creation or modification
    • GPHYSICS
    • G06COMPUTING; CALCULATING; COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F16/00Information retrieval; Database structures therefor; File system structures therefor
    • G06F16/90Details of database functions independent of the retrieved data types
    • G06F16/901Indexing; Data structures therefor; Storage structures
    • G06F16/9024Graphs; Linked lists
    • GPHYSICS
    • G06COMPUTING; CALCULATING; COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F16/00Information retrieval; Database structures therefor; File system structures therefor
    • G06F16/90Details of database functions independent of the retrieved data types
    • G06F16/93Document management systems
    • GPHYSICS
    • G06COMPUTING; CALCULATING; COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F17/00Digital computing or data processing equipment or methods, specially adapted for specific functions
    • G06F17/20Handling natural language data
    • G06F17/21Text processing
    • G06F17/22Manipulating or registering by use of codes, e.g. in sequence of text characters
    • G06F17/2211Calculation of differences between files
    • GPHYSICS
    • G06COMPUTING; CALCULATING; COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F17/00Digital computing or data processing equipment or methods, specially adapted for specific functions
    • G06F17/20Handling natural language data
    • G06F17/21Text processing
    • G06F17/22Manipulating or registering by use of codes, e.g. in sequence of text characters
    • G06F17/2229Fragmentation of text-files, e.g. reusable text-blocks, including linking to the fragments, XInclude, Namespaces
    • GPHYSICS
    • G06COMPUTING; CALCULATING; COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F17/00Digital computing or data processing equipment or methods, specially adapted for specific functions
    • G06F17/20Handling natural language data
    • G06F17/27Automatic analysis, e.g. parsing
    • G06F17/2765Recognition
    • G06F17/2775Phrasal analysis, e.g. finite state techniques, chunking
    • G06F17/278Named entity recognition

Abstract

The present subject matter relates to entity resolution, and in particular, relates to providing an entity resolution from documents. The method comprises obtaining the plurality of documents from at least one data source. The plurality of documents is blocked into at least one bucket based on textual similarity and inter-document references among the plurality of documents. Further, within each bucket, a merged document for each entity may be created based on an iterative match-merge technique. The iterative match-merge technique identifies, from the plurality of documents, at least one matching pair of documents and merges the at least one matching pair of documents to create the merged document for each entity. The merged documents may be merged to generate a resolved entity-document for each entity based on a graph clustering technique.
IN169/MUM/2014A 2014-01-17 2014-01-17 Entity resolution from documents IN2014MU00169A (en)

Priority Applications (2)

Application Number Priority Date Filing Date Title
IN169/MUM/2014A IN2014MU00169A (en) 2014-01-17 2014-01-17 Entity resolution from documents
IN2014MU000000169 2014-01-17

Applications Claiming Priority (7)

Application Number Priority Date Filing Date Title
IN169/MUM/2014A IN2014MU00169A (en) 2014-01-17 2014-01-17 Entity resolution from documents
EP14186280.5A EP2897054A3 (en) 2014-01-17 2014-09-24 Entity resolution from documents
AU2014253497A AU2014253497A1 (en) 2014-01-17 2014-10-22 Entity resolution from documents
CA2868540A CA2868540A1 (en) 2014-01-17 2014-10-24 Entity resolution from documents
MX2014013314A MX355195B (en) 2014-01-17 2014-11-03 Entity resolution from documents.
BR102014027639A BR102014027639A2 (en) 2014-01-17 2014-11-05 method to solve the entities of a plurality of documents; and entity resolution system for the resolution entity of a plurality of documents
US14/533,866 US10311093B2 (en) 2014-01-17 2014-11-05 Entity resolution from documents

Publications (1)

Publication Number Publication Date
IN2014MU00169A true IN2014MU00169A (en) 2015-08-28

Family

ID=51625852

Family Applications (1)

Application Number Title Priority Date Filing Date
IN169/MUM/2014A IN2014MU00169A (en) 2014-01-17 2014-01-17 Entity resolution from documents

Country Status (7)

Country Link
US (1) US10311093B2 (en)
EP (1) EP2897054A3 (en)
AU (1) AU2014253497A1 (en)
BR (1) BR102014027639A2 (en)
CA (1) CA2868540A1 (en)
IN (1) IN2014MU00169A (en)
MX (1) MX355195B (en)

Family Cites Families (11)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US7213198B1 (en) * 1999-08-12 2007-05-01 Google Inc. Link based clustering of hyperlinked documents
CN1757188A (en) * 2002-11-06 2006-04-05 国际商业机器公司 Confidential data sharing and anonymous entity resolution
US8683312B2 (en) * 2005-06-16 2014-03-25 Adobe Systems Incorporated Inter-document links involving embedded documents
US20090204590A1 (en) * 2008-02-11 2009-08-13 Queplix Corp. System and method for an integrated enterprise search
US8805861B2 (en) * 2008-12-09 2014-08-12 Google Inc. Methods and systems to train models to extract and integrate information from data sources
US20110119268A1 (en) * 2009-11-13 2011-05-19 Rajaram Shyam Sundar Method and system for segmenting query urls
WO2011085360A1 (en) * 2010-01-11 2011-07-14 Panjiva, Inc. Evaluating public records of supply transactions for financial investment decisions
EP2545462A1 (en) * 2010-03-12 2013-01-16 Telefonaktiebolaget LM Ericsson (publ) System and method for matching entities and synonym group organizer used therein
US9189473B2 (en) * 2012-05-18 2015-11-17 Xerox Corporation System and method for resolving entity coreference
US9442929B2 (en) * 2013-02-12 2016-09-13 Microsoft Technology Licensing, Llc Determining documents that match a query
US10140664B2 (en) * 2013-03-14 2018-11-27 Palantir Technologies Inc. Resolving similar entities from a transaction database

Also Published As

Publication number Publication date
CA2868540A1 (en) 2015-07-17
EP2897054A2 (en) 2015-07-22
AU2014253497A1 (en) 2015-08-06
US20150205803A1 (en) 2015-07-23
BR102014027639A2 (en) 2016-05-24
EP2897054A3 (en) 2015-09-16
US10311093B2 (en) 2019-06-04
MX2014013314A (en) 2016-03-15
MX355195B (en) 2018-04-06

Similar Documents

Publication Publication Date Title
EP2811414A3 (en) Confidence-driven rewriting of source texts for improved translation
GB201208529D0 (en) Foiling a document exploit attack
EP3336721A3 (en) Method and system for generating a parser and parsing complex data
TW201535157A (en) Voice input command
WO2013163644A3 (en) Updating a search index used to facilitate application searches
TW201612773A (en) Multi-command single utterance input method
TW201220098A (en) Presenting actions and providers associated with entities
WO2013019869A3 (en) Data fingerpringting for copy accuracy assurance
WO2014152161A3 (en) Multi-language information retrieval and advertising
MX359781B (en) Private information hiding method and device.
WO2009045668A3 (en) Two-pass hash extraction of text strings
WO2012143603A3 (en) Methods and apparatuses for facilitating gesture recognition
EP2624149A3 (en) Document processing employing probabilistic topic modeling of documents represented as text words transformed to a continuous space
WO2014210577A3 (en) Comparing extracted card data using continuous scanning
TW201222291A (en) Method and device for providing text segmentation results with multiple granularity levels
WO2014210548A3 (en) Extracting card data using card art
TW201426381A (en) Method and system for detecting malware applications
GB201618158D0 (en) Improved method, system and software for searching, identifying, retrieving and presenting electronic documents
WO2012112944A3 (en) Managing unwanted communications using template generation and fingerprint comparison features
TW201039147A (en) Fast merge support for legacy documents
Becker HOTPANTS: High Order Transform of PSF ANd Template Subtraction
AU2015284970B2 (en) Operating method for microphones and electronic device supporting the same
WO2013114212A3 (en) Notification and privacy management of online photos and videos
GB201209324D0 (en) Method and apparatus for assessing a translation
WO2014085832A3 (en) Event investigation within an online research system