IN2014MU00169A - - Google Patents

Info

Publication number
IN2014MU00169A
IN2014MU00169A IN169MU2014A IN2014MU00169A IN 2014MU00169 A IN2014MU00169 A IN 2014MU00169A IN 169MU2014 A IN169MU2014 A IN 169MU2014A IN 2014MU00169 A IN2014MU00169 A IN 2014MU00169A
Authority
IN
India
Prior art keywords
documents
entity
merged
document
bucket
Prior art date
Application number
Inventor
Puneet Agarwal
Gautam Shroff
Pankaj Malhotra
Original Assignee
Tata Consultancy Services Ltd
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Tata Consultancy Services Ltd filed Critical Tata Consultancy Services Ltd
Priority to IN169MU2014 priority Critical patent/IN2014MU00169A/en
Priority to EP14186280.5A priority patent/EP2897054A3/en
Priority to AU2014253497A priority patent/AU2014253497B2/en
Priority to CA2868540A priority patent/CA2868540C/en
Priority to MX2014013314A priority patent/MX355195B/en
Priority to US14/533,866 priority patent/US10311093B2/en
Priority to BR102014027639-4A priority patent/BR102014027639B1/en
Publication of IN2014MU00169A publication Critical patent/IN2014MU00169A/en

Links

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F16/00Information retrieval; Database structures therefor; File system structures therefor
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F16/00Information retrieval; Database structures therefor; File system structures therefor
    • G06F16/30Information retrieval; Database structures therefor; File system structures therefor of unstructured textual data
    • G06F16/35Clustering; Classification
    • G06F16/355Class or cluster creation or modification
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F16/00Information retrieval; Database structures therefor; File system structures therefor
    • G06F16/90Details of database functions independent of the retrieved data types
    • G06F16/901Indexing; Data structures therefor; Storage structures
    • G06F16/9024Graphs; Linked lists
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F16/00Information retrieval; Database structures therefor; File system structures therefor
    • G06F16/90Details of database functions independent of the retrieved data types
    • G06F16/93Document management systems
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F40/00Handling natural language data
    • G06F40/10Text processing
    • G06F40/12Use of codes for handling textual entities
    • G06F40/131Fragmentation of text files, e.g. creating reusable text-blocks; Linking to fragments, e.g. using XInclude; Namespaces
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F40/00Handling natural language data
    • G06F40/10Text processing
    • G06F40/194Calculation of difference between files
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F40/00Handling natural language data
    • G06F40/20Natural language analysis
    • G06F40/279Recognition of textual entities
    • G06F40/289Phrasal analysis, e.g. finite state techniques or chunking
    • G06F40/295Named entity recognition

Abstract

The present subject matter relates to entity resolution, and in particular, relates to providing an entity resolution from documents. The method comprises obtaining the plurality of documents from at least one data source. The plurality of documents is blocked into at least one bucket based on textual similarity and inter-document references among the plurality of documents. Further, within each bucket, a merged document for each entity may be created based on an iterative match-merge technique. The iterative match-merge technique identifies, from the plurality of documents, at least one matching pair of documents and merges the at least one matching pair of documents to create the merged document for each entity. The merged documents may be merged to generate a resolved entity-document for each entity based on a graph clustering technique.
IN169MU2014 2014-01-17 2014-01-17 IN2014MU00169A (en)

Priority Applications (7)

Application Number Priority Date Filing Date Title
IN169MU2014 IN2014MU00169A (en) 2014-01-17 2014-01-17
EP14186280.5A EP2897054A3 (en) 2014-01-17 2014-09-24 Entity resolution from documents
AU2014253497A AU2014253497B2 (en) 2014-01-17 2014-10-22 Entity resolution from documents
CA2868540A CA2868540C (en) 2014-01-17 2014-10-24 Entity resolution from documents
MX2014013314A MX355195B (en) 2014-01-17 2014-11-03 Entity resolution from documents.
US14/533,866 US10311093B2 (en) 2014-01-17 2014-11-05 Entity resolution from documents
BR102014027639-4A BR102014027639B1 (en) 2014-01-17 2014-11-05 Method for resolving entities from a plurality of documents, and entity resolving system for resolving entities from a plurality of documents

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
IN169MU2014 IN2014MU00169A (en) 2014-01-17 2014-01-17

Publications (1)

Publication Number Publication Date
IN2014MU00169A true IN2014MU00169A (en) 2015-08-28

Family

ID=51625852

Family Applications (1)

Application Number Title Priority Date Filing Date
IN169MU2014 IN2014MU00169A (en) 2014-01-17 2014-01-17

Country Status (7)

Country Link
US (1) US10311093B2 (en)
EP (1) EP2897054A3 (en)
AU (1) AU2014253497B2 (en)
BR (1) BR102014027639B1 (en)
CA (1) CA2868540C (en)
IN (1) IN2014MU00169A (en)
MX (1) MX355195B (en)

Families Citing this family (6)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN109165291B (en) * 2018-06-29 2021-07-09 厦门快商通信息技术有限公司 Text matching method and electronic equipment
CN109635114A (en) * 2018-12-17 2019-04-16 北京百度网讯科技有限公司 Method and apparatus for handling information
FR3104282B1 (en) * 2019-12-05 2024-01-19 Codexo Saving documents in blocks
US20210342541A1 (en) * 2020-05-01 2021-11-04 Salesforce.Com, Inc. Stable identification of entity mentions
CN111882165A (en) * 2020-07-01 2020-11-03 国网河北省电力有限公司经济技术研究院 Device and method for splitting comprehensive project cost analysis data
WO2024036394A1 (en) * 2022-08-18 2024-02-22 9197-1168 Québec Inc. Systems and methods for identifying documents and references

Family Cites Families (11)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US7213198B1 (en) * 1999-08-12 2007-05-01 Google Inc. Link based clustering of hyperlinked documents
JP2006505873A (en) * 2002-11-06 2006-02-16 インターナショナル・ビジネス・マシーンズ・コーポレーション Confidential data sharing and anonymous entity resolution
US8683312B2 (en) * 2005-06-16 2014-03-25 Adobe Systems Incorporated Inter-document links involving embedded documents
US8423425B2 (en) * 2007-11-14 2013-04-16 Panjiva, Inc. Evaluating public records of supply transactions for financial investment decisions
US20090204590A1 (en) * 2008-02-11 2009-08-13 Queplix Corp. System and method for an integrated enterprise search
US8805861B2 (en) * 2008-12-09 2014-08-12 Google Inc. Methods and systems to train models to extract and integrate information from data sources
US20110119268A1 (en) * 2009-11-13 2011-05-19 Rajaram Shyam Sundar Method and system for segmenting query urls
WO2011109921A1 (en) * 2010-03-12 2011-09-15 Telefonaktiebolaget L M Ericsson (Publ) System and method for matching entities and synonym group organizer used therein
US9189473B2 (en) * 2012-05-18 2015-11-17 Xerox Corporation System and method for resolving entity coreference
US9442929B2 (en) * 2013-02-12 2016-09-13 Microsoft Technology Licensing, Llc Determining documents that match a query
US10140664B2 (en) * 2013-03-14 2018-11-27 Palantir Technologies Inc. Resolving similar entities from a transaction database

Also Published As

Publication number Publication date
MX355195B (en) 2018-04-06
BR102014027639B1 (en) 2022-05-03
BR102014027639A2 (en) 2016-05-24
US10311093B2 (en) 2019-06-04
EP2897054A2 (en) 2015-07-22
CA2868540C (en) 2020-09-22
EP2897054A3 (en) 2015-09-16
US20150205803A1 (en) 2015-07-23
AU2014253497B2 (en) 2020-05-28
CA2868540A1 (en) 2015-07-17
AU2014253497A1 (en) 2015-08-06
MX2014013314A (en) 2016-03-15
BR102014027639A8 (en) 2021-08-24

Similar Documents

Publication Publication Date Title
MX355195B (en) Entity resolution from documents.
WO2015191746A8 (en) Systems and methods for a database of software artifacts
MX2014014048A (en) Entity resolution from documents.
PH12015000372A1 (en) Conversion of documents of different types to a uniform and an editable or a searchable format
EP2664997A3 (en) System and method for resolving named entity coreference
PH12016000485A1 (en) Document processing
GB2549875A (en) Automated content classification/filtering
IN2014MU00919A (en)
WO2015181639A3 (en) Methods and computer-program products for organizing electronic documents
GB2550777A (en) Classification and storage of documents
MY176481A (en) Method and apparatus for classifying object based on social networking service, and storage medium
GB2513747A (en) System and method for detecting malware in documents
GB2536826A (en) Matching of an input document to documents in a document collection
AU2015364405A8 (en) Methods for simultaneous source separation
IN2015CH01303A (en)
CA2912019C (en) Systems and methods for generating issue networks
GB2583636A8 (en) Facilitation of domain and client-specific application program interface recommendations
GB2527230A (en) Processing seismic attributes using mathematical morphology
GB2533243A (en) Document-based search with facet information
WO2014012863A3 (en) Method of automatically extracting features from a computer readable file
GB201300134D0 (en) Method and apparautus for analyzing a document
Angelina Uy et al. Revisiting point cloud classification: A new benchmark dataset and classification model on real-world data
IN2013MU03153A (en)
Yu et al. Named entity linking based on Wikipedia
Dixon et al. Morphodynamic model validation for tropical river junctions