IN2014MU00169A - - Google Patents
Info
- Publication number
- IN2014MU00169A IN2014MU00169A IN169MU2014A IN2014MU00169A IN 2014MU00169 A IN2014MU00169 A IN 2014MU00169A IN 169MU2014 A IN169MU2014 A IN 169MU2014A IN 2014MU00169 A IN2014MU00169 A IN 2014MU00169A
- Authority
- IN
- India
- Prior art keywords
- documents
- entity
- merged
- document
- bucket
- Prior art date
Links
Classifications
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F16/00—Information retrieval; Database structures therefor; File system structures therefor
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F16/00—Information retrieval; Database structures therefor; File system structures therefor
- G06F16/30—Information retrieval; Database structures therefor; File system structures therefor of unstructured textual data
- G06F16/35—Clustering; Classification
- G06F16/355—Class or cluster creation or modification
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F16/00—Information retrieval; Database structures therefor; File system structures therefor
- G06F16/90—Details of database functions independent of the retrieved data types
- G06F16/901—Indexing; Data structures therefor; Storage structures
- G06F16/9024—Graphs; Linked lists
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F16/00—Information retrieval; Database structures therefor; File system structures therefor
- G06F16/90—Details of database functions independent of the retrieved data types
- G06F16/93—Document management systems
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F40/00—Handling natural language data
- G06F40/10—Text processing
- G06F40/12—Use of codes for handling textual entities
- G06F40/131—Fragmentation of text files, e.g. creating reusable text-blocks; Linking to fragments, e.g. using XInclude; Namespaces
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F40/00—Handling natural language data
- G06F40/10—Text processing
- G06F40/194—Calculation of difference between files
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F40/00—Handling natural language data
- G06F40/20—Natural language analysis
- G06F40/279—Recognition of textual entities
- G06F40/289—Phrasal analysis, e.g. finite state techniques or chunking
- G06F40/295—Named entity recognition
Abstract
The present subject matter relates to entity resolution, and in particular, relates to providing an entity resolution from documents. The method comprises obtaining the plurality of documents from at least one data source. The plurality of documents is blocked into at least one bucket based on textual similarity and inter-document references among the plurality of documents. Further, within each bucket, a merged document for each entity may be created based on an iterative match-merge technique. The iterative match-merge technique identifies, from the plurality of documents, at least one matching pair of documents and merges the at least one matching pair of documents to create the merged document for each entity. The merged documents may be merged to generate a resolved entity-document for each entity based on a graph clustering technique.
Priority Applications (7)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
IN169MU2014 IN2014MU00169A (en) | 2014-01-17 | 2014-01-17 | |
EP14186280.5A EP2897054A3 (en) | 2014-01-17 | 2014-09-24 | Entity resolution from documents |
AU2014253497A AU2014253497B2 (en) | 2014-01-17 | 2014-10-22 | Entity resolution from documents |
CA2868540A CA2868540C (en) | 2014-01-17 | 2014-10-24 | Entity resolution from documents |
MX2014013314A MX355195B (en) | 2014-01-17 | 2014-11-03 | Entity resolution from documents. |
US14/533,866 US10311093B2 (en) | 2014-01-17 | 2014-11-05 | Entity resolution from documents |
BR102014027639-4A BR102014027639B1 (en) | 2014-01-17 | 2014-11-05 | Method for resolving entities from a plurality of documents, and entity resolving system for resolving entities from a plurality of documents |
Applications Claiming Priority (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
IN169MU2014 IN2014MU00169A (en) | 2014-01-17 | 2014-01-17 |
Publications (1)
Publication Number | Publication Date |
---|---|
IN2014MU00169A true IN2014MU00169A (en) | 2015-08-28 |
Family
ID=51625852
Family Applications (1)
Application Number | Title | Priority Date | Filing Date |
---|---|---|---|
IN169MU2014 IN2014MU00169A (en) | 2014-01-17 | 2014-01-17 |
Country Status (7)
Country | Link |
---|---|
US (1) | US10311093B2 (en) |
EP (1) | EP2897054A3 (en) |
AU (1) | AU2014253497B2 (en) |
BR (1) | BR102014027639B1 (en) |
CA (1) | CA2868540C (en) |
IN (1) | IN2014MU00169A (en) |
MX (1) | MX355195B (en) |
Families Citing this family (6)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN109165291B (en) * | 2018-06-29 | 2021-07-09 | 厦门快商通信息技术有限公司 | Text matching method and electronic equipment |
CN109635114A (en) * | 2018-12-17 | 2019-04-16 | 北京百度网讯科技有限公司 | Method and apparatus for handling information |
FR3104282B1 (en) * | 2019-12-05 | 2024-01-19 | Codexo | Saving documents in blocks |
US20210342541A1 (en) * | 2020-05-01 | 2021-11-04 | Salesforce.Com, Inc. | Stable identification of entity mentions |
CN111882165A (en) * | 2020-07-01 | 2020-11-03 | 国网河北省电力有限公司经济技术研究院 | Device and method for splitting comprehensive project cost analysis data |
WO2024036394A1 (en) * | 2022-08-18 | 2024-02-22 | 9197-1168 Québec Inc. | Systems and methods for identifying documents and references |
Family Cites Families (11)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
US7213198B1 (en) * | 1999-08-12 | 2007-05-01 | Google Inc. | Link based clustering of hyperlinked documents |
JP2006505873A (en) * | 2002-11-06 | 2006-02-16 | インターナショナル・ビジネス・マシーンズ・コーポレーション | Confidential data sharing and anonymous entity resolution |
US8683312B2 (en) * | 2005-06-16 | 2014-03-25 | Adobe Systems Incorporated | Inter-document links involving embedded documents |
US8423425B2 (en) * | 2007-11-14 | 2013-04-16 | Panjiva, Inc. | Evaluating public records of supply transactions for financial investment decisions |
US20090204590A1 (en) * | 2008-02-11 | 2009-08-13 | Queplix Corp. | System and method for an integrated enterprise search |
US8805861B2 (en) * | 2008-12-09 | 2014-08-12 | Google Inc. | Methods and systems to train models to extract and integrate information from data sources |
US20110119268A1 (en) * | 2009-11-13 | 2011-05-19 | Rajaram Shyam Sundar | Method and system for segmenting query urls |
WO2011109921A1 (en) * | 2010-03-12 | 2011-09-15 | Telefonaktiebolaget L M Ericsson (Publ) | System and method for matching entities and synonym group organizer used therein |
US9189473B2 (en) * | 2012-05-18 | 2015-11-17 | Xerox Corporation | System and method for resolving entity coreference |
US9442929B2 (en) * | 2013-02-12 | 2016-09-13 | Microsoft Technology Licensing, Llc | Determining documents that match a query |
US10140664B2 (en) * | 2013-03-14 | 2018-11-27 | Palantir Technologies Inc. | Resolving similar entities from a transaction database |
-
2014
- 2014-01-17 IN IN169MU2014 patent/IN2014MU00169A/en unknown
- 2014-09-24 EP EP14186280.5A patent/EP2897054A3/en not_active Ceased
- 2014-10-22 AU AU2014253497A patent/AU2014253497B2/en active Active
- 2014-10-24 CA CA2868540A patent/CA2868540C/en active Active
- 2014-11-03 MX MX2014013314A patent/MX355195B/en active IP Right Grant
- 2014-11-05 US US14/533,866 patent/US10311093B2/en active Active
- 2014-11-05 BR BR102014027639-4A patent/BR102014027639B1/en active IP Right Grant
Also Published As
Publication number | Publication date |
---|---|
MX355195B (en) | 2018-04-06 |
BR102014027639B1 (en) | 2022-05-03 |
BR102014027639A2 (en) | 2016-05-24 |
US10311093B2 (en) | 2019-06-04 |
EP2897054A2 (en) | 2015-07-22 |
CA2868540C (en) | 2020-09-22 |
EP2897054A3 (en) | 2015-09-16 |
US20150205803A1 (en) | 2015-07-23 |
AU2014253497B2 (en) | 2020-05-28 |
CA2868540A1 (en) | 2015-07-17 |
AU2014253497A1 (en) | 2015-08-06 |
MX2014013314A (en) | 2016-03-15 |
BR102014027639A8 (en) | 2021-08-24 |
Similar Documents
Publication | Publication Date | Title |
---|---|---|
MX355195B (en) | Entity resolution from documents. | |
WO2015191746A8 (en) | Systems and methods for a database of software artifacts | |
MX2014014048A (en) | Entity resolution from documents. | |
PH12015000372A1 (en) | Conversion of documents of different types to a uniform and an editable or a searchable format | |
EP2664997A3 (en) | System and method for resolving named entity coreference | |
PH12016000485A1 (en) | Document processing | |
GB2549875A (en) | Automated content classification/filtering | |
IN2014MU00919A (en) | ||
WO2015181639A3 (en) | Methods and computer-program products for organizing electronic documents | |
GB2550777A (en) | Classification and storage of documents | |
MY176481A (en) | Method and apparatus for classifying object based on social networking service, and storage medium | |
GB2513747A (en) | System and method for detecting malware in documents | |
GB2536826A (en) | Matching of an input document to documents in a document collection | |
AU2015364405A8 (en) | Methods for simultaneous source separation | |
IN2015CH01303A (en) | ||
CA2912019C (en) | Systems and methods for generating issue networks | |
GB2583636A8 (en) | Facilitation of domain and client-specific application program interface recommendations | |
GB2527230A (en) | Processing seismic attributes using mathematical morphology | |
GB2533243A (en) | Document-based search with facet information | |
WO2014012863A3 (en) | Method of automatically extracting features from a computer readable file | |
GB201300134D0 (en) | Method and apparautus for analyzing a document | |
Angelina Uy et al. | Revisiting point cloud classification: A new benchmark dataset and classification model on real-world data | |
IN2013MU03153A (en) | ||
Yu et al. | Named entity linking based on Wikipedia | |
Dixon et al. | Morphodynamic model validation for tropical river junctions |