MX352046B - Resolución de la identidad de documentos. - Google Patents

Resolución de la identidad de documentos.

Info

Publication number
MX352046B
MX352046B MX2014014048A MX2014014048A MX352046B MX 352046 B MX352046 B MX 352046B MX 2014014048 A MX2014014048 A MX 2014014048A MX 2014014048 A MX2014014048 A MX 2014014048A MX 352046 B MX352046 B MX 352046B
Authority
MX
Mexico
Prior art keywords
documents
bucket
entity
entity resolution
record
Prior art date
Application number
MX2014014048A
Other languages
English (en)
Other versions
MX2014014048A (es
Inventor
Agarwal Puneet
Shroff Gautam
Malhotra Pankaj
Original Assignee
Tata Consultancy Services Ltd
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Tata Consultancy Services Ltd filed Critical Tata Consultancy Services Ltd
Publication of MX2014014048A publication Critical patent/MX2014014048A/es
Publication of MX352046B publication Critical patent/MX352046B/es

Links

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06VIMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
    • G06V30/00Character recognition; Recognising digital ink; Document-oriented image-based pattern recognition
    • G06V30/40Document-oriented image-based pattern recognition
    • G06V30/41Analysis of document content
    • G06V30/418Document matching, e.g. of document images
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F16/00Information retrieval; Database structures therefor; File system structures therefor
    • G06F16/20Information retrieval; Database structures therefor; File system structures therefor of structured data, e.g. relational data
    • G06F16/28Databases characterised by their database models, e.g. relational or object models
    • G06F16/284Relational databases
    • G06F16/285Clustering or classification
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F16/00Information retrieval; Database structures therefor; File system structures therefor
    • G06F16/20Information retrieval; Database structures therefor; File system structures therefor of structured data, e.g. relational data
    • G06F16/21Design, administration or maintenance of databases
    • G06F16/215Improving data quality; Data cleansing, e.g. de-duplication, removing invalid entries or correcting typographical errors
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F16/00Information retrieval; Database structures therefor; File system structures therefor
    • G06F16/20Information retrieval; Database structures therefor; File system structures therefor of structured data, e.g. relational data
    • G06F16/27Replication, distribution or synchronisation of data between databases or within a distributed database system; Distributed database system architectures therefor
    • G06F16/275Synchronous replication
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F16/00Information retrieval; Database structures therefor; File system structures therefor
    • G06F16/90Details of database functions independent of the retrieved data types
    • G06F16/95Retrieval from the web
    • G06F16/951Indexing; Web crawling techniques
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F40/00Handling natural language data
    • G06F40/20Natural language analysis
    • G06F40/279Recognition of textual entities
    • G06F40/289Phrasal analysis, e.g. finite state techniques or chunking
    • G06F40/295Named entity recognition

Landscapes

  • Engineering & Computer Science (AREA)
  • Theoretical Computer Science (AREA)
  • Databases & Information Systems (AREA)
  • Physics & Mathematics (AREA)
  • General Physics & Mathematics (AREA)
  • General Engineering & Computer Science (AREA)
  • Data Mining & Analysis (AREA)
  • Artificial Intelligence (AREA)
  • Quality & Reliability (AREA)
  • Health & Medical Sciences (AREA)
  • Audiology, Speech & Language Pathology (AREA)
  • Computational Linguistics (AREA)
  • General Health & Medical Sciences (AREA)
  • Computer Vision & Pattern Recognition (AREA)
  • Computing Systems (AREA)
  • Multimedia (AREA)
  • Information Retrieval, Db Structures And Fs Structures Therefor (AREA)

Abstract

La presente materia objeto se refiere a resolución de entidad, y en particular, se refiere a proporcionar una resolución de entidad a partir de documentos. El método comprende obtener una pluralidad de documentos que corresponde a una pluralidad de entidades, a partir de al menos una fuente de datos. Tras recibir la pluralidad de documentos, la pluralidad de documentos se agrupa en bloques en al menos un cubo basándose en la similitud textual. Además, se crea un grafo que incluye una pluralidad de vértices de registro y al menos un vértice de cubo. La pluralidad de vértices de registro y el al menos un vértice de cubo son indicativos de la pluralidad de documentos y el al menos un cubo, respectivamente. Posteriormente, se proporciona una notificación a un usuario para seleccionar una de una técnica de Paralelización Basada en Cubos (BCP) y una técnica de Paralelización Basada en Registros (RCP) para resolver entidades a partir de la pluralidad de documentos. Basándose en la selección, se crea un documento de entidad resuleta para cada entidad.
MX2014014048A 2014-03-06 2014-11-19 Resolución de la identidad de documentos. MX352046B (es)

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
IN770MU2014 2014-03-06

Publications (2)

Publication Number Publication Date
MX2014014048A MX2014014048A (es) 2016-05-04
MX352046B true MX352046B (es) 2017-11-07

Family

ID=54198670

Family Applications (1)

Application Number Title Priority Date Filing Date
MX2014014048A MX352046B (es) 2014-03-06 2014-11-19 Resolución de la identidad de documentos.

Country Status (6)

Country Link
US (1) US10346439B2 (es)
EP (1) EP2916242B1 (es)
AU (1) AU2014262240B2 (es)
BR (1) BR102014028893B1 (es)
CA (1) CA2871036C (es)
MX (1) MX352046B (es)

Families Citing this family (21)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US8285656B1 (en) 2007-03-30 2012-10-09 Consumerinfo.Com, Inc. Systems and methods for data verification
US8312033B1 (en) 2008-06-26 2012-11-13 Experian Marketing Solutions, Inc. Systems and methods for providing an integrated identifier
US9256904B1 (en) 2008-08-14 2016-02-09 Experian Information Solutions, Inc. Multi-bureau credit file freeze and unfreeze
US10262362B1 (en) 2014-02-14 2019-04-16 Experian Information Solutions, Inc. Automatic generation of code for attributes
US9946808B2 (en) 2014-07-09 2018-04-17 International Business Machines Corporation Using vertex self-information scores for vertices in an entity graph to determine whether to perform entity resolution on the vertices in the entity graph
US11132343B1 (en) * 2015-03-18 2021-09-28 Groupon, Inc. Automatic entity resolution data cleaning
US9348880B1 (en) * 2015-04-01 2016-05-24 Palantir Technologies, Inc. Federated search of multiple sources with conflict resolution
US20170083820A1 (en) * 2015-09-21 2017-03-23 International Business Machines Corporation Posterior probabilistic model for bucketing records
US10757154B1 (en) 2015-11-24 2020-08-25 Experian Information Solutions, Inc. Real-time event-based notification system
CN110383319B (zh) * 2017-01-31 2023-05-26 益百利信息解决方案公司 大规模异构数据摄取和用户解析
US10735183B1 (en) 2017-06-30 2020-08-04 Experian Information Solutions, Inc. Symmetric encryption for private smart contracts among multiple parties in a private peer-to-peer network
US11574287B2 (en) 2017-10-10 2023-02-07 Text IQ, Inc. Automatic document classification
US10783162B1 (en) 2017-12-07 2020-09-22 Palantir Technologies Inc. Workflow assistant
US10963434B1 (en) 2018-09-07 2021-03-30 Experian Information Solutions, Inc. Data architecture for supporting multiple search models
WO2020146667A1 (en) 2019-01-11 2020-07-16 Experian Information Solutions, Inc. Systems and methods for secure data aggregation and computation
US11520764B2 (en) * 2019-06-27 2022-12-06 International Business Machines Corporation Multicriteria record linkage with surrogate blocking keys
US11941065B1 (en) 2019-09-13 2024-03-26 Experian Information Solutions, Inc. Single identifier platform for storing entity data
US11880377B1 (en) 2021-03-26 2024-01-23 Experian Information Solutions, Inc. Systems and methods for entity resolution
US20230162518A1 (en) * 2021-11-24 2023-05-25 Adobe Inc. Systems for Generating Indications of Relationships between Electronic Documents
US20230214360A1 (en) * 2022-01-05 2023-07-06 Jpmorgan Chase Bank, N.A. Method and system for facilitating distributed entity resolution
US11995046B2 (en) * 2022-04-18 2024-05-28 Salesforce, Inc. Record management for database systems using fuzzy field matching

Family Cites Families (10)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US7200606B2 (en) * 2000-11-07 2007-04-03 The Regents Of The University Of California Method and system for selecting documents by measuring document quality
US7900052B2 (en) * 2002-11-06 2011-03-01 International Business Machines Corporation Confidential data sharing and anonymous entity resolution
US20040107205A1 (en) * 2002-12-03 2004-06-03 Lockheed Martin Corporation Boolean rule-based system for clustering similar records
US7596603B2 (en) * 2004-06-30 2009-09-29 International Business Machines Corporation Automatic email consolidation for multiple participants
US20110119268A1 (en) * 2009-11-13 2011-05-19 Rajaram Shyam Sundar Method and system for segmenting query urls
US9189473B2 (en) * 2012-05-18 2015-11-17 Xerox Corporation System and method for resolving entity coreference
US8533182B1 (en) * 2012-05-31 2013-09-10 David P. Charboneau Apparatuses, systems, and methods for efficient graph pattern matching and querying
US9594831B2 (en) * 2012-06-22 2017-03-14 Microsoft Technology Licensing, Llc Targeted disambiguation of named entities
US8819078B2 (en) * 2012-07-13 2014-08-26 Hewlett-Packard Development Company, L. P. Event processing for graph-structured data
US9483565B2 (en) * 2013-06-27 2016-11-01 Google Inc. Associating a task with a user based on user selection of a query suggestion

Also Published As

Publication number Publication date
AU2014262240A1 (en) 2015-09-24
EP2916242A1 (en) 2015-09-09
US10346439B2 (en) 2019-07-09
EP2916242B1 (en) 2019-06-05
MX2014014048A (es) 2016-05-04
BR102014028893A2 (pt) 2016-01-26
US20150254329A1 (en) 2015-09-10
CA2871036A1 (en) 2015-09-06
CA2871036C (en) 2020-03-10
AU2014262240B2 (en) 2020-08-13
BR102014028893B1 (pt) 2022-02-01

Similar Documents

Publication Publication Date Title
MX352046B (es) Resolución de la identidad de documentos.
EP3411864A4 (en) ENERGY ARMS, HIGH-RESOLUTION AUTOMATIC COUNTER READING, CENTRALIZED DATA COLLECTION AND ANALYSIS
WO2015191746A8 (en) Systems and methods for a database of software artifacts
GB2569920A (en) System, method and computer program for fault detection and location in power grid
PH12016000485A1 (en) Document processing
PH12015000372A1 (en) Conversion of documents of different types to a uniform and an editable or a searchable format
MX2016001025A (es) Metodo para la designacion de datos sismicos adquiridos utilizando una fuente en movimiento.
WO2015195676A3 (en) Computer-implemented tools and methods for extracting information about the structure of a large computer software system, exploring its structure, discovering problems in its design, and enabling refactoring
WO2013019869A3 (en) Data fingerpringting for copy accuracy assurance
GB2502715A (en) Malware Detection
MX2015009172A (es) Sistemas y metodos para identificar y reportar vulnerabilidades de aplicaciones y archivos.
SA519410568B1 (ar) دمج معلومات سياقية في سير عمل لعمليات حفرة بئر
IN2014CH02163A (es)
GB2533875A (en) In-situ wellbore, core and cuttings information system
PH12016000106B1 (en) Ticket solver system
GB2550777A (en) Classification and storage of documents
MX2014013314A (es) Resolucion de entidad de documentos.
GB2531585A8 (en) Methods and systems for generating a three dimensional model of a subject
GB2540700A (en) Merging multiple point-in-time copies into a merged point-in-time copy
MX338265B (es) Metodos y sistemas de computacion para procesar datos.
GB2539815A (en) Multi-Z Polyline to single-Z horizons conversion
GB2527230A (en) Processing seismic attributes using mathematical morphology
WO2014036306A3 (en) System and method for determining a value of information metric from a posterior distribution generated through stochastic inversion
IN2013CH05422A (es)
GB2548734A (en) A system and method for generating electronic inducements respective of location

Legal Events

Date Code Title Description
FG Grant or registration