WO2023201791A1 - Procédé et appareil de reconnaissance d'entité de données, dispositif informatique et support de stockage - Google Patents

Procédé et appareil de reconnaissance d'entité de données, dispositif informatique et support de stockage Download PDF

Info

Publication number
WO2023201791A1
WO2023201791A1 PCT/CN2022/092575 CN2022092575W WO2023201791A1 WO 2023201791 A1 WO2023201791 A1 WO 2023201791A1 CN 2022092575 W CN2022092575 W CN 2022092575W WO 2023201791 A1 WO2023201791 A1 WO 2023201791A1
Authority
WO
WIPO (PCT)
Prior art keywords
data
predicates
matching
data set
entity
Prior art date
Application number
PCT/CN2022/092575
Other languages
English (en)
Chinese (zh)
Inventor
樊文飞
陆平
朱筱可
Original Assignee
深圳计算科学研究院
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by 深圳计算科学研究院 filed Critical 深圳计算科学研究院
Publication of WO2023201791A1 publication Critical patent/WO2023201791A1/fr

Links

Images

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F16/00Information retrieval; Database structures therefor; File system structures therefor
    • G06F16/20Information retrieval; Database structures therefor; File system structures therefor of structured data, e.g. relational data
    • G06F16/21Design, administration or maintenance of databases
    • G06F16/215Improving data quality; Data cleansing, e.g. de-duplication, removing invalid entries or correcting typographical errors
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F16/00Information retrieval; Database structures therefor; File system structures therefor
    • G06F16/20Information retrieval; Database structures therefor; File system structures therefor of structured data, e.g. relational data
    • G06F16/22Indexing; Data structures therefor; Storage structures
    • G06F16/2228Indexing structures
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F16/00Information retrieval; Database structures therefor; File system structures therefor
    • G06F16/20Information retrieval; Database structures therefor; File system structures therefor of structured data, e.g. relational data
    • G06F16/24Querying
    • G06F16/242Query formulation
    • G06F16/2433Query languages
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F16/00Information retrieval; Database structures therefor; File system structures therefor
    • G06F16/20Information retrieval; Database structures therefor; File system structures therefor of structured data, e.g. relational data
    • G06F16/24Querying
    • G06F16/245Query processing
    • G06F16/2453Query optimisation
    • G06F16/24534Query rewriting; Transformation
    • G06F16/24542Plan optimisation
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F16/00Information retrieval; Database structures therefor; File system structures therefor
    • G06F16/20Information retrieval; Database structures therefor; File system structures therefor of structured data, e.g. relational data
    • G06F16/28Databases characterised by their database models, e.g. relational or object models
    • G06F16/284Relational databases

Definitions

  • Entity recognition also known as deduplication, entity resolution, or record linking, refers to the process of identifying tuples in relationships that point to the same real-world entity. As an important method to improve data quality, entity recognition has received widespread attention from researchers. We classify relevant technical backgrounds as follows.
  • Data set building unit used to build data sets through the relational schema and attributes of data
  • the Hash function may correspond to different dimensions of the hypercube and thus to different locations (i.e., computational nodes). This creates redundant communication costs among computing nodes.
  • This embodiment uses the following data structure to solve the above problems.
  • the data entity identification device 400 further includes:
  • the data entity identification device 400 further includes:
  • the matching calculation unit 404 includes:
  • the relationship representation unit is used to set the dependency relationship H on the id predicate and the ML predicate.
  • One of the dependency relationships is expressed as l 1 ⁇ l 2 ⁇ ... ⁇ l n ⁇ l, l and l i (i ⁇ [1,n]) are One of the id predicate or ML predicate;

Landscapes

  • Engineering & Computer Science (AREA)
  • Theoretical Computer Science (AREA)
  • Databases & Information Systems (AREA)
  • Physics & Mathematics (AREA)
  • General Physics & Mathematics (AREA)
  • General Engineering & Computer Science (AREA)
  • Data Mining & Analysis (AREA)
  • Computational Linguistics (AREA)
  • Software Systems (AREA)
  • Mathematical Physics (AREA)
  • Operations Research (AREA)
  • Quality & Reliability (AREA)
  • Information Retrieval, Db Structures And Fs Structures Therefor (AREA)

Abstract

Sont divulgués dans la présente invention un procédé et un appareil de reconnaissance d'entité de données, ainsi qu'un dispositif informatique et un support de stockage. Le procédé consiste : à construire un ensemble de données au moyen d'un mode de relation et d'un attribut de données ; à réaliser une conjonction sur des prédicats dans l'ensemble de données, et à établir ne règle de mise en correspondance selon les prédicats conjonctifs et le mode de relation des données ; sur la base de la règle de mise en correspondance, à générer un plan d'interrogation à l'aide d'une technologie MQO ; et à réaliser un calcul de mise en correspondance sur un ensemble de données d'entité à l'aide du plan d'interrogation. Des dépendances de mise en correspondance étendues (MRL) sont fournies dans la présente invention en tant que modèle de règle pour une résolution d'entité, et un algorithme de résolution d'entité parallèle (PER) approprié pour des MRL est également fourni, c'est-à-dire que les MRL sont utilisés en tant que règle de mise en correspondance, de façon à obtenir une précision et une interprétabilité élevées ; de plus, des procédés HyperCube et MQO sont combinés pour réduire les coûts de communication et de calcul. De plus, une structure de données spéciale d'un algorithme de mise en correspondance est en outre conçue dans la présente invention, de façon à accélérer l'exécution de l'algorithme et à réduire l'occupation de mémoire.
PCT/CN2022/092575 2022-04-22 2022-05-13 Procédé et appareil de reconnaissance d'entité de données, dispositif informatique et support de stockage WO2023201791A1 (fr)

Applications Claiming Priority (2)

Application Number Priority Date Filing Date Title
CN202210430975.2 2022-04-22
CN202210430975.2A CN114780528B (zh) 2022-04-22 2022-04-22 一种数据实体识别方法、装置、计算机设备及存储介质

Publications (1)

Publication Number Publication Date
WO2023201791A1 true WO2023201791A1 (fr) 2023-10-26

Family

ID=82431605

Family Applications (1)

Application Number Title Priority Date Filing Date
PCT/CN2022/092575 WO2023201791A1 (fr) 2022-04-22 2022-05-13 Procédé et appareil de reconnaissance d'entité de données, dispositif informatique et support de stockage

Country Status (2)

Country Link
CN (1) CN114780528B (fr)
WO (1) WO2023201791A1 (fr)

Citations (5)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN103377186A (zh) * 2012-04-26 2013-10-30 富士通株式会社 基于命名实体识别的Web服务整合装置、方法以及设备
US20140082014A1 (en) * 2012-03-02 2014-03-20 Logics Research Centre Sia Abstract, structured data store querying technology
US20140214868A1 (en) * 2013-01-25 2014-07-31 Wipro Limited Methods for identifying unique entities across data sources and devices thereof
CN109858018A (zh) * 2018-12-25 2019-06-07 中国科学院信息工程研究所 一种面向威胁情报的实体识别方法及系统
CN112733541A (zh) * 2021-01-06 2021-04-30 重庆邮电大学 基于注意力机制的BERT-BiGRU-IDCNN-CRF的命名实体识别方法

Family Cites Families (3)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US20200320153A1 (en) * 2019-04-02 2020-10-08 International Business Machines Corporation Method for accessing data records of a master data management system
CN110807325B (zh) * 2019-10-18 2023-05-26 腾讯科技(深圳)有限公司 谓词识别方法、装置及存储介质
CN113434693B (zh) * 2021-06-23 2023-02-21 重庆邮电大学工业互联网研究院 一种基于智慧数据平台的数据集成方法

Patent Citations (5)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US20140082014A1 (en) * 2012-03-02 2014-03-20 Logics Research Centre Sia Abstract, structured data store querying technology
CN103377186A (zh) * 2012-04-26 2013-10-30 富士通株式会社 基于命名实体识别的Web服务整合装置、方法以及设备
US20140214868A1 (en) * 2013-01-25 2014-07-31 Wipro Limited Methods for identifying unique entities across data sources and devices thereof
CN109858018A (zh) * 2018-12-25 2019-06-07 中国科学院信息工程研究所 一种面向威胁情报的实体识别方法及系统
CN112733541A (zh) * 2021-01-06 2021-04-30 重庆邮电大学 基于注意力机制的BERT-BiGRU-IDCNN-CRF的命名实体识别方法

Also Published As

Publication number Publication date
CN114780528B (zh) 2024-07-09
CN114780528A (zh) 2022-07-22

Similar Documents

Publication Publication Date Title
Wylot et al. RDF data storage and query processing schemes: A survey
Özsu A survey of RDF data management systems
Kim et al. Taming subgraph isomorphism for RDF query processing
Tran et al. Query reverse engineering
Stocker et al. SPARQL basic graph pattern optimization using selectivity estimation
Umbrich et al. Comparing data summaries for processing live queries over linked data
Yin et al. Efficient classification across multiple database relations: A crossmine approach
EP2932412A2 (fr) Traitement d'interrogation de graphe à l'aide d'une pluralité de moteurs
Wang et al. An Efficient Sliding Window Approach for Approximate Entity Extraction with Synonyms.
Kargar et al. Meaningful keyword search in relational databases with large and complex schema
Crescenzi et al. Crowdsourcing for data management
CN115237937A (zh) 一种基于星际文件系统的分布式协同查询处理系统
Cheng et al. Beyond pages: supporting efficient, scalable entity search with dual-inversion index
Tran et al. Evaluation of set-based queries with aggregation constraints
Wu et al. Discovering topical structures of databases
WO2023201791A1 (fr) Procédé et appareil de reconnaissance d'entité de données, dispositif informatique et support de stockage
Yao et al. Using user access patterns for semantic query caching
Li et al. Optimizing keyword search over federated RDF systems
Weiner et al. An integrative approach to query optimization in native XML database management systems
Zhang et al. On-the-fly constraint mapping across web query interfaces
Kläbe et al. PatchIndex: exploiting approximate constraints in distributed databases
Abdallah et al. Towards a gml-enabled knowledge graph platform
Troullinou et al. DIAERESIS: RDF data partitioning and query processing on SPARK
Li et al. Exploring personal corespace for dataspace management
Straccia et al. A System for Retrieving Top-k Candidates to Job Positions.

Legal Events

Date Code Title Description
121 Ep: the epo has been informed by wipo that ep was designated in this application

Ref document number: 22938026

Country of ref document: EP

Kind code of ref document: A1