WO2023201791A1 - Procédé et appareil de reconnaissance d'entité de données, dispositif informatique et support de stockage - Google Patents
Procédé et appareil de reconnaissance d'entité de données, dispositif informatique et support de stockage Download PDFInfo
- Publication number
- WO2023201791A1 WO2023201791A1 PCT/CN2022/092575 CN2022092575W WO2023201791A1 WO 2023201791 A1 WO2023201791 A1 WO 2023201791A1 CN 2022092575 W CN2022092575 W CN 2022092575W WO 2023201791 A1 WO2023201791 A1 WO 2023201791A1
- Authority
- WO
- WIPO (PCT)
- Prior art keywords
- data
- predicates
- matching
- data set
- entity
- Prior art date
Links
- 238000000034 method Methods 0.000 title claims abstract description 58
- 238000004364 calculation method Methods 0.000 claims abstract description 27
- 238000005516 engineering process Methods 0.000 claims abstract description 23
- 230000006870 function Effects 0.000 claims description 29
- 238000004590 computer program Methods 0.000 claims description 12
- 239000013598 vector Substances 0.000 claims description 4
- 238000004422 calculation algorithm Methods 0.000 abstract description 15
- 238000004891 communication Methods 0.000 abstract description 5
- 230000008569 process Effects 0.000 description 6
- 238000000638 solvent extraction Methods 0.000 description 5
- 238000010586 diagram Methods 0.000 description 4
- 230000029087 digestion Effects 0.000 description 4
- 238000013461 design Methods 0.000 description 2
- 238000013507 mapping Methods 0.000 description 2
- 238000012986 modification Methods 0.000 description 2
- 230000004048 modification Effects 0.000 description 2
- 238000005457 optimization Methods 0.000 description 2
- 241000776471 DPANN group Species 0.000 description 1
- 230000000903 blocking effect Effects 0.000 description 1
- 230000008859 change Effects 0.000 description 1
- 238000011217 control strategy Methods 0.000 description 1
- 238000013480 data collection Methods 0.000 description 1
- 238000000354 decomposition reaction Methods 0.000 description 1
- 238000013135 deep learning Methods 0.000 description 1
- 230000000694 effects Effects 0.000 description 1
- 238000010801 machine learning Methods 0.000 description 1
- 230000007246 mechanism Effects 0.000 description 1
- 230000003287 optical effect Effects 0.000 description 1
- 238000005192 partition Methods 0.000 description 1
- 230000000750 progressive effect Effects 0.000 description 1
- 238000013526 transfer learning Methods 0.000 description 1
Images
Classifications
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F16/00—Information retrieval; Database structures therefor; File system structures therefor
- G06F16/20—Information retrieval; Database structures therefor; File system structures therefor of structured data, e.g. relational data
- G06F16/21—Design, administration or maintenance of databases
- G06F16/215—Improving data quality; Data cleansing, e.g. de-duplication, removing invalid entries or correcting typographical errors
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F16/00—Information retrieval; Database structures therefor; File system structures therefor
- G06F16/20—Information retrieval; Database structures therefor; File system structures therefor of structured data, e.g. relational data
- G06F16/22—Indexing; Data structures therefor; Storage structures
- G06F16/2228—Indexing structures
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F16/00—Information retrieval; Database structures therefor; File system structures therefor
- G06F16/20—Information retrieval; Database structures therefor; File system structures therefor of structured data, e.g. relational data
- G06F16/24—Querying
- G06F16/242—Query formulation
- G06F16/2433—Query languages
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F16/00—Information retrieval; Database structures therefor; File system structures therefor
- G06F16/20—Information retrieval; Database structures therefor; File system structures therefor of structured data, e.g. relational data
- G06F16/24—Querying
- G06F16/245—Query processing
- G06F16/2453—Query optimisation
- G06F16/24534—Query rewriting; Transformation
- G06F16/24542—Plan optimisation
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F16/00—Information retrieval; Database structures therefor; File system structures therefor
- G06F16/20—Information retrieval; Database structures therefor; File system structures therefor of structured data, e.g. relational data
- G06F16/28—Databases characterised by their database models, e.g. relational or object models
- G06F16/284—Relational databases
Definitions
- Entity recognition also known as deduplication, entity resolution, or record linking, refers to the process of identifying tuples in relationships that point to the same real-world entity. As an important method to improve data quality, entity recognition has received widespread attention from researchers. We classify relevant technical backgrounds as follows.
- Data set building unit used to build data sets through the relational schema and attributes of data
- the Hash function may correspond to different dimensions of the hypercube and thus to different locations (i.e., computational nodes). This creates redundant communication costs among computing nodes.
- This embodiment uses the following data structure to solve the above problems.
- the data entity identification device 400 further includes:
- the data entity identification device 400 further includes:
- the matching calculation unit 404 includes:
- the relationship representation unit is used to set the dependency relationship H on the id predicate and the ML predicate.
- One of the dependency relationships is expressed as l 1 ⁇ l 2 ⁇ ... ⁇ l n ⁇ l, l and l i (i ⁇ [1,n]) are One of the id predicate or ML predicate;
Landscapes
- Engineering & Computer Science (AREA)
- Theoretical Computer Science (AREA)
- Databases & Information Systems (AREA)
- Physics & Mathematics (AREA)
- General Physics & Mathematics (AREA)
- General Engineering & Computer Science (AREA)
- Data Mining & Analysis (AREA)
- Computational Linguistics (AREA)
- Software Systems (AREA)
- Mathematical Physics (AREA)
- Operations Research (AREA)
- Quality & Reliability (AREA)
- Information Retrieval, Db Structures And Fs Structures Therefor (AREA)
Abstract
Sont divulgués dans la présente invention un procédé et un appareil de reconnaissance d'entité de données, ainsi qu'un dispositif informatique et un support de stockage. Le procédé consiste : à construire un ensemble de données au moyen d'un mode de relation et d'un attribut de données ; à réaliser une conjonction sur des prédicats dans l'ensemble de données, et à établir ne règle de mise en correspondance selon les prédicats conjonctifs et le mode de relation des données ; sur la base de la règle de mise en correspondance, à générer un plan d'interrogation à l'aide d'une technologie MQO ; et à réaliser un calcul de mise en correspondance sur un ensemble de données d'entité à l'aide du plan d'interrogation. Des dépendances de mise en correspondance étendues (MRL) sont fournies dans la présente invention en tant que modèle de règle pour une résolution d'entité, et un algorithme de résolution d'entité parallèle (PER) approprié pour des MRL est également fourni, c'est-à-dire que les MRL sont utilisés en tant que règle de mise en correspondance, de façon à obtenir une précision et une interprétabilité élevées ; de plus, des procédés HyperCube et MQO sont combinés pour réduire les coûts de communication et de calcul. De plus, une structure de données spéciale d'un algorithme de mise en correspondance est en outre conçue dans la présente invention, de façon à accélérer l'exécution de l'algorithme et à réduire l'occupation de mémoire.
Applications Claiming Priority (2)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN202210430975.2 | 2022-04-22 | ||
CN202210430975.2A CN114780528B (zh) | 2022-04-22 | 2022-04-22 | 一种数据实体识别方法、装置、计算机设备及存储介质 |
Publications (1)
Publication Number | Publication Date |
---|---|
WO2023201791A1 true WO2023201791A1 (fr) | 2023-10-26 |
Family
ID=82431605
Family Applications (1)
Application Number | Title | Priority Date | Filing Date |
---|---|---|---|
PCT/CN2022/092575 WO2023201791A1 (fr) | 2022-04-22 | 2022-05-13 | Procédé et appareil de reconnaissance d'entité de données, dispositif informatique et support de stockage |
Country Status (2)
Country | Link |
---|---|
CN (1) | CN114780528B (fr) |
WO (1) | WO2023201791A1 (fr) |
Citations (5)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN103377186A (zh) * | 2012-04-26 | 2013-10-30 | 富士通株式会社 | 基于命名实体识别的Web服务整合装置、方法以及设备 |
US20140082014A1 (en) * | 2012-03-02 | 2014-03-20 | Logics Research Centre Sia | Abstract, structured data store querying technology |
US20140214868A1 (en) * | 2013-01-25 | 2014-07-31 | Wipro Limited | Methods for identifying unique entities across data sources and devices thereof |
CN109858018A (zh) * | 2018-12-25 | 2019-06-07 | 中国科学院信息工程研究所 | 一种面向威胁情报的实体识别方法及系统 |
CN112733541A (zh) * | 2021-01-06 | 2021-04-30 | 重庆邮电大学 | 基于注意力机制的BERT-BiGRU-IDCNN-CRF的命名实体识别方法 |
Family Cites Families (3)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
US20200320153A1 (en) * | 2019-04-02 | 2020-10-08 | International Business Machines Corporation | Method for accessing data records of a master data management system |
CN110807325B (zh) * | 2019-10-18 | 2023-05-26 | 腾讯科技(深圳)有限公司 | 谓词识别方法、装置及存储介质 |
CN113434693B (zh) * | 2021-06-23 | 2023-02-21 | 重庆邮电大学工业互联网研究院 | 一种基于智慧数据平台的数据集成方法 |
-
2022
- 2022-04-22 CN CN202210430975.2A patent/CN114780528B/zh active Active
- 2022-05-13 WO PCT/CN2022/092575 patent/WO2023201791A1/fr unknown
Patent Citations (5)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
US20140082014A1 (en) * | 2012-03-02 | 2014-03-20 | Logics Research Centre Sia | Abstract, structured data store querying technology |
CN103377186A (zh) * | 2012-04-26 | 2013-10-30 | 富士通株式会社 | 基于命名实体识别的Web服务整合装置、方法以及设备 |
US20140214868A1 (en) * | 2013-01-25 | 2014-07-31 | Wipro Limited | Methods for identifying unique entities across data sources and devices thereof |
CN109858018A (zh) * | 2018-12-25 | 2019-06-07 | 中国科学院信息工程研究所 | 一种面向威胁情报的实体识别方法及系统 |
CN112733541A (zh) * | 2021-01-06 | 2021-04-30 | 重庆邮电大学 | 基于注意力机制的BERT-BiGRU-IDCNN-CRF的命名实体识别方法 |
Also Published As
Publication number | Publication date |
---|---|
CN114780528B (zh) | 2024-07-09 |
CN114780528A (zh) | 2022-07-22 |
Similar Documents
Publication | Publication Date | Title |
---|---|---|
Wylot et al. | RDF data storage and query processing schemes: A survey | |
Özsu | A survey of RDF data management systems | |
Kim et al. | Taming subgraph isomorphism for RDF query processing | |
Tran et al. | Query reverse engineering | |
Stocker et al. | SPARQL basic graph pattern optimization using selectivity estimation | |
Umbrich et al. | Comparing data summaries for processing live queries over linked data | |
Yin et al. | Efficient classification across multiple database relations: A crossmine approach | |
EP2932412A2 (fr) | Traitement d'interrogation de graphe à l'aide d'une pluralité de moteurs | |
Wang et al. | An Efficient Sliding Window Approach for Approximate Entity Extraction with Synonyms. | |
Kargar et al. | Meaningful keyword search in relational databases with large and complex schema | |
Crescenzi et al. | Crowdsourcing for data management | |
CN115237937A (zh) | 一种基于星际文件系统的分布式协同查询处理系统 | |
Cheng et al. | Beyond pages: supporting efficient, scalable entity search with dual-inversion index | |
Tran et al. | Evaluation of set-based queries with aggregation constraints | |
Wu et al. | Discovering topical structures of databases | |
WO2023201791A1 (fr) | Procédé et appareil de reconnaissance d'entité de données, dispositif informatique et support de stockage | |
Yao et al. | Using user access patterns for semantic query caching | |
Li et al. | Optimizing keyword search over federated RDF systems | |
Weiner et al. | An integrative approach to query optimization in native XML database management systems | |
Zhang et al. | On-the-fly constraint mapping across web query interfaces | |
Kläbe et al. | PatchIndex: exploiting approximate constraints in distributed databases | |
Abdallah et al. | Towards a gml-enabled knowledge graph platform | |
Troullinou et al. | DIAERESIS: RDF data partitioning and query processing on SPARK | |
Li et al. | Exploring personal corespace for dataspace management | |
Straccia et al. | A System for Retrieving Top-k Candidates to Job Positions. |
Legal Events
Date | Code | Title | Description |
---|---|---|---|
121 | Ep: the epo has been informed by wipo that ep was designated in this application |
Ref document number: 22938026 Country of ref document: EP Kind code of ref document: A1 |