WO2020135048A1 - Procédé et appareil de fusion de données pour graphe de connaissances - Google Patents

Procédé et appareil de fusion de données pour graphe de connaissances Download PDF

Info

Publication number
WO2020135048A1
WO2020135048A1 PCT/CN2019/124552 CN2019124552W WO2020135048A1 WO 2020135048 A1 WO2020135048 A1 WO 2020135048A1 CN 2019124552 W CN2019124552 W CN 2019124552W WO 2020135048 A1 WO2020135048 A1 WO 2020135048A1
Authority
WO
WIPO (PCT)
Prior art keywords
data
entity
module
matching
attributes
Prior art date
Application number
PCT/CN2019/124552
Other languages
English (en)
Chinese (zh)
Inventor
刘涛
朱宏明
顾江
姜逸之
王晓文
周游
Original Assignee
颖投信息科技(上海)有限公司
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by 颖投信息科技(上海)有限公司 filed Critical 颖投信息科技(上海)有限公司
Publication of WO2020135048A1 publication Critical patent/WO2020135048A1/fr

Links

Images

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F16/00Information retrieval; Database structures therefor; File system structures therefor
    • G06F16/20Information retrieval; Database structures therefor; File system structures therefor of structured data, e.g. relational data
    • G06F16/28Databases characterised by their database models, e.g. relational or object models

Definitions

  • This application relates to the technical field of knowledge graphs, and in particular, to a data fusion method and device for knowledge graphs.
  • the knowledge graph is a huge semantic network graph that describes various entities or concepts and their relationships in the real world. Its nodes represent entities or concepts, and edges are composed of attributes or relationships.
  • entity refers to something that is distinguishable and independent, such as a country, a company, a person, etc. Attributes refer to the inherent characteristics of an entity. For example, countries have different attributes such as "population” and "area” (as shown in Figure 4), and companies have attributes such as "name” and "legal representative”.
  • a relationship is a characteristic of an association between an entity and another entity. For example, a company is registered in a country, and a person is employed by a company.
  • the nodes and edges of the knowledge graph are generally defined in the form of triples (SPO, Subject-Property-Object), including (entity 1-relation-entity 2) and (entity-attribute-attribute value), and the knowledge graph can be Represented as a collection of triples, the data model can be represented in the form of a graph (as shown in Figure 4), and a graph database is used for data storage and management.
  • SPO Subject-Property-Object
  • entity 1-relation-entity 2 entity 1-relation-entity 2
  • entity-attribute-attribute value entity-attribute value
  • Existing data fusion solutions generally include the main steps of partition indexing, similarity calculation and entity fusion, but in specific implementation, the corresponding partitioning algorithm, similarity matching algorithm and entity alignment algorithm will be selected according to the characteristics of the data source and knowledge base, and Integrate the above solution into a complete system. When the scope of the data source or knowledge base changes, it is necessary to rebuild the data fusion system in order to adapt to the new requirements.
  • the present application provides a data fusion method and device for knowledge graph, which is used to solve the problem that the existing data fusion technology cannot flexibly adapt to the data fusion of different knowledge bases.
  • a data fusion method for knowledge graph disclosed in the present application includes a data platform configured with a unified access interface.
  • the method includes: processing data from different data sources and converting it into a triplet format , Store to the data platform through the unified access interface, and receive the graph data index information returned by the data platform; according to the graph data index information, the entities stored in the data platform are divided into one or more sub-attributes according to attributes Partition; perform similarity calculation on candidate entity pairs divided into the same sub-division, and select matching entity pairs that meet the preset similarity condition; supplement and/or replace the entity attribute values of the matching entity pairs to generate a unified Entity representation.
  • the method further includes: converting data from multiple data sources into a triplet format The entities stored in the data platform are then aligned according to the actual meaning of their attributes.
  • the sub-partition division method is to perform equivalent division based on a globally unique partition key generated by entity attributes, or to divide based on a preset clustering model.
  • the similarity calculation is performed on the candidate entity pairs divided into the same sub-partition, and the matching entity pairs that meet the preset similarity condition are selected, specifically: the attributes of the entity itself and the attributes of other entities related to the entity Set different weights and weighted sum to calculate the overall similarity of candidate entity pairs; if the overall similarity of candidate entity pairs in the same sub-partition exceeds the preset similarity threshold, the candidate entity pair is regarded as a matching entity pair.
  • the method of supplementing the missing entity attribute value is to obtain it from the network through a crawler or perform manual filling.
  • the graph data index information is a storage address of the graph data in the triplet format on the data platform and its metadata.
  • a data fusion device for knowledge graph disclosed in this application includes a data platform, a data preprocessing module, an entity partition module, an entity matching module and an entity fusion module, wherein: the data platform is configured with a unified access interface; the data pre-processing The processing module is configured to process data from different data sources and convert it into a triplet format, store to the data platform through the unified access interface, and receive graph data index information returned by the data platform; the entity partitioning module Configured to divide entities stored in the data platform into one or more sub-partitions according to attributes according to the graph data index information output by the data pre-processing module; the entity matching module is configured to divide the entity partition module into Similarity calculation is performed on candidate entity pairs in the same sub-division to select matching entity pairs that meet a preset similarity condition; the entity fusion module is configured to perform entity attribute values of the matching entity pairs selected by the entity matching module Supplement and/or replace to generate a unified physical representation.
  • the entity partitioning module includes an equivalent partitioning submodule and/or a clustering partitioning submodule; the equivalent partitioning submodule is configured to perform global unique partitioning key generation based on entity attributes on entities stored in the data platform Equivalent partitioning; the clustering sub-module is configured to partition entities stored in the data platform based on a preset clustering model.
  • the entity matching module specifically includes a similarity calculation submodule and a comparison submodule; the similarity calculation submodule is configured to set different weights for attributes of the entity itself and attributes of other entities related to the entity, Weighted summation calculates the overall similarity of candidate entity pairs; the comparison submodule is configured to determine whether the overall similarity of candidate entity pairs in the same sub-region exceeds a preset similarity threshold, and if so, the candidate entity pair is considered as a match Entity pair.
  • the similarity calculation submodule is configured to set different weights for attributes of the entity itself and attributes of other entities related to the entity, Weighted summation calculates the overall similarity of candidate entity pairs
  • the comparison submodule is configured to determine whether the overall similarity of candidate entity pairs in the same sub-region exceeds a preset similarity threshold, and if so, the candidate entity pair is considered as a match Entity pair.
  • the device further includes a data processing module and/or an attribute alignment module;
  • the data processing module is configured to process node entity data and edge entity data in the data platform through the unified access interface, and return data processing The result is passed to the next module;
  • the attribute alignment module is configured to align the entities stored in the data platform after the data from multiple data sources are processed by the data preprocessing module according to the actual meaning of their attributes.
  • the present application also discloses a storage medium on which a program configured to execute the above method is recorded.
  • the stages in the preferred embodiment of the present application have upstream and downstream dependencies on the pipeline, but different stages are only constrained by data format and decoupled from each other through the unified interface provided by the data platform, which can be independently developed.
  • the algorithm of each stage can be flexibly replaced.
  • a new process stage can be inserted between different stages to freely compile custom requirements.
  • this application has no restrictions on the architecture of the data platform. For example, a Hadoop distributed file system or a cloud computing architecture may be used to facilitate expansion of computing and storage resources when the amount of data increases.
  • FIG. 1 is a schematic flowchart of a first embodiment of a data fusion method for knowledge graph of the application
  • FIG. 2 is a schematic flowchart of a second embodiment of a data fusion method for knowledge graph of the application
  • FIG. 3 is a schematic structural diagram of an embodiment of a data fusion device for knowledge graphs of the application
  • Figure 4 is a schematic diagram of the graph data model of the knowledge graph.
  • first and second are used for descriptive purposes only, and cannot be understood as indicating or implying relative importance or implicitly indicating the number of indicated technical features.
  • the features defined as “first” and “second” may explicitly or implicitly include one or more of the features.
  • the meaning of “plurality” is two or more, unless specifically defined otherwise.
  • the terms “including”, “including” and similar terms should be understood as open terms, ie “including/including but not limited to”.
  • the term “based on” is “based at least in part on.”
  • one embodiment means “at least one embodiment”; the term “another embodiment” means “at least one other embodiment”.
  • Related definitions of other terms will be given in the description below.
  • the system implementing this method embodiment is provided with a data platform that provides an operating environment and computing resources for each stage, and each stage can pass data.
  • the platform's unified access interface enables interaction.
  • the data platform can be built on the Hadoop distributed file system, cloud computing architecture (such as Amazon AWS EMR) or other architectures, this application is not limited.
  • the method embodiment includes the following stages:
  • Data preprocessing stage processing data from multiple homogeneous or heterogeneous data sources (such as structured data A and unstructured data B) into the same entity and attribute format (SPO format) As an input to subsequent stages.
  • data is extracted, cleaned, and transformed from the data source and stored on the data platform in a unified data format.
  • a relational database data source by configuring connection information, entity types and entity tables, relationship types and relationship tables, you can extract the required SPO data.
  • nodes entity-attribute-attribute values
  • edges entity-relationship-entity
  • the data can be parsed and stored after reading the remote data.
  • you can call machine learning interfaces, network interfaces, etc. to complete knowledge extraction, save it as triple information, and return the address and metadata information of the saved data.
  • Entity partitioning stage Entities from multiple data sources are divided into different sub-partitions (Blocks) according to their attributes to reduce the data size of candidate matching pairs.
  • the size of the entity data of the data source S is m and the size of the entity data of the data source T is n, and the size of the data that needs to be checked for matching is m*n.
  • this data scale is basically unachievable, and the size of the data pairs that need to be matched must be reduced.
  • entity pairs that are unlikely to match in the two data sources can be divided into different data partitions in advance, so that the data size in each data partition is greatly reduced, and multiple data partitions can be calculated in parallel.
  • the partitioning method of the entity partitioning stage (BlockingStage) can be extended through a custom partitioning algorithm, for example, through the following interface form:
  • a globally unique block key (block key) can be generated based on the attributes of the current entity's partition and the next partition to divide the data into the next partition. When the number of possible matching entity pairs of the partition reaches the lowest value or the total number of partitions reaches the maximum value, the partition is no longer divided.
  • the clustering-based partitioning algorithm can be implemented using the already trained clustering model, and the interface form is as follows:
  • the clustering model can directly predict the current entity and correspond to a certain class. At this time, the number of partitions is equal to the number of classes in the clustering model. Of course, you can continue to divide the partition on the basis of clustering.
  • Entity matching stage For candidate entity pairs in the same partition, different weights can be set according to the attributes of the entity itself and the attributes of the entities associated with it, and the candidate entity pair is calculated by weighted summation Overall similarity; select candidate entity pairs that exceed a certain similarity threshold as matching entity pairs.
  • this process design allows the insertion of some rules based on strong associations to directly complete the matching, such as the company data in the two data sources, if they are both listed companies and the listed stock codes are exactly the same, they will be directly matched , Thereby skipping the similarity calculation process, thereby reducing the computational complexity of the matching phase.
  • the results generated by the matching algorithm can be compared with the verification data set to verify the accuracy of the matching algorithm.
  • the calculation results are compared multiple times to continuously improve accuracy. For example, two company entities are compared by the weighted sum of the similarity between the name and the stock symbol. If the name is expressed in different languages in different data sources, the similarity weight is lower, and the weight needs to be lowered, while the similarity of the stock symbol The relative weight should be set higher.
  • the entity matching algorithm of this application can adjust the parameters for multiple iterations to improve the accuracy of the matching results.
  • a pre-trained machine learning binary classification model is used, and the attribute similarity vectors of the two entities are used as input to infer the probability of whether they can be classified as the same entity (yes, it is 1).
  • the last matched entity pair will be output to the result set.
  • Entity fusion stage (MergeStage): The data in different data sources that actually point to the same entity are supplemented, replaced, and normalized according to the fusion algorithm, and the unified entity representation is finally generated.
  • Data fusion can be achieved by combining different business rules. For example, multiple anonymous names can be set, and standardized formats can be used for mailboxes and addresses.
  • the missing attribute data can be filled by crawlers or manual to construct high-quality data, which is convenient for the search and analysis of knowledge graphs.
  • stages of different functions can also be arranged.
  • the following forms of interface can be used:
  • the data to be processed is passed through the input configuration parameters, and the output is written to the output after the processing is completed, and passed to the next stage to realize the expansion of the system functions.
  • This application realizes a general pipeline (Pipeline) for entity fusion in a big data scenario through the above-mentioned means.
  • the pipeline is composed of multiple stages (Stage), each stage can be flexibly expanded through configuration, and custom stages (CustomStage) can be arranged to the pipeline to adapt to different application scenarios.
  • Input configuration can specify the entity list, relationship list, data address and related data element information (schema including table name, column name, etc.) from different data sources that need to be obtained during this stage of operation.
  • the input data is read, the algorithm is run, written to the data platform, and all data addresses and metadata are output through the output. Therefore, each stage can be run in series through input and output, or it can be run individually by specifying input parameters.
  • an attribute alignment stage is added between the data preprocessing stage and the entity partitioning stage ( Attribute Matching): used to align the pre-processed entities from multiple data sources stored in the data platform according to the actual meaning of their attributes, such as the "Address" field of data source A and the "Address” field of data source B
  • Attribute Matching used to align the pre-processed entities from multiple data sources stored in the data platform according to the actual meaning of their attributes, such as the "Address" field of data source A and the "Address” field of data source B
  • the fields are aligned, and the fields that are aligned in the subsequent partition and matching phases will be treated as fields with the same meaning.
  • the actual meaning of the entity attribute can be set manually, or can be realized by setting an attribute meaning comparison table in the system, which is not limited in this application.
  • the present application also discloses a storage medium on which the program for executing the above method is recorded.
  • the storage medium includes any mechanism configured to store or transfer information in a form readable by a computer (taking a computer as an example).
  • storage media include read-only memory (ROM), random access memory (RAM), magnetic disk storage media, optical storage media, flash storage media, electrical, optical, acoustic, or other forms of propagated signals (eg, carrier waves, infrared Signals, digital signals, etc.) etc.
  • FIG. 3 a structural block diagram of an embodiment of a data fusion device for knowledge graphs of the present application is shown, including a data platform 10, a data preprocessing module 11, an entity partitioning module 12, an entity matching module 13, and an entity fusion module 14, wherein:
  • the data platform 10 is configured with a unified access interface to provide computing and storage services for other modules.
  • This application has no restrictions on the architecture of the data platform. In order to facilitate the expansion of computing and storage resources when the amount of data increases, you can use the Hadoop distributed file system or cloud computing architecture.
  • the data pre-processing module 11 is used to process the data from different data sources and convert it into a triplet (SPO) format, store it to the data platform 10 through the unified access interface, and receive the graph data index information returned by the data platform 10 .
  • the graph data index information may be the storage address of the graph data in the triplet format in the data platform 10 and its metadata.
  • the entity partitioning module 12 is used to divide the entities stored in the data platform 10 into one or more sub-partitions according to attributes according to the graph data index information through the unified access interface.
  • the entity partitioning module 12 may include an equivalent partitioning sub-module for dividing the entities stored in the data platform by the globally unique partition key generated according to the attribute of the entity, and storing the data in the data platform based on the preset clustering model The entities are divided into clustering sub-modules, and/or sub-modules of other partitioning methods.
  • the entity matching module 13 is configured to perform similarity calculation on candidate entity pairs divided into the same sub-partition, and filter out matching entity pairs that meet a preset similarity condition.
  • the entity fusion module 14 is used to supplement and/or replace entity attribute values of the matching entity pairs to generate a unified entity representation.
  • Each functional module of the device embodiment of the present application has upstream and downstream dependencies on the pipeline, but different modules are only constrained by data format and decoupled from each other through the unified interface provided by the data platform, which can be independently developed.
  • the algorithm of each module can be flexibly replaced.
  • new modules can be inserted between different modules to freely compile custom requirements.
  • an attribute alignment module 15 may be inserted between the data preprocessing module 11 and the entity partitioning module 12, for The entities stored in the data platform 10 after being processed by the data preprocessing module 11 of the data source are aligned according to the actual meaning of their attributes. If the "Address" field of data source A is aligned with the "Address" field of data source B, the aligned fields in the subsequent partition and matching phases will be treated as fields of the same meaning.
  • the entity matching module 13 may specifically include a similarity calculation sub-module and a comparison sub-module; the similarity calculation sub-module is used for attributes of the entity itself and attributes of other entities related to the entity Set different weights and weighted sum to calculate the overall similarity of the candidate entity pairs; the comparison submodule is used to determine whether the overall similarity of the candidate entity pairs in the same sub-region exceeds the preset similarity threshold, if so, the candidate Entity pairs are used as matching entity pairs.
  • the similarity calculation sub-module is used for attributes of the entity itself and attributes of other entities related to the entity Set different weights and weighted sum to calculate the overall similarity of the candidate entity pairs
  • the comparison submodule is used to determine whether the overall similarity of the candidate entity pairs in the same sub-region exceeds the preset similarity threshold, if so, the candidate Entity pairs are used as matching entity pairs.
  • the device may further include a data processing module for processing node entity data and edge entity data in the data platform through the unified access interface, and returning the data processing result to the next One module.
  • the above data processing module can be implemented in the following forms:
  • the data to be processed is transmitted through the input configuration parameters, and after the data processing is completed, the result is written to the output and passed to the function module in the next stage to realize the expansion of the device function.

Landscapes

  • Engineering & Computer Science (AREA)
  • Databases & Information Systems (AREA)
  • Theoretical Computer Science (AREA)
  • Data Mining & Analysis (AREA)
  • Physics & Mathematics (AREA)
  • General Engineering & Computer Science (AREA)
  • General Physics & Mathematics (AREA)
  • Information Retrieval, Db Structures And Fs Structures Therefor (AREA)

Abstract

L'invention concerne un procédé et un appareil de fusion de données pour un graphe de connaissances. Un système pour mettre en œuvre le procédé comprend une plateforme de données conçue avec une interface d'accès unifiée. Le procédé consiste à : traiter des données provenant de différentes sources de données, puis à les convertir en un format sujet-propriété-objet, les stocker dans la plateforme de données au moyen de l'interface d'accès unifiée, et recevoir des informations d'index de données de graphique renvoyées par la plateforme de données; selon les informations d'index de données de graphique, diviser des sujets stockés dans la plate-forme de données en un ou plusieurs sous-blocs selon l'attribut; effectuer un calcul de similarité sur des sujets candidats classés dans le même sous-bloc, et cribler des paires de sujets concordants qui satisfont une condition de similarité prédéfinie; et compléter et/ou remplacer des valeurs d'attribut sujet des paires de sujets concordants pour générer une représentation de sujet unifiée. Par le procédé susmentionné, le problème de fusion de données selon lequel des techniques de fusion de données existantes ne peuvent pas s'adapter de manière flexible à différents graphes de connaissances peut être efficacement résolu.
PCT/CN2019/124552 2018-12-29 2019-12-11 Procédé et appareil de fusion de données pour graphe de connaissances WO2020135048A1 (fr)

Applications Claiming Priority (2)

Application Number Priority Date Filing Date Title
CN201811635696.X 2018-12-29
CN201811635696.XA CN109739939A (zh) 2018-12-29 2018-12-29 知识图谱的数据融合方法和装置

Publications (1)

Publication Number Publication Date
WO2020135048A1 true WO2020135048A1 (fr) 2020-07-02

Family

ID=66362378

Family Applications (1)

Application Number Title Priority Date Filing Date
PCT/CN2019/124552 WO2020135048A1 (fr) 2018-12-29 2019-12-11 Procédé et appareil de fusion de données pour graphe de connaissances

Country Status (2)

Country Link
CN (1) CN109739939A (fr)
WO (1) WO2020135048A1 (fr)

Cited By (1)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN112699252A (zh) * 2021-03-25 2021-04-23 成都数联铭品科技有限公司 应用于知识图谱的属性数据的处理方法及电子设备

Families Citing this family (29)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN109783582B (zh) * 2018-12-04 2023-08-15 平安科技(深圳)有限公司 一种知识库对齐方法、装置、计算机设备及存储介质
CN109739939A (zh) * 2018-12-29 2019-05-10 颖投信息科技(上海)有限公司 知识图谱的数据融合方法和装置
CN110427415A (zh) * 2019-08-02 2019-11-08 泰康保险集团股份有限公司 知识库共享方法、装置、系统介质及电子设备
CN110532304B (zh) * 2019-09-06 2020-11-24 京东城市(北京)数字科技有限公司 数据处理方法及装置、计算机可读存储介质以及电子设备
CN110580294B (zh) * 2019-09-11 2022-11-29 腾讯科技(深圳)有限公司 实体融合方法、装置、设备及存储介质
CN110704635B (zh) * 2019-09-16 2023-12-12 金色熊猫有限公司 一种知识图谱中三元组数据的转换方法及装置
CN110598072B (zh) * 2019-09-24 2022-03-01 恩亿科(北京)数据科技有限公司 一种特征数据聚合方法及装置
CN111046186A (zh) * 2019-10-30 2020-04-21 平安科技(深圳)有限公司 知识图谱的实体对齐方法、装置、设备及存储介质
CN110826316B (zh) * 2019-11-06 2021-08-10 北京交通大学 一种应用于裁判文书中敏感信息的识别方法
CN111026874A (zh) * 2019-11-22 2020-04-17 海信集团有限公司 知识图谱的数据处理方法及服务器
CN110929105B (zh) * 2019-11-28 2022-11-29 广东云徙智能科技有限公司 一种基于大数据技术的用户id关联方法
CN111125376B (zh) * 2019-12-23 2023-08-29 秒针信息技术有限公司 知识图谱生成方法、装置、数据处理设备及存储介质
CN111475653B (zh) * 2019-12-30 2021-03-02 北京国双科技有限公司 油气勘探开发领域的知识图谱的构建方法及装置
CN111291196B (zh) * 2020-01-22 2024-03-22 腾讯科技(深圳)有限公司 知识图谱的完善方法及装置、数据处理方法及装置
CN111444351B (zh) * 2020-03-24 2023-09-12 清华苏州环境创新研究院 一种行业工艺领域知识图谱构建方法及装置
CN111597239B (zh) * 2020-04-10 2021-08-31 中科驭数(北京)科技有限公司 数据对齐的方法及装置
CN111522803B (zh) * 2020-04-14 2023-05-19 北京仁科互动网络技术有限公司 软件服务化平台的租户交互方法、装置及电子设备
CN111563133A (zh) * 2020-05-06 2020-08-21 支付宝(杭州)信息技术有限公司 一种基于实体关系进行数据融合的方法及系统
CN112182330A (zh) * 2020-09-23 2021-01-05 创新奇智(成都)科技有限公司 知识图谱构建方法、装置、电子设备及计算机存储介质
CN112906826A (zh) * 2021-03-30 2021-06-04 平安科技(深圳)有限公司 基于多维度的知识图谱的融合方法、装置及计算机设备
CN113297213B (zh) * 2021-04-29 2023-09-12 军事科学院系统工程研究院网络信息研究所 一种实体对象的动态多属性匹配方法
CN113392227B (zh) * 2021-05-31 2024-04-19 交控科技股份有限公司 面向轨道交通领域的元数据知识图谱引擎系统
CN113760995A (zh) * 2021-09-09 2021-12-07 上海明略人工智能(集团)有限公司 一种实体链接方法及系统、设备和存储介质
CN113901264A (zh) * 2021-11-12 2022-01-07 央视频融媒体发展有限公司 一种影视类属性数据源间的周期性实体匹配方法及系统
CN113934866B (zh) * 2021-12-17 2022-03-08 鲁班(北京)电子商务科技有限公司 一种基于集合相似度的商品实体匹配方法及装置
CN114282073B (zh) * 2022-03-02 2022-07-15 支付宝(杭州)信息技术有限公司 数据存储方法及装置、数据读取方法及装置
CN114896363B (zh) * 2022-04-19 2023-03-28 北京月新时代科技股份有限公司 一种数据管理方法、装置、设备及介质
CN115577318B (zh) * 2022-09-30 2023-07-21 北京大数据先进技术研究院 基于半实物的数据融合评估方法、系统、设备及储存介质
CN117725555A (zh) * 2024-02-08 2024-03-19 暗物智能科技(广州)有限公司 多源知识树的关联融合方法、装置、电子设备及存储介质

Citations (6)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US20150142829A1 (en) * 2013-11-18 2015-05-21 Fujitsu Limited System, apparatus, program and method for data aggregatione
CN105956015A (zh) * 2016-04-22 2016-09-21 四川中软科技有限公司 一种基于大数据的服务平台整合方法
CN107545046A (zh) * 2017-08-17 2018-01-05 北京奇安信科技有限公司 一种多源异构数据的融合方法及装置
CN107958086A (zh) * 2017-12-18 2018-04-24 北京睿力科技有限公司 解决数据语义异构问题的多源异构数据库数据集成方法
CN109033129A (zh) * 2018-06-04 2018-12-18 桂林电子科技大学 基于自适应权重的多源信息融合知识图谱表示学习方法
CN109739939A (zh) * 2018-12-29 2019-05-10 颖投信息科技(上海)有限公司 知识图谱的数据融合方法和装置

Family Cites Families (3)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US10303999B2 (en) * 2011-02-22 2019-05-28 Refinitiv Us Organization Llc Machine learning-based relationship association and related discovery and search engines
CN107145523B (zh) * 2017-04-12 2019-10-18 浙江大学 基于迭代匹配的大型异构知识库对齐方法
CN108647318A (zh) * 2018-05-10 2018-10-12 北京航空航天大学 一种基于多源数据的知识融合方法

Patent Citations (6)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US20150142829A1 (en) * 2013-11-18 2015-05-21 Fujitsu Limited System, apparatus, program and method for data aggregatione
CN105956015A (zh) * 2016-04-22 2016-09-21 四川中软科技有限公司 一种基于大数据的服务平台整合方法
CN107545046A (zh) * 2017-08-17 2018-01-05 北京奇安信科技有限公司 一种多源异构数据的融合方法及装置
CN107958086A (zh) * 2017-12-18 2018-04-24 北京睿力科技有限公司 解决数据语义异构问题的多源异构数据库数据集成方法
CN109033129A (zh) * 2018-06-04 2018-12-18 桂林电子科技大学 基于自适应权重的多源信息融合知识图谱表示学习方法
CN109739939A (zh) * 2018-12-29 2019-05-10 颖投信息科技(上海)有限公司 知识图谱的数据融合方法和装置

Cited By (2)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN112699252A (zh) * 2021-03-25 2021-04-23 成都数联铭品科技有限公司 应用于知识图谱的属性数据的处理方法及电子设备
CN112699252B (zh) * 2021-03-25 2021-07-23 成都数联铭品科技有限公司 应用于知识图谱的属性数据的处理方法及电子设备

Also Published As

Publication number Publication date
CN109739939A (zh) 2019-05-10

Similar Documents

Publication Publication Date Title
WO2020135048A1 (fr) Procédé et appareil de fusion de données pour graphe de connaissances
CN107391677B (zh) 携带实体关系属性的中文通用知识图谱的生成方法及装置
WO2021083239A1 (fr) Procédé et appareil d'interrogation de données en graphe, et dispositif et support de stockage
WO2021174783A1 (fr) Procédé et appareil de présentation de quasi-synonymes, dispositif électronique et support
CN111753099A (zh) 一种基于知识图谱增强档案实体关联度的方法及系统
CN104866471B (zh) 一种基于局部敏感哈希策略的实例匹配方法
US10747958B2 (en) Dependency graph based natural language processing
US20180349425A1 (en) Data edge platform for improved storage and analytics
JP2017208015A (ja) 更新装置、更新方法、及び更新プログラム
CN114218472A (zh) 基于知识图谱的智能搜索系统
KR102046692B1 (ko) 다언어 특질 투영된 개체 공간 기반 개체 요약본 생성 방법 및 시스템
CN111125199B (zh) 一种数据库访问方法、装置及电子设备
US20190005118A1 (en) Method and system for managing associations between entity records
Bo et al. Entity resolution acceleration using Micron’s Automata Processor
CN115438274A (zh) 基于异质图卷积网络的虚假新闻识别方法
TW202123026A (zh) 資料歸檔方法、裝置、電腦裝置及存儲介質
Benny et al. Hadoop framework for entity resolution within high velocity streams
CN106933844B (zh) 面向大规模rdf数据的可达性查询索引的构建方法
US11720563B1 (en) Data storage and retrieval system for a cloud-based, multi-tenant application
CN111984745A (zh) 数据库字段动态扩展方法、装置、设备及存储介质
US20220284501A1 (en) Probabilistic determination of compatible content
CN106682107B (zh) 数据库表关联关系确定方法及装置
Xia et al. Content-irrelevant tag cleansing via bi-layer clustering and peer cooperation
CN115509497A (zh) 基于脚本语言的可视化业务规则引擎构建方法
US10417230B2 (en) Transforming and evaluating missing values in graph databases

Legal Events

Date Code Title Description
121 Ep: the epo has been informed by wipo that ep was designated in this application

Ref document number: 19904173

Country of ref document: EP

Kind code of ref document: A1

NENP Non-entry into the national phase

Ref country code: DE

122 Ep: pct application non-entry in european phase

Ref document number: 19904173

Country of ref document: EP

Kind code of ref document: A1