CN115952862A

CN115952862A - Knowledge graph data fusion method and system

Info

Publication number: CN115952862A
Application number: CN202211605089.5A
Authority: CN
Inventors: 桂正科
Original assignee: Alipay Hangzhou Information Technology Co Ltd
Current assignee: Alipay Hangzhou Information Technology Co Ltd
Priority date: 2022-12-14
Filing date: 2022-12-14
Publication date: 2023-04-11

Abstract

The embodiment of the specification discloses a knowledge graph data fusion method and a knowledge graph data fusion system. Wherein, the method comprises the following steps: acquiring ontology definition data of the fusion knowledge graph; the ontology definition data of the fusion knowledge graph comprises a target entity field, target relation description and a fusion graph operator; based on the ontology definition data of the fusion knowledge graph, acquiring the target entity field and the example data corresponding to the target relation description from the more than two knowledge graphs respectively; processing the multiple example data through the fusion map operator to obtain a fusion knowledge map; wherein at least a portion of the instance data of the converged knowledge-graph has source tags; the source tags indicate a knowledge-graph from which the corresponding instance data came; determining a value contribution for each of the knowledge-graphs based on actual values generated by the fused knowledge-graph and source tags for instance data.

Description

Knowledge graph data fusion method and system

Technical Field

The specification relates to the technical field of computers, in particular to a knowledge graph data fusion method and system.

Background

Different platforms or different business fields can respectively have respective data, and with the development of data management and data construction, fusion and communication of data in multiple platforms and multiple business fields are expected. The knowledge graph is data in a graph mode, and knowledge information contained in the data can be efficiently presented. If knowledge communication in multiple platforms and multiple service fields is realized through the knowledge map, the efficiency of data fusion can be effectively improved, and the service effect and the calculation efficiency are improved.

The specification provides a knowledge graph data fusion method and a knowledge graph data fusion system to promote knowledge fusion in multiple platforms and multiple service fields.

Disclosure of Invention

One aspect of an embodiment of the present specification provides a method of knowledge-graph data fusion. The method comprises the following steps: acquiring ontology definition data of the fusion knowledge graph; the ontology definition data of the fusion knowledge graph comprises a target entity field, a target relation description and a fusion graph operator, wherein the target entity field and the target relation description are selected from ontology definition data of more than two knowledge graphs, the ontology definition data of the knowledge graphs comprise entity fields for defining entities and relation descriptions for defining relations among the entities, and the fusion graph operator is used for carrying out fusion processing on the target entity field and/or the target relation description; based on the ontology definition data of the fusion knowledge graph, acquiring the target entity field and the example data corresponding to the target relation description from the more than two knowledge graphs respectively; processing the multiple instance data through the fusion map operator to obtain a fusion knowledge map; wherein at least a portion of the instance data of the converged knowledge-graph has source tags; the source tags indicate a knowledge-graph from which the corresponding instance data came; determining a value contribution for each of the knowledge-graphs based on actual values generated by the fused knowledge-graph and source tags for instance data.

Another aspect of embodiments of the present specification provides a system for knowledge-graph data fusion. The system comprises: the first acquisition module is used for acquiring ontology definition data of the fusion knowledge graph; the ontology definition data of the fusion knowledge graph comprises a target entity field, a target relation description and a fusion graph operator, wherein the target entity field and the target relation description are selected from ontology definition data of more than two knowledge graphs, the ontology definition data of the knowledge graphs comprise entity fields for defining entities and relation descriptions for defining relations among the entities, and the fusion graph operator is used for carrying out fusion processing on the target entity field and/or the target relation description; a second obtaining module, configured to obtain, based on ontology definition data of the fusion knowledge graph, example data corresponding to the target entity field and the target relationship description from the two or more knowledge graphs respectively; the third acquisition module is used for processing the multiple example data through the fusion map operator to acquire a fusion knowledge map; wherein at least a portion of the instance data of the converged knowledge-graph has source tags; the source tags indicate a knowledge-graph from which the corresponding instance data came; and the value contribution determining module is used for determining the value contribution of each knowledge graph based on the actual value generated by the fusion knowledge graph and the source mark of the example data.

In some embodiments, the system further comprises an exception handling module to: determining abnormal instance data in the fused knowledge graph; based on the source marker, locating the source of the abnormal instance data so as to modify the fused knowledge-graph.

In some embodiments, to modify the fused knowledge-graph, the exception handling module is further configured to send an exception prompt to a data provider of a source knowledge-graph, and to adjust ontology-defining data of the fused knowledge-graph and reacquire the fused knowledge-graph to exclude instance data of the source knowledge-graph from the fused knowledge-graph.

Another aspect of embodiments of the present specification provides a knowledge-graph data fusion apparatus comprising at least one storage medium and at least one processor, the at least one storage medium for storing computer instructions; the at least one processor is configured to execute the computer instructions to implement a method of knowledge-graph data fusion.

Another aspect of embodiments of the present specification provides a computer-readable storage medium storing computer instructions which, when read by a computer, cause the computer to perform a method of knowledge-graph data fusion.

Drawings

The present description will be further explained by way of exemplary embodiments, which will be described in detail by way of the accompanying drawings. These embodiments are not intended to be limiting, and in these embodiments like numerals are used to indicate like structures, wherein:

FIG. 1 is an exemplary flow diagram of a method of knowledge-graph data fusion, shown in accordance with some embodiments of the present description;

FIG. 2 is an exemplary diagram of a converged knowledge-graph, shown in accordance with some embodiments herein;

FIG. 3 is an exemplary flow diagram illustrating locating anomaly instance data according to some embodiments of the present description;

FIG. 4 is an exemplary block diagram of a knowledge-graph data fusion system, shown in accordance with some embodiments of the present description.

Detailed Description

In order to more clearly illustrate the technical solutions of the embodiments of the present disclosure, the drawings used in the description of the embodiments will be briefly described below. It is obvious that the drawings in the following description are only examples or embodiments of the present description, and that for a person skilled in the art, the present description can also be applied to other similar scenarios on the basis of these drawings without inventive effort. Unless otherwise apparent from the context, or otherwise indicated, like reference numbers in the figures refer to the same structure or operation.

It should be understood that "system", "device", "unit" and/or "module" as used herein is a method for distinguishing different components, elements, parts, portions or assemblies of different levels. However, other words may be substituted by other expressions if they accomplish the same purpose.

As used in this specification and the appended claims, the terms "a," "an," "the," and/or "the" are not intended to be inclusive in the singular, but rather are intended to be inclusive in the plural, unless the context clearly dictates otherwise. In general, the terms "comprises" and "comprising" merely indicate that steps and elements are included which are explicitly identified, that the steps and elements do not form an exclusive list, and that a method or apparatus may include other steps or elements.

Flow charts are used in this description to illustrate operations performed by a system according to embodiments of the present description. It should be understood that the preceding or following operations are not necessarily performed in the exact order in which they are performed. Rather, the various steps may be processed in reverse order or simultaneously. Meanwhile, other operations may be added to or removed from these processes.

A knowledge graph is a semantic network that exposes relationships between entities (otherwise known as objects). Nodes in the graph represent entities, and the nodes can be of various types, called node types, for indicating various types of entities. Edges in the graph represent relationships, and there may be multiple types of edges, called edge types, that indicate various types of relationships. An entity may refer to something in the real world, such as a person, place name, concept, medicine, company, and so forth. Relationships can be used to express connections between different entities, e.g., zhang three and Li Sishi "friends", social account numbers have a login relationship with a mobile terminal, and so on.

The knowledge-graph may be a directed graph or an undirected graph, i.e., edges in the knowledge-graph may be directed or undirected. The directional edges may be unidirectional or bidirectional to indicate the directionality of the relationship. When the knowledge-graph is an undirected graph, an edge may indicate that a relationship has no directionality or that a relationship is bidirectional (e.g., a "friend" relationship). An edge pointing to a node may be referred to as an in-edge of the node, and an edge pointed out from a node (i.e., pointing to other nodes) may be referred to as an out-edge of the node.

An instance of a knowledge-graph may be referred to as a data-graph (or as graph data, where a data-graph may also be referred to as a knowledge-graph without confusion), where a data-graph contains specific knowledge data (also may be referred to as instance data, including node instance data and edge instance data), and each piece of knowledge may be represented as a triple that contains two entities and their relationships. For example, in a social network graph, there may be both "people" entities, such as Zhang three and Liqu, and "companies" entities, such as company A and company B. The relationship between people and people may be "friends" or "colleagues", and the relationship between people and companies may be "present at" or "present at". Relationships/edges may have directionality, for example, a "friends" relationship may be bidirectional, and a "incumbent" or "once incumbent" relationship may be unidirectional.

The knowledge graph may correspond to ontology definition data, or a schema called a knowledge graph. The ontology definition data of the knowledge graph refers to data for defining the entity included in the knowledge graph and the relationship between the entities, and can represent semantic information of a data instance of the ontology of the knowledge graph. Ontology definition data of the knowledge graph can guide the collection of data instances and composition based on the data instances to obtain the knowledge graph (also called an instance graph). Thus, in some embodiments, ontology definition data of a knowledge-graph may include entity fields for defining an entity. An entity field may be understood as an entity name or an entity representation, such as a "company principal", "user", etc., and a value of the entity field may be an instance of the aforementioned entity. The entity field may correspond to a plurality of attribute fields, the attribute field may be an abstraction of entity description information, for example, the attribute field may be "address", "age", "registered capital", and the like, and the value of the attribute field may be a specific description of its corresponding entity instance, for example, "build road No. 11", "28 years", "500 ten thousand", and the like. In some embodiments, the ontology definition data of the knowledge-graph may include relationship descriptions for defining relationships between entities, which may be abstractions of relationship types between entities, such as "employment relationships", "parent-child relationships", "device login relationships", and the like. In some embodiments, the relationship description may further include a relationship attribute for further illustration of the relationship description, such as "employment" may specifically be "temporary employment" or "formal employment", "child-parent relationship" may further include "full funding relationship", "partial funding relationship", and so forth. By means of the relationship description, it can be determined whether there is an edge between two entity instances when the knowledge graph is constructed.

Knowledge data used for constructing the knowledge graph can be from different platforms and different business fields. Different platforms and different service fields may store respective data, for example, each platform or each service field may record respective service data in the form of a knowledge graph or a data table. The fusion and the communication of knowledge data of different platforms and different business fields can improve the business effect, the business efficiency and the computational efficiency. The data fusion and communication of the multi-platform and multi-service fields can be realized by constructing a knowledge graph of multi-platform and multi-service knowledge data communication.

In some embodiments, the fusion knowledge graph may be created (e.g., graph computation by a graph constructing operator) by obtaining a data table from each platform or each business field (i.e., data instances are recorded in a two-dimensional table, a data table may include fields and field values, i.e., data instances corresponding to the fields, etc.), and further based on the obtained data table. The created fusion knowledge graph can be better applied to business on the basis of combining knowledge data of different platforms and different business fields, so that more business values can be obtained by using the knowledge data of multiple platforms and multiple business fields. For data table providers in each platform or each service field, because the current mode of constructing a knowledge graph by fusion only concerns the fusion result and does not concern the source of each attribute and relationship in the fusion entity, the data provider cannot track the actual use of the provided data table, cannot know the value of the provided data table in the actual service, cannot track the value of the provided data table, and hinders the depth and the breadth of knowledge fusion, and is not beneficial to data sharing. In addition, because the current fusion method cannot track the data source, when the fusion entity has data abnormality, the abnormal data is difficult to locate.

In view of this, some embodiments of the present specification provide a method and a system for fusion of knowledge graph data, where instance data is directly obtained from a knowledge graph of each business party for fusion based on ontology definition data of a fusion knowledge graph, and source information of each instance data is stored in the fusion knowledge graph, so as to facilitate tracking of fusion sources of each instance of a fusion entity on respective attributes and relationships, and based on the source information, data contributions of each fusion source in the generation of the fusion entity can be counted, thereby implementing value tracking of data provided by each data source. Meanwhile, aiming at the example of the specific fusion entity, the fusion source of the attribute and the relation can also be directly obtained, so that the data problem can be efficiently checked.

The knowledge graph data fusion method and system provided by the specification can be applied to relevant scenes of multi-platform or multi-service field data processing, for example, can be applied to scenes of performing business task (such as determining fund risk of a certain natural person) calculation based on data of multiple service fields such as safety, insurance, payment and wealth. The technical solutions disclosed in the present specification are explained in detail by the description of the drawings below.

FIG. 1 is an exemplary flow diagram of a method of knowledge-graph data fusion shown in accordance with some embodiments of the present description. In some embodiments, flow 100 may be performed by a processing device. For example, the process 100 may be stored in a storage device (such as an onboard storage unit of a processing device or an external storage device) in the form of a program or instructions that, when executed, may implement the process 100. The process 100 may include the following operations.

And 102, acquiring ontology definition data of the fusion knowledge graph. In some embodiments, step 102 may be performed by the first obtaining module 410.

Ontology definition data of the fused knowledge graph may include target entity fields, target relationship descriptions, and fused graph operators. The target entity field can refer to entities included in the converged knowledge graph to be established and entity fields of relationship definition among the entities, and the target entity field can be selected from ontology definition data of more than two knowledge graphs. The target relationship description may refer to a relationship description defining relationships between entities of the converged knowledge graph to be established. The graph operator can be considered as an algorithm or method established on ontology definition data (including entity definitions and relationship descriptions) of the knowledge graph, and can also be considered as a part of the ontology definition data.

In some embodiments, the processing device may screen out required entity fields and relationship descriptions from ontology definition data of knowledge-graphs of two or more platforms/business domains according to business actual needs, and the selected entity fields and relationship descriptions are called target entity fields and target relationship descriptions. For example, if the business objective is to determine the fund risk of the merchant, entity fields related to the merchant, such as merchant, commodity, policyholder, manager, etc., may be screened from the knowledge graph ontology definition data in the insurance business field as target entity fields and relationship descriptions related to belonging, managing, policyholder, etc., as target relationship descriptions, and entity fields related to the merchant, such as merchant, commodity, payee, manager, etc., may be screened from the knowledge graph ontology definition data in the payment business field as target entity fields and relationship descriptions related to belonging, managing, paying, etc., as target relationship descriptions. In some embodiments, the relationship descriptions selected from the ontology-defining data of the same knowledge-graph should be related to simultaneously selected entity fields. In other words, the entity fields involved in the relationship description screened from the knowledge-graph ontology definition data are all in the selected target entity fields.

It can be understood that, for the knowledge graphs of different platforms/service domains, the ontology definition data included in the knowledge graphs may be different, that is, the target entity fields and the target relationship description may be different, and the ontology definition data of the knowledge graphs of different platforms/service domains are not connected, for example, the target entity fields are not associated. The ontology definition data of the knowledge maps of different platforms/service fields can be fused and associated through one or more map operators for fusing and processing the target entity fields and the target relation descriptions to obtain ontology definition data for constructing the fused knowledge maps, and fusion and/or communication of data instances corresponding to the knowledge maps of different platforms/service fields can be achieved based on the ontology definition data of the fused knowledge maps. The fused graph operator may include one or more graph operators.

The fusion graph operator can be one or more graph operators for realizing fusion and/or connection processing of data corresponding to each target entity field and each target relation description. The fusion graph operator can be used for performing fusion processing on the target entity field and/or the target relation description. For example, various operators may be included, such as an operator for fusing similar target entities into one entity, an operator for adding a relationship between two target entities that are not related, an operator for performing expression normalization processing on attribute information, and an operator for identifying a source for fused instance data and/or attribute values thereof. For more on the fused graph operator, see the relevant description of step 106.

In some embodiments, the graph operator may be used to find entity instances from a large number of data instances and determine relationships between the entity instances based on the entity definitions or relationship descriptions. The map operator may also be understood as a map calculation algorithm or method for performing data processing operations or operations for map construction. May be implemented in various ways, such as a data processing/computing unit, program code, machine learning model, etc. In some embodiments, data may be input to an operator, and the operator may perform corresponding data processing/operation, complete conversion of the data, and output the converted data. The map operator may be obtained in advance, for example, a user may determine the map operator in advance according to a creation requirement of the fused knowledge map, and a determination manner of the map operator may be determined in various existing manners, which is not limited in this embodiment.

In some embodiments, the ontology definition data of the converged knowledge graph can be set by the constructor creating the converged knowledge graph according to business requirements. In some embodiments, the ontology definition data of the converged knowledge graph may also be set collectively by the provider(s) of each knowledge graph. In some embodiments, the processing device may also read from the storage device, the database, or obtain the ontology definition data of the converged knowledge graph by using the ontology definition data of the converged knowledge graph, which is preset and stored, through the relevant data interface.

And 104, respectively acquiring the target entity field and the example data corresponding to the target relation description from the more than two knowledge graphs based on the ontology definition data of the fusion knowledge graph. In some embodiments, step 104 may be performed by the second acquisition module 420.

The instance data corresponding to the target entity field may refer to the entity instance corresponding to the target entity field and/or the attribute value thereof. Instance data corresponding to a target relationship description may refer to a relationship description between corresponding entity instances.

The target entity field and the target relationship description may indicate to which knowledge graph the corresponding instance data is obtained, for example, when a certain target entity field is selected from ontology definition data of the knowledge graph a, the instance data corresponding to the entity field is obtained from the knowledge graph a, and when a certain target relationship description is selected from ontology definition data of the knowledge graph B, the corresponding instance data is obtained from the knowledge graph B.

In some embodiments, the processing device may obtain the respective instance data from the knowledge-graph of the corresponding platform or business domain according to each target entity field and target relationship description in the ontology definition data of the converged knowledge-graph.

And 106, processing the multiple instance data through the fusion map operator to obtain a fusion knowledge map. In some embodiments, step 106 may be performed by the third acquisition module 430.

In some embodiments, the processing of the plurality of instance data by the fused graph operator may be an operation and/or operation on the plurality of instance data by the fused graph operator.

In some embodiments, processing the plurality of instance data by the fusion graph operator may include identifying a source tag for the instance data and/or an attribute value thereof, performing expression normalization processing on the plurality of instance data, fusing the plurality of instance data, establishing a description of a relationship between the respective two instance data based on at least one corresponding attribute field of the plurality of instance data, and the like.

The source tag may be used to track the source of instance data and/or its attribute values, e.g., from which knowledge graph (or from which business party). The form of the source mark may include various forms, for example, a mark in the form of a field, a mark added with a label, a color, a graphic, and the like. Illustratively, the labels are in the form of fields, and assuming that a plurality of example data come from knowledge graphs of two service parties, the entity field of the knowledge graph A of the first service party included in the target entity field is citycode, and the entity field of the knowledge graph B of the second service party is kalabel, when the example data is processed by the fusion graph operator, a source label can be added to the example data. For example, the source tag operator may add the instance data taken under the citycode field of the knowledgegraph a with source tags, which may be denoted as "a.citycode" and "b.kalabel". Where "a" and "B" are the numbers or identifiers of the source knowledge-graph, and "cityCode" and "kalabel" are the field names, in some embodiments, the field names may be omitted, or the service identifier may be substituted for the identifier of the source knowledge-graph. As another example, instance data or attribute values in the fused knowledgegraph may be labeled with colors, red representing a knowledgegraph from a first business party and blue representing a knowledgegraph from a second business party. In some embodiments, the granularity of the labels may be in units of nodes or edges, i.e., from which knowledge graph the entire instance data that labels a node or edge comes. In still other embodiments, the granularity of the labels may be refined to the attributes of a node or edge, i.e., to detail which knowledgegraph the attribute value of a node or edge comes from.

In some embodiments, the data expression criteria for knowledge-graphs may differ for different platforms or business domains. For example, the format of the attribute fields may be different. The expression standardization process may be a unified standardization process of the data format of the instance data (for example, the instance value of the attribute field is a numeric value or a character or a binary number), the data expression constraint condition (for example, the constraint condition of the attribute field of the time type is a value of the time of year, month and day or a value of a time type of 24 hours, the constraint condition of the attribute field of the money type is a value of a unit of dollars or a value of a unit of RMB), the data expression type (for example, the constraint condition of the attribute field is integer data or floating point data), and the like, so that the attribute values of the entity fields from different platforms or business fields have a unified expression form or measurement form.

In some embodiments, fusing multiple instance data may be based on fusing ontology-defined data of different knowledge-graphs, such as by fusing two or more target entity fields to achieve fusion and connectivity of knowledge data of different platforms/business domains. In some embodiments, semantically similar or identical target entity fields may be fused. For example, the ontology definition data in the converged knowledge graph includes a target entity field "cro.company" from the insurance business field and a target entity field "company v2" from the payment business field, and the "cro.company" and "company v2" may be converged to obtain a converged entity field, and the converged entity field may be represented by any one of the two or more converged target entity fields, such as "cro.company" or "company v2", or by other entity fields capable of expressing the semantics of the two or more converged target entity fields. In some embodiments, after the two or more target entity fields are fused to obtain the fused entity field, the attribute fields and the associated relationship descriptions corresponding to the fused two or more target entity fields are also adjusted to be suitable for the fused entity field. Specifically, the attribute field corresponding to the fused entity field may be a union of the attribute fields corresponding to the two or more fused target entity fields, or a part of the union, for example, the attribute field corresponding to the fused entity field may be all or part of the attribute field corresponding to a certain fused target entity field, and the like. The fused entity field-related relationship description may include a target relationship description related to each of the two or more target entity fields being fused. In some embodiments, the similarity between the target entity fields may be calculated, and two or more target entity fields whose similarities satisfy a condition (e.g., similarity is greater than a threshold or similarity is ranked TopN) may be fused to obtain a fused entity field.

In some embodiments, the similarity between target entity fields may be calculated by text similarity algorithms such as tf-idf algorithms, calculating vector distances between texts (distances may include, but are not limited to, cosine distances, euclidean distances, manhattan distances, mahalanobis distances, or Minkowski distances, etc.).

In some embodiments, the similarity of two target entity fields may be determined by a semantic similarity prediction model, e.g., the similarity between target entity fields may be calculated based on BERT, transformer, ESIM, etc. models. In some embodiments, it may also be determined whether two or more target entity fields are similar or identical based on the attribute field to which the target entity field corresponds. Taking the BERT model as an example, texts corresponding to two or more target entity fields (which may include field names of the target entity fields and corresponding attribute field names) may be input into the BERT model, the BERT model may determine text vectors of the two or more target entity fields and calculate semantic similarity between the text vectors, and the BERT model may output similarity scores between the text vectors, that is, the obtained similarity scores may be used as similarity between the target entity fields.

In some embodiments, the source tagging operator and the entity fusion operator may be used in combination, or an instruction for implementing source tagging is added to the entity fusion operator, and when these operators are invoked to process instance data, the source of the data may be clearly shown in the fused entity instance. As previously described, an entity field may correspond to one or more attribute fields. After the fusion processing is carried out, if a certain node instance or a certain edge instance comes from one map, the source can be marked for the whole node instance, if a certain node instance has records in a plurality of source knowledge maps and different attribute values of the node instance come from different knowledge maps, the source marks can be marked for the attribute values of the node instance respectively in a fine-grained manner.

In some embodiments, the processing device may establish a description of a relationship between two respective target entities based on at least one corresponding attribute field of the two target entity fields. The attribute field corresponding to the entity field may indicate the definition of further description information of the entity field, such as name, address, type, and the like, and in some embodiments, the attribute field corresponding to the target entity field may determine whether a new association relationship exists between two unassociated target entities, so as to establish a relationship description between the two target entities. For example, the attribute field corresponding to the target entity field "cro.company" from the insurance business field includes "address", and the target entity field "City" from the payment business field, and the relationship description between "cro.company" and "City" may be established according to the attribute field "address" corresponding to "cro.company", for example, the established relationship is described as the City where the City is located. For another example, if the target entity field "commodity" from the manufacturing business field corresponds to the attribute field "commodity type", and the target entity field "business" from the sales business field also corresponds to the attribute field "main operating range", a relationship description between the "commodity" and the "business" may be established based on the attribute fields of the two, and the relationship description may be a sales relationship, for example.

Step 108, determining the value contribution of each knowledge-graph based on the actual value generated by the fusion knowledge-graph and the source tag of the instance data. In some embodiments, step 108 may be performed by value contribution determination module 440.

The actual value may refer to service gain or service revenue brought by the converged knowledge graph in service application. For example, in a recommendation service, the service is recommended to the user by referring to the knowledge data of the fused knowledge graph, and the click rate and the profit income brought when news and commodities are recommended to the user, wherein the click rate and the profit income can be actual values generated by the fused knowledge graph. The reference can be based on extracting recommendation rules or training recommendation models and the like.

In some embodiments, the processing device may determine the number of instance data corresponding to the knowledge-graph for each business party in fusing the actual values generated by the knowledge-graphs based on the source tags for the instance data, thereby determining the value contribution for each knowledge-graph. For example, assuming that the actual value generated by the converged knowledge graph is to bring the business party 100 recommended clicks, it can be determined that there are 10 pieces of instance data from the a knowledge graph, 20 pieces of instance data from the B knowledge graph, and the remaining 20 pieces of instance data from the C knowledge graph based on the source tags of each piece of instance data. Then the value contribution of knowledge graph A =100 + 10/(10 + 20) =20, and similarly, the value contribution of knowledge graph B is 40 and the value contribution of knowledge graph C is 40.

Referring to fig. 2, fig. 2 is an exemplary schematic diagram of a fused knowledge-graph according to some embodiments herein. The knowledge graph A, the knowledge graph B and the knowledge graph C represent a plurality of service parties providing the knowledge graph, the processing equipment can acquire example data corresponding to target entity fields and target relation description from the knowledge graph A, the knowledge graph B and the knowledge graph C based on ontology definition data of the fusion knowledge graph, then process the example data through a fusion graph operator (for example, entity and attribute fusion, expression standardization, addition of source marks for the example data and the like) to obtain the fusion knowledge graph, the source of the example data is represented by letters A, B, C (corresponding to the name of the source knowledge graph) in the fusion knowledge graph, and finally, in a value analysis stage, the value contribution of the knowledge graph of each service party is determined based on the source of the example data.

In some embodiments of the present description, when the knowledge graphs from multiple service parties are merged, the sources of the instance data corresponding to each knowledge graph are marked, so that the service parties can track the sources of the instances of the merged knowledge graph in the respective attributes and relationships, and based on the source information, the data contribution of the source in the generation of the merged entity can be determined, thereby realizing value tracking of the data of each service party, promoting data merging of multiple service parties, and facilitating improvement of the depth and breadth of knowledge merging.

FIG. 3 is an exemplary flow diagram of locating anomaly instance data, according to some embodiments of the present description. In some embodiments, flow 300 may be performed by a processing device. For example, the process 300 may be stored in a storage device (e.g., an onboard storage unit of a processing device or an external storage device) in the form of a program or instructions that, when executed, may implement the process 300. The flow 300 may include the following operations.

Step 302, determining abnormal instance data in the fusion knowledge graph.

In some embodiments, anomalous instance data may refer to instance data in the converged knowledge graph that has too low a value contribution or that does not match well with the use of the current business scenario. For example, an instance data may be considered anomalous when it is determined in a value contribution analysis that the value contribution of the instance data is significantly less than that of other instance data, e.g., the value contributions of other instance data are distributed between 100-1000, and the value contribution of one or some instance data is 0-10, significantly less than that of other instance data.

In some embodiments, the reason why the value contribution corresponding to the abnormal instance data is low may be that the instance data itself has a data error, or that the data quality of the instance data does not meet the standard, for example, the instance data does not apply in the current business scenario, and the like. For example, the service scenario is a fund wind control scenario, the instance data provided by a certain service party is data related to logistics, the relationship between the logistics data and the fund wind control is not very close, the service gain brought by the fund wind control may be limited, and at this time, the instance data may also be considered as abnormal instance data.

In some embodiments, the processing device may determine anomalous instance data based on the value contribution of the instance data in the fused knowledge-graph, e.g., a value contribution threshold may be set, and the instance data may be considered anomalous instance data when it is analyzed that the value contribution of the instance data is below the value contribution threshold.

And 304, positioning the source of the abnormal example data based on the source mark so as to correct the fusion knowledge graph.

In some embodiments, the processing device may determine the source of the exception instance data based on the source tag. For example, assuming that the source labels of the instance data in the fused knowledge graph are represented by the letters A, B, C, a represents that the source is knowledge graph a, B represents that the source is knowledge graph B, and C represents that the source is knowledge graph C, when it is determined that there is abnormal instance data, the source label of the abnormal instance data can be obtained, for example, abnormal instance data-a, and the source of the abnormal instance data can be located as knowledge graph a.

In some embodiments, modifying the fused knowledge-graph may include sending an exception prompt to a data provider of the source knowledge-graph, and adjusting ontology-defining data of the fused knowledge-graph and regenerating the fused knowledge-graph to exclude instance data of the source knowledge-graph from the fused knowledge-graph. For example, when abnormal instance data occurs, an abnormal prompt is sent to the data provider to inform the data provider that the instance data has an abnormality, so that the data provider can perform troubleshooting and correction on the data.

In some embodiments, if the data provider finds that the instance data itself is not problematic after performing the troubleshooting on the abnormal instance data, and at this time, the value contribution of the instance data in the fusion knowledge graph is low, it may be stated that the instance data is not applicable to the current business scenario, and at this time, the ontology definition data of the fusion knowledge graph may be adjusted, for example, the target entity field and/or the target relationship description corresponding to the instance data are deleted from the ontology definition data of the fusion knowledge graph, and the fusion knowledge graph is regenerated based on the adjusted ontology definition data to exclude the instance data of the knowledge graph corresponding to the abnormal instance data from the fusion knowledge graph.

In the embodiment, the source of the abnormal instance data in the fusion knowledge graph can be quickly positioned, so that the data problem can be efficiently checked, the fusion knowledge graph can be adjusted, and the actual value possibly brought by the subsequent use of the fusion knowledge graph is improved.

It should be noted that the above description of the respective flows is only for illustration and description, and does not limit the applicable scope of the present specification. Various modifications and changes to the flow may occur to those skilled in the art, given the benefit of this disclosure. However, such modifications and variations are intended to be within the scope of the present description. For example, the present specification may be directed to variations on the process steps, such as the addition of pre-processing steps and storage steps.

FIG. 4 is an exemplary block diagram of a knowledge-graph data fusion system, shown in accordance with some embodiments of the present description. As shown in FIG. 4, system 400 may include a first obtaining module 410, a second obtaining module 420, a third obtaining module 430, and a value contribution determining module 440.

The first obtaining module 410 may be used to obtain ontology definition data of the fused knowledge-graph.

The ontology definition data of the fusion knowledge graph comprises a field for acquiring a target entity, a target relation description and a fusion graph operator. The target entity field and the target relation description are selected from ontology definition data of more than two knowledge graphs, the ontology definition data of the knowledge graphs comprise entity fields used for defining entities and relation descriptions used for defining relations among the entities, and a fusion graph operator is used for carrying out fusion processing on the target entity fields and/or the target relation descriptions.

The second obtaining module 420 may be configured to obtain, based on the ontology definition data of the fused knowledge-graph, the target entity field and the instance data corresponding to the target relationship description from the two or more knowledge-graphs respectively.

The third obtaining module 430 may be configured to process the multiple instance data through the fusion atlas operator to obtain a fusion knowledge atlas.

Wherein at least a portion of the instance data of the fused knowledge-graph has source tags; the source tags indicate the knowledge-graph from which the corresponding instance data came.

Value contribution determination module 440 may be configured to determine a value contribution for each of the knowledge-graphs based on the actual value produced by the fused knowledge-graph and the source tags for the instance data.

With regard to the detailed description of the modules of the system shown above, reference may be made to the flow chart portion of this specification, e.g., the associated description of fig. 1 and 3.

It should be understood that the system and its modules shown in FIG. 4 may be implemented in a variety of ways. For example, in some embodiments, the system and its modules may be implemented in hardware, software, or a combination of software and hardware. Wherein the hardware portion may be implemented using dedicated logic; the software portions may be stored in a memory and executed by a suitable instruction execution system, such as a microprocessor or specially designed hardware. Those skilled in the art will appreciate that the methods and systems described above may be implemented using computer executable instructions and/or embodied in processor control code, such code being provided, for example, on a carrier medium such as a diskette, CD-or DVD-ROM, a programmable memory such as read-only memory (firmware), or a data carrier such as an optical or electronic signal carrier. The system and its modules in this specification may be implemented not only by hardware circuits such as very large scale integrated circuits or gate arrays, semiconductors such as logic chips, transistors, or programmable hardware devices such as field programmable gate arrays, programmable logic devices, etc., but also by software executed by various types of processors, for example, or by a combination of the above hardware circuits and software (e.g., firmware).

It should be noted that the above description of the knowledge-graph data fusion system and its modules is for convenience only and should not limit the present disclosure to the scope of the illustrated embodiments. It will be appreciated by those skilled in the art that, given the teachings of the present system, any combination of modules or sub-system configurations may be used to connect to other modules without departing from such teachings. For example, in some embodiments, the first obtaining module 410, the second obtaining module 420, the third obtaining module 430, and the value contribution determining module 440 may be different modules in a system, or may be a module that implements the functions of two or more of the above modules. For example, each module may share one memory module, and each module may have its own memory module. Such variations are within the scope of the present disclosure.

The beneficial effects that may be brought by the embodiments of the present description include, but are not limited to: (1) Adding a source mark of each instance data in the fusion entity, so as to facilitate tracking the source of each instance of the fusion entity on respective attribute and relationship; (2) Based on the source information, the data contribution of each fusion source in the generation of the fusion entity can be counted, the value tracking of each data source is realized, and the data fusion of each service party is promoted; (3) And aiming at the example data of a specific fusion entity, the fusion source of the attribute and the relationship can also be directly obtained, so that the data problem can be efficiently checked. It is to be noted that different embodiments may produce different advantages, and in different embodiments, the advantages that may be produced may be any one or combination of the above, or any other advantages that may be obtained.

Having thus described the basic concept, it will be apparent to those skilled in the art that the foregoing detailed disclosure is to be regarded as illustrative only and not as limiting the present specification. Various modifications, improvements and adaptations to the present description may occur to those skilled in the art, though not explicitly described herein. Such modifications, improvements and adaptations are proposed in the present specification and thus fall within the spirit and scope of the exemplary embodiments of the present specification.

Also, the description uses specific words to describe embodiments of the description. Reference throughout this specification to "one embodiment," "an embodiment," and/or "some embodiments" means that a particular feature, structure, or characteristic described in connection with at least one embodiment of the specification is included. Therefore, it is emphasized and should be appreciated that two or more references to "an embodiment" or "one embodiment" or "an alternative embodiment" in various places throughout this specification are not necessarily all referring to the same embodiment. Furthermore, some features, structures, or characteristics of one or more embodiments of the specification may be combined as appropriate.

Moreover, those skilled in the art will appreciate that aspects of the present description may be illustrated and described in terms of several patentable species or situations, including any new and useful combination of processes, machines, manufacture, or materials, or any new and useful improvement thereof. Accordingly, aspects of this description may be performed entirely by hardware, entirely by software (including firmware, resident software, micro-code, etc.), or by a combination of hardware and software. The above hardware or software may be referred to as "data block," module, "" engine, "" unit, "" component, "or" system. Furthermore, aspects of the present description may be represented as a computer product, including computer readable program code, embodied in one or more computer readable media.

The computer storage medium may comprise a propagated data signal with the computer program code embodied therewith, for example, on baseband or as part of a carrier wave. The propagated signal may take any of a variety of forms, including electromagnetic, optical, etc., or any suitable combination. A computer storage medium may be any computer-readable medium that can communicate, propagate, or transport a program for use by or in connection with an instruction execution system, apparatus, or device. Program code located on a computer storage medium may be propagated over any suitable medium, including radio, cable, fiber optic cable, RF, or the like, or any combination of the preceding.

Computer program code required for the operation of various portions of this specification may be written in any one or more programming languages, including an object oriented programming language such as Java, scala, smalltalk, eiffel, JADE, emerald, C + +, C #, VB.NET, python, and the like, a conventional programming language such as C, visual Basic, fortran 2003, perl, COBOL 2002, PHP, ABAP, a dynamic programming language such as Python, ruby, and Groovy, or other programming languages. The program code may execute entirely on the user's computer, partly on the user's computer, as a stand-alone software package, partly on the user's computer and partly on a remote computer or entirely on the remote computer or server. In the latter scenario, the remote computer may be connected to the user's computer through any network format, such as a Local Area Network (LAN) or a Wide Area Network (WAN), or the connection may be made to an external computer (for example, through the Internet), or in a cloud computing environment, or as a service, such as a software as a service (SaaS).

Additionally, the order in which the elements and sequences of the process are recited in the specification, the use of alphanumeric characters, or other designations, is not intended to limit the order in which the processes and methods of the specification occur, unless otherwise specified in the claims. While various presently contemplated embodiments of the invention have been discussed in the foregoing disclosure by way of example, it is to be understood that such detail is solely for that purpose and that the appended claims are not limited to the disclosed embodiments, but, on the contrary, are intended to cover all modifications and equivalent arrangements that are within the spirit and scope of the embodiments herein. For example, although the system components described above may be implemented by hardware devices, they may also be implemented by software-only solutions, such as installing the described system on an existing server or mobile device.

Similarly, it should be noted that in the preceding description of embodiments of the present specification, various features are sometimes grouped together in a single embodiment, figure, or description thereof for the purpose of streamlining the disclosure aiding in the understanding of one or more of the embodiments. This method of disclosure, however, is not intended to imply that more features are required than are expressly recited in the claims. Indeed, the embodiments may be characterized as having less than all of the features of a single embodiment disclosed above.

Numerals describing the number of components, attributes, etc. are used in some embodiments, it being understood that such numerals used in the description of the embodiments are modified in some instances by the use of the modifier "about", "approximately" or "substantially". Unless otherwise indicated, "about", "approximately" or "substantially" indicates that the number allows a variation of ± 20%. Accordingly, in some embodiments, the numerical parameters used in the specification and claims are approximations that may vary depending upon the desired properties of the individual embodiments. In some embodiments, the numerical parameter should take into account the specified significant digits and employ a general digit preserving approach. Notwithstanding that the numerical ranges and parameters setting forth the broad scope of the range in some embodiments of the specification are approximations, in specific embodiments, such numerical values are set forth as precisely as possible within the practical range.

For each patent, patent application publication, and other material, such as articles, books, specifications, publications, documents, etc., cited in this specification, the entire contents of each are hereby incorporated by reference into this specification. Except where the application history document does not conform to or conflict with the contents of the present specification, it is to be understood that the application history document, as used herein in the present specification or appended claims, is intended to define the broadest scope of the present specification (whether presently or later in the specification) rather than the broadest scope of the present specification. It is to be understood that the descriptions, definitions and/or uses of terms in the accompanying materials of the present specification shall control if they are inconsistent or inconsistent with the statements and/or uses of the present specification.

Finally, it should be understood that the embodiments described herein are merely illustrative of the principles of the embodiments described herein. Other variations are also possible within the scope of the present description. Thus, by way of example, and not limitation, alternative configurations of the embodiments of the specification can be considered consistent with the teachings of the specification. Accordingly, the embodiments of the present description are not limited to only those embodiments explicitly described and depicted herein.

Claims

1. A method of knowledge-graph data fusion, the method comprising:

acquiring ontology definition data of the fusion knowledge graph; the ontology definition data of the fusion knowledge graph comprises a target entity field, a target relation description and a fusion graph operator, wherein the target entity field and the target relation description are selected from ontology definition data of more than two knowledge graphs, the ontology definition data of the knowledge graphs comprise entity fields for defining entities and relation descriptions for defining relations among the entities, and the fusion graph operator is used for carrying out fusion processing on the target entity field and/or the target relation description;

based on the ontology definition data of the fusion knowledge graph, acquiring the target entity field and the example data corresponding to the target relation description from the more than two knowledge graphs respectively;

processing the multiple example data through the fusion map operator to obtain a fusion knowledge map; wherein at least a portion of the instance data of the fused knowledge-graph has source tags; the source tags indicate a knowledge-graph from which the corresponding instance data came;

determining a value contribution for each of the knowledge-graphs based on actual values generated by the fused knowledge-graph and source tags for instance data.

2. The method of claim 1, the fusion graph operator being configured to identify a source tag for an instance data and/or attribute value thereof when processing the plurality of instance data.

3. The method of claim 2, wherein the entity field corresponds to one or more attribute fields, and the fused graph operator is configured to fuse two or more target entity fields to obtain a fused entity field; the attribute field corresponding to the fusion entity field is from at least one corresponding attribute field in the more than two target entity fields, and the relation description related to the fusion entity field comprises at least one related target relation description in the more than two target entity fields.

4. The method of claim 1, further comprising:

determining abnormal instance data in the fused knowledge graph;

based on the source marker, locating the source of the abnormal instance data so as to modify the fused knowledge-graph.

5. The method of claim 4, wherein modifying the fused knowledge-graph comprises sending an exception prompt to a data provider of a source knowledge-graph, and adjusting ontology-defining data of the fused knowledge-graph and reacquiring the fused knowledge-graph to exclude instance data of the source knowledge-graph from the fused knowledge-graph.

6. A knowledge-graph data fusion system, the system comprising:

the first acquisition module is used for acquiring ontology definition data of the fusion knowledge graph; the ontology definition data of the fusion knowledge graph comprises a target entity field, a target relation description and a fusion graph operator, wherein the target entity field and the target relation description are selected from ontology definition data of more than two knowledge graphs, the ontology definition data of the knowledge graphs comprise entity fields for defining entities and relation descriptions for defining relations among the entities, and the fusion graph operator is used for carrying out fusion processing on the target entity field and/or the target relation description;

a second obtaining module, configured to obtain, based on ontology definition data of the fusion knowledge graph, the target entity field and instance data corresponding to the target relationship description from the two or more knowledge graphs respectively;

the third acquisition module is used for processing the multiple example data through the fusion map operator to acquire a fusion knowledge map; wherein at least a portion of the instance data of the converged knowledge-graph has source tags; the source tags indicate a knowledge-graph from which the corresponding instance data came;

and the value contribution determining module is used for determining the value contribution of each knowledge graph based on the actual value generated by the fusion knowledge graph and the source mark of the example data.

7. The system of claim 6, the fused graph operator to identify its source tags for instance data and/or attribute values thereof when processing the plurality of instance data.

8. The system of claim 6, further comprising an exception handling module to:

determining abnormal instance data in the fused knowledge-graph;

9. The system of claim 8, to modify the fused knowledge-graph, the exception handling module is further configured to send an exception prompt to a data provider of a source knowledge-graph, and to adjust the ontology-defining data of the fused knowledge-graph and re-acquire the fused knowledge-graph to exclude instance data of the source knowledge-graph from the fused knowledge-graph.

10. A knowledge-graph data fusion apparatus comprising at least one storage medium and at least one processor, the at least one storage medium storing computer instructions; the at least one processor is configured to execute the computer instructions to implement the method of knowledge-graph data fusion of any of claims 1-5.