CN114547323A

CN114547323A - Fine-grained knowledge graph fusion method for two-dimensional overlapped large sample data source

Info

Publication number: CN114547323A
Application number: CN202111646665.6A
Authority: CN
Inventors: 季白杨
Original assignee: Hangzhou Biwan Information Technology Co ltd
Current assignee: Hangzhou Biwan Information Technology Co ltd
Priority date: 2021-12-30
Filing date: 2021-12-30
Publication date: 2022-05-27

Abstract

The invention discloses a fine-grained knowledge graph fusion method of a two-dimensional overlapped large sample data source, which comprises the following steps: s1, performing iterative entity alignment on attribute triples corresponding to a knowledge graph to obtain an entity pair set of the attribute triples, and performing multi-level classification on the entity pairs to obtain high-confidence entity pairs; taking the obtained high-confidence entity pair as a training data set of an embedded model, performing structured embedding of the embedded model by using a relation triple to obtain high-dimensional space vector representation of the entities and the relations, and setting weights for the attributes and the relations to obtain final similarity of the attributes and the relations; s2, screening entity attributes according to the obtained similarity to obtain the final similarity of the attributes; s3, automatically completing the knowledge characteristic fusion of high-confidence entity pairs and attributes based on a classifier model and an atomic expression algorithm obtained by machine learning training; and S4, completing the bidirectional supervision interactive data fusion based on knowledge characteristic fusion.

Description

Fine-grained knowledge graph fusion method for two-dimensional overlapped large sample data source

Technical Field

The invention relates to the technical field of big data processing, in particular to a fine-grained knowledge graph fusion method for a two-dimensional overlapped big sample data source.

Background

Knowledge Graph (KG) contains a large number of cases, and in practical application, a triple is usually used<H，R，T>To express, H and T represent the head and tail entities, and R represents the intrinsic relationship implied by the two entities. In two respectsThe cognitive maps KG1 and KG2 are defined as follows: presentity pair set Entity ═ a₁，A₂，A₃...A_nIn which A_i(i ═ 1,2, 3.. n) is defined as a quadruple structure<ID,E1,E2,S>Where ID is set as a unique identifier, E1 ∈ KG1, E2 ∈ KG2, and S represents the similarity value between two entities, where S is located at [0,1]]In the meantime. In different knowledge graphs, equivalent entities are distinguished by identification names due to different constructors or language differences, and semantic similarity exists between the equivalent entities.

In the prior art, the following methods are mostly adopted for knowledge fusion: 1) by utilizing the text information of the entities and the relations, the method is mostly based on simple character string matching and is simple to operate. 2) And performing data matching by using the node similarity based on the data structure. 3) And carrying out indirect matching by utilizing a third-party data set. 4) And (4) performing data characteristic processing by using a machine learning algorithm, wherein the data characteristic processing comprises learning a fusion expression, training a classification model and the like. Or multi-algorithm fusion is carried out by utilizing an aggregation function to achieve better expected effect, but the method has the following defects:

1) the feature that the pre-fused knowledge graph has data overlapping in both the entity dimension and the attribute dimension is ignored.

2) In the existing scheme, a large number of entity pairs are fused and matched instead of the final purpose of improving the quality of the knowledge graph, namely, the key factor of attribute fusion of the entities is ignored, so that the finally formed knowledge graph is large and not accurate. The cross fusion of the ontology and the attributes is two indispensable dimensions from the viewpoint of improving the quality of the knowledge graph, and the mutual promotion of the ontology and the attributes is the difficulty and the key point of the knowledge fusion degree.

3) The large-scale data processing ideally is suitable for computer processing, but is limited by the technology and other reasons, such as manual operation for data block segmentation or data labeling, or manual light-weight algorithm processing. Causing a great deal of waste of manpower and financial resources.

Disclosure of Invention

Aiming at the problems that the quality of a knowledge graph obtained by the existing knowledge fusion method is low, a large amount of resources are consumed in the fusion process and the like, the invention combines linguistic information, spatial information and a machine learning algorithm, aims to solve one or more difficulties in the existing knowledge graph fusion to a considerable extent, and provides a fine-grained knowledge graph fusion method of a two-dimensional overlapped large-sample data source.

In order to achieve the purpose, the invention adopts the following technical scheme:

a fine-grained knowledge graph fusion method of a two-dimensional overlapped large sample data source comprises the following steps:

s1, performing iterative entity alignment on attribute triples corresponding to a knowledge graph to obtain an entity pair set of the attribute triples, and performing multi-level classification on the entity pairs to obtain high-confidence entity pairs; taking the obtained high-confidence entity pair as a training data set of an embedded model, performing structured embedding of the embedded model by using a relation triple to obtain high-dimensional space vector representation of the entities and the relations, and setting weights for the attributes and the relations to obtain final similarity of the attributes and the relations;

s2, screening entity attributes according to the obtained similarity to obtain the final similarity of the attributes;

s3, automatically completing the knowledge characteristic fusion of high-confidence entity pairs and attributes based on a classifier model and an atomic expression algorithm obtained by machine learning training;

and S4, completing the bidirectional supervision interactive data fusion based on knowledge characteristic fusion.

Further, the step S1 specifically includes:

s11, entity alignment is carried out on the attribute triples on the basis of an iterative model, entity matching operation is carried out on the basis of attribute values corresponding to the attributes and the attributes to obtain an entity pair set, attribute similarity matching operation is carried out on the entity pair set to obtain an attribute pair set, and entity pairs with high confidence degrees are obtained;

s12, performing structured embedding on the obtained high-confidence entity pair serving as a training data set of an embedding model by using a relation triple, describing and modeling a global structure of a knowledge graph to be fused, and finally obtaining high-dimensional space vector representation of the entity and the relation;

and S13, fusing the attribute alignment and the relationship alignment based on different weights to obtain alignment results of two dimensions of the relationship and the attribute, and obtaining the total similarity of the attribute and the relationship by adopting a linear combination mode.

Further, the step S2 specifically includes:

s21, calculating the similarity between the attributes;

s22, calculating the similarity between adjacent entities;

s23, calculating the similarity of the attribute label set;

and S24, screening upper-layer concept paths of entity attributes in the knowledge graph to form path vectors, and calculating the final similarity of the attributes.

Further, the step S3 specifically includes:

s31, obtaining a classifier model by utilizing machine learning training, and processing entity fusion by utilizing a two-classification method;

s32, screening attributes by using an atomic expression;

and S33, combining and using the atomic expressions to complete the knowledge characteristic fusion of the high-confidence entity pairs and the attributes.

Further, the step S4 specifically includes:

s41, embedding the vector of the triple based on a TransE algorithm and a PtransE algorithm to finish the training of a single knowledge graph;

and S42, remapping the high-dimensional space vectors of the processed entities and the processed relations in a low latitude space, and forming constraints on the entities and the relation vectors in the mapping process respectively to complete bidirectional supervision interactive data fusion.

Further, the step S11 specifically includes:

s111, setting uniform weight for the public attributes during attribute alignment, and calculating the similarity between entities, wherein the similarity is expressed as:

wherein, Sim_A(e₁，e₂) Representing an entity e₁With entity e₂The similarity between them;

representing an entity e₁A k-th attribute common to both entities;

representing an entity e₂A k-th attribute common to both entities; n represents the total number of attributes common to the two entities; sim_vRepresenting two attribute values

And

the similarity between them is expressed as:

wherein levenshteinSim represents the similarity calculated based on the Levenshtein distance; lcsSim represents that similarity calculation is carried out on the longest substring which is common to the character strings;

s112, searching potential aligned attribute pairs according to the aligned entity pairs, wherein the potential aligned attribute pairs are expressed as follows:

wherein the content of the first and second substances,

representing attribute pairs

The similarity of (2);

representing the number of elements in the finite set of the entity;

representing the similarity between attribute values.

Further, the total similarity of the attributes and the relationships obtained in step S13 is represented as:

Sim(Ei，Ej)＝λ×simR(ei,ej)+(1-λ)×simA(ei,ej)

wherein simR represents similarity obtained based on the relation triples; the simA represents the similarity obtained by using the attribute triples; λ represents a weight; sim (E)_i，E_j) Representing the overall similarity.

Further, the similarity between the attributes is calculated in step S21, and is expressed as:

Sim_property＝COS(Name_property1Name_property2)

wherein Sim_propertyRepresenting the similarity of two properties property1 and property2 at the property name level; name_property1With Name_property2Respectively, representing a high-dimensional space vector representation.

Further, in step S22, the similarity between adjacent entities is calculated as:

Sim_entity＝|entityList₁∩entityList₂|/|entityList₁∪entityList₂|

wherein Sim_entityRepresenting the similarity of two adjacent entities; the entityList1 and entityList2 represent a limited set of entities adjacent to the property1 and property 2.

Further, in step S23, the similarity of the attribute label set is calculated as:

Sim_label＝COS(label_property1，label_property2)

wherein, Sim_labelFinite tag similarity representing property;

in step S24, the final similarity of the attributes is calculated and expressed as:

Sim_con＝COS(concept_property1，concept_property2)

wherein, Sim_conRepresenting the upper-level conceptual similarity, concept, of property1 and property2_property1，concept_property2Respectively representing the upper level concept path labels of the attributes.

Compared with the prior art, the invention has the beneficial effects that:

1) the method aims to improve the quality of the knowledge graph, meets the requirement of fine-grained fusion by adopting a means of fusion of the knowledge graph in the financial field at an attribute level, focuses on a key factor of attribute fusion in the knowledge graph fusion based on the fact that a large number of entity pairs are matched in the entity fusion, promotes the fusion effect by utilizing the close relation between the entities and the attributes, and finally improves the quality and the precision of the fused knowledge graph.

2) Manual intervention in the fusion process is reduced, a machine learning algorithm only needing a positive sample fusion expression is realized, a good fusion effect is achieved, and resource consumption is reduced.

Drawings

Fig. 1 is a flowchart of a fine-grained knowledgegraph fusion method for a two-dimensional overlapped large sample data source according to an embodiment.

Detailed Description

The embodiments of the present invention are described below with reference to specific embodiments, and other advantages and effects of the present invention will be easily understood by those skilled in the art from the disclosure of the present specification. The invention is capable of other and different embodiments and of being practiced or of being carried out in various ways, and its several details are capable of modification in various respects, all without departing from the spirit and scope of the present invention. It is to be noted that the features in the following embodiments and examples may be combined with each other without conflict.

The invention aims to provide a fine-grained knowledge graph fusion method of a two-dimensional overlapped large-sample data source, aiming at overcoming the defects of the prior art, reducing the workload and emphasizing on the utilization of the attributes of the entities in the knowledge graph, and ensuring the precision and quality of the fused knowledge graph by using the promotion relationship between the attributes and the entities.

Example one

The embodiment provides a fine-grained knowledge graph fusion method for a two-dimensional overlapped large sample data source, as shown in fig. 1, including:

Industry knowledge maps are oriented to a specific vertical domain, with stricter pre-data patterns and more accurate accuracy requirements for the data, and emphasis is placed on "depth". Financial field data is typically big data with "4V" characteristics (voluminous Volume, multi-structure multi-dimensional Value, huge Value, timeliness requirement Velocity). Further, the financial field is the industry that best represents data. The financial industry is very wide in category industry, and the major categories mainly include: bank class, investment class, insurance class, etc. The smaller particle size can be divided into: money, bond, fund, trust, etc. resource management programs, element markets, credit and loan, etc. The application of the knowledge graph in the financial field mainly comprises the following steps: wind control, credit investigation, auditing, anti-fraud, data analysis, automated reporting, etc. The knowledge graph is repeatedly constructed by different organizations, organizations and individuals aiming at different fields and different requirements. The data classes in these maps fall into three categories: 1) structuring data: taking e-government form data as a representative, usually taking the ID of a person or an organization as an anchor point to aggregate different information, such as name, occupation, income, etc.; a series of organization forms such as a basic bank, a subject bank, a special subject bank and the like can be evolved in the future. 2) Unstructured data: video, image, voice and text are taken as representatives, and most of the follow-up data need to be analyzed and processed into structured data to be used. 3) Spatio-temporal data: represented by geographic information, IoT, trajectory data.

The embodiment is a realistic problem of how to organize, rationalize and automatically fuse the heterogeneous knowledge maps aiming at the current situation in the financial field so as to improve the coverage and quality of knowledge and solve the problems of low data quality or data loss and the like of a single knowledge map so as to achieve better application effect. And organically combining an entity alignment algorithm and an attribute alignment algorithm in the field of knowledge graph fusion. And finally, designing and realizing a fine-grained knowledge graph fusion method for the heterogeneous two-dimensional overlapped large sample data source in the financial field based on a positive and negative sample fusion expression learning algorithm.

In the step S1, performing iterative entity alignment on attribute triples corresponding to the knowledge graph to obtain an entity pair set of the attribute triples, and performing multi-level classification on the entity pairs to obtain high-confidence entity pairs; and taking the obtained high-confidence entity pair as a training data set of the embedded model, performing structured embedding of the embedded model by using the relation triple to obtain high-dimensional space vector representation of the entity and the relation, and setting weights for the attribute and the relation to obtain the final similarity of the attribute and the relation.

And performing iterative entity alignment on the attribute triples based on the probability model. The method comprises the steps of carrying out multi-level grading on similarity on entities to form a hierarchical tree structure, setting different thresholds based on a tree hierarchical diagram to obtain entity pairs with high confidence, enabling the entity pairs to be relatively high in quality based on the tree hierarchical diagram, using the entity pairs as embedded model training to further obtain high-dimensional vector representation of low-latitude data, combining training of a logistic regression model with high-dimensional vectors to form uniform similarity mapping of a high-dimensional low-latitude space, and further obtaining final similarity based on weight setting. The method specifically comprises the following steps:

s11, based on attribute triple alignment, carrying out entity alignment on attribute triples based on an iterative model, carrying out entity matching operation based on attributes and attribute values corresponding to the attributes to obtain an entity pair set, carrying out attribute similarity matching operation on the entity pair set to obtain an attribute pair set, and repeatedly executing two steps until a new entity and attribute pair set cannot be generated to obtain an entity pair with high confidence level;

s111, setting uniform weight for the public attributes during attribute alignment, and calculating the similarity between entities;

because the attribute coverage is low, and the attribute names and the attribute values are expressed in diversity, the attributes of the same entity are different, and based on the viewpoint that any public attribute is particularly important when the attributes are aligned, the public attributes of the two entities are set as uniform weights, and the similarity of the two entities is calculated according to the following formula, which is expressed as:

representing an entity e₁A k-th attribute common to both entities;

And

the similarity between them is expressed as:

and S112, searching potential aligned attribute pairs according to the aligned entity pairs.

S111 may find a potentially aligned attribute pair according to the aligned entity pair, specifically, obtain an aligned entity pair set, then find a subset including the potentially aligned attribute pair in the aligned entity set according to the potentially aligned attribute pair, measure attribute name similarity according to the entity, and represent a calculation formula for the attribute pair in the entity pair as follows:

wherein the content of the first and second substances,

representing attribute pairs

The similarity of (2);

representing the number of elements in the finite set of entities;

representing the similarity between attribute values.

Based on the mathematical model, the interactive alignment of the attributes and the entities is carried out according to the following algorithm iteration, wherein the algorithm is as follows:

s12, based on a relational embedding alignment method, performing structured embedding on the obtained high-confidence entity pair serving as a training data set of an embedded model by using a relational triple, describing and modeling a global structure of a knowledge graph to be fused, and finally obtaining high-dimensional space vector representation of the entity and the relation;

the structure embedding model optimizes the maximum boundary loss function to enable the positive samples to score the scores of the regional negative samples, and the formula is as follows:

O_SE＝∑∑(f(tr)-α(tr'))

wherein f (tr) | | | h + r-t | | | represents a score function; tr and Tr' represent a finite set of positive and negative sample triples; alpha is located between [0,1] and represents a hyperparameter for weighting positive and negative samples. And obtaining entity high-dimensional vectors in the two knowledge maps based on the embedding process so as to obtain the similarity through cosine distance.

S13, fusing the attribute alignment and the relationship alignment based on different weights to obtain the alignment result of two dimensions of the relationship and the attribute, and obtaining the total similarity of the attribute and the relationship by adopting a linear combination mode, wherein the similarity is expressed as follows:

Sim(Ei，Ej)＝λ×simR(ei,ej)+(1-λ)×simA(ei,ej)

wherein, sim (E)_i，E_j) Represents the total similarity; simR represents similarity obtained based on the relation triplets; the simA represents the similarity obtained by using the attribute triples; λ represents a weight, and this weight is learned by a regression model. More specifically, the importance of attributes and relationships in different data sets are different, the relationships and attribute qualities of different knowledge graphs are different, if the relationship quality in the knowledge graph is high, the relationship alignment obviously has higher confidence, and in the sparse knowledge graph, the result of alignment based on the attributes obviously has higher confidence.

In step S2, the entity attributes are filtered according to the obtained similarity, and the final similarity of the attributes is obtained.

The method comprises the steps of screening attributes with common meanings in a knowledge base according to the similarity of an entity, determining a threshold value standard, designing an attribute function to automatically screen the entity attributes, and utilizing part of information of the entity attributes to comprise upper and lower concepts, labels, attribute values and the like of the entity. And combining the similarity obtained by using the information with the attribute name similarity to obtain the final similarity of the attributes, and performing pruning operation to reduce the redundancy of the entity attributes in the fused knowledge graph. Finally, the two maps are interactively executed and mutually promoted, so that the two maps are input, the output is the fused map, and only the aligned entity pair is output. The method specifically comprises the following steps:

s21, calculating the similarity between the attributes;

the similarity of the attribute names and the semantic information of the attributes are particularly important, the similarity calculation expected by the embodiment can be deeply carried out on the specific information contained in the semantic level of different attributes, but not on the simple matching of the character level, and the similarity calculation is carried out by utilizing an open source laboratory AILab Chinese word vector library based on the following formula, which is expressed as follows:

Sim_property＝COS(Name_property1Name_property2)

wherein Sim_propertyRepresenting the similarity of two properties property1 and property2 at the property name level; name_property1With Name_property2Respectively representing high-dimensional space vector representation, and finally obtaining results which are cosine values of two attribute name vectors.

S22, calculating the similarity between adjacent entities;

the associated entity similarity, in addition to the entity similarity itself, the present embodiment notes that the neighboring relationship of the entities can also improve the knowledge fusion quality, and it is assumed here that if the similarity of the neighboring entities of the two attributes reaches a certain threshold, the attribute pair can be considered to be similar. The formula is calculated for the above-mentioned property1 and property2 adjacent entities as follows:

Sim_entity＝|entityList₁∩entityList₂|/|entityList₁∪entityList₂|

wherein Sim_entityIndicating that two adjacent entities are similarDegree; the entityList1 and entityList2 represent a limited set of entities adjacent to the property1 and property 2.

S23, calculating the similarity of the attribute label set;

the similarity of related entity labels can be found in a plurality of search engines such as Wikipedia, Baidu and Saigao, and the generalization of the entry labels to entity features such as 'any positive negation' search can be found, and the labels such as president and CEO can be found. Such labels tend to be quite representative.

Based on the situation, from the perspective of improving the quality of the knowledge graph, the Label vector Label of the property1 is constructed_property1＝(X₁，X₂，...X_n) And the label vector label of property2_property2＝ (y₁，y₂，...y_n) Therefore, the similarity of the finite set of attribute labels is calculated, and the formula is as follows:

Sim_label＝COS(label_property1，label_property2)

wherein, Sim_labelFinite tag similarity representing property;

The similarity of the upper concepts of the associated entities, the hierarchical concept tree exists in the knowledge graph, and the root node is 'human' as the most common person, so as to be further differentiated into 'political field', 'economic field', 'amusement circle', and the like, and the economic field can be divided into 'real estate', 'automobile industry', and the like, and finally the concept hierarchical tree is formed, based on the condition, the upper-layer concept paths of the entity attributes in the two knowledge graphs are extracted, and path vectors are formed, so that the similarity calculation is carried out, and the formula is as follows:

Sim_con＝COS(concept_property1，concept_property2)

In step S3, knowledge feature fusion of high-confidence entity pairs and attributes is automatically completed based on the classifier model and the atomic expression algorithm obtained by machine learning training.

Aiming at the practical situation that negative samples are not usually recorded in the knowledge fusion process, the data features are automatically extracted based on a machine learning algorithm, and under the condition that manual intervention can be reduced, knowledge can achieve a good fusion effect, and the method specifically comprises the following steps:

the classification function is formulated as follows:

and according to the high-quality entity pairs and the attribute set obtained in the step S1 and the step S2, determining attributes which meet the standard and can be subjected to similarity calculation, utilizing an atomic expression to enable the F-measure of each pair of attributes to reach the maximum, and completing the creation of an expression tree based on AND operation.

S32, screening attributes by using an atomic expression;

the atomic expression is based on the premise that a proper metric function is determined so that a proper threshold can be configured for screening the functions participating in similarity calculation, it is obvious that attribute pairs still need to be screened next after the operations of step S1 and step S2, in order to further improve the precision and quality of the fused knowledge graph, the first principle of attribute screening is the general representativeness of attributes, and the screening formula is as follows:

wherein, cover (p) represents the property universality, the numerator is the number containing the property p, and the denominator is the number of all subject entities; KG1 and KG2 were screened to obtain a finite set of properties, P1 and P2, respectively. Then, optionally, according to a custom function M, a cartesian product operation is performed on P1 and P2, to find a function Mp1, P2 and a corresponding threshold index θ that maximizes the F-measure value of the attribute pair (P1, P2), and then obtain an atomic expression set, as follows:

Considering the defect that atomic expressions only utilize local information of attributes, atomic expressions are combined for use, and the formula is expressed as:

wherein φ (E) represents an operator symbol; u, "" indicates OR, AND AND DIFF operators, respectively.

In step S4, bi-directional supervised interactive data fusion is completed based on knowledge feature fusion.

The method realizes a two-way supervision interactive data fusion algorithm, carries out interactive supervision training on knowledge maps to be fused based on the assumption that the knowledge maps to be fused have a considerable degree of fitting, enhances the quality and quantity of the knowledge maps in the process of circulating fusion, emphasizes structural information among entities, weakens the similarity weight of the linguistic similarity in the process of fusion, converts the similarity of low latitude characters into the similarity of high-dimensional space vectors, realizes cross-domain structural fusion, and specifically comprises the following steps:

vector embedding of RDF triples is realized based on TransE and PTransE models, and a loss function of the TransE is defined as follows:

L(h,r,t)＝[γ+E[h,r,t]-E(h',r',t')]₊

wherein L (h, r, t) represents a loss function; γ represents an interval value; e (h, r, t) represents vector embedding and (h ', r ', t ') is an error triplet.

The TransE algorithm flow is as follows:

the method is different from other methods in which the model is complicated and difficult to cut due to excessive setting parameters of training triples, is suitable for processing a knowledge graph with a large data set scale and simple data content, considers that the processing process necessarily involves a large scale and the data set content is complicated, considers that the PtransE model is adopted for processing, and has the following specific algorithm flow:

In step S41, the training of a single knowledge graph is mainly completed, and the essence of fusion is to remap the high-dimensional space vector of the processed entity relationship in the low latitude space, so this section completes the two-way supervised training of two knowledge graphs based on the pre-fused entity pair information, and forms the constraint on the respective vector in the process, and the specific algorithm pseudo code is as follows:

s43, knowledge representation learning and supervised learning are a reciprocating iterative process, for the entity pairs E1 and E2 in the two networks, a threshold value theta is determined, if E (E1, E2) < theta, the entity pairs are considered to be similar entities, namely (E1, E2) are called as a standard entity pair, and the standard entity pair can find more entity pairs through bidirectional supervision in the iterative process, wherein the specific flow of the iteration is as follows:

compared with the prior art, the beneficial effect of this embodiment:

The specific embodiments described herein are merely illustrative of the spirit of the invention. Various modifications or additions may be made to the described embodiments or alternatives may be employed by those skilled in the art without departing from the spirit or ambit of the invention as defined in the appended claims.

Claims

1. A fine-grained knowledge graph fusion method of a two-dimensional overlapped large sample data source is characterized by comprising the following steps:

2. The method for fusing the fine-grained knowledge graph of a two-dimensional overlapped large sample data source according to claim 1, wherein the step S1 specifically comprises:

3. The method according to claim 2, wherein the step S2 specifically includes:

s21, calculating the similarity between the attributes;

s22, calculating the similarity between adjacent entities;

s23, calculating the similarity of the attribute label set;

4. The method according to claim 3, wherein the step S3 specifically includes:

s32, screening attributes by using an atomic expression;

5. The method according to claim 4, wherein the step S4 specifically includes:

6. The method according to claim 2, wherein the step S11 specifically includes:

representing an entity e₁In the kth genus common to both entitiesSex;

And

the similarity between them is expressed as:

wherein the content of the first and second substances,

representing attribute pairs

The similarity of (2);

representing the number of elements in the finite set of entities;

representing the similarity between attribute values.

7. The fine-grained knowledgegraph fusion method of a two-dimensional overlapped large sample data source according to claim 6, wherein the total similarity of attributes and relations obtained in step S13 is expressed as:

Sim(Ei，Ej)＝λ×simR(ei,ej)+(1-λ)×simA(ei,ej)

wherein simR represents similarity obtained based on the relation triples; simA represents the similarity obtained using the attribute triples; λ represents a weight; sim (E)_i，E_j) Representing the overall similarity.

8. The fine-grained knowledgegraph fusion method of a two-dimensional overlapping large sample data source of claim 3, wherein the similarity between the attributes is calculated in step S21 as:

Sim_property＝COS(Name_property1Name_property2)

9. The fine-grained knowledgegraph fusion method of a two-dimensional overlapping large sample data source of claim 8, wherein the similarity between adjacent entities is calculated in step S22 as:

Sim_entity＝|entityList₁∩entityList₂|/|entityList₁∪entityList₂|

10. The fine-grained knowledgegraph fusion method of a two-dimensional overlapped large sample data source of claim 9, wherein the similarity of the attribute tag sets is calculated in step S23 as:

Sim_label＝COS(label_property1，label_property2)

wherein, Sim_labelFinite tag similarity representing property;

in step S24, the final similarity of the attributes is calculated, and is expressed as:

Sim_con＝COS(concept_property1，concept_property2)