CN117725555A - Multi-source knowledge tree association fusion method and device, electronic equipment and storage medium - Google Patents

Multi-source knowledge tree association fusion method and device, electronic equipment and storage medium Download PDF

Info

Publication number
CN117725555A
CN117725555A CN202410176275.4A CN202410176275A CN117725555A CN 117725555 A CN117725555 A CN 117725555A CN 202410176275 A CN202410176275 A CN 202410176275A CN 117725555 A CN117725555 A CN 117725555A
Authority
CN
China
Prior art keywords
knowledge
node
nodes
tree
trees
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Pending
Application number
CN202410176275.4A
Other languages
Chinese (zh)
Inventor
罗歆昱
陈崇雨
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
DMAI Guangzhou Co Ltd
Original Assignee
DMAI Guangzhou Co Ltd
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by DMAI Guangzhou Co Ltd filed Critical DMAI Guangzhou Co Ltd
Priority to CN202410176275.4A priority Critical patent/CN117725555A/en
Publication of CN117725555A publication Critical patent/CN117725555A/en
Pending legal-status Critical Current

Links

Classifications

    • YGENERAL TAGGING OF NEW TECHNOLOGICAL DEVELOPMENTS; GENERAL TAGGING OF CROSS-SECTIONAL TECHNOLOGIES SPANNING OVER SEVERAL SECTIONS OF THE IPC; TECHNICAL SUBJECTS COVERED BY FORMER USPC CROSS-REFERENCE ART COLLECTIONS [XRACs] AND DIGESTS
    • Y02TECHNOLOGIES OR APPLICATIONS FOR MITIGATION OR ADAPTATION AGAINST CLIMATE CHANGE
    • Y02DCLIMATE CHANGE MITIGATION TECHNOLOGIES IN INFORMATION AND COMMUNICATION TECHNOLOGIES [ICT], I.E. INFORMATION AND COMMUNICATION TECHNOLOGIES AIMING AT THE REDUCTION OF THEIR OWN ENERGY USE
    • Y02D10/00Energy efficient computing, e.g. low power processors, power management or thermal management

Abstract

The application provides a method, a device, electronic equipment and a storage medium for associated fusion of a multi-source knowledge tree, which comprise the following steps: processing the knowledge trees to determine different knowledge tree clusters; performing node association degree calculation on any two knowledge nodes among different knowledge trees in the knowledge tree clusters based on character features, semantic features and structural features of a plurality of knowledge nodes in each knowledge tree cluster, and determining a plurality of associated knowledge node lists in the knowledge tree clusters; integrating the attributes of a plurality of associated knowledge nodes in each associated knowledge node list, determining a combined knowledge node of each associated knowledge node list, determining an intermediate knowledge node of the associated knowledge node list from the associated knowledge node list based on a weight maximization method, and taking the intermediate knowledge node as the combined knowledge node; and updating the attribute and the relationship of each merged knowledge node in the knowledge tree cluster to generate a merged knowledge tree, thereby providing a more comprehensive and comprehensive knowledge tree.

Description

Multi-source knowledge tree association fusion method and device, electronic equipment and storage medium
Technical Field
The application relates to the technical field of knowledge tree fusion, in particular to a method, a device, electronic equipment and a storage medium for associated fusion of multi-source knowledge trees.
Background
In recent years, with the rapid development of information technology and big data analysis, the demands for knowledge management and knowledge fusion are becoming more and more urgent. In many fields, such as academic research, enterprise management, decision support, etc., the association and fusion of knowledge is critical to the acquisition of comprehensive information and deep insight issues. Traditional knowledge fusion methods rely mainly on manual integration and analysis, and are limited by time, resources and subjective factors, low in efficiency and prone to introducing subjective deviation. Thus, there is a need for an automated and reliable method for knowledge correlation and fusion. A common knowledge representation model is a knowledge tree that organizes knowledge elements together by way of nodes and edges to form a structured knowledge representation. Nodes of the knowledge tree may represent concepts, entities, etc. in the field, while edges represent relationships between the nodes. However, the existing knowledge tree fusion method mainly has the following problems: 1) Node association of knowledge trees requires a large number of computation steps, resulting in computational inefficiency. 2) Existing methods rely only on similarities between nodes to shallow associate knowledge trees, while ignoring deeper associations between nodes. This approach limits the ability to fully understand and analyze the knowledge tree. Therefore, how to fuse knowledge trees is a small technical problem.
Disclosure of Invention
In view of this, an object of the present application is to provide a method, an apparatus, an electronic device, and a storage medium for associative fusion of multiple knowledge trees, which can remove duplicate and redundant knowledge by merging knowledge nodes that are the same or related in different knowledge trees, improve accuracy of knowledge and improve reliability of knowledge, and the different knowledge trees may contain descriptions of different aspects of the same concept or topic, and can integrate and display views and information of different knowledge sources by associative fusion of related knowledge nodes in the different knowledge trees, thereby providing a more comprehensive and comprehensive knowledge tree.
The embodiment of the application provides a correlation fusion method of a multi-source knowledge tree, which comprises the following steps:
carrying out knowledge clustering processing on a plurality of knowledge trees to determine different knowledge tree clusters;
performing node association calculation on any two knowledge nodes among different knowledge trees in the knowledge tree clusters based on character features, semantic features and structural features of a plurality of knowledge nodes in each knowledge tree cluster, and determining a plurality of associated knowledge node lists in the knowledge tree clusters;
integrating the attributes of a plurality of associated knowledge nodes in each associated knowledge node list, determining a combined knowledge node of each associated knowledge node list, determining an intermediate knowledge node of the associated knowledge node list from the associated knowledge node list based on a weight maximization method, and taking the intermediate knowledge node as the combined knowledge node;
And updating the attribute and the relationship of each merged knowledge node in the knowledge tree cluster to generate a merged knowledge tree.
In one possible implementation manner, the knowledge clustering processing is performed on the plurality of knowledge trees to determine different knowledge tree clusters, including:
extracting structural features and content features of each knowledge tree, and determining knowledge feature information of each knowledge tree;
calculating knowledge characteristic information of any two knowledge trees based on a calculation formula of cosine similarity, and determining similarity between any two knowledge trees;
and clustering similar knowledge trees together based on the similarity between any two knowledge trees by using a clustering algorithm, and determining different knowledge tree clusters.
In one possible implementation manner, the calculating node association degree of any two knowledge nodes between different knowledge trees in the knowledge tree clusters based on character features, semantic features and structural features of a plurality of knowledge nodes in each knowledge tree cluster, and determining a plurality of associated knowledge node lists in the knowledge tree clusters includes:
calculating character characteristics of any two knowledge nodes among different knowledge trees, and determining editing distance similarity and longest public subsequence similarity between any two knowledge nodes;
Calculating semantic features of any two knowledge nodes among different knowledge trees, and determining semantic similarity among any two knowledge nodes;
weighting the edit distance similarity, the longest common subsequence similarity and the semantic similarity between any two knowledge nodes to determine the similarity between any two knowledge nodes;
and carrying out node association degree calculation based on the structural characteristics of a plurality of knowledge nodes and the similarity between any two knowledge nodes, and determining a plurality of associated knowledge node lists in the knowledge tree cluster.
In one possible implementation manner, the calculating node association degree based on the structural features of the plurality of knowledge nodes and the similarity between any two knowledge nodes, and determining a plurality of associated knowledge node lists in the knowledge tree cluster includes:
if any two knowledge nodes are leaf knowledge nodes, determining the association degree of any two knowledge nodes based on the similarity of any two knowledge nodes and the similarity of the father knowledge nodes of any two knowledge nodes;
if any one of the two knowledge nodes is not a leaf knowledge node, determining the similarity of the any two knowledge nodes as the association degree of the any two knowledge nodes;
If the association degree of any two knowledge nodes is greater than or equal to a preset association degree threshold value, the any two knowledge nodes are associated knowledge nodes, and a plurality of associated knowledge nodes with association relations form an associated knowledge node list.
In one possible implementation manner, the weight-based maximization method determines an intermediate knowledge node of the associated knowledge node list from the associated knowledge node list, and takes the intermediate knowledge node as a combined knowledge node, including:
and aiming at any associated knowledge node list, carrying out association degree average value calculation on each associated knowledge node in the associated knowledge node list based on a weight maximization method, taking the associated knowledge node corresponding to the largest association degree average value as an intermediate knowledge node of the associated knowledge node list, and taking the intermediate knowledge node as the combined knowledge node of the associated knowledge node list.
In one possible implementation, the association fusion method further includes determining the intermediate knowledge node by:
inputting the associated knowledge node list into a natural language model, and processing a plurality of associated knowledge nodes in the associated knowledge node list to generate one intermediate knowledge node.
In one possible implementation manner, the updating the attribute and the relationship of each of the merged knowledge nodes in the knowledge tree cluster to generate a merged knowledge tree includes:
fusing the attributes of a plurality of associated knowledge nodes in the associated knowledge node list, determining attribute information of the combined knowledge nodes, and updating the attribute information of the combined knowledge nodes;
and controlling the merged knowledge node to inherit the father-son node relation of each other knowledge node in the associated knowledge node list, and generating the merged knowledge tree based on a plurality of the merged knowledge nodes and other knowledge nodes in the knowledge tree cluster.
In one possible implementation manner, the fusing the attributes of the multiple associated knowledge nodes in the associated knowledge node list to determine attribute information of the merged knowledge node includes:
and fusing the attributes of a plurality of associated knowledge nodes in the associated knowledge node list based on any one of a numerical value average value fusion method, a relevance fusion method, a character string fusion method and a list splicing method to determine the attribute information of the merged knowledge nodes.
The embodiment of the application also provides a correlation fusion device of the multi-source knowledge tree, which comprises:
the knowledge tree clustering module is used for carrying out knowledge clustering processing on the plurality of knowledge trees and determining different knowledge tree clusters;
the association module is used for carrying out node association degree calculation on any two knowledge nodes among different knowledge trees in the knowledge tree clusters based on character features, semantic features and structural features of a plurality of knowledge nodes in each knowledge tree cluster, and determining a plurality of associated knowledge node lists in the knowledge tree clusters;
the fusion module is used for integrating the attributes of a plurality of associated knowledge nodes in each associated knowledge node list, determining a combined knowledge node of each associated knowledge node list, determining an intermediate knowledge node of the associated knowledge node list from the associated knowledge node list based on a weight maximization method, and taking the intermediate knowledge node as the combined knowledge node;
and the generating module is used for updating the attribute and the relationship of each combined knowledge node in the knowledge tree cluster to generate a fusion knowledge tree.
The embodiment of the application also provides electronic equipment, which comprises: the system comprises a processor, a memory and a bus, wherein the memory stores machine-readable instructions executable by the processor, the processor and the memory are communicated through the bus when the electronic device runs, and the machine-readable instructions are executed by the processor to execute the steps of the associated fusion method of the multi-source knowledge tree.
Embodiments of the present application also provide a computer readable storage medium having stored thereon a computer program which, when executed by a processor, performs the steps of the association fusion method of a multi-source knowledge tree as described above.
The method, the device, the electronic equipment and the storage medium for the association fusion of the multi-source knowledge tree provided by the embodiment of the application comprise the following steps: carrying out knowledge clustering processing on a plurality of knowledge trees to determine different knowledge tree clusters; performing node association calculation on any two knowledge nodes among different knowledge trees in the knowledge tree clusters based on character features, semantic features and structural features of a plurality of knowledge nodes in each knowledge tree cluster, and determining a plurality of associated knowledge node lists in the knowledge tree clusters; integrating the attributes of a plurality of associated knowledge nodes in each associated knowledge node list, determining a combined knowledge node of each associated knowledge node list, determining an intermediate knowledge node of the associated knowledge node list from the associated knowledge node list based on a weight maximization method, and taking the intermediate knowledge node as the combined knowledge node; and updating the attribute and the relationship of each merged knowledge node in the knowledge tree cluster to generate a merged knowledge tree. By merging identical or related knowledge nodes in different knowledge trees, duplicate and redundant knowledge is removed, knowledge accuracy is improved, knowledge reliability is improved, and different knowledge trees possibly contain descriptions of different aspects of the same concept or topic, and by performing association fusion on related knowledge nodes in the different knowledge trees, views and information of different knowledge sources can be integrated and displayed, so that a more comprehensive and comprehensive knowledge tree is provided.
In order to make the above objects, features and advantages of the present application more comprehensible, preferred embodiments accompanied with figures are described in detail below.
Drawings
In order to more clearly illustrate the technical solutions of the embodiments of the present application, the drawings that are needed in the embodiments will be briefly described below, it being understood that the following drawings only illustrate some embodiments of the present application and therefore should not be considered limiting the scope, and that other related drawings may be obtained according to these drawings without inventive effort for a person skilled in the art.
FIG. 1 is a flowchart of a method for associative fusion of multiple source knowledge trees according to an embodiment of the present application;
FIG. 2 is a schematic diagram of a method for associative fusion of multiple source knowledge trees according to an embodiment of the present disclosure;
fig. 3 is a schematic structural diagram of an apparatus for a method for associative fusion of multiple source knowledge trees according to an embodiment of the present application;
fig. 4 is a schematic structural diagram of an electronic device according to an embodiment of the present application.
Detailed Description
For the purposes of making the objects, technical solutions and advantages of the embodiments of the present application more clear, the technical solutions in the embodiments of the present application will be clearly and completely described below with reference to the drawings in the embodiments of the present application, and it is apparent that the described embodiments are only some embodiments of the present application, but not all embodiments. The components of the embodiments of the present application, which are generally described and illustrated in the figures herein, may be arranged and designed in a wide variety of different configurations. Thus, the following detailed description of the embodiments of the present application, as provided in the accompanying drawings, is not intended to limit the scope of the application, as claimed, but is merely representative of selected embodiments of the application. Based on the embodiments of the present application, every other embodiment that a person skilled in the art would obtain without making any inventive effort is within the scope of protection of the present application.
First, application scenarios applicable to the present application will be described. The method and the device can be applied to the technical field of knowledge tree fusion.
Based on this, the embodiment of the application provides a multi-source knowledge tree association fusion method, which is capable of removing duplicate and redundant knowledge by merging the same or related knowledge nodes in different knowledge trees, improving the accuracy of knowledge and the reliability of knowledge, and possibly including descriptions of different aspects of the same concept or theme, and integrating and displaying the views and information of different knowledge sources by carrying out association fusion on the related knowledge nodes in the different knowledge trees, so as to provide a more comprehensive and comprehensive knowledge tree.
Referring to fig. 1, fig. 1 is a flowchart of a method for association fusion of multiple source knowledge trees according to an embodiment of the present application. As shown in fig. 1, the association fusion method provided in the embodiment of the present application includes:
s101: and carrying out knowledge clustering processing on the knowledge trees to determine different knowledge tree clusters.
In the step, knowledge clustering processing is carried out on a plurality of knowledge trees from different sources, and different knowledge tree clusters are determined.
The knowledge tree clusters are used for clustering a large number of knowledge trees into a small number of tree clusters, similar knowledge trees are clustered together, the similar knowledge trees have similar subjects and contents, and the clustered knowledge tree clusters have higher internal consistency and correlation. In the subsequent association and fusion calculation, a plurality of knowledge trees belonging to the same cluster are associated and fused together, so that the efficiency and accuracy of association and fusion can be improved.
In a specific embodiment, the performing knowledge clustering processing on the plurality of knowledge trees to determine different knowledge tree clusters includes:
a: and extracting the structural features and the content features of each knowledge tree, and determining knowledge feature information of each knowledge tree.
Here, feature extraction is performed on the structural features and the content features of each knowledge tree, and knowledge feature information of each knowledge number is determined.
Among other things, features describing knowledge trees can be generally divided into two categories: structural features and content features. The structural features comprise information such as the height, depth, branch number, node number and the like of the knowledge tree, and can be used for evaluating the structural similarity of the tree; the content features include information on the knowledge nodes such as attributes, values, or tags, which can be used to evaluate the content similarity of the knowledge tree. The knowledge tree clustering of the scheme aims at clustering knowledge trees with similar subject and content into a cluster, and in order to better acquire the representation of the knowledge tree, the scheme fuses the structural features and the content features to obtain knowledge feature information. First, each knowledge tree is treated as a document, each knowledge node is treated as a phrase in the document, and a knowledge tree dictionary is constructed. In the statistics of word frequency, structural features are introduced, nodes of different levels are given different weights, and the nodes closer to the root node are weighted more heavily (because the root node can generally represent the subject of a knowledge tree). Finally, the representation of the knowledge tree is obtained by using a document vector representation method in a natural language processing technology.
The method comprises the following specific steps: 1) The knowledge node is weighted: obtaining N knowledge trees, setting tree_a tree height as h_a for each knowledge tree (tree_a), traversing to obtain a knowledge node list node_a by using breadth-first search, wherein each knowledge node in the node_a contains name (text naming) attribute and level (level, starting calculation by a root node, calculating and increasing word frequency weight, setting weight (word frequency weight) attribute for each knowledge node in the node_a, and performing data preprocessing on weight=log2 (h_a-level+1): removing irrelevant punctuation marks, stop words, numbers and the like for each knowledge node in the knowledge node list, and performing word segmentation operation; 3) Constructing a knowledge tree dictionary: and constructing a dictionary by utilizing the preprocessed knowledge tree data. Traversing all text data in each knowledge tree, adding the appearing words to the vocabulary, and removing the repeated words. Associating each word with a unique index to form a dictionary for subsequent vector representations; 4) Calculating knowledge tree representation: for each knowledge tree, I can use vocabulary, text data, and node word frequency weights to calculate WTF-IDF (weighted word frequency-inverse document frequency) vector representations. First, a weighted word frequency (WTF) is calculated: the weighted frequency of occurrence of each word in the text is counted. For word w, when traversing knowledge tree nodes, its weighted frequency at a certain knowledge node node_a is equal to the knowledge node word frequency weight multiplied by the frequency of occurrence of w in node_a. The weighted frequency of w is then the sum of the weighted frequencies of w in all knowledge nodes of the knowledge tree. To reduce the impact of high frequency words on similarity, a logarithmic form of weighted word frequencies will be employed. Next, an Inverse Document Frequency (IDF) is calculated: for each word, the number of knowledge trees containing the word is counted, and the total number of knowledge trees is divided by the logarithm of the number, and then the number is taken as a logarithm. Finally, the WTF and the IDF are multiplied to obtain a WTF-IDF vector. The complete formula is as follows:
①WTF(w, node_a) = TF_weight(node_a)freq(w, node_a);
②WTF(w) = sum(WTF(w, node_a) for all nodes in the knowledge tree);
③WTF_IDF(w) = log(WTF(w))log(N / DF(w));
Where WTF (w, node_a) represents the weighted word frequency of word w in knowledge node node_a, tf_weight (node_a) represents the word frequency weight of knowledge node node_a, freq (w, node_a) represents the frequency of word w in node_a, WTF (w) represents the weighted word frequency of word w in the knowledge tree, sum represents the summation operation, wtf_idf (w) represents the WTF-IDF vector of word w, log represents the natural logarithm, N represents the total number of knowledge trees, DF (w) represents the number of knowledge trees containing word w.
B: and calculating knowledge characteristic information of any two knowledge trees based on a calculation formula of cosine similarity, and determining the similarity between any two knowledge trees.
Here, the knowledge characteristic information of any two knowledge trees is calculated through a calculation formula of cosine similarity, and the similarity between any two knowledge trees is determined.
Specifically, after knowledge feature information of two knowledge trees is acquired, the similarity between the knowledge feature information and the knowledge feature information can be calculated. The degree of similarity between two vectors can be measured using cosine similarity. Construction of an NAn N-dimensional similarity matrix, where N represents the number of knowledge trees. [ i, j ] in the matrix]The element of the position represents the similarity calculation result of the ith knowledge tree and the jth knowledge tree.
C: and clustering similar knowledge trees together based on the similarity between any two knowledge trees by using a clustering algorithm, and determining different knowledge tree clusters.
Here, the similarity matrix is input into a clustering algorithm, and similar knowledge trees are clustered together according to the similarity between a plurality of any two knowledge trees, so that different knowledge tree clusters are determined.
Wherein each knowledge tree represents a data point, and the similarity between two pairs represents the distance between the data points. And clustering the similar knowledge trees together by using a clustering algorithm to form different clusters by taking the similarity matrix of each knowledge tree as input data. Alternative clustering algorithms include k-means clustering, hierarchical clustering, density clustering, and the like. The method comprises the following specific steps: randomly selecting an unvisited knowledge tree; judging whether the neighborhood of the knowledge tree meets the requirements of a specified radius (similarity threshold, set to 0.85) and the minimum data point number (set to 1) in the neighborhood, if so, classifying the knowledge tree and the knowledge tree in the neighborhood into a cluster, and marking the knowledge trees as accessed; if the number of knowledge trees in the neighborhood is insufficient, marking the knowledge trees as noise knowledge trees; repeating the steps for the knowledge trees which are not accessed until all knowledge trees are accessed; after the clustering is completed, each cluster is used as a knowledge tree cluster, and a group of densely-associated knowledge trees is contained.
Through this step, a large number of knowledge trees are clustered into a smaller number of knowledge tree clusters based on content similarity. Subsequent knowledge association and fusion will occur in each individual knowledge tree cluster. The significance of this step is mainly represented by the following two aspects: and improving the accuracy of association and fusion: knowledge tree clustering can aggregate similar knowledge trees together, the similar knowledge trees have similar topics and contents, and clustered knowledge tree stacks have higher internal consistency and correlation. In subsequent association and fusion calculation, knowledge trees from the same cluster stack are associated and fused together, so that the accuracy of association and fusion can be improved. And (3) relieving the calculation pressure: knowledge tree clustering can combine a large number of knowledge trees with fine granularity into fewer cluster stacks, and reduces the scale and complexity of association and fusion calculation, thereby reducing the calculation pressure. Especially when facing large-scale knowledge tree fusion tasks, the problem can be decomposed into a plurality of small-scale cluster stacks through clustering, parallel calculation is better performed, and the overall calculation efficiency is improved.
S102: and performing node association degree calculation on any two knowledge nodes among different knowledge trees in the knowledge tree clusters based on character features, semantic features and structural features of a plurality of knowledge nodes in each knowledge tree cluster, and determining a plurality of associated knowledge node lists in the knowledge tree clusters.
In the step, for character features, semantic features and structural features of a plurality of knowledge nodes in each knowledge cluster, node association degree calculation is performed on any two knowledge nodes among different knowledge trees in the knowledge tree cluster, and a plurality of associated knowledge node lists in the knowledge tree cluster are determined.
The character features are name character strings of the knowledge nodes, the semantic features are semantic vectors obtained through a language model according to the character features of the knowledge nodes, and the structural features are father knowledge nodes and leaf knowledge nodes of the knowledge nodes.
Wherein the associated knowledge node list is formed by a plurality of associated knowledge nodes.
In one possible implementation manner, the calculating node association degree of any two knowledge nodes between different knowledge trees in the knowledge tree clusters based on character features, semantic features and structural features of a plurality of knowledge nodes in each knowledge tree cluster, and determining a plurality of associated knowledge node lists in the knowledge tree clusters includes:
s1021: and calculating character characteristics of any two knowledge nodes among different knowledge trees, and determining the editing distance similarity and the longest common subsequence similarity between any two knowledge nodes.
Here, character features of any two knowledge nodes between different knowledge trees are calculated, and the editing distance similarity and the longest common subsequence similarity between any two knowledge nodes are determined.
S1022: and calculating semantic features of any two knowledge nodes among different knowledge trees, and determining semantic similarity between any two knowledge nodes.
Here, the semantic features of any two knowledge nodes among different knowledge trees are calculated, and the semantic similarity between any two knowledge nodes is determined.
S1023: and carrying out weighting processing on the edit distance similarity between any two knowledge nodes, the longest common subsequence similarity and the semantic similarity to determine the similarity between any two knowledge nodes.
And carrying out weighting processing on the edit distance similarity, the longest common subsequence similarity and the semantic similarity between any two knowledge nodes to determine the similarity between any two knowledge nodes.
Specifically, according to the character features of the knowledge node A1 and the character features of the knowledge node B1, an edit distance similarity (A1, B1) and a longest common subsequence similarity lcs_distance (A1, B1) are obtained, cosine similarity is calculated between the semantic features of the knowledge node A1 and the semantic features of the knowledge node B1 to obtain semantic similarity, and then the knowledge node similarity is a weighted value of the edit distance similarity, the longest common subsequence similarity and the semantic similarity.
S1024: and carrying out node association degree calculation based on the structural characteristics of a plurality of knowledge nodes and the similarity between any two knowledge nodes, and determining a plurality of associated knowledge node lists in the knowledge tree cluster.
Here, the node association degree calculation is performed according to the structural features of the plurality of knowledge nodes and the similarity between any two knowledge nodes, and a plurality of associated knowledge node lists in the knowledge tree cluster are determined.
In a specific embodiment, the calculating the node association degree based on the structural features of the plurality of knowledge nodes and the similarity between any two knowledge nodes, and determining a plurality of associated knowledge node lists in the knowledge tree cluster includes:
a: if any two knowledge nodes are leaf knowledge nodes, determining the association degree of any two knowledge nodes based on the similarity of any two knowledge nodes and the similarity of the father knowledge nodes of any two knowledge nodes.
Here, if any two knowledge nodes are leaf knowledge nodes, determining the association degree of any two knowledge nodes according to the similarity of any two knowledge nodes and the similarity of the father knowledge nodes of any two knowledge nodes.
b: if any one of the two knowledge nodes is not a leaf knowledge node, determining the similarity of the any two knowledge nodes as the association degree of the any two knowledge nodes.
Here, if any one of the two knowledge nodes is not a leaf knowledge node, the similarity of the any two knowledge nodes is determined as the association of the any two knowledge nodes.
For example, the association degree between the knowledge node A1 and the knowledge node B1 is represented by correlation (A1, B1), for the knowledge node A1 and the knowledge node B1, the parent knowledge node of the knowledge node A1 is a1_father, the leaf knowledge node is a1_child, the parent node of the knowledge node B1 is b1_father, the leaf knowledge node is b1_child, the association degree of the leaf knowledge node is a weighted sum of the similarity of the knowledge node itself and the parent knowledge node, and the association degree of the intermediate knowledge node only considers the similarity value of the knowledge node itself. Such as: if a1_child is empty and b1_child is empty, corridation (A1, B1) =sim (A1, B1) +sim (a1_father, b1_father)/2; if a1_children is not null or b1_children is not null, corridation (A1, B1) =sim (A1, B1).
c: if the association degree of any two knowledge nodes is greater than or equal to a preset association degree threshold value, the any two knowledge nodes are associated knowledge nodes, and a plurality of associated knowledge nodes with association relations form an associated knowledge node list.
If the association degree of any two knowledge nodes is greater than or equal to the preset association degree threshold, the association relationship between any two knowledge nodes is the association knowledge node, and a plurality of association knowledge nodes with association relationships form an association knowledge node list.
Setting a correlation threshold value correlation_threshold=0.85, traversing each knowledge node of the knowledge tree, and performing correlation calculation with knowledge nodes of different knowledge trees, wherein n correlation knowledge node lists are obtained in the step, and m correlation knowledge nodes are arranged in each correlation knowledge node list. Such as a list of associated knowledge nodes: [ A1, B1, C1, D1] wherein A1, B1 are associated knowledge nodes, B1, C1 are associated knowledge nodes, A1, D1 are associated knowledge nodes.
In the scheme, knowledge nodes of different knowledge trees in the same knowledge tree cluster are associated, and a knowledge node pair with correlation is generated by establishing the association between the knowledge nodes. Through knowledge association, similar or related knowledge nodes can be accurately found, invalid or wrong fusion is avoided, and the accuracy of a subsequent knowledge fusion result is improved.
S103: integrating the attributes of a plurality of associated knowledge nodes in each associated knowledge node list, determining a combined knowledge node of each associated knowledge node list, determining an intermediate knowledge node of the associated knowledge node list from the associated knowledge node list based on a weight maximization method, and taking the intermediate knowledge node as the combined knowledge node.
In the step, the attributes of a plurality of associated knowledge nodes in each associated knowledge node list are integrated, the combined knowledge node of each associated knowledge node list is determined, the intermediate knowledge node of the associated knowledge node list is determined from the associated knowledge node list according to a weight maximization method, and the intermediate knowledge node is used as the combined knowledge node.
The attributes of a plurality of knowledge nodes in each associated knowledge node list are integrated into a combined knowledge node, and unique identifiers are allocated to the combined knowledge node. If the association list [ cloud recording (SaaS) product introduction, 1. Cloud recording-product introduction, cloud recording product introduction ] is initialized to a new merged knowledge node Q.
In a specific embodiment, the weight-based maximization method determines an intermediate knowledge node of the associated knowledge node list from the associated knowledge node list, and uses the intermediate knowledge node as a combined knowledge node, and includes:
And aiming at any associated knowledge node list, carrying out association degree average value calculation on each associated knowledge node in the associated knowledge node list based on a weight maximization method, taking the associated knowledge node corresponding to the largest association degree average value as an intermediate knowledge node of the associated knowledge node list, and taking the intermediate knowledge node as the combined knowledge node of the associated knowledge node list.
Here, for each associated knowledge node list, performing association degree average calculation on each associated knowledge node in the associated knowledge node list according to a weight maximization method, using the associated knowledge node corresponding to the largest association degree average as an intermediate knowledge node of the associated knowledge node list, and using the intermediate knowledge node as a combined knowledge node of the associated knowledge node list.
Specifically, the list of associated knowledge node pairs = [ (node_a1, node_b1, weight (A1, B1)) ], (node_n, node_m, weight (N, M)) ] ], where weight (A1, B1) is the degree of association of node_a1 associated knowledge node and node_b1 associated knowledge node, and for each associated knowledge node in the cor_pair_list, the average value avg_weight of its associated weights, i.e., the sum of the degrees of association of the node included in the associated knowledge node pair in the cor_pair_list divided by the number of times the node appears, is input. avg_weight=Σ (weight (A1, B1))/count; where Σ (weight (A1, B1)) represents the sum of all the associated weights of the node contained in the cor_pair_list, and count represents the number of times the node appears in the cor_list. For example, cor_pair_list= [ (cloud recording (SaaS) product description, 1. Cloud recording-product description, 0.89), (cloud recording (SaaS) product description, cloud recording product description, 0.92), (1. Cloud recording-product description, cloud recording product description, 0.91) ], cor_list= [ cloud recording (SaaS) product description, 1. Cloud recording-product description, cloud recording product description ], calculating average relevance avg_weight of each associated knowledge node in the associated knowledge node list, for example avg_weight= (0.92+0.91)/2=0.915 is the highest, and selecting "cloud recording product description" as the intermediate node.
In one possible implementation, the association fusion method further includes determining the intermediate knowledge node by:
inputting the associated knowledge node list into a natural language model, and processing a plurality of associated knowledge nodes in the associated knowledge node list to generate one intermediate knowledge node.
The method comprises the steps of inputting a related knowledge node list and the degree of association among knowledge nodes, and generating a new knowledge node from the related knowledge node list by using a natural language generation method to serve as a representation of the combined knowledge node.
In this scenario, knowledge fusion refers to the merging of related knowledge nodes into a more comprehensive and consistent knowledge tree. The main task is accomplished by merging and integrating the relevant knowledge nodes. Firstly initializing the merging knowledge nodes, then carrying out attribute fusion and intermediate knowledge node selection, and finally using the selected intermediate knowledge nodes as the representation of the merging knowledge nodes and updating the attributes and the relations of the merging knowledge nodes.
S104: and updating the attribute and the relationship of each merged knowledge node in the knowledge tree cluster to generate a merged knowledge tree.
In the step, each merged knowledge node in the knowledge tree cluster is subjected to attribute and relationship updating to generate a merged knowledge tree. The method and the system realize the integration of the associated knowledge nodes among different knowledge trees, generate a global knowledge tree and keep the relation and attribute characteristics of each associated knowledge point.
In one possible implementation manner, the updating the attribute and the relationship of each of the merged knowledge nodes in the knowledge tree cluster to generate a merged knowledge tree includes:
(1): and fusing the attributes of a plurality of associated knowledge nodes in the associated knowledge node list, determining the attribute information of the combined knowledge nodes, and updating the attribute information of the combined knowledge nodes.
Here, the attributes of the plurality of associated knowledge nodes in the associated knowledge node list are fused, and the attribute information of the merged knowledge node is determined and updated.
In one possible implementation manner, the fusing the attributes of the multiple associated knowledge nodes in the associated knowledge node list to determine attribute information of the merged knowledge node includes:
And fusing the attributes of a plurality of associated knowledge nodes in the associated knowledge node list based on any one of a numerical value average value fusion method, a relevance fusion method, a character string fusion method and a list splicing method to determine the attribute information of the merged knowledge nodes.
Here, the attribute information of the combined knowledge nodes is determined by fusing the attributes of a plurality of associated knowledge nodes in the associated knowledge node list according to any one of a numerical value average value fusion method, a relevance fusion method, a character string combination method and a list splicing method.
The numerical value average value is fused to average the numerical value attributes, the association degree is fused to weight average the attributes of different knowledge nodes in the attribute fusion process according to the weight of the associated knowledge nodes, the character strings are combined to combine the text attributes into a new text attribute, and the list is spliced to combine the list attributes into a large list.
(2): and controlling the merged knowledge node to inherit the father-son node relation of each other knowledge node in the associated knowledge node list, and generating the merged knowledge tree based on a plurality of the merged knowledge nodes and other knowledge nodes in the knowledge tree cluster.
Here, the merging knowledge node is controlled to inherit the parent-child node relation of each other knowledge node in the associated knowledge node list, and a merged knowledge tree is generated according to the merging knowledge nodes and the other knowledge nodes in the knowledge tree cluster.
And taking the finally selected intermediate knowledge node as a representation of the combined knowledge node, and updating the attribute and the relation of the combined knowledge node.
Through the algorithm and the steps, the association nodes among different knowledge trees can be fused to generate a global knowledge tree, and the relation and attribute characteristics of each association tree are reserved. By merging the same or related knowledge nodes in different knowledge trees, duplicate and redundant knowledge is removed, and knowledge accuracy is improved. Meanwhile, the correlation fusion can further improve the credibility of knowledge by weighting or averaging common characteristics of different knowledge tree nodes. Different knowledge trees may contain descriptions of different aspects or angles of the same concept or topic. By performing association fusion on the related nodes in the different knowledge trees, the views and information of different knowledge sources can be integrated and displayed, so that a more comprehensive and comprehensive knowledge view is provided.
In the specific embodiment, the scheme can be applied to the communication field, namely the field related to the technologies and applications of telecommunication, network, wireless communication and the like, and has wide application in daily life and work. The knowledge tree refers to a knowledge graph with a hierarchical structure which is arranged from various materials, and can help learners to better understand and master related knowledge. In knowledge training in the communication field, sources of knowledge trees mainly comprise word documents, pdf documents, ppt screen videos and the like. These data may be organized in different ways and formats, but all contain knowledge about the communication domain. The purpose of the association fusion of the knowledge trees is to integrate and share knowledge, so that a trainer can better understand the knowledge in the communication field. Through associating the nodes among different knowledge trees, the association among the different knowledge trees can be established, and the connection and the supplementation of knowledge are realized. In the knowledge fusion process, the associated nodes can be combined and recombined to form a more complete and comprehensive knowledge tree, so that learners can comprehensively know and master the knowledge in the communication field. The meaning of knowledge tree association fusion is to promote knowledge transfer and learning effect. Through fusing the relations among different knowledge trees, repeated knowledge and omission can be avoided, repeated reading and searching of learners on different materials are reduced, and learning efficiency is improved. Meanwhile, the association fusion can help learners to comprehensively understand knowledge in the communication field, a more complete knowledge system is formed, and knowledge application and innovation are facilitated.
Further, referring to fig. 2, fig. 2 is a schematic diagram of a method for association fusion of multiple source knowledge trees according to an embodiment of the present application. As shown in fig. 2, a multi-source knowledge tree set is obtained, knowledge tree clustering is performed on the multi-source knowledge tree set to obtain a knowledge tree cluster, knowledge association is performed on a plurality of knowledge nodes in the knowledge tree cluster to obtain an associated knowledge node list of the knowledge tree cluster, and knowledge fusion is performed on a plurality of associated knowledge nodes in the associated knowledge node list to obtain a fused knowledge tree. Here, a large number of knowledge trees are clustered into a small number of knowledge tree clusters in a knowledge tree clustering mode, so that the calculation complexity can be reduced, the calculation efficiency is improved, meanwhile, knowledge trees in similar fields can be clustered together through clustering, and therefore knowledge confusion among different sources and different fields is reduced, and the correlation accuracy is improved. Second, not only are similarities between nodes considered, but also deeper associations between nodes are analyzed. Knowledge trees can be more fully understood and analyzed by considering the context information and semantic similarity of nodes. Thus, the rich information of the knowledge tree can be fully utilized, and the accuracy and reliability of association are improved. And finally, in the knowledge fusion stage, merging and fusing the associated nodes to obtain a final fused knowledge tree.
The embodiment of the application provides a correlation fusion method of a multi-source knowledge tree, which comprises the following steps: carrying out knowledge clustering processing on a plurality of knowledge trees to determine different knowledge tree clusters; performing node association calculation on any two knowledge nodes among different knowledge trees in the knowledge tree clusters based on character features, semantic features and structural features of a plurality of knowledge nodes in each knowledge tree cluster, and determining a plurality of associated knowledge node lists in the knowledge tree clusters; integrating the attributes of a plurality of associated knowledge nodes in each associated knowledge node list, determining a combined knowledge node of each associated knowledge node list, determining an intermediate knowledge node of the associated knowledge node list from the associated knowledge node list based on a weight maximization method, and taking the intermediate knowledge node as the combined knowledge node; and updating the attribute and the relationship of each merged knowledge node in the knowledge tree cluster to generate a merged knowledge tree. By merging identical or related knowledge nodes in different knowledge trees, duplicate and redundant knowledge is removed, knowledge accuracy is improved, knowledge reliability is improved, and different knowledge trees possibly contain descriptions of different aspects of the same concept or topic, and by performing association fusion on related knowledge nodes in the different knowledge trees, views and information of different knowledge sources can be integrated and displayed, so that a more comprehensive and comprehensive knowledge tree is provided.
Referring to fig. 3, fig. 3 is a schematic structural diagram of an apparatus for a multi-source knowledge tree association fusion method according to an embodiment of the present application. As shown in fig. 3, the association fusion apparatus 300 of the multi-source knowledge tree includes:
the knowledge tree clustering module 310 is configured to perform knowledge clustering on the multiple knowledge trees to determine different knowledge tree clusters;
the association module 320 is configured to perform node association calculation on any two knowledge nodes between different knowledge trees in each knowledge tree cluster based on character features, semantic features and structural features of the plurality of knowledge nodes in each knowledge tree cluster, and determine a plurality of associated knowledge node lists in the knowledge tree cluster;
the fusion module 330 is configured to integrate attributes of a plurality of associated knowledge nodes in each associated knowledge node list, determine a merged knowledge node of each associated knowledge node list, determine an intermediate knowledge node of the associated knowledge node list from the associated knowledge node list based on a weight maximization method, and use the intermediate knowledge node as the merged knowledge node;
and the generating module 340 is configured to update the attribute and the relationship of each of the merged knowledge nodes in the knowledge tree cluster, so as to generate a merged knowledge tree.
Further, when the knowledge tree clustering module 310 is configured to perform knowledge clustering processing on the plurality of knowledge trees to determine different knowledge tree clusters, the knowledge tree clustering module 310 is specifically configured to:
extracting structural features and content features of each knowledge tree, and determining knowledge feature information of each knowledge tree;
calculating knowledge characteristic information of any two knowledge trees based on a calculation formula of cosine similarity, and determining similarity between any two knowledge trees;
and clustering similar knowledge trees together based on the similarity between any two knowledge trees by using a clustering algorithm, and determining different knowledge tree clusters.
Further, when the association module 320 is configured to calculate the node association degree of any two knowledge nodes between different knowledge trees in each knowledge tree cluster based on the character features, the semantic features and the structural features of the plurality of knowledge nodes in each knowledge tree cluster, and determine a plurality of associated knowledge node lists in the knowledge tree cluster, the association module 320 is specifically configured to:
calculating character characteristics of any two knowledge nodes among different knowledge trees, and determining editing distance similarity and longest public subsequence similarity between any two knowledge nodes;
Calculating semantic features of any two knowledge nodes among different knowledge trees, and determining semantic similarity among any two knowledge nodes;
weighting the edit distance similarity, the longest common subsequence similarity and the semantic similarity between any two knowledge nodes to determine the similarity between any two knowledge nodes;
and carrying out node association degree calculation based on the structural characteristics of a plurality of knowledge nodes and the similarity between any two knowledge nodes, and determining a plurality of associated knowledge node lists in the knowledge tree cluster.
Further, when the association module 320 is configured to calculate the node association degree based on the structural features of the plurality of knowledge nodes and the similarity between any two knowledge nodes, and determine a plurality of associated knowledge node lists in the knowledge tree cluster, the association module 320 is specifically configured to:
if any two knowledge nodes are leaf knowledge nodes, determining the association degree of any two knowledge nodes based on the similarity of any two knowledge nodes and the similarity of the father knowledge nodes of any two knowledge nodes;
if any one of the two knowledge nodes is not a leaf knowledge node, determining the similarity of the any two knowledge nodes as the association degree of the any two knowledge nodes;
If the association degree of any two knowledge nodes is greater than or equal to a preset association degree threshold value, the any two knowledge nodes are associated knowledge nodes, and a plurality of associated knowledge nodes with association relations form an associated knowledge node list.
Further, when the fusing module 330 is configured to determine an intermediate knowledge node of the associated knowledge node list from the associated knowledge node list by using the weight-based maximization method, and take the intermediate knowledge node as a merged knowledge node, the fusing module 330 is specifically configured to:
and aiming at any associated knowledge node list, carrying out association degree average value calculation on each associated knowledge node in the associated knowledge node list based on a weight maximization method, taking the associated knowledge node corresponding to the largest association degree average value as an intermediate knowledge node of the associated knowledge node list, and taking the intermediate knowledge node as the combined knowledge node of the associated knowledge node list.
Further, the fusion module 330 determines the intermediate knowledge node by:
inputting the associated knowledge node list into a natural language model, and processing a plurality of associated knowledge nodes in the associated knowledge node list to generate one intermediate knowledge node.
Further, when the generating module 340 is configured to update the attribute and the relationship of each of the merged knowledge nodes in the knowledge tree cluster to generate a merged knowledge tree, the generating module 340 is specifically configured to:
fusing the attributes of a plurality of associated knowledge nodes in the associated knowledge node list, determining attribute information of the combined knowledge nodes, and updating the attribute information of the combined knowledge nodes;
and controlling the merged knowledge node to inherit the father-son node relation of each other knowledge node in the associated knowledge node list, and generating the merged knowledge tree based on a plurality of the merged knowledge nodes and other knowledge nodes in the knowledge tree cluster.
Further, when the generating module 340 is configured to fuse the attributes of the plurality of associated knowledge nodes in the associated knowledge node list and determine attribute information of the merged knowledge node, the generating module 340 is specifically configured to:
and fusing the attributes of a plurality of associated knowledge nodes in the associated knowledge node list based on any one of a numerical value average value fusion method, a relevance fusion method, a character string fusion method and a list splicing method to determine the attribute information of the merged knowledge nodes.
The embodiment of the application provides a correlation fusion device of multisource knowledge tree, the correlation fusion device includes: the knowledge tree clustering module is used for carrying out knowledge clustering processing on the plurality of knowledge trees and determining different knowledge tree clusters; the association module is used for carrying out node association degree calculation on any two knowledge nodes among different knowledge trees in the knowledge tree clusters based on character features, semantic features and structural features of a plurality of knowledge nodes in each knowledge tree cluster, and determining a plurality of associated knowledge node lists in the knowledge tree clusters; the fusion module is used for integrating the attributes of a plurality of associated knowledge nodes in each associated knowledge node list, determining a combined knowledge node of each associated knowledge node list, determining an intermediate knowledge node of the associated knowledge node list from the associated knowledge node list based on a weight maximization method, and taking the intermediate knowledge node as the combined knowledge node; and the generating module is used for updating the attribute and the relationship of each combined knowledge node in the knowledge tree cluster to generate a fusion knowledge tree. By merging identical or related knowledge nodes in different knowledge trees, duplicate and redundant knowledge is removed, knowledge accuracy is improved, knowledge reliability is improved, and different knowledge trees possibly contain descriptions of different aspects of the same concept or topic, and by performing association fusion on related knowledge nodes in the different knowledge trees, views and information of different knowledge sources can be integrated and displayed, so that a more comprehensive and comprehensive knowledge tree is provided.
Referring to fig. 4, fig. 4 is a schematic structural diagram of an electronic device according to an embodiment of the present application. As shown in fig. 4, the electronic device 400 includes a processor 410, a memory 420, and a bus 430.
The memory 420 stores machine-readable instructions executable by the processor 410, when the electronic device 400 is running, the processor 410 communicates with the memory 420 through the bus 430, and when the machine-readable instructions are executed by the processor 410, the steps of the association fusion method of the multi-source knowledge tree in the method embodiment shown in fig. 1 can be executed, and the specific implementation is referred to the method embodiment and will not be described herein.
The embodiment of the present application further provides a computer readable storage medium, where a computer program is stored on the computer readable storage medium, and when the computer program is executed by a processor, the steps of the association fusion method of the multi-source knowledge tree in the method embodiment shown in fig. 1 may be executed, and a specific implementation manner may refer to the method embodiment and will not be described herein.
It will be clear to those skilled in the art that, for convenience and brevity of description, specific working procedures of the above-described systems, apparatuses and units may refer to corresponding procedures in the foregoing method embodiments, and are not repeated herein.
In the several embodiments provided in this application, it should be understood that the disclosed systems, devices, and methods may be implemented in other manners. The above-described apparatus embodiments are merely illustrative, for example, the division of the units is merely a logical function division, and there may be other manners of division in actual implementation, and for example, multiple units or components may be combined or integrated into another system, or some features may be omitted, or not performed. Alternatively, the coupling or direct coupling or communication connection shown or discussed with each other may be through some communication interface, device or unit indirect coupling or communication connection, which may be in electrical, mechanical or other form.
The units described as separate units may or may not be physically separate, and units shown as units may or may not be physical units, may be located in one place, or may be distributed on a plurality of network units. Some or all of the units may be selected according to actual needs to achieve the purpose of the solution of this embodiment.
In addition, each functional unit in each embodiment of the present application may be integrated in one processing unit, or each unit may exist alone physically, or two or more units may be integrated in one unit.
The functions, if implemented in the form of software functional units and sold or used as a stand-alone product, may be stored in a non-volatile computer readable storage medium executable by a processor. Based on such understanding, the technical solution of the present application may be embodied essentially or in a part contributing to the prior art or in a part of the technical solution, in the form of a software product stored in a storage medium, including several instructions for causing a computer device (which may be a personal computer, a server, or a network device, etc.) to perform all or part of the steps of the methods described in the embodiments of the present application. And the aforementioned storage medium includes: a U-disk, a removable hard disk, a Read-Only Memory (ROM), a random access Memory (Random Access Memory, RAM), a magnetic disk, or an optical disk, or other various media capable of storing program codes.
Finally, it should be noted that: the foregoing examples are merely specific embodiments of the present application, and are not intended to limit the scope of the present application, but the present application is not limited thereto, and those skilled in the art will appreciate that while the foregoing examples are described in detail, the present application is not limited thereto. Any person skilled in the art may modify or easily conceive of the technical solution described in the foregoing embodiments, or make equivalent substitutions for some of the technical features within the technical scope of the disclosure of the present application; such modifications, changes or substitutions do not depart from the spirit and scope of the technical solutions of the embodiments of the present application, and are intended to be included in the scope of the present application. Therefore, the protection scope of the present application shall be subject to the protection scope of the claims.

Claims (11)

1. The association fusion method of the multi-source knowledge tree is characterized by comprising the following steps of:
carrying out knowledge clustering processing on a plurality of knowledge trees to determine different knowledge tree clusters;
performing node association calculation on any two knowledge nodes among different knowledge trees in the knowledge tree clusters based on character features, semantic features and structural features of a plurality of knowledge nodes in each knowledge tree cluster, and determining a plurality of associated knowledge node lists in the knowledge tree clusters;
integrating the attributes of a plurality of associated knowledge nodes in each associated knowledge node list, determining a combined knowledge node of each associated knowledge node list, determining an intermediate knowledge node of the associated knowledge node list from the associated knowledge node list based on a weight maximization method, and taking the intermediate knowledge node as the combined knowledge node;
and updating the attribute and the relationship of each merged knowledge node in the knowledge tree cluster to generate a merged knowledge tree.
2. The association fusion method of claim 1, wherein the performing knowledge clustering on the plurality of knowledge trees to determine different knowledge tree clusters comprises:
Extracting structural features and content features of each knowledge tree, and determining knowledge feature information of each knowledge tree;
calculating knowledge characteristic information of any two knowledge trees based on a calculation formula of cosine similarity, and determining similarity between any two knowledge trees;
and clustering similar knowledge trees together based on the similarity between any two knowledge trees by using a clustering algorithm, and determining different knowledge tree clusters.
3. The association fusion method according to claim 1, wherein the calculating the node association degree of any two knowledge nodes between different knowledge trees in the knowledge tree clusters based on character features, semantic features and structural features of a plurality of knowledge nodes in each knowledge tree cluster, and determining a plurality of associated knowledge node lists in the knowledge tree clusters includes:
calculating character characteristics of any two knowledge nodes among different knowledge trees, and determining editing distance similarity and longest public subsequence similarity between any two knowledge nodes;
calculating semantic features of any two knowledge nodes among different knowledge trees, and determining semantic similarity among any two knowledge nodes;
Weighting the edit distance similarity, the longest common subsequence similarity and the semantic similarity between any two knowledge nodes to determine the similarity between any two knowledge nodes;
and carrying out node association degree calculation based on the structural characteristics of a plurality of knowledge nodes and the similarity between any two knowledge nodes, and determining a plurality of associated knowledge node lists in the knowledge tree cluster.
4. The association fusion method of claim 3, wherein the calculating the node association based on the structural features of the plurality of knowledge nodes and the similarity between any two knowledge nodes, determining a plurality of associated knowledge node lists in the knowledge tree cluster, comprises:
if any two knowledge nodes are leaf knowledge nodes, determining the association degree of any two knowledge nodes based on the similarity of any two knowledge nodes and the similarity of the father knowledge nodes of any two knowledge nodes;
if any one of the two knowledge nodes is not a leaf knowledge node, determining the similarity of the any two knowledge nodes as the association degree of the any two knowledge nodes;
If the association degree of any two knowledge nodes is greater than or equal to a preset association degree threshold value, the any two knowledge nodes are associated knowledge nodes, and a plurality of associated knowledge nodes with association relations form an associated knowledge node list.
5. The association fusion method according to claim 1, wherein the weight-based maximization method determines an intermediate knowledge node of the associated knowledge node list from the associated knowledge node list, and uses the intermediate knowledge node as a combined knowledge node, and comprises:
and aiming at any associated knowledge node list, carrying out association degree average value calculation on each associated knowledge node in the associated knowledge node list based on a weight maximization method, taking the associated knowledge node corresponding to the largest association degree average value as an intermediate knowledge node of the associated knowledge node list, and taking the intermediate knowledge node as the combined knowledge node of the associated knowledge node list.
6. The association fusion method of claim 1, further comprising determining the intermediate knowledge node by:
Inputting the associated knowledge node list into a natural language model, and processing a plurality of associated knowledge nodes in the associated knowledge node list to generate one intermediate knowledge node.
7. The method of claim 1, wherein said updating the attributes and relationships of each of the merged knowledge nodes in the knowledge tree cluster to generate a merged knowledge tree comprises:
fusing the attributes of a plurality of associated knowledge nodes in the associated knowledge node list, determining attribute information of the combined knowledge nodes, and updating the attribute information of the combined knowledge nodes;
and controlling the merged knowledge node to inherit the father-son node relation of each other knowledge node in the associated knowledge node list, and generating the merged knowledge tree based on a plurality of the merged knowledge nodes and other knowledge nodes in the knowledge tree cluster.
8. The association fusion method of claim 7, wherein the fusing the attributes of the plurality of associated knowledge nodes in the associated knowledge node list, determining the attribute information of the merged knowledge node, comprises:
And fusing the attributes of a plurality of associated knowledge nodes in the associated knowledge node list based on any one of a numerical value average value fusion method, a relevance fusion method, a character string fusion method and a list splicing method to determine the attribute information of the merged knowledge nodes.
9. An associative fusion device for a multi-source knowledge tree, wherein the associative fusion device comprises:
the knowledge tree clustering module is used for carrying out knowledge clustering processing on the plurality of knowledge trees and determining different knowledge tree clusters;
the association module is used for carrying out node association degree calculation on any two knowledge nodes among different knowledge trees in the knowledge tree clusters based on character features, semantic features and structural features of a plurality of knowledge nodes in each knowledge tree cluster, and determining a plurality of associated knowledge node lists in the knowledge tree clusters;
the fusion module is used for integrating the attributes of a plurality of associated knowledge nodes in each associated knowledge node list, determining a combined knowledge node of each associated knowledge node list, determining an intermediate knowledge node of the associated knowledge node list from the associated knowledge node list based on a weight maximization method, and taking the intermediate knowledge node as the combined knowledge node;
And the generating module is used for updating the attribute and the relationship of each combined knowledge node in the knowledge tree cluster to generate a fusion knowledge tree.
10. An electronic device, comprising: a processor, a memory and a bus, said memory storing machine readable instructions executable by said processor, said processor and said memory communicating via said bus when the electronic device is running, said machine readable instructions when executed by said processor performing the steps of the method of associative fusion of multi-source knowledge trees according to any one of claims 1 to 8.
11. A computer readable storage medium, characterized in that the computer readable storage medium has stored thereon a computer program which, when executed by a processor, performs the steps of the association fusion method of a multi-source knowledge tree according to any of claims 1 to 8.
CN202410176275.4A 2024-02-08 2024-02-08 Multi-source knowledge tree association fusion method and device, electronic equipment and storage medium Pending CN117725555A (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN202410176275.4A CN117725555A (en) 2024-02-08 2024-02-08 Multi-source knowledge tree association fusion method and device, electronic equipment and storage medium

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN202410176275.4A CN117725555A (en) 2024-02-08 2024-02-08 Multi-source knowledge tree association fusion method and device, electronic equipment and storage medium

Publications (1)

Publication Number Publication Date
CN117725555A true CN117725555A (en) 2024-03-19

Family

ID=90200141

Family Applications (1)

Application Number Title Priority Date Filing Date
CN202410176275.4A Pending CN117725555A (en) 2024-02-08 2024-02-08 Multi-source knowledge tree association fusion method and device, electronic equipment and storage medium

Country Status (1)

Country Link
CN (1) CN117725555A (en)

Citations (13)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN109739939A (en) * 2018-12-29 2019-05-10 颖投信息科技(上海)有限公司 The data fusion method and device of knowledge mapping
CN111143479A (en) * 2019-12-10 2020-05-12 浙江工业大学 Knowledge graph relation extraction and REST service visualization fusion method based on DBSCAN clustering algorithm
CN111522968A (en) * 2020-06-22 2020-08-11 中国银行股份有限公司 Knowledge graph fusion method and device
CN112149400A (en) * 2020-09-23 2020-12-29 腾讯科技(深圳)有限公司 Data processing method, device, equipment and storage medium
WO2022011681A1 (en) * 2020-07-17 2022-01-20 国防科技大学 Method for fusing knowledge graph based on iterative completion
CN114077674A (en) * 2021-10-31 2022-02-22 国电南瑞科技股份有限公司 Power grid dispatching knowledge graph data optimization method and system
US20220075948A1 (en) * 2020-09-10 2022-03-10 International Business Machines Corporation Knowledge graph fusion
WO2023040499A1 (en) * 2021-09-16 2023-03-23 支付宝(杭州)信息技术有限公司 Knowledge graph data fusion
CN116501887A (en) * 2023-04-18 2023-07-28 平安科技(深圳)有限公司 Medical knowledge graph fusion method, device, equipment and medium
CN116542332A (en) * 2023-05-16 2023-08-04 中国电子科技集团公司第五十四研究所 Multi-domain knowledge fusion method based on semantic tree
CN116775893A (en) * 2022-11-14 2023-09-19 中移(苏州)软件技术有限公司 Knowledge graph dividing method, device, equipment and storage medium
CN117194616A (en) * 2023-11-06 2023-12-08 湖南四方天箭信息科技有限公司 Knowledge query method and device for vertical domain knowledge graph, computer equipment and storage medium
CN117236435A (en) * 2023-11-08 2023-12-15 中国标准化研究院 Knowledge fusion method, device and storage medium of design rationality knowledge network

Patent Citations (13)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN109739939A (en) * 2018-12-29 2019-05-10 颖投信息科技(上海)有限公司 The data fusion method and device of knowledge mapping
CN111143479A (en) * 2019-12-10 2020-05-12 浙江工业大学 Knowledge graph relation extraction and REST service visualization fusion method based on DBSCAN clustering algorithm
CN111522968A (en) * 2020-06-22 2020-08-11 中国银行股份有限公司 Knowledge graph fusion method and device
WO2022011681A1 (en) * 2020-07-17 2022-01-20 国防科技大学 Method for fusing knowledge graph based on iterative completion
US20220075948A1 (en) * 2020-09-10 2022-03-10 International Business Machines Corporation Knowledge graph fusion
CN112149400A (en) * 2020-09-23 2020-12-29 腾讯科技(深圳)有限公司 Data processing method, device, equipment and storage medium
WO2023040499A1 (en) * 2021-09-16 2023-03-23 支付宝(杭州)信息技术有限公司 Knowledge graph data fusion
CN114077674A (en) * 2021-10-31 2022-02-22 国电南瑞科技股份有限公司 Power grid dispatching knowledge graph data optimization method and system
CN116775893A (en) * 2022-11-14 2023-09-19 中移(苏州)软件技术有限公司 Knowledge graph dividing method, device, equipment and storage medium
CN116501887A (en) * 2023-04-18 2023-07-28 平安科技(深圳)有限公司 Medical knowledge graph fusion method, device, equipment and medium
CN116542332A (en) * 2023-05-16 2023-08-04 中国电子科技集团公司第五十四研究所 Multi-domain knowledge fusion method based on semantic tree
CN117194616A (en) * 2023-11-06 2023-12-08 湖南四方天箭信息科技有限公司 Knowledge query method and device for vertical domain knowledge graph, computer equipment and storage medium
CN117236435A (en) * 2023-11-08 2023-12-15 中国标准化研究院 Knowledge fusion method, device and storage medium of design rationality knowledge network

Non-Patent Citations (2)

* Cited by examiner, † Cited by third party
Title
李婷婷: "融合实体类型特征的知识图谱嵌入方法及其应用", 《中国优秀硕士学位论文全文数据库 信息科技辑》, no. 15, 15 October 2023 (2023-10-15) *
杨钊 等: "组合相似度算法与知识图谱在电网数字化项目统筹中的应用研究", 《电子信息与通信技术》, vol. 21, no. 3, 25 March 2023 (2023-03-25) *

Similar Documents

Publication Publication Date Title
CN111104794B (en) Text similarity matching method based on subject term
US11520812B2 (en) Method, apparatus, device and medium for determining text relevance
US11227118B2 (en) Methods, devices, and systems for constructing intelligent knowledge base
US8666984B2 (en) Unsupervised message clustering
CN111581354A (en) FAQ question similarity calculation method and system
US20110258190A1 (en) Spectral Neighborhood Blocking for Entity Resolution
CN108647322B (en) Method for identifying similarity of mass Web text information based on word network
CN111797245B (en) Knowledge graph model-based information matching method and related device
US11886515B2 (en) Hierarchical clustering on graphs for taxonomy extraction and applications thereof
CN114386421A (en) Similar news detection method and device, computer equipment and storage medium
US11537918B2 (en) Systems and methods for document similarity matching
CN110688593A (en) Social media account identification method and system
CN114461783A (en) Keyword generation method and device, computer equipment, storage medium and product
CN114328800A (en) Text processing method and device, electronic equipment and computer readable storage medium
CN111930949B (en) Search string processing method and device, computer readable medium and electronic equipment
Ruambo et al. Towards enhancing information retrieval systems: A brief survey of strategies and challenges
CN117435685A (en) Document retrieval method, document retrieval device, computer equipment, storage medium and product
CN114528378A (en) Text classification method and device, electronic equipment and storage medium
CN114547233A (en) Data duplicate checking method and device and electronic equipment
CN117725555A (en) Multi-source knowledge tree association fusion method and device, electronic equipment and storage medium
CN114328894A (en) Document processing method, document processing device, electronic equipment and medium
Sharma et al. A probabilistic approach to apriori algorithm
CN112926297A (en) Method, apparatus, device and storage medium for processing information
Wen et al. Blockchain-based reviewer selection
CN115688771B (en) Document content comparison performance improving method and system

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination