CN117725555A

CN117725555A - Multi-source knowledge tree association fusion method and device, electronic equipment and storage medium

Info

Publication number: CN117725555A
Application number: CN202410176275.4A
Authority: CN
Inventors: 罗歆昱; 陈崇雨
Original assignee: DMAI Guangzhou Co Ltd
Current assignee: DMAI Guangzhou Co Ltd
Priority date: 2024-02-08
Filing date: 2024-02-08
Publication date: 2024-03-19

Abstract

The application provides a method, a device, electronic equipment and a storage medium for associated fusion of a multi-source knowledge tree, which comprise the following steps: processing the knowledge trees to determine different knowledge tree clusters; performing node association degree calculation on any two knowledge nodes among different knowledge trees in the knowledge tree clusters based on character features, semantic features and structural features of a plurality of knowledge nodes in each knowledge tree cluster, and determining a plurality of associated knowledge node lists in the knowledge tree clusters; integrating the attributes of a plurality of associated knowledge nodes in each associated knowledge node list, determining a combined knowledge node of each associated knowledge node list, determining an intermediate knowledge node of the associated knowledge node list from the associated knowledge node list based on a weight maximization method, and taking the intermediate knowledge node as the combined knowledge node; and updating the attribute and the relationship of each merged knowledge node in the knowledge tree cluster to generate a merged knowledge tree, thereby providing a more comprehensive and comprehensive knowledge tree.

Description

Multi-source knowledge tree association fusion method and device, electronic equipment and storage medium

Technical Field

The application relates to the technical field of knowledge tree fusion, in particular to a method, a device, electronic equipment and a storage medium for associated fusion of multi-source knowledge trees.

Background

In recent years, with the rapid development of information technology and big data analysis, the demands for knowledge management and knowledge fusion are becoming more and more urgent. In many fields, such as academic research, enterprise management, decision support, etc., the association and fusion of knowledge is critical to the acquisition of comprehensive information and deep insight issues. Traditional knowledge fusion methods rely mainly on manual integration and analysis, and are limited by time, resources and subjective factors, low in efficiency and prone to introducing subjective deviation. Thus, there is a need for an automated and reliable method for knowledge correlation and fusion. A common knowledge representation model is a knowledge tree that organizes knowledge elements together by way of nodes and edges to form a structured knowledge representation. Nodes of the knowledge tree may represent concepts, entities, etc. in the field, while edges represent relationships between the nodes. However, the existing knowledge tree fusion method mainly has the following problems: 1) Node association of knowledge trees requires a large number of computation steps, resulting in computational inefficiency. 2) Existing methods rely only on similarities between nodes to shallow associate knowledge trees, while ignoring deeper associations between nodes. This approach limits the ability to fully understand and analyze the knowledge tree. Therefore, how to fuse knowledge trees is a small technical problem.

Disclosure of Invention

In view of this, an object of the present application is to provide a method, an apparatus, an electronic device, and a storage medium for associative fusion of multiple knowledge trees, which can remove duplicate and redundant knowledge by merging knowledge nodes that are the same or related in different knowledge trees, improve accuracy of knowledge and improve reliability of knowledge, and the different knowledge trees may contain descriptions of different aspects of the same concept or topic, and can integrate and display views and information of different knowledge sources by associative fusion of related knowledge nodes in the different knowledge trees, thereby providing a more comprehensive and comprehensive knowledge tree.

The embodiment of the application provides a correlation fusion method of a multi-source knowledge tree, which comprises the following steps:

carrying out knowledge clustering processing on a plurality of knowledge trees to determine different knowledge tree clusters;

performing node association calculation on any two knowledge nodes among different knowledge trees in the knowledge tree clusters based on character features, semantic features and structural features of a plurality of knowledge nodes in each knowledge tree cluster, and determining a plurality of associated knowledge node lists in the knowledge tree clusters;

integrating the attributes of a plurality of associated knowledge nodes in each associated knowledge node list, determining a combined knowledge node of each associated knowledge node list, determining an intermediate knowledge node of the associated knowledge node list from the associated knowledge node list based on a weight maximization method, and taking the intermediate knowledge node as the combined knowledge node;

And updating the attribute and the relationship of each merged knowledge node in the knowledge tree cluster to generate a merged knowledge tree.

In one possible implementation manner, the knowledge clustering processing is performed on the plurality of knowledge trees to determine different knowledge tree clusters, including:

extracting structural features and content features of each knowledge tree, and determining knowledge feature information of each knowledge tree;

calculating knowledge characteristic information of any two knowledge trees based on a calculation formula of cosine similarity, and determining similarity between any two knowledge trees;

and clustering similar knowledge trees together based on the similarity between any two knowledge trees by using a clustering algorithm, and determining different knowledge tree clusters.

In one possible implementation manner, the calculating node association degree of any two knowledge nodes between different knowledge trees in the knowledge tree clusters based on character features, semantic features and structural features of a plurality of knowledge nodes in each knowledge tree cluster, and determining a plurality of associated knowledge node lists in the knowledge tree clusters includes:

calculating character characteristics of any two knowledge nodes among different knowledge trees, and determining editing distance similarity and longest public subsequence similarity between any two knowledge nodes;

Calculating semantic features of any two knowledge nodes among different knowledge trees, and determining semantic similarity among any two knowledge nodes;

weighting the edit distance similarity, the longest common subsequence similarity and the semantic similarity between any two knowledge nodes to determine the similarity between any two knowledge nodes;

and carrying out node association degree calculation based on the structural characteristics of a plurality of knowledge nodes and the similarity between any two knowledge nodes, and determining a plurality of associated knowledge node lists in the knowledge tree cluster.

In one possible implementation manner, the calculating node association degree based on the structural features of the plurality of knowledge nodes and the similarity between any two knowledge nodes, and determining a plurality of associated knowledge node lists in the knowledge tree cluster includes:

if any two knowledge nodes are leaf knowledge nodes, determining the association degree of any two knowledge nodes based on the similarity of any two knowledge nodes and the similarity of the father knowledge nodes of any two knowledge nodes;

if any one of the two knowledge nodes is not a leaf knowledge node, determining the similarity of the any two knowledge nodes as the association degree of the any two knowledge nodes;

If the association degree of any two knowledge nodes is greater than or equal to a preset association degree threshold value, the any two knowledge nodes are associated knowledge nodes, and a plurality of associated knowledge nodes with association relations form an associated knowledge node list.

In one possible implementation manner, the weight-based maximization method determines an intermediate knowledge node of the associated knowledge node list from the associated knowledge node list, and takes the intermediate knowledge node as a combined knowledge node, including:

and aiming at any associated knowledge node list, carrying out association degree average value calculation on each associated knowledge node in the associated knowledge node list based on a weight maximization method, taking the associated knowledge node corresponding to the largest association degree average value as an intermediate knowledge node of the associated knowledge node list, and taking the intermediate knowledge node as the combined knowledge node of the associated knowledge node list.

In one possible implementation, the association fusion method further includes determining the intermediate knowledge node by:

inputting the associated knowledge node list into a natural language model, and processing a plurality of associated knowledge nodes in the associated knowledge node list to generate one intermediate knowledge node.

In one possible implementation manner, the updating the attribute and the relationship of each of the merged knowledge nodes in the knowledge tree cluster to generate a merged knowledge tree includes:

fusing the attributes of a plurality of associated knowledge nodes in the associated knowledge node list, determining attribute information of the combined knowledge nodes, and updating the attribute information of the combined knowledge nodes;

and controlling the merged knowledge node to inherit the father-son node relation of each other knowledge node in the associated knowledge node list, and generating the merged knowledge tree based on a plurality of the merged knowledge nodes and other knowledge nodes in the knowledge tree cluster.

In one possible implementation manner, the fusing the attributes of the multiple associated knowledge nodes in the associated knowledge node list to determine attribute information of the merged knowledge node includes:

and fusing the attributes of a plurality of associated knowledge nodes in the associated knowledge node list based on any one of a numerical value average value fusion method, a relevance fusion method, a character string fusion method and a list splicing method to determine the attribute information of the merged knowledge nodes.

The embodiment of the application also provides a correlation fusion device of the multi-source knowledge tree, which comprises:

the knowledge tree clustering module is used for carrying out knowledge clustering processing on the plurality of knowledge trees and determining different knowledge tree clusters;

the association module is used for carrying out node association degree calculation on any two knowledge nodes among different knowledge trees in the knowledge tree clusters based on character features, semantic features and structural features of a plurality of knowledge nodes in each knowledge tree cluster, and determining a plurality of associated knowledge node lists in the knowledge tree clusters;

the fusion module is used for integrating the attributes of a plurality of associated knowledge nodes in each associated knowledge node list, determining a combined knowledge node of each associated knowledge node list, determining an intermediate knowledge node of the associated knowledge node list from the associated knowledge node list based on a weight maximization method, and taking the intermediate knowledge node as the combined knowledge node;

and the generating module is used for updating the attribute and the relationship of each combined knowledge node in the knowledge tree cluster to generate a fusion knowledge tree.

The embodiment of the application also provides electronic equipment, which comprises: the system comprises a processor, a memory and a bus, wherein the memory stores machine-readable instructions executable by the processor, the processor and the memory are communicated through the bus when the electronic device runs, and the machine-readable instructions are executed by the processor to execute the steps of the associated fusion method of the multi-source knowledge tree.

Embodiments of the present application also provide a computer readable storage medium having stored thereon a computer program which, when executed by a processor, performs the steps of the association fusion method of a multi-source knowledge tree as described above.

The method, the device, the electronic equipment and the storage medium for the association fusion of the multi-source knowledge tree provided by the embodiment of the application comprise the following steps: carrying out knowledge clustering processing on a plurality of knowledge trees to determine different knowledge tree clusters; performing node association calculation on any two knowledge nodes among different knowledge trees in the knowledge tree clusters based on character features, semantic features and structural features of a plurality of knowledge nodes in each knowledge tree cluster, and determining a plurality of associated knowledge node lists in the knowledge tree clusters; integrating the attributes of a plurality of associated knowledge nodes in each associated knowledge node list, determining a combined knowledge node of each associated knowledge node list, determining an intermediate knowledge node of the associated knowledge node list from the associated knowledge node list based on a weight maximization method, and taking the intermediate knowledge node as the combined knowledge node; and updating the attribute and the relationship of each merged knowledge node in the knowledge tree cluster to generate a merged knowledge tree. By merging identical or related knowledge nodes in different knowledge trees, duplicate and redundant knowledge is removed, knowledge accuracy is improved, knowledge reliability is improved, and different knowledge trees possibly contain descriptions of different aspects of the same concept or topic, and by performing association fusion on related knowledge nodes in the different knowledge trees, views and information of different knowledge sources can be integrated and displayed, so that a more comprehensive and comprehensive knowledge tree is provided.

In order to make the above objects, features and advantages of the present application more comprehensible, preferred embodiments accompanied with figures are described in detail below.

Drawings

In order to more clearly illustrate the technical solutions of the embodiments of the present application, the drawings that are needed in the embodiments will be briefly described below, it being understood that the following drawings only illustrate some embodiments of the present application and therefore should not be considered limiting the scope, and that other related drawings may be obtained according to these drawings without inventive effort for a person skilled in the art.

FIG. 1 is a flowchart of a method for associative fusion of multiple source knowledge trees according to an embodiment of the present application;

FIG. 2 is a schematic diagram of a method for associative fusion of multiple source knowledge trees according to an embodiment of the present disclosure;

fig. 3 is a schematic structural diagram of an apparatus for a method for associative fusion of multiple source knowledge trees according to an embodiment of the present application;

fig. 4 is a schematic structural diagram of an electronic device according to an embodiment of the present application.

Detailed Description

For the purposes of making the objects, technical solutions and advantages of the embodiments of the present application more clear, the technical solutions in the embodiments of the present application will be clearly and completely described below with reference to the drawings in the embodiments of the present application, and it is apparent that the described embodiments are only some embodiments of the present application, but not all embodiments. The components of the embodiments of the present application, which are generally described and illustrated in the figures herein, may be arranged and designed in a wide variety of different configurations. Thus, the following detailed description of the embodiments of the present application, as provided in the accompanying drawings, is not intended to limit the scope of the application, as claimed, but is merely representative of selected embodiments of the application. Based on the embodiments of the present application, every other embodiment that a person skilled in the art would obtain without making any inventive effort is within the scope of protection of the present application.

First, application scenarios applicable to the present application will be described. The method and the device can be applied to the technical field of knowledge tree fusion.

Based on this, the embodiment of the application provides a multi-source knowledge tree association fusion method, which is capable of removing duplicate and redundant knowledge by merging the same or related knowledge nodes in different knowledge trees, improving the accuracy of knowledge and the reliability of knowledge, and possibly including descriptions of different aspects of the same concept or theme, and integrating and displaying the views and information of different knowledge sources by carrying out association fusion on the related knowledge nodes in the different knowledge trees, so as to provide a more comprehensive and comprehensive knowledge tree.

Referring to fig. 1, fig. 1 is a flowchart of a method for association fusion of multiple source knowledge trees according to an embodiment of the present application. As shown in fig. 1, the association fusion method provided in the embodiment of the present application includes:

s101: and carrying out knowledge clustering processing on the knowledge trees to determine different knowledge tree clusters.

In the step, knowledge clustering processing is carried out on a plurality of knowledge trees from different sources, and different knowledge tree clusters are determined.

The knowledge tree clusters are used for clustering a large number of knowledge trees into a small number of tree clusters, similar knowledge trees are clustered together, the similar knowledge trees have similar subjects and contents, and the clustered knowledge tree clusters have higher internal consistency and correlation. In the subsequent association and fusion calculation, a plurality of knowledge trees belonging to the same cluster are associated and fused together, so that the efficiency and accuracy of association and fusion can be improved.

In a specific embodiment, the performing knowledge clustering processing on the plurality of knowledge trees to determine different knowledge tree clusters includes:

a: and extracting the structural features and the content features of each knowledge tree, and determining knowledge feature information of each knowledge tree.

Here, feature extraction is performed on the structural features and the content features of each knowledge tree, and knowledge feature information of each knowledge number is determined.

Among other things, features describing knowledge trees can be generally divided into two categories: structural features and content features. The structural features comprise information such as the height, depth, branch number, node number and the like of the knowledge tree, and can be used for evaluating the structural similarity of the tree; the content features include information on the knowledge nodes such as attributes, values, or tags, which can be used to evaluate the content similarity of the knowledge tree. The knowledge tree clustering of the scheme aims at clustering knowledge trees with similar subject and content into a cluster, and in order to better acquire the representation of the knowledge tree, the scheme fuses the structural features and the content features to obtain knowledge feature information. First, each knowledge tree is treated as a document, each knowledge node is treated as a phrase in the document, and a knowledge tree dictionary is constructed. In the statistics of word frequency, structural features are introduced, nodes of different levels are given different weights, and the nodes closer to the root node are weighted more heavily (because the root node can generally represent the subject of a knowledge tree). Finally, the representation of the knowledge tree is obtained by using a document vector representation method in a natural language processing technology.

The method comprises the following specific steps: 1) The knowledge node is weighted: obtaining N knowledge trees, setting tree_a tree height as h_a for each knowledge tree (tree_a), traversing to obtain a knowledge node list node_a by using breadth-first search, wherein each knowledge node in the node_a contains name (text naming) attribute and level (level, starting calculation by a root node, calculating and increasing word frequency weight, setting weight (word frequency weight) attribute for each knowledge node in the node_a, and performing data preprocessing on weight=log2 (h_a-level+1): removing irrelevant punctuation marks, stop words, numbers and the like for each knowledge node in the knowledge node list, and performing word segmentation operation; 3) Constructing a knowledge tree dictionary: and constructing a dictionary by utilizing the preprocessed knowledge tree data. Traversing all text data in each knowledge tree, adding the appearing words to the vocabulary, and removing the repeated words. Associating each word with a unique index to form a dictionary for subsequent vector representations; 4) Calculating knowledge tree representation: for each knowledge tree, I can use vocabulary, text data, and node word frequency weights to calculate WTF-IDF (weighted word frequency-inverse document frequency) vector representations. First, a weighted word frequency (WTF) is calculated: the weighted frequency of occurrence of each word in the text is counted. For word w, when traversing knowledge tree nodes, its weighted frequency at a certain knowledge node node_a is equal to the knowledge node word frequency weight multiplied by the frequency of occurrence of w in node_a. The weighted frequency of w is then the sum of the weighted frequencies of w in all knowledge nodes of the knowledge tree. To reduce the impact of high frequency words on similarity, a logarithmic form of weighted word frequencies will be employed. Next, an Inverse Document Frequency (IDF) is calculated: for each word, the number of knowledge trees containing the word is counted, and the total number of knowledge trees is divided by the logarithm of the number, and then the number is taken as a logarithm. Finally, the WTF and the IDF are multiplied to obtain a WTF-IDF vector. The complete formula is as follows:

①WTF(w, node_a) = TF_weight(node_a)freq(w, node_a);

②WTF(w) = sum(WTF(w, node_a) for all nodes in the knowledge tree);

③WTF_IDF(w) = log(WTF(w))log(N / DF(w));

Where WTF (w, node_a) represents the weighted word frequency of word w in knowledge node node_a, tf_weight (node_a) represents the word frequency weight of knowledge node node_a, freq (w, node_a) represents the frequency of word w in node_a, WTF (w) represents the weighted word frequency of word w in the knowledge tree, sum represents the summation operation, wtf_idf (w) represents the WTF-IDF vector of word w, log represents the natural logarithm, N represents the total number of knowledge trees, DF (w) represents the number of knowledge trees containing word w.

B: and calculating knowledge characteristic information of any two knowledge trees based on a calculation formula of cosine similarity, and determining the similarity between any two knowledge trees.

Here, the knowledge characteristic information of any two knowledge trees is calculated through a calculation formula of cosine similarity, and the similarity between any two knowledge trees is determined.

Specifically, after knowledge feature information of two knowledge trees is acquired, the similarity between the knowledge feature information and the knowledge feature information can be calculated. The degree of similarity between two vectors can be measured using cosine similarity. Construction of an NAn N-dimensional similarity matrix, where N represents the number of knowledge trees. [ i, j ] in the matrix]The element of the position represents the similarity calculation result of the ith knowledge tree and the jth knowledge tree.

C: and clustering similar knowledge trees together based on the similarity between any two knowledge trees by using a clustering algorithm, and determining different knowledge tree clusters.

Here, the similarity matrix is input into a clustering algorithm, and similar knowledge trees are clustered together according to the similarity between a plurality of any two knowledge trees, so that different knowledge tree clusters are determined.

Wherein each knowledge tree represents a data point, and the similarity between two pairs represents the distance between the data points. And clustering the similar knowledge trees together by using a clustering algorithm to form different clusters by taking the similarity matrix of each knowledge tree as input data. Alternative clustering algorithms include k-means clustering, hierarchical clustering, density clustering, and the like. The method comprises the following specific steps: randomly selecting an unvisited knowledge tree; judging whether the neighborhood of the knowledge tree meets the requirements of a specified radius (similarity threshold, set to 0.85) and the minimum data point number (set to 1) in the neighborhood, if so, classifying the knowledge tree and the knowledge tree in the neighborhood into a cluster, and marking the knowledge trees as accessed; if the number of knowledge trees in the neighborhood is insufficient, marking the knowledge trees as noise knowledge trees; repeating the steps for the knowledge trees which are not accessed until all knowledge trees are accessed; after the clustering is completed, each cluster is used as a knowledge tree cluster, and a group of densely-associated knowledge trees is contained.

Through this step, a large number of knowledge trees are clustered into a smaller number of knowledge tree clusters based on content similarity. Subsequent knowledge association and fusion will occur in each individual knowledge tree cluster. The significance of this step is mainly represented by the following two aspects: and improving the accuracy of association and fusion: knowledge tree clustering can aggregate similar knowledge trees together, the similar knowledge trees have similar topics and contents, and clustered knowledge tree stacks have higher internal consistency and correlation. In subsequent association and fusion calculation, knowledge trees from the same cluster stack are associated and fused together, so that the accuracy of association and fusion can be improved. And (3) relieving the calculation pressure: knowledge tree clustering can combine a large number of knowledge trees with fine granularity into fewer cluster stacks, and reduces the scale and complexity of association and fusion calculation, thereby reducing the calculation pressure. Especially when facing large-scale knowledge tree fusion tasks, the problem can be decomposed into a plurality of small-scale cluster stacks through clustering, parallel calculation is better performed, and the overall calculation efficiency is improved.

S102: and performing node association degree calculation on any two knowledge nodes among different knowledge trees in the knowledge tree clusters based on character features, semantic features and structural features of a plurality of knowledge nodes in each knowledge tree cluster, and determining a plurality of associated knowledge node lists in the knowledge tree clusters.

In the step, for character features, semantic features and structural features of a plurality of knowledge nodes in each knowledge cluster, node association degree calculation is performed on any two knowledge nodes among different knowledge trees in the knowledge tree cluster, and a plurality of associated knowledge node lists in the knowledge tree cluster are determined.

The character features are name character strings of the knowledge nodes, the semantic features are semantic vectors obtained through a language model according to the character features of the knowledge nodes, and the structural features are father knowledge nodes and leaf knowledge nodes of the knowledge nodes.

Wherein the associated knowledge node list is formed by a plurality of associated knowledge nodes.

s1021: and calculating character characteristics of any two knowledge nodes among different knowledge trees, and determining the editing distance similarity and the longest common subsequence similarity between any two knowledge nodes.

Here, character features of any two knowledge nodes between different knowledge trees are calculated, and the editing distance similarity and the longest common subsequence similarity between any two knowledge nodes are determined.

S1022: and calculating semantic features of any two knowledge nodes among different knowledge trees, and determining semantic similarity between any two knowledge nodes.

Here, the semantic features of any two knowledge nodes among different knowledge trees are calculated, and the semantic similarity between any two knowledge nodes is determined.

S1023: and carrying out weighting processing on the edit distance similarity between any two knowledge nodes, the longest common subsequence similarity and the semantic similarity to determine the similarity between any two knowledge nodes.

And carrying out weighting processing on the edit distance similarity, the longest common subsequence similarity and the semantic similarity between any two knowledge nodes to determine the similarity between any two knowledge nodes.

Specifically, according to the character features of the knowledge node A1 and the character features of the knowledge node B1, an edit distance similarity (A1, B1) and a longest common subsequence similarity lcs_distance (A1, B1) are obtained, cosine similarity is calculated between the semantic features of the knowledge node A1 and the semantic features of the knowledge node B1 to obtain semantic similarity, and then the knowledge node similarity is a weighted value of the edit distance similarity, the longest common subsequence similarity and the semantic similarity.

S1024: and carrying out node association degree calculation based on the structural characteristics of a plurality of knowledge nodes and the similarity between any two knowledge nodes, and determining a plurality of associated knowledge node lists in the knowledge tree cluster.

Here, the node association degree calculation is performed according to the structural features of the plurality of knowledge nodes and the similarity between any two knowledge nodes, and a plurality of associated knowledge node lists in the knowledge tree cluster are determined.

In a specific embodiment, the calculating the node association degree based on the structural features of the plurality of knowledge nodes and the similarity between any two knowledge nodes, and determining a plurality of associated knowledge node lists in the knowledge tree cluster includes:

a: if any two knowledge nodes are leaf knowledge nodes, determining the association degree of any two knowledge nodes based on the similarity of any two knowledge nodes and the similarity of the father knowledge nodes of any two knowledge nodes.

Here, if any two knowledge nodes are leaf knowledge nodes, determining the association degree of any two knowledge nodes according to the similarity of any two knowledge nodes and the similarity of the father knowledge nodes of any two knowledge nodes.

b: if any one of the two knowledge nodes is not a leaf knowledge node, determining the similarity of the any two knowledge nodes as the association degree of the any two knowledge nodes.

Here, if any one of the two knowledge nodes is not a leaf knowledge node, the similarity of the any two knowledge nodes is determined as the association of the any two knowledge nodes.

For example, the association degree between the knowledge node A1 and the knowledge node B1 is represented by correlation (A1, B1), for the knowledge node A1 and the knowledge node B1, the parent knowledge node of the knowledge node A1 is a1_father, the leaf knowledge node is a1_child, the parent node of the knowledge node B1 is b1_father, the leaf knowledge node is b1_child, the association degree of the leaf knowledge node is a weighted sum of the similarity of the knowledge node itself and the parent knowledge node, and the association degree of the intermediate knowledge node only considers the similarity value of the knowledge node itself. Such as: if a1_child is empty and b1_child is empty, corridation (A1, B1) =sim (A1, B1) +sim (a1_father, b1_father)/2; if a1_children is not null or b1_children is not null, corridation (A1, B1) =sim (A1, B1).

c: if the association degree of any two knowledge nodes is greater than or equal to a preset association degree threshold value, the any two knowledge nodes are associated knowledge nodes, and a plurality of associated knowledge nodes with association relations form an associated knowledge node list.

If the association degree of any two knowledge nodes is greater than or equal to the preset association degree threshold, the association relationship between any two knowledge nodes is the association knowledge node, and a plurality of association knowledge nodes with association relationships form an association knowledge node list.

Setting a correlation threshold value correlation_threshold=0.85, traversing each knowledge node of the knowledge tree, and performing correlation calculation with knowledge nodes of different knowledge trees, wherein n correlation knowledge node lists are obtained in the step, and m correlation knowledge nodes are arranged in each correlation knowledge node list. Such as a list of associated knowledge nodes: [ A1, B1, C1, D1] wherein A1, B1 are associated knowledge nodes, B1, C1 are associated knowledge nodes, A1, D1 are associated knowledge nodes.

In the scheme, knowledge nodes of different knowledge trees in the same knowledge tree cluster are associated, and a knowledge node pair with correlation is generated by establishing the association between the knowledge nodes. Through knowledge association, similar or related knowledge nodes can be accurately found, invalid or wrong fusion is avoided, and the accuracy of a subsequent knowledge fusion result is improved.

S103: integrating the attributes of a plurality of associated knowledge nodes in each associated knowledge node list, determining a combined knowledge node of each associated knowledge node list, determining an intermediate knowledge node of the associated knowledge node list from the associated knowledge node list based on a weight maximization method, and taking the intermediate knowledge node as the combined knowledge node.

In the step, the attributes of a plurality of associated knowledge nodes in each associated knowledge node list are integrated, the combined knowledge node of each associated knowledge node list is determined, the intermediate knowledge node of the associated knowledge node list is determined from the associated knowledge node list according to a weight maximization method, and the intermediate knowledge node is used as the combined knowledge node.

The attributes of a plurality of knowledge nodes in each associated knowledge node list are integrated into a combined knowledge node, and unique identifiers are allocated to the combined knowledge node. If the association list [ cloud recording (SaaS) product introduction, 1. Cloud recording-product introduction, cloud recording product introduction ] is initialized to a new merged knowledge node Q.

In a specific embodiment, the weight-based maximization method determines an intermediate knowledge node of the associated knowledge node list from the associated knowledge node list, and uses the intermediate knowledge node as a combined knowledge node, and includes:

Here, for each associated knowledge node list, performing association degree average calculation on each associated knowledge node in the associated knowledge node list according to a weight maximization method, using the associated knowledge node corresponding to the largest association degree average as an intermediate knowledge node of the associated knowledge node list, and using the intermediate knowledge node as a combined knowledge node of the associated knowledge node list.

Specifically, the list of associated knowledge node pairs = [ (node_a1, node_b1, weight (A1, B1)) ], (node_n, node_m, weight (N, M)) ] ], where weight (A1, B1) is the degree of association of node_a1 associated knowledge node and node_b1 associated knowledge node, and for each associated knowledge node in the cor_pair_list, the average value avg_weight of its associated weights, i.e., the sum of the degrees of association of the node included in the associated knowledge node pair in the cor_pair_list divided by the number of times the node appears, is input. avg_weight=Σ (weight (A1, B1))/count; where Σ (weight (A1, B1)) represents the sum of all the associated weights of the node contained in the cor_pair_list, and count represents the number of times the node appears in the cor_list. For example, cor_pair_list= [ (cloud recording (SaaS) product description, 1. Cloud recording-product description, 0.89), (cloud recording (SaaS) product description, cloud recording product description, 0.92), (1. Cloud recording-product description, cloud recording product description, 0.91) ], cor_list= [ cloud recording (SaaS) product description, 1. Cloud recording-product description, cloud recording product description ], calculating average relevance avg_weight of each associated knowledge node in the associated knowledge node list, for example avg_weight= (0.92+0.91)/2=0.915 is the highest, and selecting "cloud recording product description" as the intermediate node.

The method comprises the steps of inputting a related knowledge node list and the degree of association among knowledge nodes, and generating a new knowledge node from the related knowledge node list by using a natural language generation method to serve as a representation of the combined knowledge node.

In this scenario, knowledge fusion refers to the merging of related knowledge nodes into a more comprehensive and consistent knowledge tree. The main task is accomplished by merging and integrating the relevant knowledge nodes. Firstly initializing the merging knowledge nodes, then carrying out attribute fusion and intermediate knowledge node selection, and finally using the selected intermediate knowledge nodes as the representation of the merging knowledge nodes and updating the attributes and the relations of the merging knowledge nodes.

S104: and updating the attribute and the relationship of each merged knowledge node in the knowledge tree cluster to generate a merged knowledge tree.

In the step, each merged knowledge node in the knowledge tree cluster is subjected to attribute and relationship updating to generate a merged knowledge tree. The method and the system realize the integration of the associated knowledge nodes among different knowledge trees, generate a global knowledge tree and keep the relation and attribute characteristics of each associated knowledge point.

(1): and fusing the attributes of a plurality of associated knowledge nodes in the associated knowledge node list, determining the attribute information of the combined knowledge nodes, and updating the attribute information of the combined knowledge nodes.

Here, the attributes of the plurality of associated knowledge nodes in the associated knowledge node list are fused, and the attribute information of the merged knowledge node is determined and updated.

Here, the attribute information of the combined knowledge nodes is determined by fusing the attributes of a plurality of associated knowledge nodes in the associated knowledge node list according to any one of a numerical value average value fusion method, a relevance fusion method, a character string combination method and a list splicing method.

The numerical value average value is fused to average the numerical value attributes, the association degree is fused to weight average the attributes of different knowledge nodes in the attribute fusion process according to the weight of the associated knowledge nodes, the character strings are combined to combine the text attributes into a new text attribute, and the list is spliced to combine the list attributes into a large list.

(2): and controlling the merged knowledge node to inherit the father-son node relation of each other knowledge node in the associated knowledge node list, and generating the merged knowledge tree based on a plurality of the merged knowledge nodes and other knowledge nodes in the knowledge tree cluster.

Here, the merging knowledge node is controlled to inherit the parent-child node relation of each other knowledge node in the associated knowledge node list, and a merged knowledge tree is generated according to the merging knowledge nodes and the other knowledge nodes in the knowledge tree cluster.

And taking the finally selected intermediate knowledge node as a representation of the combined knowledge node, and updating the attribute and the relation of the combined knowledge node.

Through the algorithm and the steps, the association nodes among different knowledge trees can be fused to generate a global knowledge tree, and the relation and attribute characteristics of each association tree are reserved. By merging the same or related knowledge nodes in different knowledge trees, duplicate and redundant knowledge is removed, and knowledge accuracy is improved. Meanwhile, the correlation fusion can further improve the credibility of knowledge by weighting or averaging common characteristics of different knowledge tree nodes. Different knowledge trees may contain descriptions of different aspects or angles of the same concept or topic. By performing association fusion on the related nodes in the different knowledge trees, the views and information of different knowledge sources can be integrated and displayed, so that a more comprehensive and comprehensive knowledge view is provided.

In the specific embodiment, the scheme can be applied to the communication field, namely the field related to the technologies and applications of telecommunication, network, wireless communication and the like, and has wide application in daily life and work. The knowledge tree refers to a knowledge graph with a hierarchical structure which is arranged from various materials, and can help learners to better understand and master related knowledge. In knowledge training in the communication field, sources of knowledge trees mainly comprise word documents, pdf documents, ppt screen videos and the like. These data may be organized in different ways and formats, but all contain knowledge about the communication domain. The purpose of the association fusion of the knowledge trees is to integrate and share knowledge, so that a trainer can better understand the knowledge in the communication field. Through associating the nodes among different knowledge trees, the association among the different knowledge trees can be established, and the connection and the supplementation of knowledge are realized. In the knowledge fusion process, the associated nodes can be combined and recombined to form a more complete and comprehensive knowledge tree, so that learners can comprehensively know and master the knowledge in the communication field. The meaning of knowledge tree association fusion is to promote knowledge transfer and learning effect. Through fusing the relations among different knowledge trees, repeated knowledge and omission can be avoided, repeated reading and searching of learners on different materials are reduced, and learning efficiency is improved. Meanwhile, the association fusion can help learners to comprehensively understand knowledge in the communication field, a more complete knowledge system is formed, and knowledge application and innovation are facilitated.

Further, referring to fig. 2, fig. 2 is a schematic diagram of a method for association fusion of multiple source knowledge trees according to an embodiment of the present application. As shown in fig. 2, a multi-source knowledge tree set is obtained, knowledge tree clustering is performed on the multi-source knowledge tree set to obtain a knowledge tree cluster, knowledge association is performed on a plurality of knowledge nodes in the knowledge tree cluster to obtain an associated knowledge node list of the knowledge tree cluster, and knowledge fusion is performed on a plurality of associated knowledge nodes in the associated knowledge node list to obtain a fused knowledge tree. Here, a large number of knowledge trees are clustered into a small number of knowledge tree clusters in a knowledge tree clustering mode, so that the calculation complexity can be reduced, the calculation efficiency is improved, meanwhile, knowledge trees in similar fields can be clustered together through clustering, and therefore knowledge confusion among different sources and different fields is reduced, and the correlation accuracy is improved. Second, not only are similarities between nodes considered, but also deeper associations between nodes are analyzed. Knowledge trees can be more fully understood and analyzed by considering the context information and semantic similarity of nodes. Thus, the rich information of the knowledge tree can be fully utilized, and the accuracy and reliability of association are improved. And finally, in the knowledge fusion stage, merging and fusing the associated nodes to obtain a final fused knowledge tree.

The embodiment of the application provides a correlation fusion method of a multi-source knowledge tree, which comprises the following steps: carrying out knowledge clustering processing on a plurality of knowledge trees to determine different knowledge tree clusters; performing node association calculation on any two knowledge nodes among different knowledge trees in the knowledge tree clusters based on character features, semantic features and structural features of a plurality of knowledge nodes in each knowledge tree cluster, and determining a plurality of associated knowledge node lists in the knowledge tree clusters; integrating the attributes of a plurality of associated knowledge nodes in each associated knowledge node list, determining a combined knowledge node of each associated knowledge node list, determining an intermediate knowledge node of the associated knowledge node list from the associated knowledge node list based on a weight maximization method, and taking the intermediate knowledge node as the combined knowledge node; and updating the attribute and the relationship of each merged knowledge node in the knowledge tree cluster to generate a merged knowledge tree. By merging identical or related knowledge nodes in different knowledge trees, duplicate and redundant knowledge is removed, knowledge accuracy is improved, knowledge reliability is improved, and different knowledge trees possibly contain descriptions of different aspects of the same concept or topic, and by performing association fusion on related knowledge nodes in the different knowledge trees, views and information of different knowledge sources can be integrated and displayed, so that a more comprehensive and comprehensive knowledge tree is provided.

Referring to fig. 3, fig. 3 is a schematic structural diagram of an apparatus for a multi-source knowledge tree association fusion method according to an embodiment of the present application. As shown in fig. 3, the association fusion apparatus 300 of the multi-source knowledge tree includes:

the knowledge tree clustering module 310 is configured to perform knowledge clustering on the multiple knowledge trees to determine different knowledge tree clusters;

the association module 320 is configured to perform node association calculation on any two knowledge nodes between different knowledge trees in each knowledge tree cluster based on character features, semantic features and structural features of the plurality of knowledge nodes in each knowledge tree cluster, and determine a plurality of associated knowledge node lists in the knowledge tree cluster;

the fusion module 330 is configured to integrate attributes of a plurality of associated knowledge nodes in each associated knowledge node list, determine a merged knowledge node of each associated knowledge node list, determine an intermediate knowledge node of the associated knowledge node list from the associated knowledge node list based on a weight maximization method, and use the intermediate knowledge node as the merged knowledge node;

and the generating module 340 is configured to update the attribute and the relationship of each of the merged knowledge nodes in the knowledge tree cluster, so as to generate a merged knowledge tree.

Further, when the knowledge tree clustering module 310 is configured to perform knowledge clustering processing on the plurality of knowledge trees to determine different knowledge tree clusters, the knowledge tree clustering module 310 is specifically configured to:

Further, when the association module 320 is configured to calculate the node association degree of any two knowledge nodes between different knowledge trees in each knowledge tree cluster based on the character features, the semantic features and the structural features of the plurality of knowledge nodes in each knowledge tree cluster, and determine a plurality of associated knowledge node lists in the knowledge tree cluster, the association module 320 is specifically configured to:

Further, when the association module 320 is configured to calculate the node association degree based on the structural features of the plurality of knowledge nodes and the similarity between any two knowledge nodes, and determine a plurality of associated knowledge node lists in the knowledge tree cluster, the association module 320 is specifically configured to:

Further, when the fusing module 330 is configured to determine an intermediate knowledge node of the associated knowledge node list from the associated knowledge node list by using the weight-based maximization method, and take the intermediate knowledge node as a merged knowledge node, the fusing module 330 is specifically configured to:

Further, the fusion module 330 determines the intermediate knowledge node by:

Further, when the generating module 340 is configured to update the attribute and the relationship of each of the merged knowledge nodes in the knowledge tree cluster to generate a merged knowledge tree, the generating module 340 is specifically configured to:

Further, when the generating module 340 is configured to fuse the attributes of the plurality of associated knowledge nodes in the associated knowledge node list and determine attribute information of the merged knowledge node, the generating module 340 is specifically configured to:

The embodiment of the application provides a correlation fusion device of multisource knowledge tree, the correlation fusion device includes: the knowledge tree clustering module is used for carrying out knowledge clustering processing on the plurality of knowledge trees and determining different knowledge tree clusters; the association module is used for carrying out node association degree calculation on any two knowledge nodes among different knowledge trees in the knowledge tree clusters based on character features, semantic features and structural features of a plurality of knowledge nodes in each knowledge tree cluster, and determining a plurality of associated knowledge node lists in the knowledge tree clusters; the fusion module is used for integrating the attributes of a plurality of associated knowledge nodes in each associated knowledge node list, determining a combined knowledge node of each associated knowledge node list, determining an intermediate knowledge node of the associated knowledge node list from the associated knowledge node list based on a weight maximization method, and taking the intermediate knowledge node as the combined knowledge node; and the generating module is used for updating the attribute and the relationship of each combined knowledge node in the knowledge tree cluster to generate a fusion knowledge tree. By merging identical or related knowledge nodes in different knowledge trees, duplicate and redundant knowledge is removed, knowledge accuracy is improved, knowledge reliability is improved, and different knowledge trees possibly contain descriptions of different aspects of the same concept or topic, and by performing association fusion on related knowledge nodes in the different knowledge trees, views and information of different knowledge sources can be integrated and displayed, so that a more comprehensive and comprehensive knowledge tree is provided.

Referring to fig. 4, fig. 4 is a schematic structural diagram of an electronic device according to an embodiment of the present application. As shown in fig. 4, the electronic device 400 includes a processor 410, a memory 420, and a bus 430.

The memory 420 stores machine-readable instructions executable by the processor 410, when the electronic device 400 is running, the processor 410 communicates with the memory 420 through the bus 430, and when the machine-readable instructions are executed by the processor 410, the steps of the association fusion method of the multi-source knowledge tree in the method embodiment shown in fig. 1 can be executed, and the specific implementation is referred to the method embodiment and will not be described herein.

The embodiment of the present application further provides a computer readable storage medium, where a computer program is stored on the computer readable storage medium, and when the computer program is executed by a processor, the steps of the association fusion method of the multi-source knowledge tree in the method embodiment shown in fig. 1 may be executed, and a specific implementation manner may refer to the method embodiment and will not be described herein.

It will be clear to those skilled in the art that, for convenience and brevity of description, specific working procedures of the above-described systems, apparatuses and units may refer to corresponding procedures in the foregoing method embodiments, and are not repeated herein.

In the several embodiments provided in this application, it should be understood that the disclosed systems, devices, and methods may be implemented in other manners. The above-described apparatus embodiments are merely illustrative, for example, the division of the units is merely a logical function division, and there may be other manners of division in actual implementation, and for example, multiple units or components may be combined or integrated into another system, or some features may be omitted, or not performed. Alternatively, the coupling or direct coupling or communication connection shown or discussed with each other may be through some communication interface, device or unit indirect coupling or communication connection, which may be in electrical, mechanical or other form.

The units described as separate units may or may not be physically separate, and units shown as units may or may not be physical units, may be located in one place, or may be distributed on a plurality of network units. Some or all of the units may be selected according to actual needs to achieve the purpose of the solution of this embodiment.

In addition, each functional unit in each embodiment of the present application may be integrated in one processing unit, or each unit may exist alone physically, or two or more units may be integrated in one unit.

The functions, if implemented in the form of software functional units and sold or used as a stand-alone product, may be stored in a non-volatile computer readable storage medium executable by a processor. Based on such understanding, the technical solution of the present application may be embodied essentially or in a part contributing to the prior art or in a part of the technical solution, in the form of a software product stored in a storage medium, including several instructions for causing a computer device (which may be a personal computer, a server, or a network device, etc.) to perform all or part of the steps of the methods described in the embodiments of the present application. And the aforementioned storage medium includes: a U-disk, a removable hard disk, a Read-Only Memory (ROM), a random access Memory (Random Access Memory, RAM), a magnetic disk, or an optical disk, or other various media capable of storing program codes.

Finally, it should be noted that: the foregoing examples are merely specific embodiments of the present application, and are not intended to limit the scope of the present application, but the present application is not limited thereto, and those skilled in the art will appreciate that while the foregoing examples are described in detail, the present application is not limited thereto. Any person skilled in the art may modify or easily conceive of the technical solution described in the foregoing embodiments, or make equivalent substitutions for some of the technical features within the technical scope of the disclosure of the present application; such modifications, changes or substitutions do not depart from the spirit and scope of the technical solutions of the embodiments of the present application, and are intended to be included in the scope of the present application. Therefore, the protection scope of the present application shall be subject to the protection scope of the claims.

Claims

1. The association fusion method of the multi-source knowledge tree is characterized by comprising the following steps of:

2. The association fusion method of claim 1, wherein the performing knowledge clustering on the plurality of knowledge trees to determine different knowledge tree clusters comprises:

3. The association fusion method according to claim 1, wherein the calculating the node association degree of any two knowledge nodes between different knowledge trees in the knowledge tree clusters based on character features, semantic features and structural features of a plurality of knowledge nodes in each knowledge tree cluster, and determining a plurality of associated knowledge node lists in the knowledge tree clusters includes:

4. The association fusion method of claim 3, wherein the calculating the node association based on the structural features of the plurality of knowledge nodes and the similarity between any two knowledge nodes, determining a plurality of associated knowledge node lists in the knowledge tree cluster, comprises:

5. The association fusion method according to claim 1, wherein the weight-based maximization method determines an intermediate knowledge node of the associated knowledge node list from the associated knowledge node list, and uses the intermediate knowledge node as a combined knowledge node, and comprises:

6. The association fusion method of claim 1, further comprising determining the intermediate knowledge node by:

7. The method of claim 1, wherein said updating the attributes and relationships of each of the merged knowledge nodes in the knowledge tree cluster to generate a merged knowledge tree comprises:

8. The association fusion method of claim 7, wherein the fusing the attributes of the plurality of associated knowledge nodes in the associated knowledge node list, determining the attribute information of the merged knowledge node, comprises:

9. An associative fusion device for a multi-source knowledge tree, wherein the associative fusion device comprises:

10. An electronic device, comprising: a processor, a memory and a bus, said memory storing machine readable instructions executable by said processor, said processor and said memory communicating via said bus when the electronic device is running, said machine readable instructions when executed by said processor performing the steps of the method of associative fusion of multi-source knowledge trees according to any one of claims 1 to 8.

11. A computer readable storage medium, characterized in that the computer readable storage medium has stored thereon a computer program which, when executed by a processor, performs the steps of the association fusion method of a multi-source knowledge tree according to any of claims 1 to 8.