CN117573886A

CN117573886A - Knowledge graph visualization construction method and system for structured data

Info

Publication number: CN117573886A
Application number: CN202311550922.5A
Authority: CN
Inventors: 高士连; 赵慧; 王智铎; 桂严; 石博凡
Original assignee: CETC 32 Research Institute
Current assignee: CETC 32 Research Institute
Priority date: 2023-11-20
Filing date: 2023-11-20
Publication date: 2024-02-20

Abstract

The invention relates to a knowledge graph visualization construction method and a knowledge graph visualization construction system for structured data, wherein the structured data is constructed according to a top-down mode to form a universal knowledge graph; according to the method, candidate entities and candidate relations are introduced in the process of constructing the knowledge graph, and node elements for constructing the knowledge graph are formed by recommending the candidate entities, the candidate relations to the entities and the relations in batches according to needs through artificial decision; and adopting classification and clustering algorithms in constructed knowledge graph nodes, visualizing complex data in a database, and hanging the classification of the data nodes on the graph, thereby being beneficial to carrying out deep association analysis on the data. The method solves the problems of low efficiency, low utilization rate and poor readability of the knowledge graph in the face of a huge, multi-source and complex database, greatly reduces redundant nodes, enables the knowledge graph to have better readability and practicability, and is beneficial to deep association analysis of data and improvement of data readability.

Description

Knowledge graph visualization construction method and system for structured data

Technical Field

The invention relates to the technical field of databases and knowledge maps, in particular to a mechanism and a system for visual construction of a structured data knowledge map.

Background

Knowledge maps have wide application in the context of vast amounts of information in structured data, which helps organize, manage, and infer large amounts of complex data, thereby providing more insight and intelligent decisions. A knowledge graph is a data structure for representing and organizing knowledge, which is presented in the form of a graph or network, including entities, attributes, and relationships between entities. The goal of the knowledge graph is to capture information and associations in the real world to help computer systems better understand and process knowledge. But has the following problems:

1) The problems of scattered and diverse data, islanding, redundancy, low data value utilization and poor readability of the multi-source database,

2) Under the conditions of complex database table structure and huge database table scale, the structured data construction knowledge graph can form a plurality of nodes and relationship lines, and some of the huge node relationship lines are redundant nodes, and some node relationship line users are not concerned and not needed at the time. All the dotted lines are piled up in the map, so that the map is indiscriminate and poor in readability, and key information cannot be found;

3) The core data information of the database has various categories, complex corresponding relation of data entities, closely constrained association between the data dictionary and the record table, huge data stored in the database table, difficult data relation management among the tables and poor mining of data value.

Disclosure of Invention

Aiming at the problems of low efficiency, low utilization rate and poor readability of knowledge graphs in the face of huge, multi-source and complex databases, the method and the system for constructing the knowledge graph visualization for the structured data are provided. Constructing a knowledge graph of the structured data according to a top-down mode to form a universal structured data construction knowledge graph visualization method and system; according to the method, candidate entities and candidate relations are introduced in the process of constructing the knowledge graph, and node elements for constructing the knowledge graph are formed by recommending the candidate entities, the candidate relations to the entities and the relations in batches according to needs through artificial decision; and adopting classification and clustering algorithms in constructed knowledge graph nodes, visualizing complex data in a database, and hanging the classification of the data nodes on the graph, thereby being beneficial to carrying out deep association analysis on the data.

The technical scheme of the invention is as follows:

the structured data-oriented knowledge graph visualization construction method is used for providing structured data-oriented knowledge graph visualization construction, under the conditions of complex database table structure and huge scale, the structured data is extracted into candidate entities and candidate relations in the knowledge graph through analysis of constraint relations among the database tables, the candidate entities and the candidate relations are visually displayed in a form tree form on an interface, and the candidate entities, the candidate relations and the candidate relations are recommended in batches as required through artificial decision to form elements for constructing the knowledge graph; the structured data knowledge graph visualization construction realizes the construction of multi-source heterogeneous data knowledge graphs by creating functional modules of a top knowledge base module, a knowledge extraction module, a knowledge storage module and a graph visualization module in each field;

the top knowledge base construction module is a basic and core component part in the whole knowledge spectrum system, the top knowledge base defines a data model of the knowledge spectrum, and the data model comprises entity types and relation types, so that a reference is provided for constructing the structure of the knowledge spectrum; constructing key concepts, entities, relations, attributes and business rules in each field of the top knowledge base, normalizing and abstracting the key concepts, entities, relations, attributes and business rules, and constructing a meta-model with high universality, applicability and flexibility in each field;

the knowledge extraction module provides a function of extracting useful knowledge from a large amount of data and storing the useful knowledge into a knowledge graph, is a precondition for constructing the knowledge graph, and provides data nodes and relationship lines for constructing the knowledge graph; introducing concepts of candidate entities and candidate relations, and recommending the candidate entities, the candidate relations to the entities and the entity relations in batches according to the need through artificial decision to form node elements for constructing a knowledge graph;

the knowledge storage module is used for storing the knowledge extracted by the knowledge extraction module and providing data support for the map visualization module;

the map visualization module provides data in the map to be presented to a user in a map, table and tree form visualization mode, and the user can operate on the map nodes and can also operate on the map nodes.

Further, the implementation process of the top knowledge base construction module is as follows:

analyzing the domain to determine key concepts, entities, relationships, attributes and business rules in the domain;

determining, based on the domain analysis, the primary entities present in the domain and the relationships between them; an entity may be a concrete thing, a concrete concept, an abstract concept, and a relationship describes a relationship between these entities;

defining attributes for each entity, the attributes describing a feature or attribute of the entity; the attributes may be of various data types and may have different constraints;

the determined entity, relation and attribute are materialized into a database table, and a reference is provided for constructing the structure of the knowledge graph, so that the constructed top-level knowledge base has good universality, practicability and expandability.

Further, the knowledge extraction module comprises two sub-functional modules, namely a database table knowledge extraction module and a database table data knowledge extraction module;

the specific implementation process of the database table knowledge extraction module is as follows:

type classification is carried out on the database table according to a main external key constraint specific rule, and the database table comprises an entity table, a sub-table, a dictionary table, an association table, an expansion table, a dynamic table and other tables; under the condition of excluding dictionary tables, the table which meets the unique main key and non-external key conditions belongs to an entity table; the table belonging meeting the conditions that the main key is unique and the main key is an external key is a sub-table; the list attribution meeting the condition that the list main key is unique and the list names are not foreign keys and meet the dictionary rule, or the list attribution meeting the condition that the list main key is unique and the list names are not foreign keys and the list notes contain dictionary word patterns is a dictionary list; the table attribution meeting the condition that a plurality of fields in the table are not only a main key but also an external key, and the external key comes from different tables is an associated table; the list meeting the condition that a plurality of main key fields in the list come from another list or the list meeting the condition that two main keys are arranged in the list and one main key is an external key and the other main key is not the external key is an expansion list; the table attribution meeting the field conditions of the time types contained in the combined main key of the table is a dynamic table; tables which do not meet the above rules are attributed to other tables;

the database table knowledge extraction module comprises a data source management module which provides functions of adding, editing, deleting, connecting with a data source, analyzing the data source and importing a data source model for data source analysis;

the realization of the data source analysis function is that the main foreign key constraint analysis and the main foreign key field type analysis are carried out on the database tables, and each data table generates a candidate entity and a candidate relation which are correspondingly classified according to the classification rule of the classified 7-class table; the analysis to other tables can manually recommend the tables to candidate entities or candidate entity relations through interfaces;

in the visual display interfaces of candidate entities and candidate relations, a user can select related candidate entities and candidate relations to recommend to the entities and the entity relation modules according to the needs in the analyzed candidate entities and candidate relation modules of a plurality of data sources to form node relation line elements for constructing a knowledge graph; the constructed knowledge graph can enable the multi-source database to be in a data set, share knowledge and improve the knowledge utilization rate.

Further, the database table data knowledge extraction module is performed on the basis of the database table knowledge extraction module, and the table data is extracted on the entity table graph nodes of the knowledge graph formed by the entities and entity relations extracted in the database table knowledge extraction module, so that the specific implementation process is as follows:

the original data of the selected node is obtained from the knowledge graph, important data can be selected from the original data and extracted to the graph node, and the data relevance can be visually displayed;

data in the original table is subjected to data extraction of a hierarchical structure, and default classification is carried out according to predefined hierarchical fields, hierarchical rules and hierarchical numbers, and the data is visually displayed in a tree structure interface, so that a data view is clear in hierarchy and clear in logic;

the original table data can be extracted according to the self-defined classification of the table field; the user-defined classified data extraction user can select a concerned table field to classify the data, and configures a node display field to hang the data extraction result on a corresponding graph node;

the original table data can be extracted through clustering, a clustering algorithm is used for processing the text and integer data, a user selects a table field and selects a clustering algorithm corresponding to the field to extract the data, and the extraction result is hung on a corresponding graph node to be displayed.

Preferably, the method for constructing the data map visualization of the Oracle, DM, mysql and Jin Cang databases is supported.

The system constructs the structured data into a knowledge graph according to a top-down mode and displays the knowledge graph visually; the structured data is stored through a general user relational database, a top knowledge base of each field is firstly established aiming at the structured data of each field, and a map is constructed by extracting entities and relational information from mass data stored in the relational database and is displayed in a form and a graphic mode; the knowledge graph visualization system of the structured data comprises a top knowledge base module, a knowledge extraction module, a knowledge storage module and a graph visualization module in each field;

The invention has the beneficial effects that:

1) The problems of scattered and diverse data, islanding, redundancy, low data value utilization and difficult knowledge sharing of the multi-source database are solved.

2) Aiming at the condition of huge database table size, candidate entities and candidate relations are introduced in the knowledge extraction process, and node elements for constructing the knowledge graph are formed in the candidate entities and the candidate relations to the entity and entity relations according to the requirement by manual decision batch recommendation on an interface, so that redundant nodes are greatly reduced, and the knowledge graph has higher readability and practicability.

3) And classifying and clustering the table data in the constructed knowledge graph table nodes by adopting a classification and clustering algorithm, visualizing the complex data in the database table, and hanging the data nodes on the graph according to the classification and clustering result, thereby being beneficial to carrying out deep association analysis on the data and improving the data readability.

4) The map visualization shows that nodes such as map entities, data, classification and the like are represented in different colors, and the knowledge maps are presented in a graph and chart mode, so that potential insights, modes and trends can be found by users, and the method can be used for exploring, inquiring and navigating the knowledge maps.

Drawings

FIG. 1 is a diagram of a knowledge graph construction process according to the invention;

FIG. 2 is a schematic diagram of the overall flow of knowledge graph visualization construction for structured data according to the present invention;

FIG. 3 is a schematic diagram of the top knowledge base construction process of the present invention;

fig. 4 is a schematic diagram of the knowledge extraction process according to the present invention.

Detailed Description

The invention will now be described in detail with reference to the drawings and specific examples. The present embodiment is implemented on the premise of the technical scheme of the present invention, and a detailed implementation manner and a specific operation process are given, but the protection scope of the present invention is not limited to the following examples.

Along with the increasing explosive growth of informationized data, the information quantity is larger and larger, the core data of the database has various information types, complex corresponding relation of data entities and close constraint association between a data dictionary and a record table. Under the condition of huge database scale and information quantity, the knowledge graph technology is applied to structured data, complex data in the database can be visualized, deep association analysis of the data is facilitated, and the problems of scattered, diversified and isolated data of the multi-source database and low data value utilization are solved.

In order to solve the problems of scattered, diversified, islanding, redundancy and low data value utilization of the multi-source database, the consistency and usability of the data are improved, and the structured data can be constructed into a knowledge graph and visualized. The knowledge graph is constructed in the modes of top-down, bottom-up, top-down and bottom-up combination of 3. The top-down build model requires that the top-level knowledge base be created first, then the ontology and entity information be extracted from the mass data and added to the top-level knowledge base that was originally created. The bottom-up construction mode needs to extract knowledge from the data, and then adds the obtained entity, relation and attribute to the knowledge graph after the entity alignment, semantic fusion, information merging, knowledge processing and other processes. The flow of constructing a knowledge graph in these two modes is shown in fig. 1. In addition to the two common knowledge graph construction modes, in recent years, a few students construct a knowledge graph by combining the two knowledge graph construction modes, the construction mode needs to construct a most basic mode layer in a large amount of data, then the more valuable knowledge updating mode layer is continuously mined, and finally mapping from the mode layer to the data layer is designed to fill in the entity, so that a complete knowledge graph is formed.

The invention provides a knowledge graph visualization construction method and a knowledge graph visualization construction system for structured data. The structured data is generally stored in a user relational database, a top knowledge base of each field is firstly established aiming at the structured data of each field, and a map is constructed by extracting entities and relational information from mass data stored in the relational database and is displayed in a form and a graph.

The structured data-oriented knowledge graph visualization construction method and system provides structured data-oriented knowledge graph visualization construction, under the conditions of complex database table structure and huge scale, the structured data is extracted into candidate entities and candidate relations in the knowledge graph through analysis of constraint relations among database tables, the candidate entities and the candidate relations are visually displayed in a form tree form on an interface, and the candidate entities, the candidate relations and the candidate relations are recommended in batches as required through artificial decision to form elements for constructing the knowledge graph. The structured data-oriented knowledge graph visualization construction method and system comprise a top knowledge base construction module, a knowledge extraction module, a knowledge storage module and a graph visualization module.

The method and the system for constructing the knowledge graph visualization of the structured data support database data graph visualization such as Oracle, DM, mysql, jin Cang, and the like, and the overall flow chart is shown in figure 2, and the structured data knowledge graph visualization construction realizes the construction of the multi-source heterogeneous data knowledge graph by creating functional modules such as a top-level knowledge base module, a knowledge extraction module, a knowledge storage module, a graph visualization module and the like in each field, so that the problems of poor aggregation capability, low utilization rate and difficult knowledge sharing of the multi-source heterogeneous data are solved.

The top knowledge base construction module is a basic and core component in the whole knowledge map system, the top knowledge base defines a data model of the knowledge map, and the data model comprises entity types, relation types and the like, so that a reference is provided for constructing the structure of the knowledge map. The construction of the top knowledge base is to construct a meta-model with high universality, applicability and flexibility in each field by cooperating with experts in each field to know key concepts, entities, relations, attributes and business rules in the field in detail and standardizing and abstracting the key concepts, entities, relations, attributes and business rules.

The knowledge extraction module provides the function of extracting useful knowledge from a large amount of data and storing the useful knowledge in the knowledge graph, is a precondition for constructing the knowledge graph, and provides data nodes and relationship lines for constructing the knowledge graph. Aiming at the conditions that the node relations of the knowledge graph constructed in a huge scale are numerous, the readability is poor and the knowledge mining is difficult, the concepts of candidate entities and candidate relations are introduced in the knowledge extraction, and the candidate entities, the candidate relations and the node elements for constructing the knowledge graph are recommended in batches according to the needs through artificial decision.

The knowledge storage module is used for storing the knowledge extracted by the knowledge extraction module and providing data support for the map visualization module.

The map visualization module provides data in the map to be presented to the user in a form of map, table, tree visualization, as shown in fig. 3. The user can perform operations such as data extraction, node display style configuration and the like on the map nodes. The node can be enlarged, reduced, dragged and only displayed with some kind of map nodes, relations and the like.

The present example expands detailed description for the top-level knowledge base building module and knowledge extraction module.

Examples are:

the implementation process of the top knowledge base construction module is shown in fig. 3:

(1) The domain is analyzed to determine key concepts, entities, relationships, attributes and business rules in the domain.

(2) Based on the domain analysis, the main entities present in the domain and the relationships between them are determined. An entity may be a concrete thing, a concept, or an abstract concept, and a relationship describes a relationship between these entities.

(3) Attributes are defined for each entity, which describe the characteristics or attributes of the entity. The attributes may be text, number, date, etc. data types and may have different constraints.

(4) The determined entity, relation and attribute are materialized into database tables, which are the lybt table lybt_stlx and the lybt_gxlx table respectively. The method provides a reference for constructing the structure of the knowledge graph, so that the constructed top knowledge base has good universality, practicability and expandability.

The knowledge extraction module comprises a database table knowledge extraction module and a database table data knowledge extraction module.

The specific implementation process of the database table knowledge extraction module is shown in fig. 4:

(1) And performing type classification on the database table according to specific rules such as main external key constraint and the like, wherein the database table comprises an entity table, a sub-table, a dictionary table, an association table, an extension table, a dynamic table and other tables. The table belonging to the table meeting the conditions of the main key being unique and not the external key (excluding dictionary tables) is an entity table; the table belonging meeting the conditions that the main key is unique and the main key is an external key is a sub-table; the list attribution meeting the condition that the list main key is unique and the list names are not foreign keys and meet the dictionary rule, or the list attribution meeting the condition that the list main key is unique and the list names are not foreign keys and the list notes contain dictionary word patterns is a dictionary list; the table attribution meeting the condition that a plurality of fields in the table are not only a main key but also an external key, and the external key comes from different tables is an associated table; the list meeting the condition that a plurality of main key fields in the list come from another list or the list meeting the condition that two main keys are arranged in the list and one main key is an external key and the other main key is not the external key is an expansion list; the table attribution meeting the field conditions of the time types contained in the combined main key of the table is a dynamic table; tables that do not meet the above rules are attributed to other tables.

(2) The data source management module provides functions of adding, editing, deleting, connecting, analyzing and importing data sources into the data source model for data source analysis.

(3) The data source analysis function is realized by carrying out primary foreign key constraint analysis and primary foreign key field type analysis on the database tables, and generating candidate entities and candidate relations of corresponding classification on each data table according to the classification rules of the class 7 tables divided in the step 1.

(4) The table analyzed in the step 3 below can be manually recommended to the candidate entity or the candidate entity relation through an interface.

(5) And in the visual display interface of the candidate entity and the candidate relation, a user can select related candidate entities and candidate relation recommendation to the entities and the entity relation module according to the needs in the candidate entity and the candidate relation module of a plurality of analyzed data sources to form node relation line elements for constructing a knowledge graph. The constructed knowledge graph can enable the multi-source database to be in a data set, share knowledge and improve the knowledge utilization rate.

The database table data knowledge extraction module is performed on the basis of the database table knowledge extraction module, and the specific implementation process of extracting the table data on the entity table graph nodes of the knowledge graph formed by the entity and the entity relationship extracted by the database table knowledge extraction module is shown in fig. 4:

(1) The original data of the selected node is obtained from the map, the important data can be selected from the original data and extracted to the map node, and the data relevance can be visually displayed.

(2) And data in the original table is subjected to data extraction of a hierarchical structure, and default classification is carried out according to predefined hierarchical fields, hierarchical rules and hierarchical numbers, and the data is visually displayed in a tree structure interface, so that the data view is seen to be distinct in hierarchy and clear in logic.

(3) The original table data can be extracted by the data of the custom classification according to the table field. The user-defined classified data extraction user can select the concerned table field to classify the data, and configures the node display field to hang the data extraction result on the corresponding graph node.

(4) And extracting the data of the original table data through clustering, processing the text and integer data by a clustering algorithm, extracting the data by a user through selecting a table field and selecting a clustering algorithm corresponding to the field, and hanging the extraction result to a corresponding graph node for display.

The above examples represent only 1 embodiment of the present invention, which is described in more detail and detail, but are not to be construed as limiting the scope of the invention. It should be noted that it will be apparent to those skilled in the art that several variations and modifications can be made without departing from the spirit of the invention, which are all within the scope of the invention. Accordingly, the scope of protection of the present invention is to be determined by the appended claims.

Claims

1. The structured data-oriented knowledge graph visualization construction method is characterized by providing structured data-oriented knowledge graph visualization construction, under the conditions of complex database table structure and huge scale, extracting structured data into candidate entities and candidate relations in the knowledge graph by analyzing constraint relations among database tables, and performing visualization display on the candidate entities and the candidate relations in a table tree form on an interface, and recommending the candidate entities and the candidate relations to the entities and the relations in batches as required by artificial decision to form elements for constructing the knowledge graph; the structured data knowledge graph visualization construction realizes the construction of multi-source heterogeneous data knowledge graphs by creating functional modules of a top knowledge base module, a knowledge extraction module, a knowledge storage module and a graph visualization module in each field;

2. The structured data oriented knowledge graph visualization construction method according to claim 1, wherein the top knowledge base construction module is specifically implemented as follows:

3. The structured data oriented knowledge graph visualization construction method of claim 1, wherein the knowledge extraction module comprises two sub-functional modules, namely a database table knowledge extraction module and a database table data knowledge extraction module;

4. The method for constructing a structured data-oriented knowledge graph visualization according to claim 3, wherein the database table data knowledge extraction module is based on the database table knowledge extraction module, and the table data is extracted on the entity table graph nodes of the knowledge graph formed by the entities and the entity relationships extracted in the database table knowledge extraction module, and the specific implementation process is as follows:

5. The structured data oriented knowledge graph visualization construction method of claim 1, wherein the method of structured data oriented knowledge graph visualization construction is supported for Oracle, DM, mysql, jin Cang databases.

6. The structured data-oriented knowledge graph visualization system is characterized in that the system constructs the structured data into a knowledge graph according to a top-down mode and visualizes and displays the knowledge graph; the structured data is stored through a general user relational database, a top knowledge base of each field is firstly established aiming at the structured data of each field, and a map is constructed by extracting entities and relational information from mass data stored in the relational database and is displayed in a form and a graphic mode; the knowledge graph visualization system of the structured data comprises a top knowledge base module, a knowledge extraction module, a knowledge storage module and a graph visualization module in each field;