CN115203435A - Entity relation generation method and data query method based on knowledge graph - Google Patents

Entity relation generation method and data query method based on knowledge graph Download PDF

Info

Publication number
CN115203435A
CN115203435A CN202210828302.2A CN202210828302A CN115203435A CN 115203435 A CN115203435 A CN 115203435A CN 202210828302 A CN202210828302 A CN 202210828302A CN 115203435 A CN115203435 A CN 115203435A
Authority
CN
China
Prior art keywords
data
metadata
relationship
data tables
entity
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Pending
Application number
CN202210828302.2A
Other languages
Chinese (zh)
Inventor
王明
王天振
陈建欣
李印
庞艳蓓
付大超
李飞飞
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Alibaba Cloud Computing Ltd
Original Assignee
Alibaba Cloud Computing Ltd
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Alibaba Cloud Computing Ltd filed Critical Alibaba Cloud Computing Ltd
Priority to CN202210828302.2A priority Critical patent/CN115203435A/en
Publication of CN115203435A publication Critical patent/CN115203435A/en
Pending legal-status Critical Current

Links

Images

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F16/00Information retrieval; Database structures therefor; File system structures therefor
    • G06F16/30Information retrieval; Database structures therefor; File system structures therefor of unstructured textual data
    • G06F16/36Creation of semantic tools, e.g. ontology or thesauri
    • G06F16/367Ontology
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F16/00Information retrieval; Database structures therefor; File system structures therefor
    • G06F16/30Information retrieval; Database structures therefor; File system structures therefor of unstructured textual data
    • G06F16/31Indexing; Data structures therefor; Storage structures
    • G06F16/316Indexing structures
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F16/00Information retrieval; Database structures therefor; File system structures therefor
    • G06F16/30Information retrieval; Database structures therefor; File system structures therefor of unstructured textual data
    • G06F16/33Querying
    • G06F16/3331Query processing
    • G06F16/334Query execution

Abstract

The embodiment of the application provides an entity relation generation method based on a knowledge graph, a data query method, computing equipment and a computer storage medium. The entity relationship generation method based on the knowledge graph comprises the following steps: acquiring metadata of at least two data tables; analyzing the metadata, and determining the data structures of the at least two data tables and the incidence relation of the at least two data tables; generating a knowledge graph according to the data structure and the incidence relation; and performing visual rendering on the knowledge graph to generate an entity relationship graph. According to the technical scheme provided by the embodiment of the application, the association relationship among the data can be extracted and obtained by utilizing the reasoning capability of the domain knowledge of the knowledge graph, and then the entity relationship graph is generated by rendering according to the rich association relationship among the data displayed in the knowledge graph.

Description

Entity relation generation method and data query method based on knowledge graph
Technical Field
The embodiment of the application relates to the technical field of computers, in particular to an entity relationship generation method, a data query method, a computing device and a computer storage medium based on a knowledge graph.
Background
As data managed and stored by a data management system becomes increasingly complex, an entity relationship diagram is usually required to represent the association relationship between data tables and fields stored in the data management system, so that operation and maintenance personnel intuitively feel the association relationship between services maintained by the data management system.
In the related art, an entity relationship graph is usually generated by relying on a foreign key relationship established in a data table. Specifically, in the database design paradigm in the related art, a database table includes a primary key and a foreign key, the primary key is used to uniquely identify a row of data in the database table, and the field in the database table that represents a reference by another database table is called the foreign key. Based on the foreign key in the data table, it can be determined that the data table has an association relationship with another data table, so that the entity relationship graph can be generated based on the association relationship.
However, since the foreign key designed in the database table in the related art may affect the query performance of the data management system, in a scenario with high concurrent reading and writing, the redundant data design is usually performed on the data management system, that is, the foreign key is not designed in the database table.
Since the database table has no foreign key, the method for generating the entity relationship diagram based on the foreign key relationship in the correlation technique is not applicable, and how to provide a new method for generating the entity relationship diagram becomes a problem to be solved urgently.
Disclosure of Invention
The embodiment of the application provides an entity relationship generation method based on a knowledge graph, a data query method, a device, a computing device and a computer storage medium.
In a first aspect, an embodiment of the present application provides an entity relationship generation method based on a knowledge graph, including:
acquiring metadata of at least two data tables;
analyzing the metadata, and determining the data structures of the at least two data tables and the incidence relation of the at least two data tables;
generating a knowledge graph according to the data structure and the incidence relation;
and performing visual rendering on the knowledge graph to generate an entity relation graph.
In a second aspect, an embodiment of the present application provides a data query method, including:
receiving a data query instruction, wherein the data query instruction carries a field identifier of a target field and a table identifier of a data table to which the field identifier belongs;
determining an entity relationship diagram corresponding to the data table based on the table identification, wherein the entity relationship diagram is generated by performing visual rendering on a knowledge graph, the knowledge graph is generated according to data structures of at least two data tables determined by analyzing metadata acquired from a data management system and the association relationship of the at least two data tables, the entity relationship diagram comprises at least two entity groups generated in advance and the association relationship of the at least two entity groups, and each entity group corresponds to one data table and fields contained in the data table;
taking the field identification as index information to perform index operation aiming at the entity relationship graph, and acquiring index data associated with the field identification;
and outputting the index data.
In a third aspect, an embodiment of the present application provides a data processing apparatus, including:
the metadata acquisition module is used for acquiring metadata of at least two data tables;
the metadata analyzing module is used for analyzing the metadata and determining the data structures of the at least two data tables and the incidence relation of the at least two data tables;
the knowledge graph building module is used for generating a knowledge graph according to the data structure and the incidence relation;
and the rendering module is used for performing visual rendering on the knowledge graph to generate an entity relation graph.
In a fourth aspect, an embodiment of the present application provides a data query apparatus, including:
the instruction receiving module is used for receiving a data query instruction, wherein the data query instruction carries a field identifier of a target field and a table identifier of a data table to which the field identifier belongs;
the graph determining module is used for determining an entity relation graph corresponding to the data table based on the table identification, the entity relation graph is generated by performing visual rendering on a knowledge graph, the knowledge graph is generated according to the data structure of at least two data tables determined by analyzing metadata acquired from a data management system and the association relation of the at least two data tables, the entity relation graph comprises at least two entity groups generated in advance and the association relation of the at least two entity groups, and each entity group corresponds to one data table and the fields contained in the data table;
the index module is used for performing index operation on the entity relationship graph by taking the field identifier as index information to acquire index data associated with the field identifier;
and the output module is used for outputting the index data.
The method comprises the steps of acquiring metadata of at least two data tables; analyzing the metadata, and determining the data structures of the at least two data tables and the incidence relation of the at least two data tables; generating a knowledge graph according to the data structure and the incidence relation; according to the technical scheme, the data stored in the data management system are used for generating the knowledge graph, so that the reasoning capability of the domain knowledge of the knowledge graph can be used for extracting and obtaining the association relation between the data, and then the entity relation graph is generated by rendering according to the rich association relation between the data displayed in the knowledge graph, so that the entity relation graph is generated without depending on the foreign key relation.
These and other aspects of the present application will be more readily apparent from the following description of the embodiments.
Drawings
In order to more clearly illustrate the embodiments of the present application or the technical solutions in the prior art, the drawings used in the embodiments or the prior art descriptions will be briefly described below, and it is obvious that the drawings in the following descriptions are some embodiments of the present application, and other drawings can be obtained by those skilled in the art without creative efforts.
FIG. 1 is a flow chart schematically illustrating a method for generating an entity relationship based on a knowledge-graph according to an embodiment of the present invention;
FIG. 2 schematically illustrates a diagram of construction of a knowledge-graph in an embodiment of the invention;
FIG. 3 is a schematic diagram that schematically illustrates building a generated entity relationship graph, in accordance with an embodiment of the present invention;
FIG. 4 schematically illustrates a diagram of a knowledge-graph provided by another embodiment of the present invention;
FIG. 5 is a diagram that schematically illustrates a relationship diagram of entities provided by an alternative embodiment of the present invention;
FIG. 6 is a flow chart of a data query method according to an embodiment of the invention;
fig. 7 schematically shows a block diagram of a data processing apparatus provided by an embodiment of the present invention;
FIG. 8 is a block diagram schematically illustrating a data query device provided by an embodiment of the present invention;
FIG. 9 schematically illustrates a block diagram of a computing device provided by an embodiment of the invention.
Detailed Description
In order to make the technical solutions better understood by those skilled in the art, the technical solutions in the embodiments of the present application will be clearly and completely described below with reference to the drawings in the embodiments of the present application.
In some of the flows described in the specification and claims of this application and in the above-described figures, a number of operations are included that occur in a particular order, but it should be clearly understood that these operations may be performed out of order or in parallel as they occur herein, the number of operations, e.g., 101, 102, etc., merely being used to distinguish between various operations, and the number itself does not represent any order of performance. Additionally, the flows may include more or fewer operations, and the operations may be performed sequentially or in parallel. It should be noted that, the descriptions of "first", "second", etc. in this document are used for distinguishing different messages, devices, modules, etc., and do not represent a sequential order, nor limit the types of "first" and "second" to be different.
The technical solutions in the embodiments of the present application will be clearly and completely described below with reference to the drawings in the embodiments of the present application, and it is obvious that the described embodiments are only a part of the embodiments of the present application, and not all of the embodiments. All other embodiments, which can be derived by a person skilled in the art from the embodiments given herein without making any creative effort, shall fall within the protection scope of the present application.
Data assets are data collection resources owned or controlled by individuals or businesses that can be economically profitable, and are typically managed and maintained by a data management system. With the increasing abundance of data assets managed by data management systems, the association relationships between entities in data assets, or the association relationships between businesses, are often represented using entity relationship graphs.
In the related art, an entity relationship diagram is usually generated by relying on foreign key relationships in a data table. Specifically, in the database design in the related art, a database table includes a primary key and a foreign key, the primary key is used to uniquely identify the database table, and a field in the database table representing a reference by another database table is called a foreign key. Based on the foreign key in the data table, it can be determined that the data table has an association relationship with another data table, so that the entity relationship graph can be generated based on the association relationship.
However, since the foreign key designed in the database table in the related art may affect the query performance of the data management system, in a scenario with high concurrent reading and writing, the data management system is usually designed with redundant data, that is, the foreign key is no longer designed in the database table.
Because there is no longer a foreign key in the database table, the entity relationship graph generated by using the related art that depends on the foreign key relationship can no longer clearly display the association relationship between the entities in the data assets or the association relationship between the businesses.
In order to at least partially solve the technical problems in the related art, the embodiment of the invention provides a knowledge graph-based entity relationship generation method, which comprises the steps of acquiring metadata of at least two data tables; analyzing the metadata, and determining the data structures of the at least two data tables and the incidence relation of the at least two data tables; generating a knowledge graph according to the data structure and the incidence relation; according to the technical scheme, the data stored in the data management system are used for generating the knowledge graph, so that the inference capability of the domain knowledge of the knowledge graph can be used for extracting and obtaining the association relation between the data, and then the entity relationship graph is generated through rendering according to the rich association relation between the data displayed in the knowledge graph, so that the entity relationship graph is generated without depending on the foreign key relation.
Fig. 1 schematically shows a flowchart of a method for generating an entity relationship based on a knowledge-graph according to an embodiment of the present invention, where the method for generating an entity relationship based on a knowledge-graph may include the following steps:
101, obtaining metadata of at least two data tables.
And 102, analyzing the metadata, and determining the data structures of the at least two data tables and the association relation of the at least two data tables.
And 103, generating a knowledge graph according to the data structure and the association relation.
And 104, performing visual rendering on the knowledge graph to generate an entity relationship graph.
According to the embodiment of the present invention, the metadata of the at least two data tables may be obtained from a data management system for managing the service data of a specific service scenario, so that an entity relationship diagram for the service scenario may be generated, but is not limited thereto, and the metadata of the at least two data tables may also be obtained from other databases having a data storage function, where the metadata of the at least two data tables may be obtained from the same database, or may be obtained from at least two databases respectively.
According to embodiments of the present invention, a data management system may be used to store data assets, which may be stored in the form of data tables. Metadata associated with the data table may also be stored in the data management system.
According to an embodiment of the present invention, the metadata may be used to describe a data structure of a data table and an association relationship between the data table and other data tables.
According to the embodiment of the invention, before the metadata of at least two data tables are respectively obtained from the data management system, an entity relationship diagram generation instruction can be received, and the entity relationship diagram generation instruction can carry the table identifiers of the at least two data tables used for generating the entity relationship diagram. After the entity relationship diagram generation instruction is obtained, at least two data tables indicated by the entity relationship diagram generation instruction can be obtained in the data management system.
According to the embodiment of the invention, after the at least two data tables used for generating the entity relationship graph are determined, the metadata associated with the at least two data tables can be determined based on the table identifications of the at least two data tables from the data management system.
According to the embodiment of the invention, after the metadata corresponding to the at least two data tables are obtained, the metadata can be input into the graph database, so that the graph database analyzes the metadata to determine the data structures of the at least two data tables and the association relation of the at least two data tables, and the knowledge graph is generated according to the data structures and the association relation.
According to the embodiment of the invention, by utilizing the extraction capability of the knowledge graph on the data knowledge, the entities included in the data knowledge and the relationship between the entities can be extracted, and the entities and the relationship between the entities can be displayed in the form of a graph.
According to the embodiment of the invention, after the knowledge graph is generated, the knowledge graph can be visually rendered, and the knowledge graph can be converted into the entity relation graph.
In the embodiment of the invention, the data structures and rich association relations of at least two data tables can be extracted and refined from the metadata by utilizing the extraction capability of the knowledge graph on the data knowledge, so that the generated knowledge graph can be directly visually rendered without depending on the foreign key relation stored in the data tables to generate the entity relation graph.
According to the embodiment of the present invention, analyzing the metadata, determining the data structures of the at least two data tables, and the association relationship between the at least two data tables may specifically be implemented as:
analyzing the metadata, and determining fields contained in at least two data tables and the association relation between different fields.
According to an embodiment of the present invention, metadata is data describing data, and may be used to describe a data structure of the data and an association relationship between the data and other data.
According to embodiments of the present invention, the data schema may refer to which fields are stored in the data table and in which database the data table is stored.
According to the embodiment of the invention, by analyzing the metadata, which fields are stored in the data table and the association relationship between the fields in the data table and the fields in other data tables can be determined.
According to the embodiment of the present invention, the generation of the knowledge graph according to the data structure and the association relationship may be specifically implemented as follows:
respectively taking the at least two data tables and fields contained in the at least two data tables as nodes;
and determining edges between different nodes according to the association relationship between different fields and the inclusion relationship between the different fields and at least two data tables so as to generate the knowledge graph.
According to the embodiment of the invention, the nodes corresponding to at least two data tables and the nodes corresponding to a plurality of fields respectively can be constructed in the initial knowledge graph.
According to the embodiment of the invention, after the generation nodes are constructed, the mapping relation between each node and the data table or the field can be saved.
According to the embodiment of the invention, after the generation node is constructed, the node for characterizing the data table and the node corresponding to at least one field included in the data table may be connected through an edge, and the edge characterizes the inclusion relationship between the data table and the field.
According to the embodiment of the invention, the association relationship between the fields respectively contained in the at least two data tables can be obtained, and the nodes corresponding to the at least two fields with the association relationship are connected through the edges.
According to the embodiment of the present invention, the specific implementation of taking the fields contained in the at least two data tables and the at least two data tables as nodes is as follows:
and respectively taking the at least two data tables as main nodes, and taking fields contained in the at least two data tables as child nodes.
According to an embodiment of the present invention, the following tables 1 and 2 respectively represent data tables acquired from a data management system.
TABLE 1
Name of field Description of the field
s_name Trade name
s_number Numbering
id Order number
price Price
TABLE 2
Name of field Description of the field
s_name Trade name
c_name Name of storehouse
c_city Warehouse city
c_address Warehouse address
As shown in tables 1 and 2, table 1 may be an order table and Table 2 may be a warehouse data table, with tables 1 and 2 each having 4 fields. In table 1, the order corresponding to the order table is an order placed for a product with a product name s _ name. In table 2, the warehouse data table is a data table of the warehouse in which the s _ name commodity is stored. The s _ name in table 1 and the s _ name in table 2 represent the same commodity, and thus, the two fields have an association relationship.
FIG. 2 schematically shows a schematic diagram of construction of a knowledge-graph in an embodiment of the invention.
In fig. 2, tables 1 and 2 shown in the above embodiments are used.
In this embodiment, the main node 210 corresponding to table 1 may be first constructed in the initial knowledge graph, and then, the plurality of fields included in table 1 are obtained, and the child node 211, the child node 212, the child node 213, and the child node 214 corresponding to the plurality of fields are respectively constructed. Finally, child node 211, child node 212, child node 213, child node 214 may be connected to master node 210 by edges based on table 1's inclusion relationship with the plurality of fields.
Referring to the construction of table 1, a principal node 220 corresponding to table 2, and child nodes 221, 222, 223, 224 corresponding to a plurality of fields contained in table 2, and edges between the principal node 220 and the child nodes 221, 222, 223, 224 may be constructed in the initial knowledge-graph.
After the node is constructed, the association relationship between the fields contained in table 1 and the fields contained in table 2 may be obtained.
Since the s _ name field in table 1 and the s _ name field in table 2 have an association relationship, the child node 211 corresponding to the s _ name in table 1 and the child node 221 corresponding to the s _ name in table 2 may be connected by an edge, so as to generate the knowledge graph.
According to the embodiment of the invention, the generation of the entity relationship diagram by performing visual rendering on the knowledge graph can be specifically realized as follows:
constructing an entity based on any main node and the associated child nodes in the knowledge graph;
and constructing the incidence relation of different entities based on the edges of different child nodes in the knowledge graph so as to generate an entity relation graph.
According to the embodiment of the invention, the entity in the entity relationship graph can be generated based on any main node and the child nodes connected with the main node through the edges in the knowledge graph.
According to an embodiment of the present invention, based on any master node and its associated child nodes in the knowledge-graph, the construction entity may specifically be implemented as:
acquiring at least two node clusters from the knowledge graph, wherein each node cluster comprises a main node and sub-nodes connected with the main node;
and performing visual rendering on the at least two node clusters to generate entities respectively corresponding to the at least two node clusters.
In this embodiment, the knowledge-graph illustrated in FIG. 2 above may be used.
In fig. 2, master node 210, child node 211, child node 212, child node 213, and child node 214 may form a node cluster. After the node cluster is obtained, the node cluster can be visually rendered to generate an entity in the entity relationship graph.
Then, a node cluster composed of the master node 220, the child node 221, the child node 222, the child node 223, and the child node 224 may be obtained, and the node cluster is visually rendered to generate another entity in the entity relationship diagram.
According to the embodiment of the invention, after the entities in the entity relationship graph are constructed and generated, the connection relationship among different entities can be constructed based on the edges among the child nodes represented in the knowledge graph to generate the entity relationship graph.
FIG. 3 is a diagram schematically illustrating the construction of a generated entity relationship diagram according to an embodiment of the present invention.
The entity relationship diagram shown in the embodiment of the present invention may be generated by performing visual rendering on the knowledge-graph shown in fig. 2.
As shown in fig. 3, the two entities are included, that is, the entity 310 and the entity 320, where the entity 310 may correspond to table 1, 311 may represent a table name, and 312, 313, 314, 315 may represent field names of a plurality of fields included in the entity; the entity 320 may correspond to table 2, 321 may represent a table name, and 322, 323, 324, 325 may represent field names of a plurality of fields contained by the entity.
In an embodiment of the present invention, the entity 310 may perform a visual rendering generation on a node cluster composed of the main node 210 and the plurality of sub nodes 211, 212, 213, 214, and the entity 320 may perform a visual rendering generation on a node cluster composed of the main node 220 and the plurality of sub nodes 221, 222, 223, 224.
The entity 310 and the entity 320 are connected through an edge, and the edge indicates that fields contained in the entity 310 have an association relationship with fields contained in the entity 320.
It should be noted that, for convenience of description, the fields included in the data table and the association relationship between the fields are simplified in the embodiment of the present invention, and in practical applications, the data structure and the association relationship of the data table are analyzed by inputting the metadata of the data table into the database, so that the knowledge graph with rich edge relationships can be generated. Furthermore, the entity relation graph generated according to the knowledge graph can also display the association relation among abundant fields.
FIG. 4 schematically illustrates a diagram of a knowledge-graph provided by another embodiment of the present invention.
As shown in FIG. 4, the knowledge-graph includes a plurality of nodes, and the nodes have rich edge relationships. For example, the plurality of nodes enclosed by the block 401 includes a main node 401 and a plurality of sub-nodes 402, that is, the plurality of nodes enclosed by the block form a node cluster. Similarly, FIG. 4 also includes a plurality of different node clusters.
Fig. 5 is a schematic diagram schematically illustrating an entity relationship diagram according to another embodiment of the present invention.
The entity relationship diagram shown in fig. 5 may be generated by visually rendering the knowledge-graph shown in fig. 4.
As shown in fig. 5, the entity 501, the entity 502, the entity 503, the entity 504, the entity 505 and the entity 506 all have an association relationship, and the entity 503, the entity 504 and the entity 506 have more association relationships, so that it can be concluded that the services corresponding to the entity 503 and the entity 504, respectively, are more closely related to the service corresponding to the entity 506.
As can be seen from fig. 5, the entity relationship diagram generated by the method for generating entity relationships based on a knowledge graph according to the embodiment of the present invention can show rich association relationships between data tables and fields stored in a data management system without depending on foreign key relationships.
According to an embodiment of the invention, the metadata comprises physical metadata as well as relational metadata.
According to the embodiment of the present invention, analyzing the metadata and determining the fields respectively included in the at least two data tables and the association relationship between different fields may specifically be implemented as:
and analyzing the physical metadata to determine the data structures of at least two data tables.
Determining fields respectively included by at least two data tables according to the data structure;
physical metadata may include, for example, instance metadata, library metadata, table metadata, column metadata, index metadata, and the like, in accordance with embodiments of the present invention.
And analyzing the relation metadata, and determining the association relation between different fields corresponding to at least two data tables.
According to an embodiment of the invention, the relational metadata comprises a structured query statement.
According to the embodiment of the present invention, analyzing the relationship metadata and determining the association relationship between the fields included in the at least two data tables may specifically be implemented as:
and analyzing the structured query statement, and determining the association relation between the fields in the at least two data tables.
According to an embodiment of the present invention, the structured query language is a language in which the data management system operates.
According to the embodiment of the invention, by analyzing the structured query statement, the related operations on the fields in the at least two data tables when the user operates the data management system in the historical period can be obtained, so that the association relationship between the fields in the at least two data tables can be determined based on the related operations.
The related operations of the fields in the at least two data tables involved in the historical operations can be used for representing the historical operation relationship between different fields in the same business data table or different business data tables, including but not limited to the user performing related query on the at least two data tables. Therefore, the association relationship between different fields in each business data table can be determined according to the historical operation data which is stored in the data management system and is related to each business data table.
For example, the user makes an associated query on specified fields in the first service data table and the second service data table, that is, when the user queries data, the required data is not only in the first service data table or the second service data table, but also in the first service data table and the second service data table. At this time, it is necessary to directly rely on the relevant fields in the first service data table and the second service data table for data query at the same time, or indirectly rely on the relevant fields in the first service data table and the second service data table for data query, so that it can be determined that the first service data table and the second service data table have an association relationship based on the relevant fields.
According to an embodiment of the present invention, the relationship metadata is obtained by:
respectively acquiring table identifiers of at least two data tables;
acquiring target structured query sentences related to two data tables from a plurality of structured query sentences stored in a data management system based on the table identification;
relational metadata is determined from the target structured query statement.
According to the embodiment of the invention, the data management system can store the structured query statement executed by the user in the historical period.
According to the embodiment of the invention, after the entity relationship diagram generation instruction is analyzed, the table identifiers of at least two data tables used for generating the entity relationship diagram and indicated by the entity relationship diagram generation instruction can be obtained. Based on the obtained table identifier, traversal query can be performed on a plurality of historical structured query statements stored by the data management system, and at least one target structured query statement matched with the table identifier is determined.
According to another embodiment of the invention, after the initial knowledge graph is constructed based on the fields contained in at least two business data tables, the field information in each business data table stored in the data management system can be obtained, the knowledge of the field information is extracted in a machine learning mode, and the extraction result is determined as the relation metadata. The field information can be specifically input into a pre-trained text processing model for semantic recognition, so that similarity calculation is performed on the field information in each business data table according to a semantic recognition result, and then an association relation between two fields with the similarity larger than a preset similarity threshold is determined according to the similarity calculation result.
According to an embodiment of the present invention, the relationship metadata may be acquired by:
receiving the relationship description information of fields contained in at least two data tables input by a user;
the relationship description information is taken as relationship metadata.
According to the embodiment of the invention, on the basis of mining the association relationship between at least two data tables from the data stored in the data management system, the relationship description information of the fields contained in the at least two data tables input by the user can be obtained, and the relationship description information can be relationship information implicitly existing in the fields contained in the at least two data tables determined by the user based on business or operation experience. After the relationship description information input by the user is obtained, the relationship description information can be used as relationship metadata to enrich the edge relationship in the knowledge graph constructed based on the relationship metadata, so that the accuracy of the knowledge graph in describing knowledge of at least two data tables is improved.
According to an embodiment of the present invention, determining the relationship metadata according to the target structured query statement may be specifically implemented as:
determining a function calculation relation of at least two data tables contained in a target structured query statement;
the functional computation relationship is determined as relationship metadata.
According to another embodiment of the present invention, determining the relationship metadata according to the target structured query statement may be specifically implemented as:
determining the connection relation of at least two data tables contained in the target structured query statement;
the connection relationship is determined as relationship metadata.
According to another embodiment of the present invention, determining the relationship metadata according to the target structured query statement may be specifically implemented as:
determining ETL (extraction of data, cleaning conversion of data and loading of data) processing relations of at least two data tables contained in the target structured query statement;
the ETL processing relationship is determined as relationship metadata.
According to an embodiment of the present invention, the relationship metadata is obtained by:
performing pattern matching operation on at least two data tables to generate a matching result;
and determining the matching result as the relation metadata.
According to the embodiment of the invention, the fields contained in the at least two data tables can be matched with each other, at least two fields with the identification degrees larger than the preset threshold value in the at least two data tables are calculated through matching, and the at least two fields are determined to have the association relationship.
Fig. 6 schematically shows a flowchart of a data query method provided by an embodiment of the present invention, where the data query method may include the following steps:
601, receiving a data query instruction, wherein the data query instruction carries a field identifier of a target field and a table identifier of a data table to which the field identifier belongs;
the method comprises the steps that 602, an entity relation graph corresponding to a data table is determined based on table identification, the entity relation graph is generated by performing visual rendering on a knowledge graph, the knowledge graph is generated according to the data structure of at least two data tables determined by analyzing metadata acquired from a data management system and the association relation of the at least two data tables, the entity relation graph comprises at least two entity groups generated in advance and the association relation of the at least two entity groups, and each entity group corresponds to one data table and fields contained in the data table;
603, performing index operation on the entity relationship diagram by taking the field identifier as index information, and acquiring index data associated with the field identifier;
604, the index data is output.
The specific steps of generating the entity relationship diagram may refer to the method for generating an entity relationship based on a knowledge graph shown in fig. 1, and are not described in detail in this embodiment.
According to the embodiment of the invention, after the entity relationship diagram is constructed and generated, the entity relationship diagram can clearly show the association relationship among the fields contained in the data table stored in the data management system, so that after the data query instruction is received, the entity relationship diagram corresponding to the table identifier carried by the data query instruction is determined, and the index data associated with the data table indicated by the data query instruction can be acquired in a manner of indexing the entity relationship diagram.
Fig. 7 schematically shows a block diagram of a data processing apparatus according to an embodiment of the present invention, and the data processing apparatus 700 may include a metadata obtaining module 701, a metadata parsing module 702, a knowledge graph building module 703, and a rendering module 704.
A metadata obtaining module 701, configured to obtain metadata of at least two data tables;
a metadata parsing module 702, configured to parse metadata, and determine data structures of at least two data tables and an association relationship between the at least two data tables;
a knowledge graph constructing module 703, configured to generate a knowledge graph according to the data structure and the association relationship;
and the rendering module 704 is used for performing visual rendering on the knowledge graph to generate an entity relationship graph.
According to an embodiment of the present invention, the metadata parsing module 702 is specifically configured to:
analyzing the metadata, and determining fields contained in at least two data tables and association relations between different fields.
According to an embodiment of the present invention, the knowledge-graph constructing module 703 is specifically configured to:
respectively taking the at least two data tables and fields contained in the at least two data tables as nodes;
and determining edges between different nodes according to the incidence relation between different fields and the inclusion relation between the different fields and at least two data tables so as to generate the knowledge graph.
According to an embodiment of the present invention, the knowledge-graph constructing module 703 is specifically configured to:
and respectively taking the at least two data tables as main nodes, and taking fields contained in the at least two data tables as child nodes.
According to an embodiment of the present invention, the rendering module 704 is specifically configured to:
constructing an entity based on any main node and the associated child nodes in the knowledge graph;
and constructing incidence relations of different entities based on edges of different child nodes in the knowledge graph so as to generate an entity relation graph.
According to an embodiment of the present invention, the rendering module 704 is specifically configured to:
acquiring at least two node clusters from a knowledge graph, wherein each node cluster comprises a main node and sub-nodes connected with the main node;
and performing visual rendering on the at least two node clusters to generate entities respectively corresponding to the at least two node clusters.
According to an embodiment of the present invention, the metadata includes physical metadata and relational metadata;
according to an embodiment of the present invention, the metadata parsing module 702 is specifically configured to:
analyzing the physical metadata and determining the data structures of at least two data tables;
determining fields respectively included by at least two data tables according to the data structure;
and analyzing the relation metadata, and determining the association relation between different fields corresponding to at least two data tables.
According to an embodiment of the invention, the relational metadata comprises a structured query statement;
according to an embodiment of the present invention, the metadata parsing module 702 is specifically configured to:
and analyzing the structured query statement, and determining the association relation between the fields in the at least two data tables.
According to an embodiment of the present invention, the data processing apparatus 700 further includes a metadata obtaining module, specifically configured to:
respectively acquiring table identifiers of at least two data tables;
acquiring target structured query sentences related to two data tables from a plurality of structured query sentences stored in a data management system based on the table identification;
the relational metadata is determined from the target structured query statement.
According to an embodiment of the present invention, the data processing apparatus 700 further includes a metadata obtaining module, specifically configured to:
performing pattern matching operation on at least two data tables to generate a matching result;
and determining the matching result as the relation metadata.
The data processing apparatus in fig. 7 may execute the method for generating an entity relationship based on a knowledge graph in the embodiment shown in fig. 1, and the implementation principle and the technical effect are not repeated. The specific manner in which each module and unit of the data processing apparatus in the above embodiments perform operations has been described in detail in the embodiments related to the method, and will not be described in detail herein.
Fig. 8 schematically shows a block diagram of a data query apparatus 800 according to an embodiment of the present invention, and the data query apparatus 800 may include an instruction receiving module 801, a graph determining module 802, an indexing module 803, and an output module 804.
An instruction receiving module 801, configured to receive a data query instruction, where the data query instruction carries a field identifier of a target field and a table identifier of a data table to which the field identifier belongs;
a graph determining module 802, configured to determine an entity relationship graph corresponding to a data table based on a table identifier, where the entity relationship graph is generated by performing visual rendering on a knowledge graph, the knowledge graph is generated according to a data structure of at least two data tables determined by analyzing metadata acquired from a data management system and an association relationship between the at least two data tables, the entity relationship graph includes at least two entity groups generated in advance and an association relationship between the at least two entity groups, and each entity group corresponds to one data table and a field included in the data table;
an indexing module 803, configured to perform an indexing operation on the entity relationship diagram by using the field identifier as index information, and acquire index data associated with the field identifier;
and an output module 804, configured to output the index data.
The data query apparatus shown in fig. 8 may execute the data query method shown in the embodiment shown in fig. 6, and the implementation principle and the technical effect are not repeated. The specific manner in which each module and unit of the data query device in the above embodiments perform operations has been described in detail in the embodiments related to the method, and will not be elaborated herein.
In one possible design, the data processing apparatus and the data query apparatus provided in the embodiment of the present invention may be implemented as a computing device, as shown in fig. 9, the computing device may include a storage component 901 and a processing component 902;
the storage component 901 stores one or more computer instructions, wherein the one or more computer instructions are used for the processing component 902 to invoke and execute so as to implement the method for generating entity relationship based on knowledge graph and/or the method for querying data provided by the embodiment of the present invention.
Of course, a computing device may also necessarily include other components, such as input/output interfaces, communication components, and so forth. The input/output interface provides an interface between the processing components and peripheral interface modules, which may be output devices, input devices, etc. The communication component is configured to facilitate wired or wireless communication between the computing device and other devices, and the like.
The computing device may be a physical device or an elastic computing host provided by a cloud computing platform, and the computing device may be a cloud server, and the processing component, the storage component, and the like may be a basic server resource rented or purchased from the cloud computing platform.
When the computing device is a physical device, the computing device may be implemented as a distributed cluster consisting of a plurality of servers or terminal devices, or may be implemented as a single server or a single terminal device.
In practical application, the computing device may specifically deploy a node in the message queue system, and implement the node as a producer, a consumer, a transit server, a naming server, or the like in the message queue system.
The embodiment of the present invention further provides a computer-readable storage medium, in which a computer program is stored, and when the computer program is executed by a computer, the method for generating an entity relationship and/or a method for querying data based on a knowledge graph according to the embodiment of the present invention may be implemented.
The embodiment of the present invention further provides a computer program product, which includes a computer program, and when the computer program is executed by a computer, the method for generating an entity relationship and/or a method for querying data based on a knowledge graph according to the embodiment of the present invention may be implemented.
The processing components in the respective embodiments above may include one or more processors executing computer instructions to perform all or part of the steps of the methods described above. Of course, the processing elements may also be implemented as one or more Application Specific Integrated Circuits (ASICs), digital Signal Processors (DSPs), digital Signal Processing Devices (DSPDs), programmable Logic Devices (PLDs), field Programmable Gate Arrays (FPGAs), controllers, micro-controllers, microprocessors or other electronic components configured to perform the above-described methods.
The storage component is configured to store various types of data to support operations in the device. The storage component may be implemented by any type or combination of volatile or non-volatile memory devices such as Static Random Access Memory (SRAM), electrically erasable programmable read-only memory (EEPROM), erasable programmable read-only memory (EPROM), programmable read-only memory (PROM), read-only memory (ROM), magnetic memory, flash memory, magnetic or optical disks.
It is clear to those skilled in the art that, for convenience and brevity of description, the specific working processes of the above-described systems, apparatuses and units may refer to the corresponding processes in the foregoing method embodiments, and are not described herein again.
The above-described embodiments of the apparatus are merely illustrative, and the units described as separate parts may or may not be physically separate, and parts displayed as units may or may not be physical units, may be located in one position, or may be distributed on multiple network units. Some or all of the modules may be selected according to actual needs to achieve the purpose of the solution of this embodiment. One of ordinary skill in the art can understand and implement it without inventive effort.
Through the above description of the embodiments, those skilled in the art will clearly understand that each embodiment can be implemented by software plus a necessary general hardware platform, and certainly can also be implemented by hardware. With this understanding in mind, the above-described technical solutions may be embodied in the form of a software product, which can be stored in a computer-readable storage medium such as ROM/RAM, magnetic disk, optical disk, etc., and includes instructions for causing a computer device (which may be a personal computer, a server, or a network device, etc.) to execute the methods described in the embodiments or some parts of the embodiments.
Finally, it should be noted that: the above examples are only intended to illustrate the technical solution of the present invention, but not to limit it; although the present invention has been described in detail with reference to the foregoing embodiments, it will be understood by those of ordinary skill in the art that: the technical solutions described in the foregoing embodiments may still be modified, or some technical features may be equivalently replaced; and such modifications or substitutions do not depart from the spirit and scope of the corresponding technical solutions of the embodiments of the present invention.

Claims (14)

1. A knowledge graph-based entity relationship generation method comprises the following steps:
acquiring metadata of at least two data tables;
analyzing the metadata, and determining the data structures of the at least two data tables and the incidence relation of the at least two data tables;
generating a knowledge graph according to the data structure and the incidence relation;
and performing visual rendering on the knowledge graph to generate an entity relationship graph.
2. The method of claim 1, wherein the parsing the metadata, determining the data structures of the at least two data tables, and the association of the at least two data tables comprises:
analyzing the metadata, and determining the fields respectively contained in the at least two data tables and the incidence relation between different fields.
3. The method of claim 2, the generating a knowledge graph from the data structure and the associations comprising:
taking the at least two data tables and fields contained in the at least two data tables as nodes respectively;
and determining edges between different nodes according to the incidence relation between different fields and the inclusion relation between the different fields and the at least two data tables so as to generate the knowledge graph.
4. The method according to claim 3, wherein the step of using the fields contained in the at least two data tables and the fields contained in the at least two data tables as nodes comprises the steps of:
and respectively taking the at least two data tables as main nodes, and taking fields contained in the at least two data tables as child nodes.
5. The method of claim 4, the visually rendering the knowledge-graph, generating an entity relationship graph comprising:
constructing an entity based on any main node and the associated child nodes in the knowledge graph;
and constructing association relations of different entities based on edges of different child nodes in the knowledge graph so as to generate the entity relation graph.
6. The method of claim 5, wherein building an entity based on any of the master nodes and their associated child nodes in the knowledge-graph comprises:
acquiring at least two node clusters from the knowledge graph, wherein each node cluster comprises a main node and sub-nodes connected with the main node;
and performing visual rendering on the at least two node clusters to generate entities respectively corresponding to the at least two node clusters.
7. The method of claim 2, the metadata comprising physical metadata and relational metadata;
the analyzing the metadata and determining the association relationship between the fields and different fields respectively contained in the at least two data tables comprises:
analyzing the physical metadata and determining the data structures of the at least two data tables;
determining fields respectively included by the at least two data tables according to the data structure;
and analyzing the relationship metadata, and determining the association relationship between different fields corresponding to the at least two data tables.
8. The method of claim 7, the relational metadata comprising a structured query statement;
the analyzing the relationship metadata and determining the association relationship between the fields included in the at least two data tables includes:
and analyzing the structured query statement, and determining the association relationship between the fields in the at least two data tables.
9. The method of claim 7, the relationship metadata obtained by:
respectively acquiring the table identifications of the at least two data tables;
acquiring target structured query sentences related to the two data tables from a plurality of structured query sentences stored in the data management system based on the table identification;
determining the relational metadata according to the target structured query statement.
10. The method of claim 7, the relationship metadata obtained by:
performing pattern matching operation on the at least two data tables to generate a matching result;
determining the matching result as the relationship metadata.
11. The method of claim 7, further comprising:
receiving the relation description information of fields contained in the at least two data tables input by a user;
and taking the relationship description information as the relationship metadata.
12. A method of data query, comprising:
receiving a data query instruction, wherein the data query instruction carries a field identifier of a target field and a table identifier of a data table to which the field identifier belongs;
determining an entity relationship diagram corresponding to the data table based on the table identification, wherein the entity relationship diagram is generated by performing visual rendering on a knowledge graph, the knowledge graph is generated according to data structures of at least two data tables determined by analyzing metadata acquired from a data management system and the association relationship of the at least two data tables, the entity relationship diagram comprises at least two entity groups generated in advance and the association relationship of the at least two entity groups, and each entity group corresponds to one data table and fields contained in the data table;
taking the field identification as index information to perform index operation aiming at the entity relationship graph, and acquiring index data associated with the field identification;
and outputting the index data.
13. A computing device comprising a processing component and a storage component;
the storage component stores one or more computer instructions; the one or more computer instructions are configured to be invoked for execution by the processing component to implement a knowledge-graph based entity relationship generation method of any one of claims 1 to 11, or to implement a data query method of claim 12.
14. A computer storage medium storing a computer program which, when executed by a computer, implements the knowledge-graph-based entity relationship generation method according to any one of claims 1 to 11, or implements the data query method according to claim 12.
CN202210828302.2A 2022-07-13 2022-07-13 Entity relation generation method and data query method based on knowledge graph Pending CN115203435A (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN202210828302.2A CN115203435A (en) 2022-07-13 2022-07-13 Entity relation generation method and data query method based on knowledge graph

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN202210828302.2A CN115203435A (en) 2022-07-13 2022-07-13 Entity relation generation method and data query method based on knowledge graph

Publications (1)

Publication Number Publication Date
CN115203435A true CN115203435A (en) 2022-10-18

Family

ID=83581504

Family Applications (1)

Application Number Title Priority Date Filing Date
CN202210828302.2A Pending CN115203435A (en) 2022-07-13 2022-07-13 Entity relation generation method and data query method based on knowledge graph

Country Status (1)

Country Link
CN (1) CN115203435A (en)

Cited By (1)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN116028597A (en) * 2023-03-27 2023-04-28 南京燧坤智能科技有限公司 Object retrieval method, device, nonvolatile storage medium and computer equipment

Cited By (2)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN116028597A (en) * 2023-03-27 2023-04-28 南京燧坤智能科技有限公司 Object retrieval method, device, nonvolatile storage medium and computer equipment
CN116028597B (en) * 2023-03-27 2023-07-21 南京燧坤智能科技有限公司 Object retrieval method, device, nonvolatile storage medium and computer equipment

Similar Documents

Publication Publication Date Title
CN111159184B (en) Metadata tracing method and device and server
WO2021068547A1 (en) Log schema extraction method and apparatus
US11042581B2 (en) Unstructured data clustering of information technology service delivery actions
CN111061833A (en) Data processing method and device, electronic equipment and computer readable storage medium
US9305076B1 (en) Flattening a cluster hierarchy tree to filter documents
EP3916584A1 (en) Information processing method and apparatus, electronic device and storage medium
US11741379B2 (en) Automated resolution of over and under-specification in a knowledge graph
CN115858513A (en) Data governance method, data governance device, computer equipment and storage medium
CN110737432A (en) script aided design method and device based on root list
CN115203435A (en) Entity relation generation method and data query method based on knowledge graph
US20220051126A1 (en) Classification of erroneous cell data
CN113760891A (en) Data table generation method, device, equipment and storage medium
Raad et al. Constructing and cleaning identity graphs in the LOD cloud
CN115495587A (en) Alarm analysis method and device based on knowledge graph
CN115269862A (en) Electric power question-answering and visualization system based on knowledge graph
CN110837365A (en) Script aided design method and device based on root table
CN116483735B (en) Method, device, storage medium and equipment for analyzing influence of code change
CN117389908B (en) Dependency analysis method, system and medium for interface automation test case
CN115458103B (en) Medical data processing method, medical data processing device, electronic equipment and readable storage medium
CN109871318B (en) Key class identification method based on software operation network
Li et al. A Vectorization Method to Cloud Service Instance Data for Service Compliance
CN117390023A (en) Data aggregation method, data aggregation device, apparatus, and storage medium
CN115705327A (en) Data processing method and device, electronic equipment and medium
CN114691660A (en) Blood relationship atlas tracing method and device
CN112214556A (en) Label generation method and device, electronic equipment and computer readable storage medium

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination