CN111324609B - Knowledge graph construction method and device, electronic equipment and storage medium - Google Patents

Knowledge graph construction method and device, electronic equipment and storage medium Download PDF

Info

Publication number
CN111324609B
CN111324609B CN202010096405.5A CN202010096405A CN111324609B CN 111324609 B CN111324609 B CN 111324609B CN 202010096405 A CN202010096405 A CN 202010096405A CN 111324609 B CN111324609 B CN 111324609B
Authority
CN
China
Prior art keywords
entity
name
relation
cell
column
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Active
Application number
CN202010096405.5A
Other languages
Chinese (zh)
Other versions
CN111324609A (en
Inventor
熊俊宇
魏琪康
周煜
钟黎
刘黎春
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Tencent Cloud Computing Beijing Co Ltd
Original Assignee
Tencent Cloud Computing Beijing Co Ltd
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Tencent Cloud Computing Beijing Co Ltd filed Critical Tencent Cloud Computing Beijing Co Ltd
Priority to CN202010096405.5A priority Critical patent/CN111324609B/en
Publication of CN111324609A publication Critical patent/CN111324609A/en
Application granted granted Critical
Publication of CN111324609B publication Critical patent/CN111324609B/en
Active legal-status Critical Current
Anticipated expiration legal-status Critical

Links

Images

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F16/00Information retrieval; Database structures therefor; File system structures therefor
    • G06F16/20Information retrieval; Database structures therefor; File system structures therefor of structured data, e.g. relational data
    • G06F16/22Indexing; Data structures therefor; Storage structures
    • G06F16/2228Indexing structures
    • G06F16/2237Vectors, bitmaps or matrices
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F16/00Information retrieval; Database structures therefor; File system structures therefor
    • G06F16/20Information retrieval; Database structures therefor; File system structures therefor of structured data, e.g. relational data
    • G06F16/22Indexing; Data structures therefor; Storage structures
    • G06F16/2282Tablespace storage structures; Management thereof
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F16/00Information retrieval; Database structures therefor; File system structures therefor
    • G06F16/20Information retrieval; Database structures therefor; File system structures therefor of structured data, e.g. relational data
    • G06F16/24Querying
    • G06F16/242Query formulation
    • G06F16/2433Query languages
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F16/00Information retrieval; Database structures therefor; File system structures therefor
    • G06F16/20Information retrieval; Database structures therefor; File system structures therefor of structured data, e.g. relational data
    • G06F16/25Integrating or interfacing systems involving database management systems
    • G06F16/258Data format conversion from or to a database
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F16/00Information retrieval; Database structures therefor; File system structures therefor
    • G06F16/20Information retrieval; Database structures therefor; File system structures therefor of structured data, e.g. relational data
    • G06F16/28Databases characterised by their database models, e.g. relational or object models
    • G06F16/284Relational databases
    • G06F16/288Entity relationship models

Abstract

The application discloses a knowledge graph construction method, a device, electronic equipment and a storage medium, so as to reduce the difficulty of creating a knowledge graph, wherein the method comprises the following steps: acquiring tables corresponding to at least two entity categories respectively, wherein each table comprises a name column and at least one relation column, the name column comprises entity names of entities belonging to the corresponding entity category, and each relation column comprises an association relation between the entities in the table and the entities in any other table; generating first sentences corresponding to the entity names according to the entity names extracted from the name columns of the tables, wherein each first sentence is used for adding entity nodes corresponding to the entity names in the graphic database; generating second sentences corresponding to the association relations according to the association relations extracted from the association series of each table, wherein each second sentence is used for adding edges representing the association relations between entity nodes; and calling the first statement and the second statement which are generated by analyzing the graphic database to obtain a knowledge graph.

Description

Knowledge graph construction method and device, electronic equipment and storage medium
Technical Field
The present disclosure relates to the field of computer technologies, and in particular, to a knowledge graph construction method, a knowledge graph construction device, an electronic device, and a storage medium.
Background
The construction of knowledge maps generally involves a relatively complex flow: firstly, a professional processes data to be processed into a data form which accords with the importing requirement of a graphic database according to the requirement of the graphic database; and then, manually writing a complex import statement, importing the data to be processed meeting the import requirement into a graphic database by using the import statement, and generating a corresponding knowledge graph by the graphic database. However, the process of manually processing the data is very prone to error, the difficulty of data maintenance is high, a user is required to learn a using method of the graphic database, and the using threshold is very high.
Disclosure of Invention
The embodiment of the application provides a knowledge graph construction method, a knowledge graph construction device, electronic equipment and a storage medium, so that the difficulty of creating the knowledge graph is reduced, and a common user can conveniently and efficiently complete the creation of the knowledge graph.
In one aspect, an embodiment of the present application provides a knowledge graph construction method, including:
acquiring tables corresponding to at least two entity categories respectively, wherein each table comprises a name column and at least one relation column, the name column comprises entity names of entities belonging to the corresponding entity category, and each relation column comprises an association relation between the entities in the table and the entities in any other table;
Generating first sentences corresponding to the entity names according to the entity names extracted from the name columns of the tables, wherein each first sentence is used for adding entity nodes corresponding to the entity names in the graphic database;
generating second sentences corresponding to the association relations according to the association relations extracted from the association series of each table, wherein each second sentence is used for adding edges representing the association relations between entity nodes;
and calling the first statement and the second statement which are generated by analyzing the graphic database to obtain a knowledge graph.
In one aspect, an embodiment of the present application provides a knowledge graph construction method, including:
responding to a construction map instruction input through a file uploading page, and sending a file to be uploaded to a background server, wherein the file to be uploaded comprises: each table comprises a name column and at least one relation column, wherein the name column comprises entity names of entities belonging to the corresponding entity categories, and each relation column comprises an association relation between the entities in the table and the entities in any other table;
and displaying a knowledge graph sent by the background server, wherein the knowledge graph is a graph which is generated by the background server according to the file to be uploaded and used for describing the association relation among all the entities.
In one aspect, an embodiment of the present application provides a knowledge graph construction apparatus, including:
the system comprises an acquisition module, a storage module and a storage module, wherein the acquisition module is used for acquiring tables corresponding to at least two entity categories respectively, each table comprises a name column and at least one relation column, the name column comprises entity names of entities belonging to the corresponding entity category, and each relation column comprises an association relation between the entities in the table and the entities in any other table;
the first generation module is used for respectively generating first sentences corresponding to the entity names according to the entity names extracted from the name columns of the tables, and each first sentence is used for adding entity nodes corresponding to the entity names in the graphic database;
the second generation module is used for respectively generating second sentences corresponding to the association relations according to the association relations extracted from the association series of each table, and each second sentence is used for adding edges representing the association relations between entity nodes;
and the analysis module is used for calling the first statement and the second statement which are generated by the analysis of the graphic database to obtain a knowledge graph.
Optionally, the table names of the tables are class names of the corresponding entity classes; the first cell of each name column in each table comprises a name column identifier, and the cells except the first cell in each name column comprise entity names; the first cell of each related series in each table comprises a relationship column identifier and a class name of one entity class having an association relationship with the entity class corresponding to the table to which the relationship column belongs, and any cell except the first cell in each related series respectively comprises an entity name in the table corresponding to the class name in the first cell, wherein the entity name has an association relationship with the entity corresponding to the any cell.
Optionally, the second generating module is further configured to: the association relationship is extracted from any relation series of any table by the following method:
for any cell except the first cell in any related series, extracting a first entity name from cells where a row where the any cell is located and a column where the name column is located intersect, extracting a second entity name from any cell, and determining that an association relationship exists between an entity corresponding to the first entity name and an entity corresponding to the second entity name.
Optionally, the first cell of each relationship series further includes a relationship name describing the association relationship.
Correspondingly, the second generating module is further configured to extract a relationship name from the first cell of the any related series, and the second sentence further includes the relationship name displayed on the edge of the knowledge graph.
Optionally, the apparatus further comprises a schema creation module for:
creating class nodes corresponding to the entity classes according to the class names of the entity classes corresponding to the tables;
for any relation series of any table, extracting a class name from a first cell of the any relation series, and determining that an association relationship exists between an entity class corresponding to the extracted class name and an entity class corresponding to any table;
And creating edges among the class nodes with the association relationship to obtain a schema diagram representing the association relationship among the entity classes.
Optionally, the device further includes a verification module, configured to verify data in each table before generating the first sentence and the second sentence, and if error data is found in the table, generate corresponding error information.
Optionally, each table further includes an attribute column, where the attribute column is used to store attribute description information corresponding to each entity name in the table; the first sentence corresponding to each entity name further comprises attribute description information corresponding to each entity name.
Optionally, the device further includes a dictionary generating module, configured to generate a dictionary file corresponding to the entity according to the entity name and the entity category corresponding to the entity name in each table name column.
Optionally, each table further includes a synonym column, where the synonym column is used to store aliases corresponding to each entity name in the table.
Correspondingly, the device further comprises a synonym generating module, and the synonym generating module is used for generating a synonym file according to the entity names in the name columns of the tables and the aliases corresponding to the entity names extracted from the synonym columns.
Optionally, the table corresponding to each entity category is a worksheet in the same table file.
In one aspect, an embodiment of the present application provides a knowledge graph construction apparatus, including:
the uploading module is used for responding to a construction map instruction input through a file uploading page and sending a file to be uploaded to the background server, wherein the file to be uploaded comprises: each table comprises a name column and at least one relation column, wherein the name column comprises entity names of entities belonging to the corresponding entity categories, and each relation column comprises an association relation between the entities in the table and the entities in any other table;
the display module is used for displaying the knowledge graph sent by the background server, wherein the knowledge graph is a graph which is generated by the background server according to the file to be uploaded and used for describing the association relation among all the entities.
Optionally, the display module is further configured to:
after error information sent by the background server is received, error prompt information is displayed, and the error information comprises an error table with error data;
and responding to the operation aiming at the error prompt information, and displaying the error table, wherein error data in the error table are displayed in a special state.
In one aspect, an embodiment of the present application provides an electronic device including a memory, a processor, and a computer program stored on the memory and executable on the processor, where the processor implements steps of any of the methods described above when the processor executes the computer program.
In one aspect, an embodiment of the present application provides a computer-readable storage medium having stored thereon computer program instructions which, when executed by a processor, implement the steps of any of the methods described above.
In one aspect, an embodiment of the present application provides a computer program product comprising a computer program stored on a computer readable storage medium, the computer program comprising program instructions which when executed by a processor implement the steps of any of the methods described above.
According to the knowledge graph construction method, the device, the electronic equipment and the storage medium, a user only needs to input data into the table according to the preset format and upload the table to the knowledge graph construction tool in the background server or the terminal equipment, the knowledge graph construction tool can automatically extract entity names and association relations from name columns and related series of all tables edited by the user, automatically generate sentences for creating nodes and edges, and call the graphic database to complete the creation of the knowledge graph, so that complex data operation, writing of imported sentences and other works during the creation of the knowledge graph are avoided, and professionals related to the non-knowledge graph can also conveniently and efficiently create the knowledge graph. In addition, the format of the table is simple and visual, the maintenance and modification of the data are easy, and a user who does not understand the knowledge graph can finish the data arrangement quickly, so that the use threshold is greatly reduced.
Drawings
In order to more clearly illustrate the technical solutions of the embodiments of the present application, the drawings that are needed in the embodiments of the present application will be briefly described below, and it is obvious that the drawings that are described below are only some embodiments of the present application, and that other drawings may be obtained according to these drawings without inventive effort for a person skilled in the art.
FIG. 1A is a partial file that needs to be prepared when creating a knowledge graph using Neo4 j;
FIG. 1B is a schematic diagram of internal data of a csv type entity file;
FIG. 1C is a schematic diagram of internal data of a csv type relationship file;
FIG. 1D is a schematic diagram of an import sentence manually written by a user;
fig. 2 is an application scenario schematic diagram of a knowledge graph construction method provided in an embodiment of the present application;
fig. 3 is a flow chart of a knowledge graph construction method according to an embodiment of the present application;
FIG. 4 is a schematic diagram of a table for constructing a knowledge-graph according to an embodiment of the present application;
FIG. 5 is a partial schematic diagram of a knowledge graph obtained by the knowledge graph construction method according to an embodiment of the present application;
fig. 6 is a flow chart of a knowledge graph construction method according to an embodiment of the present application;
FIG. 7 is a schematic diagram obtained by the knowledge graph construction method according to an embodiment of the present application;
fig. 8 is a flow chart of a knowledge graph construction method according to an embodiment of the present application;
FIG. 9A is a schematic diagram of an interface of a document uploading tool according to an embodiment of the present application;
FIG. 9B is a schematic diagram illustrating an interface of a document uploading tool according to one embodiment of the present disclosure;
fig. 10 is a schematic structural diagram of a knowledge graph construction device according to an embodiment of the present application;
fig. 11 is a schematic structural diagram of a knowledge graph construction device according to an embodiment of the present application;
fig. 12 is a schematic structural diagram of an electronic device according to an embodiment of the present application.
Detailed Description
In order to make the objects, technical solutions and advantages of the embodiments of the present application more clear, the technical solutions of the embodiments of the present application will be clearly and completely described below with reference to the drawings in the embodiments of the present application.
For ease of understanding, the terms referred to in the embodiments of the present application are explained below:
graphic database: is a type of NoSQL database that uses graph theory to store relational information between entities. The graph database is a non-relational database that uses graph theory to store relationship information between entities. A common graphic database is Neo4j, flockDB, allegroGrap, graphDB, infiniteGraph, etc.
Knowledge Graph (knowledgegraph): the book emotion boundary is called knowledge domain visualization or knowledge domain mapping map, which is a series of different graphs for displaying knowledge development progress and structural relationship, knowledge resources and carriers thereof are described by using a visualization technology, and knowledge and the interrelationship between the knowledge resources and carriers are mined, analyzed, constructed, drawn and displayed. Each node in the knowledge graph represents an entity, and the association relationship between the entities is represented by the edges between the nodes.
Any number of elements in the figures are for illustration and not limitation, and any naming is used for distinction only and not for any limiting sense.
In a specific practical process, the construction of the knowledge graph generally comprises a relatively complex flow: firstly, a professional processes data to be processed into a data form which accords with the importing requirement of a graphic database according to the requirement of the graphic database; and then, manually writing a complex import statement, importing the data to be processed meeting the import requirement into a graphic database by using the import statement, and generating a corresponding knowledge graph by the graphic database. Referring to fig. 1A, for a part of files to be prepared when creating a knowledge graph of a commercial property by Neo4j, each file with a suffix of ". Csv" represents an entity file and a relationship file, wherein the entity file is a file for recording related information of all entities included in a certain entity class, and the relationship file is a file for recording relationships between entities, for example, the entity file "merchant.csv" records related information of all merchants, the file "stadium.csv" records information of all exhibition halls, and the relationship file "merchant_to_stadium.csv" records relationships between merchants and exhibition halls, that is, the relationship file "merchant_to_stadium.csv" records the exhibition halls where each merchant is located. Referring to fig. 1B, a row of data in the entity file "merchant. Csv" corresponds to one merchant, and each row of data specifically includes: numbering, name of each merchant, category of the merchant, exhibition at the location, floor at the location, merchant profile, etc. Referring to FIG. 1C, each row in the relationship file "merchant_to_stadium.csv" includes the number of one merchant in the "merchant.csv" file and the number of one exhibition in the "stadium.csv" file, e.g., the first row data "1,1" represents: the relationship between the merchant HLKD with the number of 1 in the "merchant.csv" file and the exhibition hall with the number of 2 in the "sta.csv" file is provided. Based on fig. 1B and fig. 1C, it can be known that when the knowledge graph to be drawn includes a plurality of entity categories, entities, and relationships among entities, the number of files that a user needs to edit is large, the construction efficiency is very low, and the existing data organization method is too complex, and some illegal characters and some imperceptible data deletion, data errors or format errors are inevitably generated when the data are organized, which results in failure of constructing the graph, but the user is difficult to check errors in the files, further reduces the construction efficiency, and meanwhile, the complex data organization method results in difficult maintenance and modification of the data. After the required data is prepared, a user is required to manually write a corresponding import sentence, the form of the import sentence can refer to fig. 1D, the user is required to know the use method and the data organization form of the graphic database, and the method has certain programming capability. Therefore, the existing knowledge graph construction method is extremely prone to error in the process of processing data, the difficulty of data maintenance and modification is high, a user is required to learn a using method of a graph database, the efficiency of creating the knowledge graph is low, and the using threshold is extremely high.
For this reason, the present application proposes a knowledge graph construction method, which predefines a simple and intuitive data organization form implemented by a table, specifically, related information of entities included in one entity category may be stored by a table, for example, a row in the table corresponds to one entity, a name column and at least one relationship column may be included in the table, the name column includes entity names of entities belonging to the corresponding entity category, each relationship column includes an association relationship between an entity in the table and an entity in some other table, and a user may add information of each entity to a corresponding cell of the corresponding table according to the above format, so as to sort the cluttered data into a form meeting requirements. Then, the knowledge graph construction tool acquires tables corresponding to a plurality of entity categories which are arranged by a user, respectively generates first sentences corresponding to the entity names according to the entity names which are extracted from the name columns of the tables, respectively generates second sentences corresponding to the association relations according to the association relations which are extracted from the relation series of the tables, and calls the first sentences and the second sentences which are generated by analyzing the graph database to obtain the knowledge graph, wherein each first sentence is used for adding entity nodes corresponding to the entity names in the graph database, and each second sentence is used for adding edges representing the association relations between the entity nodes. Therefore, the user only needs to edit the table according to the preset format, the knowledge graph construction tool can automatically extract entity names and association relations from name columns and related series of each table edited by the user, automatically generate sentences for creating nodes and edges, and call the graph database to complete the creation of the knowledge graph, so that professionals not related to the knowledge graph can also conveniently and efficiently create the knowledge graph, the complex data operation, the writing of imported sentences and the like during the creation of the knowledge graph are avoided, the format of the table is simple and visual, the maintenance and the modification of data are easy, and the user without knowledge graph can also complete the arrangement of the data quickly, thereby greatly reducing the use threshold.
After the design concept of the embodiment of the present application is introduced, some simple descriptions are made below for application scenarios applicable to the technical solution of the embodiment of the present application, and it should be noted that the application scenarios described below are only used to illustrate the embodiment of the present application and are not limiting. In specific implementation, the technical scheme provided by the embodiment of the application can be flexibly applied according to actual needs.
Reference is made to fig. 2, which is a schematic diagram of an application scenario of the knowledge graph construction method provided in the embodiment of the present application. The application scenario includes a terminal device 201 and a background server 202. The terminal device 201 and the background server 202 may be connected through a wireless or wired network, and the terminal device 201 includes, but is not limited to, electronic devices such as a desktop computer, a mobile phone, a mobile computer, and a tablet computer. The background server 202 may be a server, a server cluster formed by a plurality of servers, or a cloud computing center. An application program capable of editing forms, such as EXCEL, is installed in the terminal device 201, and a user can create a form for each entity category through the application program, fill relevant information of each entity into a corresponding form, and after completing the form corresponding to each entity category, send the form to the background server 202 through a file uploading tool provided by the background server 202. A knowledge graph construction tool is installed in the background server 202, the knowledge graph construction tool analyzes data in the table to generate an import sentence with nodes and edges added in the graph database, and invokes the graph database to analyze the generated import sentence to obtain a knowledge graph, and the background server 202 sends the knowledge graph to the terminal device 201 so that a user can view the generated knowledge graph through the terminal device 201.
Of course, the knowledge graph construction tool may also be installed in the terminal device 201 to complete the construction of the knowledge graph by the terminal device 201. After the user completes editing the table in the terminal device 201, a knowledge graph construction tool can be started, the table to be analyzed is specified through the knowledge graph construction tool, the knowledge graph construction tool obtains the table to be analyzed and analyzes the table, an import sentence with nodes and edges added in a graph database is generated, the graph database is called to analyze the generated import sentence, a knowledge graph is obtained, and the obtained knowledge graph is displayed through the terminal device 201.
Of course, the method provided in the embodiment of the present application is not limited to the application scenario shown in fig. 2, but may be used in other possible application scenarios, and the embodiment of the present application is not limited. The functions that can be implemented by each device in the application scenario shown in fig. 2 will be described together in the following method embodiments, which are not described in detail herein.
In order to further explain the technical solutions provided in the embodiments of the present application, the following details are described with reference to the accompanying drawings and the detailed description. Although the embodiments of the present application provide the method operational steps as shown in the following embodiments or figures, more or fewer operational steps may be included in the method, either on a routine or non-inventive basis. In steps where there is logically no necessary causal relationship, the execution order of the steps is not limited to the execution order provided by the embodiments of the present application.
The technical solution provided in the embodiment of the present application is described below with reference to an application scenario shown in fig. 2.
Referring to fig. 3, an embodiment of the present application provides a knowledge graph construction method, which may be applied to a terminal device or a background server shown in fig. 2, and specifically includes the following steps:
s301, obtaining tables corresponding to at least two entity categories respectively, wherein each table comprises a name column and at least one relation column, the name column comprises entity names of entities belonging to the corresponding entity category, and each relation column comprises an association relation between the entities in the table and the entities in any other table.
In specific implementation, a table corresponds to an entity class, that is, related information of a plurality of entities belonging to the same entity class is stored in the same table, and a row of cells in the table are used for storing related information of one entity. The relevant information of the entity at least comprises: the entity name and the association relationship between the entity and the entity in other entity categories, of course, the related information of the entity may be determined according to the application scenario, for example, the alias name and attribute description information of the entity may also be included, and the invention is not limited to the above listed content.
In order to extract the needed information from the table rapidly and conveniently, columns corresponding to each type of information are specified in the table in advance, different types of information are stored through dedicated columns, for example, a name column is used for storing entity names of the entities, a relation column is used for storing association relations of the entities, and an attribute column is used for storing attribute description information corresponding to each entity name in the table.
It should be noted that, in one relationship series, only association relations between entities of two fixed entity categories are stored. For example, for entity class a, it may be specified that a first column of the entity class a correspondence table is used to store entity names of entities belonging to entity class a, a second column is used to store attribute description information corresponding to each entity, a third column is used to store association relationships between each entity and entities in entity class B, and a fourth column is used to store association relationships between each entity and entities in entity class C.
Taking the distribution of merchants in a commercial property as an example, the specific format of the table can refer to fig. 4, and the entity categories needing statistics in the application scenario include: merchants (Merchant), exhibitions (stadium), locations (locations), business categories (GFirst), and business subcategories (Genre). The table shown in fig. 4 is a table corresponding to an entity category "merchant", and relevant information of a merchant is recorded in each row of cells, wherein the first column is a name of the merchant, the second column is an alias of the merchant, the third column is an association relationship between the merchant and an operation subcategory, the fourth column is an association relationship between the merchant and an exhibition hall, the fifth column is attribute description information of the merchant, the sixth column is an association relationship between the merchant and a position, the first column is a name column, the third column, the fourth column and the sixth column are relationship columns, and the fifth column is an attribute column. Taking the first row of data in the table as an example, the name of the merchant is HLKD, the name of the merchant is KD, the category of the business is paint wall paint, and the business is located in a second building, and the specific position is building 1 of the second building.
In practical applications, the definition of each column in the table may be predetermined, and the user may also input a corresponding preset field in the first cell of each column to implement free setting for each column. In fig. 4, the preset field corresponding to the name column may be "name", the preset field corresponding to the attribute column may be "property", the preset field corresponding to the alias column may be "synchronization" and the preset field corresponding to the relationship column may be "relationTo XX", where "XX" represents an entity class having an association relationship with an entity class corresponding to a table to which the relationship column belongs, and the user may fill in the content of "XX" by himself, but XX must be a class name of the entity class, for example, the preset field of the fourth column of the table corresponding to Merchant is "relationTo Statium", which indicates that the fourth column is used to store the association relationship between the Merchant and the exhibition hall.
S302, according to entity names extracted from name columns of all tables, first sentences corresponding to the entity names are respectively generated, and each first sentence is used for adding entity nodes corresponding to the entity names in a graphic database.
S303, respectively generating second sentences corresponding to the association relations according to the association relations extracted from the association series of the tables, wherein each second sentence is used for adding edges representing the association relations between the entity nodes.
In the implementation, a first sentence is generated corresponding to an entity name, and a second sentence is generated corresponding to an association relation. For example, if 100 entity names and 200 associations are extracted in total, 100 first sentences and 200 second sentences are generated.
In particular implementations, name columns, relationship columns, attribute columns, etc. in the table may be identified based on definitions of the columns in the table. For example, the first column in the table is predetermined as a name column, the second column is a relationship column, and then the entity name is extracted from the first column, and the association relationship is extracted from the second column. Of course, a name column, a relationship column, an attribute column, etc. in the table may also be identified according to the preset field.
In specific implementation, the formats of the imported sentences of different graphic databases are different, so that the formats of the first sentence and the second sentence can be determined according to the requirements of the graphic databases, and the embodiment of the application is not limited.
Taking neo4j as an example, determining the first column from the table Merchant shown in fig. 4 as a name column, extracting entity names of all merchants from all cells in the name column, where the extracted entity names include "HLKD", "YSWY", "jyloj", and the like, and creating an entity node for each entity name is needed, that is, creating a first sentence for creating an entity node for each entity name, and taking the entity name "HLKD" as an example, the created first sentence may be: the third column is determined to be a related series from the table Merchant shown in fig. 4, according to a preset field "relationTo Genre" of the third column, the association relationship between the business and the business subcategory can be determined to be stored in the third column, and the association relationship is extracted from each cell of the third column.
MATCH(a:Merchant),(b:Genre)
WHERE a.name= 'HLKD' AND b.name= 'paint wall paint'
CREATE(a)>(b)。
After neo4j obtains the second statement, an edge is created between the entity node 'HLKD' and the entity node 'paint wall paint', so that the relationship exists between the entity nodes.
S304, calling the first statement and the second statement which are generated by analyzing the graphic database, and obtaining a knowledge graph.
In the implementation, the background server or the terminal equipment inputs the generated first statement and second statement into a graphic database, the graphic database creates entity nodes by analyzing the first statement, and creates edges between the entity nodes by analyzing the second statement, so that the creation of the knowledge graph is completed. The first sentence and the second sentence are both generated according to the requirement of the graphic database on the imported sentence, the specific process of creating the knowledge graph by the graphic database according to the first sentence and the second sentence is executed by the graphic database, and the specific process is the prior art and is not repeated.
According to the knowledge graph construction method, a user only needs to input data into the table according to the preset format and upload the table to the knowledge graph construction tool in the background server or the terminal equipment, the knowledge graph construction tool can automatically extract entity names and association relations from name columns and related series of each table edited by the user, sentences for creating nodes and edges are automatically generated, and a graph database is called to complete the creation of the knowledge graph, so that complex data operation, sentence writing importing and the like during the creation of the knowledge graph are avoided, and people who are not specialized in knowledge graph correlation can also conveniently and efficiently create the knowledge graph. In addition, the format of the table is simple and visual, the maintenance and modification of the data are easy, and a user who does not understand the knowledge graph can finish the data arrangement quickly, so that the use threshold is greatly reduced.
To assist the user in more conveniently filling in and maintaining information in the form, further limitations are made below regarding the format of the form.
For example, the table names of the respective tables for drawing the same knowledge graph are class names of the corresponding entity classes. Still taking as an example the distribution of merchants in a commercial property, entity categories include: referring to fig. 4, the name of a table corresponding to a Merchant is "Merchant", the name of a table corresponding to an exhibition hall is "state", the name of a table corresponding to a Location is "Location", the name of a table corresponding to an operation category is "GFirst", and the name of a table corresponding to an operation sub-category is "Genre". Therefore, the knowledge graph creation tool can determine the entity category corresponding to the table only according to the name of the table, and is convenient for a user to maintain the table.
For another example, a first cell of each name column in each table for drawing the same knowledge graph includes a name column identifier, and cells other than the first cell in each name column include an entity name.
For another example, a first cell of each related series in each table for drawing the same knowledge graph includes a relationship column identifier and a class name of one entity class having an association relationship with an entity class corresponding to the table to which the relationship column belongs, and any cell in each related series other than the first cell includes an entity name in the table having an association relationship with an entity corresponding to any cell and belonging to the class name in the first cell.
Referring specifically to fig. 4, the first row of the form is used to fill in the identifier corresponding to each column, by which the type of data stored by each column can be quickly identified. For example, the name column identifier may be a preset field "name", the relationship column identifier may be a preset field "relationTo", and further, the attribute column identifier may be a preset field "property", and the alias column identifier may be a preset field "synonym". The relation column identifier is that a field between the relation to and the character_is a class name of an entity class with an association relation with the entity class corresponding to the form, and the user can fill out the relation column identifier by himself, for example, the field in a first cell of a third column of the Merchan form is "what business subcategory" which indicates that the third column is used for storing the association relation between a merchant and the business subcategory, the content behind the relation_is a relation name describing the association relation, and the relation name can be filled out freely by the user so as to facilitate the user to know the specific meaning of the association relation indicated by the third column, and here, the user can quickly know what business subcategory the third column indicates through the field; for example, the field in the first cell of the fourth column of the Merchant form is "relationTo Stadium _in which restaurant" indicating that the fourth column is used to store the association between the Merchant and the exhibition, and the content following the same "_" can be freely filled out by the user, where the user can quickly learn, through the field "in which restaurant", that the fourth column indicates "in which restaurant the Merchant is located". After the corresponding fields are filled in the columns of the first row of the form in the manner shown in fig. 4, the data of each merchant need only be filled in each cell of the corresponding row according to the definition of each column.
Based on the above, the user can edit the table according to the above fields and a certain appointed format, and the specific content in the table can be adjusted according to the data in different fields, so that the data arrangement mode is more flexible and universal, the data arrangement efficiency in the earlier stage can be improved, and the data modification and error correction cost in the later stage can be improved.
Based on the definition of the table format, the association relationship can be extracted from any relation series of any table by the following method: for any cell except the first cell in any related series, extracting a first entity name from the cell where the row of any cell is intersected with the column where the name column is located, extracting a second entity name from any cell, and determining that an association relationship exists between an entity corresponding to the first entity name and an entity corresponding to the second entity name.
Taking the table Merchant shown in fig. 4 as an example, the first column may be determined as a name column according to the field of the first column of the first row of the table, so that the entity name of each Merchant is extracted from each cell of the first column. A third column can be determined as a relation column according to the fields of the first row and the second column of the table, and stores the relation between the merchant and the business sub-category, and the cell C of the nth row in the third column n,3 J will be the cell C n,3 The entity name in the row C is taken as the second entity name, and the cell C of the first column of the nth row 1,3 The entity name in the list is used as a first entity name, the association relationship between the entity corresponding to the first entity name and the entity corresponding to the second entity name is determined, and further, according to the association relationship, the generation is performedAnd adding a second statement representing the edge of the association relation between the entity node corresponding to the first entity name and the entity node corresponding to the second entity name. Taking the first row data of the table shown in fig. 4 as an example, a method for extracting an association relationship from the table is described, extracting "HLKD" from the first row and the first column as a first entity name, extracting "paint wall paint" from the first row and the third column as a second entity name, and determining that the association relationship exists between the entity corresponding to the first entity name "HLKD" and the entity corresponding to the second entity name "paint wall paint", that is, an edge needs to be created between the entity nodes "HLKD" and "paint wall paint". Based on the above mode, the association relations stored in all the related series are extracted in turn.
Therefore, the knowledge graph creation method provided by the application has good universality and reusability, and users in different industry fields can automatically construct the corresponding knowledge graph only by editing the data according to the agreed simple format field, so that the construction efficiency is improved.
Further, based on the relationship name describing the association relationship in the first cell of each relationship series in the table, the knowledge graph construction method in the embodiment of the application further includes the following steps: the relationship names are extracted from the first cell of any relationship series, and the relationship names displayed on the side of the knowledge graph are also included in the second sentence.
Taking neo4j as an example, the information in the first cells of the third column in the table Merchant shown in fig. 4 is "relationship to genre_what business sub-category" and indicates that the third column is a closed series, and according to the preset field "relationship to Genre" of the third column, it can be determined that the third column stores the association relationship between the business and the business sub-category, and the relationship name is "what business sub-category". Taking the first row of data in the table Merchant as an example, the first row of data is related information corresponding to Merchant "HLKD", and the information in the third column of the first row is "paint wall paint", where "paint wall paint" is an entity node in the business sub-category, so that it is determined that there is an association relationship between the entity node "HLKD" and the entity node "paint wall paint", and a second sentence for adding an edge between the entity node "HLKD" and the entity node "paint wall paint" is generated, for example, the second sentence for creating an edge may be:
MATCH(a:Merchant),(b:Genre)
WHERE a.name= 'HLKD' AND b.name= 'paint wall paint'
CREATE (a) - [ r ] what business subcategory ] - > (b).
After neo4j obtains the second sentence, an edge is created between the entity node 'HLKD' and the entity node 'paint wall paint', and the relationship name 'what business subcategory' is displayed on the created edge, so that the association relationship between the two entity nodes is displayed to the user, and the specific display mode can refer to fig. 5. Fig. 5 is a schematic diagram of a knowledge graph, each circle in fig. 5 represents an entity node, a field displayed in the circle is an entity name of the entity node, a connection line between the entity nodes is an edge representing an association relationship between the entity nodes, and contents displayed on the edge are relationship names corresponding to the edge.
Based on the definition of the table format, referring to fig. 6, the knowledge graph creation method in the embodiment of the application further includes the following steps:
s601, creating class nodes corresponding to the entity classes according to class names of the entity classes corresponding to the tables.
In particular, the table names of the tables used for drawing the same knowledge graph are class names of the corresponding entity classes, so that the knowledge graph creation tool can acquire the table names of the tables, thereby acquiring the class names of the entity classes corresponding to the tables. Then, creating category nodes for each entity category by the knowledge graph tool, wherein the name of each category node is the corresponding category name. Referring to FIG. 7, which shows a schematic diagram of a created class node, each circle in FIG. 7 represents a class node, and the fields shown in the circles are class names of the class node.
S602, extracting class names from a first cell of any relation series aiming at any relation series of any table, and determining that an association relationship exists between an entity class corresponding to the extracted class names and an entity class corresponding to any table.
In specific implementation, taking the Merchant table shown in fig. 4 as an example, the class name of the entity class corresponding to the table is Merchant, and a field belonging to the class name is extracted from the field of the first unit of the relation column, where a field between "relationTo" and "character" _ "is the class name to be extracted, and the association relationship between Merchant and the entity class corresponding to the class name extracted from the relation column can be determined. Taking the third column in fig. 4 as an example, a field "Genre" between the relationship to and the character "_is obtained in the first cell of the third column, so as to determine that an association relationship exists between the entity category Merchant and the entity category Genre. Taking the fourth column in fig. 4 as an example, a field "Stadium" between the relationship to and the character "_is obtained in the first cell of the fourth column, so as to determine that an association relationship exists between the entity category Merchant and the entity category Stadium. For the relationship columns in the table corresponding to other relationship series and other entity categories in the Merchant table shown in fig. 4, the above manner may be used to determine the association relationship between the entity categories, which is not described again.
S603, creating edges among the category nodes with the association relationship to obtain a Schema map (Schema) representing the association relationship among the entity categories.
In specific implementation, according to the association relationship determined in step S602, edges are created between the category nodes with the association relationship, so as to connect the category nodes with the association relationship together, and obtain a corresponding schema map, so that a user can clearly and intuitively understand the relationship between the entity categories through the schema map.
In particular implementations, the edges between the connection class nodes may be directed edges. At this time, the association relationship extracted from the table is a directed association relationship, that is, step S602 specifically includes: for any relation series of any table, extracting a class name from a first cell of any relation series, and determining the association relationship between an entity class A corresponding to the extracted class name and an entity class B corresponding to any table as follows: a points to B. Correspondingly, when creating edges between class nodes, creating a directed edge between the entity node corresponding to the entity class A and the class node corresponding to the entity class B, wherein the direction of the directed edge is as follows: a points to B, namely, corresponding directed edges are created among category nodes according to the directed association relation extracted from the table.
In specific implementation, the relationship name can also be extracted from the relationship column, and the relationship name is displayed on the corresponding side in the schema, wherein the field after the character "_" in the first cell of the relationship series is the relationship name.
Fig. 7 is a schematic diagram created based on the table shown in fig. 4, and based on the table shown in fig. 4, it is possible to determine the following directional association relationship (1) Merchant points to Genre, the correspondence name is "what business subcategory", 2) Merchant points to Stadium, the correspondence name is "in which museum", 3) Merchant points to Location, the correspondence name is "in which museum", 4) Location points to Stadium, the correspondence name is "in which museum", 5) Genre points to Location, the correspondence name is "in", 6) Genre points to GFirst, and the correspondence name is "belongs to which business category", wherein the association relationships corresponding to (4), (5) and (6) are not shown in the table shown in fig. 4. According to the determined directed association relationship, directed edges are created between the corresponding category nodes, and corresponding relationship names are displayed on the created directed edges, and the obtained schema diagram is referred to fig. 7.
For convenience of description, an edge created based on an association relationship in a table is referred to as a forward edge. On the basis of fig. 7, a corresponding reverse side can be created for each forward side in the schema, wherein each reverse side points in a direction opposite to the direction in which its corresponding forward side points, and each reverse side displays a reverse identifier in addition to the relationship name of its corresponding forward side, so as to distinguish from the forward side. For example, the reverse identifier may be a prefix character "r" displayed before the relationship name, as shown in fig. 7, the side on which the prefix character "r" is displayed is a reverse side, and the side on which the prefix character "r" is not displayed is a forward side.
According to the knowledge graph creation method, the Schema graph representing the association relationship between entity categories can be automatically generated according to the forms, when graph data are edited, a user does not need to think about a Schema mode of the knowledge graph in advance, only needs to fill in the forms according to the relationship between the data, and therefore workload of the user is reduced.
On the basis of any one of the above embodiments, the knowledge graph construction method provided in the present application further includes the following steps: before the first statement and the second statement are generated, checking the data in each table, and if the error data exist in the tables, generating corresponding error information.
In the implementation, after all the tables uploaded by the user pass the verification, the steps S302 to S304 are executed again to prevent the construction failure.
In specific implementation, the knowledge graph construction tool can verify the data in each table, such as verifying information of file format, file size and the like, and the table can be analyzed after verification, and a first sentence and a second sentence are generated. If the uploaded file format is not the designated format, if the uploaded file format is not the EXCEL, determining that the verification is not passed, generating error information prompting the file format error at the moment, and displaying the error information to a user so that the user modifies the uploaded file according to the error information. If the file size of the uploaded file is smaller than the minimum threshold or exceeds the maximum threshold, determining that the verification is not passed, generating error information prompting that the file is too small or too large at the moment, and displaying the error information to a user so that the user modifies the uploaded file according to the error information.
Further, the data verification may also be performed by: for any relation series in any table, extracting the class name in the first cell of the relation series, if the extracted class name is different from the class name of any table, determining that error data exists in the table, marking the class name in the first cell of the relation series as error data, and marking the table as error table. The error information generated at this time includes an error table and error data. After receiving the error information, the terminal device can display the error table and display the error data in the error table in a special state, wherein the special state can be highlighting, thickening or selecting the error data by using a prompt box.
Further, the data verification may also be performed by: for any relation series in any table, determining a target table corresponding to the class name in the first cell of the relation series, if the entity name in the relation series does not appear in the corresponding target table, determining that error data exists in the table, marking the entity name in the relation series which does not appear in the corresponding target table as error data, and marking the table as error table. The error information generated at this time includes an error table and error data. After receiving the error information, the terminal device can display the error table and display the error data in the error table in a special state, wherein the special state can be highlighting, thickening or selecting the error data by using a prompt box.
In practical applications, the manner of performing the data verification is not limited to the various manners listed above.
Further, some data in the table may be converted according to a contracted conversion manner, for example, date, number, etc. are converted into a unified standardized format. In addition, some illegal characters appearing in the table can be replaced, for example, punctuation marks (such as a pause number and a comma) in an English format are replaced by punctuation marks in a Chinese format, so that construction failure can be prevented.
On the basis of any embodiment, the attribute description information corresponding to each entity name may be extracted from the attribute column of the table, and according to the entity names extracted from the name column of each table and the attribute description information corresponding to the entity names, first sentences corresponding to each entity name are respectively generated, each first sentence is used for adding an entity node corresponding to the entity name in the graphic database, and the first sentences corresponding to each entity name further include the attribute description information corresponding to each entity name.
Taking neo4j as an example, determining a first column from a table Merchant shown in fig. 4 as a name column, extracting entity names of all merchants from all cells in the name column, and extracting attribute description information corresponding to all entity names from an attribute column, where the extracted entity names include "HLKD", "YSWY", "JYLYJ", and the like, and it is required to create an entity node for each entity name, that is, create a first statement for creating an entity node for each entity name, taking the entity name "HLKD" as an example, where the generated first statement may be: (n: merchant { name: 'Netherland corollary', description: '# # # #' }). Thus, when the user views the knowledge graph, the user can view the attribute description information corresponding to the entity node by clicking the entity node in the knowledge graph.
On the basis of any one of the above embodiments, the knowledge graph construction method provided in the present application further includes the following steps: and generating a dictionary file corresponding to the entity according to the entity name in each table name column and the entity category corresponding to the entity name.
In the implementation, the dictionary file comprises entity names and entity categories of all entities in each table, and the dictionary file can store the entity names of all the entities in a classified manner according to the entity categories, so that the dictionary file can be used for quickly realizing classified display and inquiry when the subsequent application inquiry is convenient.
On the basis of any one of the above embodiments, each table further includes a synonym column, where the synonym column is used to store aliases corresponding to each entity name in the table. Taking fig. 4 as an example, the second column is the alias of each merchant, that is, the second column is the synonym column, and taking the first merchant in the table as an example, the entity name of the merchant is "HLKD", and the alias is "KD".
Based on the above, the knowledge graph construction method provided by the application further comprises the following steps: and generating a synonym file according to the entity names of the name columns of the tables and the aliases corresponding to the entity names extracted from the synonym columns.
In specific implementation, the entity names and aliases of the entities extracted from the tables are stored in the synonym file as synonym associations. For example, the entity name "HLKD" and its alias "KD" of the first merchant in the table shown in fig. 4 are stored as a synonym association in the synonym file, the entity name "YSWY" and its alias "YS" of the second merchant are stored as a synonym association in the synonym file, and so on. In this way, in the subsequent query process, if the user inputs the alias of the merchant, the merchant corresponding to the input alias can be determined through the synonym file, and then the information of the merchant is fed back to the user.
In particular, for the same merchant, multiple aliases may be stored in the synonym column, where the aliases may be separated by a designated symbol, such as a comma or semicolon. When a plurality of aliases exist in the same merchant, the entity name of the merchant and the corresponding aliases are used as synonyms and are stored in a synonym file in a correlated mode, and when inquiring, a user only needs to input any one of the aliases of the merchant, the merchant can be determined through the synonym file, and further information of the merchant is fed back to the user.
On the basis of any embodiment, in the process of creating a knowledge graph, the table corresponding to each entity category may be a worksheet (sheet) in the same table file, and the worksheet corresponding to the entity category is named by the category name of the entity category. Taking fig. 4 as an example, 5 worksheets are created in the table file, the name of the first worksheet is Genre for storing related information of entities in the business sub-category, the name of the second worksheet is GFirst for storing related information of entities in the business category, the name of the third worksheet is Location for storing related information of entities in the Location category, the name of the fourth worksheet is state for storing related information of entities in the exhibition hall category, and the name of the fifth worksheet is Merchant for storing related information of entities in the Merchant category. When the table names are extracted, the names of all the worksheets in the table file are extracted, so that the class names of the entity classes corresponding to all the worksheets are obtained.
Therefore, the user only needs to create one form file, the information related to the knowledge graph is stored in the form file, and only needs to upload one form file when uploading data, so that the situation that the data are lost due to excessive forms is prevented, and the data maintenance is more convenient and efficient.
Referring to fig. 8, the embodiment of the present application further provides a knowledge graph construction method, which may be applied to the terminal device shown in fig. 2, and specifically includes the following steps:
s801, responding to a construction map instruction input through a file uploading page, and sending a file to be uploaded to a background server, wherein the file to be uploaded comprises: each table comprises a name column and at least one relation column, the name column comprises entity names of entities belonging to the corresponding entity category, and each relation column comprises an association relation between the entities in the table and the entities in any other table.
In the implementation, the user may create a table for each entity category through an application program, such as EXCEL, of an editable table installed in the terminal device, and fill relevant information of each entity into the corresponding table. After the forms corresponding to the entity categories are completed, the user can connect with the background server through a file uploading tool provided by the background server and send the forms to the background server.
In specific implementation, the interface of the file uploading tool may refer to fig. 9A, the user selects a file to be uploaded from the files locally stored in the terminal device by clicking the browse button, clicking the submit button, and the file uploading tool performs preliminary verification on the file to be uploaded, for example, verifies information such as file format and file size, and generates prompt information that the verification passes after the verification passes, at this time, the user may click the build map button, and after receiving the build map instruction, the file uploading tool sends the file to be uploaded to the background server. If the file format of the file to be uploaded is not the designated format, if the file format is not the EXCEL, determining that the verification is not passed, generating error information prompting the file format error at the moment, and displaying the error information to a user so that the user modifies the file format of the file to be uploaded according to the error information. If the file size of the uploaded file is smaller than the minimum threshold or exceeds the maximum threshold, determining that the verification is not passed, generating error information for prompting that the file to be uploaded is too small or too large at the moment, and displaying the error information to a user so that the user modifies the file to be uploaded or reselects the file to be uploaded according to the error information.
After receiving the file to be uploaded, the background server generates first sentences corresponding to the entity names according to the entity names extracted from the name columns of the tables, each first sentence is used for adding entity nodes corresponding to the entity names in the graphic database, generates second sentences corresponding to the association relations according to the association relations extracted from the relation series of the tables, each second sentence is used for adding edges representing the association relations between the entity nodes, and invokes the graphic database to analyze the generated first sentences and second sentences to obtain the knowledge graph. The specific embodiment can refer to steps S301 to S304, and will not be described again. And the background server sends the obtained knowledge graph to the terminal equipment.
S802, displaying a knowledge graph sent by the background server, wherein the knowledge graph is a graph which is generated by the background server according to the file to be uploaded and used for describing the association relation among the entities.
In the implementation, after receiving the knowledge graph sent by the platform server, the terminal equipment can directly display the knowledge graph. Referring to fig. 9B, the terminal device may generate a prompt message of successful construction after receiving the knowledge graph sent by the platform server, and display the knowledge graph after the user clicks a view button in the prompt message. The file uploading tool can also provide a map query button for the user, and the user can directly access the background server through the map query button and acquire the knowledge map which is successfully constructed. The user can also select to store the knowledge graph locally in the terminal equipment, so that the follow-up check is convenient.
Through the file uploading tool, a user can conveniently interact with the background server and view the constructed knowledge graph.
Further, the background server may verify the data in the file to be uploaded, and a specific verification method is described above. The background server sends the generated error information to the terminal equipment, wherein the error information comprises an error table with error data; after receiving the error information sent by the background server, the terminal equipment displays corresponding error prompt information; after the user clicks the error prompt message, the terminal equipment responds to the operation aiming at the error prompt message and displays an error table, wherein error data in the error table is displayed in a special state so as to prompt the user of which data in the table are abnormal, and the user is assisted to quickly locate and modify the error data in the table.
Further, the background server may send a message that the schema construction is successful to the terminal device after generating the schema, and the file uploading tool in the terminal device may display a prompt message that the schema construction is successful after receiving the message that the schema construction is successful, and the user may view the schema by clicking the prompt message. Alternatively, the file upload tool may also provide the user with a schema query button by which the user can directly access the backend server and obtain a schema that was successfully built. The user can also choose to save the schema map locally at the terminal device for subsequent viewing.
In addition, the file uploading tool can provide buttons for inquiring the dictionary file and the synonym file for the user, so that the user can conveniently acquire the dictionary file and the synonym file.
It should be noted that, the knowledge graph construction tool is an application program corresponding to the knowledge graph construction method in the embodiment of the present application, and may be installed in any terminal device or a background server, where the any terminal device or the background server executes steps included in any knowledge graph construction method provided in the embodiment of the present application when the application program is run.
The embodiment of the application provides a tool capable of constructing a knowledge graph by one key by editing a simple format table, so that the data preparation threshold of non-knowledge graph professional staff in knowledge graph construction in each field is greatly simplified, and the user only needs to understand the user's own data scene according to some simple field conventions, so that the user can construct the user's own wanted knowledge graph by using the knowledge graph construction tool provided by the application.
As shown in fig. 10, based on the same inventive concept as the knowledge graph construction method described above, the embodiment of the present application further provides a knowledge graph construction apparatus 100, which specifically includes: an acquisition module 1001, a first generation module 1002, a second generation module 1003, and an analysis module 1004.
The obtaining module 1001 is configured to obtain tables corresponding to at least two entity categories, where each table includes a name column and at least one relationship column, the name column includes an entity name of an entity belonging to the corresponding entity category, and each relationship column includes an association relationship between an entity in the table and an entity in any other table.
The first generation module 1002 is configured to generate first sentences corresponding to the entity names according to the entity names extracted from the name columns of the tables, where each first sentence is used to add an entity node corresponding to the entity name in the graphic database.
And a second generating module 1003, configured to generate second sentences corresponding to the association relationships according to the association relationships extracted from the association series of the tables, where each second sentence is used to add edges that characterize the association relationships between entity nodes.
And the parsing module 1004 is configured to invoke the first sentence and the second sentence generated by parsing the graphic database to obtain a knowledge graph.
Optionally, the table names of the tables are class names of the corresponding entity classes; the first cell of each name column in each table comprises a name column identifier, and the cells except the first cell in each name column comprise entity names; the first cell of each related series in each table comprises a relationship column identifier and a class name of one entity class having an association relationship with the entity class corresponding to the table to which the relationship column belongs, and any cell except the first cell in each related series respectively comprises an entity name in the table corresponding to the class name in the first cell and having an association relationship with the entity corresponding to any cell.
Optionally, the second generating module 1003 is further configured to: the association relationship is extracted from any relation series of any table by the following method:
for any cell except the first cell in any related series, extracting a first entity name from the cell where the row of any cell is intersected with the column where the name column is located, extracting a second entity name from any cell, and determining that an association relationship exists between an entity corresponding to the first entity name and an entity corresponding to the second entity name.
Optionally, the first cell of each relationship series further includes a relationship name describing the association relationship.
Correspondingly, the second generating module 1003 is further configured to extract a relationship name from the first cell of any of the related series, where the second sentence further includes the relationship name displayed on the edge of the knowledge graph.
Optionally, the knowledge graph construction apparatus 100 of the embodiment of the present application further includes a schema creation module for:
creating class nodes corresponding to the entity classes according to the class names of the entity classes corresponding to the tables;
for any relation series of any table, extracting a class name from a first cell of any relation series, and determining that an association relationship exists between an entity class corresponding to the extracted class name and an entity class corresponding to any table;
And creating edges among the class nodes with the association relationship to obtain a schema diagram representing the association relationship among the entity classes.
Optionally, the knowledge graph construction apparatus 100 of the embodiment of the present application further includes a verification module, configured to verify data in each table before generating the first sentence and the second sentence, and if error data exists in the table, generate corresponding error information.
Optionally, each table further includes an attribute column, where the attribute column is used to store attribute description information corresponding to each entity name in the table; the first sentence corresponding to each entity name further comprises attribute description information corresponding to each entity name.
Optionally, the knowledge graph construction apparatus 100 of the embodiment of the present application further includes a dictionary generating module, configured to generate a dictionary file corresponding to an entity according to the entity name and the entity category corresponding to the entity name in each table name column.
Optionally, each table further includes a synonym column, where the synonym column is used to store aliases corresponding to each entity name in the table.
Accordingly, the knowledge graph construction device 100 of the embodiment of the present application further includes a synonym generating module, configured to generate a synonym file according to the entity names in the name columns of the tables and the aliases corresponding to the entity names extracted from the synonym columns.
Optionally, the table corresponding to each entity category is a worksheet in the same table file.
As shown in fig. 11, based on the same inventive concept as the knowledge graph construction method described above, the embodiment of the present application further provides a knowledge graph construction device 110, which specifically includes an uploading module 1101 and a display module 1102.
The uploading module 1101 is configured to send a file to be uploaded to a background server in response to a construction map instruction input through a file uploading page, where the file to be uploaded includes: each table comprises a name column and at least one relation column, the name column comprises entity names of entities belonging to the corresponding entity category, and each relation column comprises an association relation between the entities in the table and the entities in any other table.
And the display module 1102 is configured to display a knowledge graph sent by the background server, where the knowledge graph is a graph generated by the background server according to the file to be uploaded and used for describing association relations between the entities.
Optionally, the display module 1102 is further configured to: after error information sent by a background server is received, error prompt information is displayed, and the error information comprises an error table with error data; and responding to the operation aiming at the error prompt information, displaying an error table, wherein error data in the error table are displayed in a special state.
The knowledge graph construction device and the knowledge graph construction method provided by the embodiment of the application adopt the same invention conception, can obtain the same beneficial effects, and are not repeated here.
Based on the same inventive concept as the knowledge graph construction method, the embodiment of the application also provides electronic equipment, which can be a desktop computer, a portable computer, a smart phone, a tablet personal computer, a personal digital assistant (Personal Digital Assistant, PDA), a server and the like. As shown in fig. 12, the electronic device 120 may include a processor 1201 and a memory 1202.
The processor 1201 may be a general purpose processor such as a Central Processing Unit (CPU), digital signal processor (Digital Signal Processor, DSP), application specific integrated circuit (Application Specific Integrated Circuit, ASIC), field programmable gate array (Field Programmable Gate Array, FPGA) or other programmable logic device, discrete gate or transistor logic, discrete hardware components, and may implement or perform the methods, steps, and logic blocks disclosed in embodiments of the present application. The general purpose processor may be a microprocessor or any conventional processor or the like. The steps of a method disclosed in connection with the embodiments of the present application may be embodied directly in a hardware processor for execution, or in a combination of hardware and software modules in the processor for execution.
Memory 1202 is a non-volatile computer-readable storage medium that can be used to store non-volatile software programs, non-volatile computer-executable programs, and modules. The Memory may include at least one type of storage medium, which may include, for example, flash Memory, hard disk, multimedia card, card Memory, random access Memory (Random Access Memory, RAM), static random access Memory (Static Random Access Memory, SRAM), programmable Read-Only Memory (Programmable Read Only Memory, PROM), read-Only Memory (ROM), charged erasable programmable Read-Only Memory (Electrically Erasable Programmable Read-Only Memory, EEPROM), magnetic Memory, magnetic disk, optical disk, and the like. The memory is any other medium that can be used to carry or store desired program code in the form of instructions or data structures and that can be accessed by a computer, but is not limited to such. The memory 1202 in the present embodiment may also be circuitry or any other device capable of implementing a memory function for storing program instructions and/or data.
The embodiment of the application provides a computer readable storage medium for storing computer program instructions for the electronic device, which contain a program for executing the knowledge graph construction method.
The computer storage media described above can be any available media or data storage device that can be accessed by a computer, including, but not limited to, magnetic storage (e.g., floppy disks, hard disks, magnetic tape, magneto-optical disks (MOs), etc.), optical storage (e.g., CD, DVD, BD, HVD, etc.), and semiconductor storage (e.g., ROM, EPROM, EEPROM, nonvolatile storage (NAND FLASH), solid State Disk (SSD)), etc.
The foregoing embodiments are only used for describing the technical solution of the present application in detail, but the descriptions of the foregoing embodiments are only used for helping to understand the method of the embodiments of the present application, and should not be construed as limiting the embodiments of the present application. Variations or alternatives readily occur to those skilled in the art and are intended to be encompassed within the scope of the embodiments of the present application.

Claims (13)

1. The knowledge graph construction method is characterized by comprising the following steps of:
acquiring tables corresponding to at least two entity categories respectively, wherein each table comprises a name column and at least one relation column, a first cell of each name column in each table comprises a name column identifier, a cell except the first cell in each name column comprises an entity name, a first cell of each relation series in each table comprises a relation series identifier, a class name of one entity category having an association relation with the entity category corresponding to the relation column and a relation name describing the association relation, and any cell except the first cell in each relation series comprises an entity having an association relation with the entity corresponding to the any cell and belonging to the entity name in the table corresponding to the class name in the first cell;
Generating first sentences corresponding to the entity names according to the entity names extracted from the name columns of the tables, wherein each first sentence is used for adding entity nodes corresponding to the entity names in the graphic database;
extracting association relations from the relation series of each table respectively, and generating second sentences corresponding to each association relation respectively, wherein each second sentence is used for adding edges representing the association relation between entity nodes and relationship names displayed on the edges of the knowledge graph: for any cell except a first cell in any relation series, extracting a first entity name from cells where a row of the any cell is intersected with a column of the name column, extracting a second entity name from the any cell, determining that an association exists between an entity corresponding to the first entity name and an entity corresponding to the second entity name, and extracting a relationship name from the first cell in any relation series;
and calling the first statement and the second statement which are generated by analyzing the graphic database to obtain the knowledge graph.
2. The method of claim 1, wherein the table name of each table is a class name of the corresponding entity class.
3. The method according to claim 2, wherein the method further comprises:
creating class nodes corresponding to the entity classes according to the class names of the entity classes corresponding to the tables;
for any relation series of any table, extracting a class name from a first cell of the any relation series, and determining that an association relationship exists between an entity class corresponding to the extracted class name and an entity class corresponding to any table;
and creating edges among the class nodes with the association relationship to obtain a schema diagram representing the association relationship among the entity classes.
4. A method according to any one of claims 1 to 3, wherein the method further comprises:
and before the first statement and the second statement are generated, checking the data in each table, and if the error data exist in the tables, generating corresponding error information.
5. A method according to any one of claims 1 to 3, wherein each table further comprises an attribute column for storing attribute description information corresponding to each entity name in the table;
the first sentence corresponding to each entity name further comprises attribute description information corresponding to each entity name.
6. A method according to any one of claims 1 to 3, wherein the method further comprises:
and generating a dictionary file corresponding to the entity according to the entity name in each table name column and the entity category corresponding to the entity name.
7. A method according to any one of claims 1 to 3, wherein each table further comprises a synonym column for storing aliases corresponding to each entity name in the table;
the method further comprises the steps of:
and generating a synonym file according to the entity names of the name columns of the tables and the aliases corresponding to the entity names extracted from the synonym columns.
8. A method according to any one of claims 1 to 3, wherein the table corresponding to each entity class is a worksheet in the same table file.
9. The knowledge graph construction method is characterized by comprising the following steps of:
responding to a construction map instruction input through a file uploading page, and sending a file to be uploaded to a background server, wherein the file to be uploaded comprises: each table comprises a name column and at least one relation column, a first cell of each name column in each table comprises a name column identifier, a cell except the first cell in each name column comprises an entity name, a first cell of each relation series in each table comprises a relation series identifier, a class name of one entity class with an association relation with the entity class corresponding to the relation column and a relation name describing the association relation, and any cell except the first cell in each relation series comprises an entity with an association relation with the entity corresponding to the any cell and belongs to the entity name in the table corresponding to the class name in the first cell;
Displaying a knowledge graph sent by the background server, wherein the knowledge graph is a graph which is generated by the background server according to the file to be uploaded and used for describing association relations among all entities;
wherein the knowledge graph is constructed by performing the following operations:
generating first sentences corresponding to the entity names according to the entity names extracted from the name columns of the tables, wherein each first sentence is used for adding entity nodes corresponding to the entity names in the graphic database;
extracting association relations from the relation series of each table respectively, and generating second sentences corresponding to each association relation respectively, wherein each second sentence is used for adding edges representing the association relation between entity nodes and relationship names displayed on the edges of the knowledge graph: for any cell except a first cell in any relation series, extracting a first entity name from cells where a row of the any cell is intersected with a column of the name column, extracting a second entity name from the any cell, determining that an association exists between an entity corresponding to the first entity name and an entity corresponding to the second entity name, and extracting a relationship name from the first cell in any relation series;
And calling the first statement and the second statement which are generated by analyzing the graphic database to obtain the knowledge graph.
10. The method according to claim 9, wherein the method further comprises:
after error information sent by the background server is received, error prompt information is displayed, and the error information comprises an error table with error data;
and responding to the operation aiming at the error prompt information, and displaying the error table, wherein error data in the error table are displayed in a special state.
11. The knowledge graph construction device is characterized by comprising:
the system comprises an acquisition module, a relation column and a relation column, wherein the acquisition module is used for acquiring tables corresponding to at least two entity categories respectively, each table comprises a name column and at least one relation column, a first cell of each name column in each table comprises a name column identifier, a cell except the first cell in each name column comprises an entity name, a first cell of each relation series in each table comprises a relation series identifier, a class name of one entity category with an association relation with the entity category corresponding to the relation column and a relation name describing the association relation, and any cell except the first cell in each relation series respectively comprises an entity name in the table corresponding to the class name in the first cell and has an association relation with the entity corresponding to the any cell;
The first generation module is used for respectively generating first sentences corresponding to the entity names according to the entity names extracted from the name columns of the tables, and each first sentence is used for adding entity nodes corresponding to the entity names in the graphic database;
the second generation module is used for respectively extracting the association relation from the relation series of each table and respectively generating second sentences corresponding to each association relation, wherein each second sentence is used for adding edges representing the association relation between entity nodes and the relationship names displayed on the edges of the knowledge graph: for any cell except a first cell in any relation series, extracting a first entity name from cells where a row of the any cell is intersected with a column of the name column, extracting a second entity name from the any cell, determining that an association exists between an entity corresponding to the first entity name and an entity corresponding to the second entity name, and extracting a relationship name from the first cell in any relation series;
and the analysis module is used for calling the first statement and the second statement which are generated by the analysis of the graphic database to obtain the knowledge graph.
12. The knowledge graph construction device is characterized by comprising:
the uploading module is used for responding to a construction map instruction input through a file uploading page and sending a file to be uploaded to the background server, wherein the file to be uploaded comprises: each table comprises a name column and at least one relation column, a first cell of each name column in each table comprises a name column identifier, a cell except the first cell in each name column comprises an entity name, a first cell of each relation series in each table comprises a relation series identifier, a class name of one entity class with an association relation with the entity class corresponding to the relation column and a relation name describing the association relation, and any cell except the first cell in each relation series comprises an entity with an association relation with the entity corresponding to the any cell and belongs to the entity name in the table corresponding to the class name in the first cell;
the display module is used for displaying a knowledge graph sent by the background server, wherein the knowledge graph is a graph which is generated by the background server according to the file to be uploaded and used for describing association relations among all entities;
Wherein the knowledge graph is constructed by performing the following operations:
generating first sentences corresponding to the entity names according to the entity names extracted from the name columns of the tables, wherein each first sentence is used for adding entity nodes corresponding to the entity names in the graphic database;
extracting association relations from the relation series of each table respectively, and generating second sentences corresponding to each association relation respectively, wherein each second sentence is used for adding edges representing the association relation between entity nodes and relationship names displayed on the edges of the knowledge graph: for any cell except a first cell in any relation series, extracting a first entity name from cells where a row of the any cell is intersected with a column of the name column, extracting a second entity name from the any cell, determining that an association exists between an entity corresponding to the first entity name and an entity corresponding to the second entity name, and extracting a relationship name from the first cell in any relation series;
and calling the first statement and the second statement which are generated by analyzing the graphic database to obtain the knowledge graph.
13. A computer readable storage medium having stored thereon computer program instructions, which when executed by a processor, implement the steps of the method of any of claims 1 to 10.
CN202010096405.5A 2020-02-17 2020-02-17 Knowledge graph construction method and device, electronic equipment and storage medium Active CN111324609B (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN202010096405.5A CN111324609B (en) 2020-02-17 2020-02-17 Knowledge graph construction method and device, electronic equipment and storage medium

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN202010096405.5A CN111324609B (en) 2020-02-17 2020-02-17 Knowledge graph construction method and device, electronic equipment and storage medium

Publications (2)

Publication Number Publication Date
CN111324609A CN111324609A (en) 2020-06-23
CN111324609B true CN111324609B (en) 2023-07-14

Family

ID=71172685

Family Applications (1)

Application Number Title Priority Date Filing Date
CN202010096405.5A Active CN111324609B (en) 2020-02-17 2020-02-17 Knowledge graph construction method and device, electronic equipment and storage medium

Country Status (1)

Country Link
CN (1) CN111324609B (en)

Families Citing this family (7)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN111753027B (en) * 2020-06-29 2023-03-14 中国科学院文献情报中心 Method and device for implicit knowledge graph representation facing multiple entities
CN111914101B (en) * 2020-08-17 2023-10-20 南方电网数字电网研究院有限公司 File association relationship abnormality identification method and device and computer equipment
WO2022051996A1 (en) * 2020-09-10 2022-03-17 西门子(中国)有限公司 Method and apparatus for constructing knowledge graph
CN112182236A (en) * 2020-09-18 2021-01-05 成都数联铭品科技有限公司 Knowledge graph construction method and system and electronic equipment
CN112632106B (en) * 2020-12-29 2023-05-23 重庆农村商业银行股份有限公司 Knowledge graph query method, device, equipment and storage medium
CN113792152B (en) * 2021-08-23 2023-07-04 南京信息工程大学 Fusion method of triangular diagram and knowledge graph
CN115309789B (en) * 2022-10-11 2023-01-03 浩鲸云计算科技股份有限公司 Method for generating associated data graph in real time based on intelligent dynamic business object

Citations (7)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
JP2017207853A (en) * 2016-05-17 2017-11-24 日本電信電話株式会社 Table cell retrieval device, method and program
CN108549731A (en) * 2018-07-11 2018-09-18 中国电子科技集团公司第二十八研究所 A kind of knowledge mapping construction method based on ontology model
CN109471947A (en) * 2018-11-06 2019-03-15 北京锐安科技有限公司 A kind of knowledge spectrogram construction method, device, equipment and the storage medium of data
CN109710725A (en) * 2018-12-13 2019-05-03 中国科学院信息工程研究所 A kind of Chinese table column label restoration methods and system based on text classification
CN110275966A (en) * 2019-07-01 2019-09-24 科大讯飞(苏州)科技有限公司 A kind of Knowledge Extraction Method and device
US10496678B1 (en) * 2016-05-12 2019-12-03 Federal Home Loan Mortgage Corporation (Freddie Mac) Systems and methods for generating and implementing knowledge graphs for knowledge representation and analysis
EP3608795A1 (en) * 2018-08-08 2020-02-12 Beijing Baidu Netcom Science and Technology Co., Ltd. Method and apparatus for generating knowledge graph, device and computer readable storage medium

Family Cites Families (9)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US10303999B2 (en) * 2011-02-22 2019-05-28 Refinitiv Us Organization Llc Machine learning-based relationship association and related discovery and search engines
US10740396B2 (en) * 2013-05-24 2020-08-11 Sap Se Representing enterprise data in a knowledge graph
US20150310073A1 (en) * 2014-04-29 2015-10-29 Microsoft Corporation Finding patterns in a knowledge base to compose table answers
CN106021281A (en) * 2016-04-29 2016-10-12 京东方科技集团股份有限公司 Method for establishing medical knowledge graph, device for same and query method for same
AU2018241092B2 (en) * 2017-10-04 2019-11-21 Accenture Global Solutions Limited Knowledge enabled data management system
CN109446341A (en) * 2018-10-23 2019-03-08 国家电网公司 The construction method and device of knowledge mapping
CN110489560A (en) * 2019-06-19 2019-11-22 民生科技有限责任公司 The little Wei enterprise portrait generation method and device of knowledge based graphical spectrum technology
CN110362692A (en) * 2019-07-23 2019-10-22 中南大学 A kind of academic circle construction method of knowledge based map
CN110633364A (en) * 2019-09-23 2019-12-31 中国农业大学 Graph database-based food safety knowledge graph construction method and display mode

Patent Citations (7)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US10496678B1 (en) * 2016-05-12 2019-12-03 Federal Home Loan Mortgage Corporation (Freddie Mac) Systems and methods for generating and implementing knowledge graphs for knowledge representation and analysis
JP2017207853A (en) * 2016-05-17 2017-11-24 日本電信電話株式会社 Table cell retrieval device, method and program
CN108549731A (en) * 2018-07-11 2018-09-18 中国电子科技集团公司第二十八研究所 A kind of knowledge mapping construction method based on ontology model
EP3608795A1 (en) * 2018-08-08 2020-02-12 Beijing Baidu Netcom Science and Technology Co., Ltd. Method and apparatus for generating knowledge graph, device and computer readable storage medium
CN109471947A (en) * 2018-11-06 2019-03-15 北京锐安科技有限公司 A kind of knowledge spectrogram construction method, device, equipment and the storage medium of data
CN109710725A (en) * 2018-12-13 2019-05-03 中国科学院信息工程研究所 A kind of Chinese table column label restoration methods and system based on text classification
CN110275966A (en) * 2019-07-01 2019-09-24 科大讯飞(苏州)科技有限公司 A kind of Knowledge Extraction Method and device

Also Published As

Publication number Publication date
CN111324609A (en) 2020-06-23

Similar Documents

Publication Publication Date Title
CN111324609B (en) Knowledge graph construction method and device, electronic equipment and storage medium
US9519701B2 (en) Generating information models in an in-memory database system
US20170169061A1 (en) NoSQL RELATIONAL DATABASE (RDB) DATA MOVEMENT
US10691584B2 (en) Behavior driven development integration with test tool
US10671671B2 (en) Supporting tuples in log-based representations of graph databases
CN107798123B (en) Knowledge base and establishing, modifying and intelligent question and answer methods, devices and equipment thereof
US20160026616A1 (en) Presenting dataset of spreadsheet in form based view
US9646004B2 (en) Hierarchical database report generation with automated query generation for placeholders
CN108388623B (en) ER relationship generation method and device, computer equipment and storage medium
US20080263142A1 (en) Meta Data Driven User Interface System and Method
US20170212942A1 (en) Database grid search methods and systems
US8352507B2 (en) Retrieval and storage of localized instances of data objects
CN106484699B (en) Method and device for generating database query field
CN111666740A (en) Flow chart generation method and device, computer equipment and storage medium
CN115576974B (en) Data processing method, device, equipment and medium
US20180357278A1 (en) Processing aggregate queries in a graph database
US10289620B1 (en) Reporting and data governance management
US8732655B2 (en) Systems and methods for metamodel transformation
CN110720097A (en) Functional equivalence of tuples and edges in graph databases
CN111367514B (en) Page card development method and device, computing device and storage medium
US20080263018A1 (en) Method and System for Mapping Business Objects to Relational Database Tables
EP4357934A1 (en) Classroom activity courseware producing method and apparatus, and storage medium and electronic device
CN116975254A (en) Visual data presentation method, visual data presentation device and storage medium
US10318524B2 (en) Reporting and data governance management
CN115687704A (en) Information display method and device, electronic equipment and computer readable storage medium

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination
REG Reference to a national code

Ref country code: HK

Ref legal event code: DE

Ref document number: 40024419

Country of ref document: HK

GR01 Patent grant
GR01 Patent grant