CN114936294B - Automatic knowledge graph construction method and device - Google Patents

Automatic knowledge graph construction method and device Download PDF

Info

Publication number
CN114936294B
CN114936294B CN202210748639.2A CN202210748639A CN114936294B CN 114936294 B CN114936294 B CN 114936294B CN 202210748639 A CN202210748639 A CN 202210748639A CN 114936294 B CN114936294 B CN 114936294B
Authority
CN
China
Prior art keywords
layer
data
task
entity
structured data
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Active
Application number
CN202210748639.2A
Other languages
Chinese (zh)
Other versions
CN114936294A (en
Inventor
殷建杰
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Beijing Longzhi Digital Technology Service Co Ltd
Original Assignee
Beijing Longzhi Digital Technology Service Co Ltd
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Beijing Longzhi Digital Technology Service Co Ltd filed Critical Beijing Longzhi Digital Technology Service Co Ltd
Priority to CN202210748639.2A priority Critical patent/CN114936294B/en
Publication of CN114936294A publication Critical patent/CN114936294A/en
Application granted granted Critical
Publication of CN114936294B publication Critical patent/CN114936294B/en
Active legal-status Critical Current
Anticipated expiration legal-status Critical

Links

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F16/00Information retrieval; Database structures therefor; File system structures therefor
    • G06F16/30Information retrieval; Database structures therefor; File system structures therefor of unstructured textual data
    • G06F16/36Creation of semantic tools, e.g. ontology or thesauri
    • G06F16/367Ontology
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F16/00Information retrieval; Database structures therefor; File system structures therefor
    • G06F16/20Information retrieval; Database structures therefor; File system structures therefor of structured data, e.g. relational data
    • G06F16/28Databases characterised by their database models, e.g. relational or object models
    • G06F16/284Relational databases
    • G06F16/288Entity relationship models
    • YGENERAL TAGGING OF NEW TECHNOLOGICAL DEVELOPMENTS; GENERAL TAGGING OF CROSS-SECTIONAL TECHNOLOGIES SPANNING OVER SEVERAL SECTIONS OF THE IPC; TECHNICAL SUBJECTS COVERED BY FORMER USPC CROSS-REFERENCE ART COLLECTIONS [XRACs] AND DIGESTS
    • Y02TECHNOLOGIES OR APPLICATIONS FOR MITIGATION OR ADAPTATION AGAINST CLIMATE CHANGE
    • Y02PCLIMATE CHANGE MITIGATION TECHNOLOGIES IN THE PRODUCTION OR PROCESSING OF GOODS
    • Y02P90/00Enabling technologies with a potential contribution to greenhouse gas [GHG] emissions mitigation
    • Y02P90/30Computing systems specially adapted for manufacturing

Abstract

The disclosure relates to the technical field of data processing, and provides an automatic knowledge graph construction method and device. The method comprises the following steps: acquiring an automatic construction task of a product knowledge graph; executing a first task script, and calling service structured data from a relational database; executing a second task script, and constructing a body layer and an instance layer of the product knowledge graph according to the service structural data; executing a third task script, constructing a mapping relation between the ontology layer and the instance layer based on a preset mapping strategy, and serializing the service structured data into a triplet sequence; and executing a fourth task script, synchronizing the triplet sequence into the GDB map database, generating points and edges of the GDB map database, and completing the construction of the product knowledge graph. The method and the system can automatically associate the entities such as the products, the product lines, the product projects, the staff and the like to form the product knowledge graph, effectively realize the construction of the bottom data of query analysis and reasoning, and quickly respond to the requirements of user questions and answers.

Description

Automatic knowledge graph construction method and device
Technical Field
The disclosure relates to the technical field of data processing, and in particular relates to an automatic knowledge graph construction method and device.
Background
In the prior art, a user wants to know a responsible person of a certain product and related information of the responsible person, and generally obtains the related product and the information of the responsible person which the user wants to know by inquiring through a relational database. However, this approach cannot respond quickly to the user's query request, and the query efficiency is low. In addition, the existing product knowledge graph is basically constructed in a manual operation mode, and once a data source is huge, a large amount of manpower and material resources are required to be consumed, so that the working efficiency is low.
Disclosure of Invention
In view of the above, the embodiments of the present disclosure provide an automated knowledge graph construction method and apparatus, so as to solve the problems that the prior art cannot respond to the query request of the user quickly, the query efficiency is low, and the construction work efficiency is low.
In a first aspect of an embodiment of the present disclosure, an automated knowledge graph construction method is provided, including:
the method comprises the steps of acquiring an automatic construction task of a product knowledge graph, wherein the automatic construction task at least comprises a first task node, a second task node, a third task node and a fourth task node which are sequentially connected in series;
executing a first task script corresponding to the first task node, and retrieving service structured data from a relational database, wherein the service structured data at least comprises product structured data, product line structured data, product item structured data and employee structured data;
After the first task script is executed, executing a second task script corresponding to a second task node, and constructing a body layer and an instance layer of the product knowledge graph according to the service structured data;
after the second task script is executed, executing a third task script corresponding to a third task node, and constructing a mapping relation between the ontology layer and the instance layer based on a preset mapping strategy so as to sequence the service structured data into a triplet sequence;
after the third task script is executed, executing a fourth task script corresponding to a fourth task node, synchronizing the triplet sequence into the GDB map database, generating points and edges of the GDB map database, and completing the construction of the product knowledge graph.
In a second aspect of the embodiments of the present disclosure, an automated knowledge-graph construction apparatus is provided, including:
the automatic construction task at least comprises a first task node, a second task node, a third task node and a fourth task node which are sequentially connected in series;
the first execution module is configured to execute a first task script corresponding to the first task node, and the service structured data is called from the relational database and at least comprises product structured data, product line structured data, product project structured data and employee structured data;
The second execution module is configured to execute a second task script corresponding to the second task node after the first task script is executed, and construct a body layer and an instance layer of the product knowledge graph according to the service structural data;
the third execution module is configured to execute a third task script corresponding to a third task node after the second task script is executed, and construct a mapping relation between the ontology layer and the instance layer based on a preset mapping strategy so as to sequence the service structured data into a triplet sequence;
and the fourth execution module is configured to execute a fourth task script corresponding to the fourth task node after the third task script is executed, synchronize the triplet sequence into the GDB map database, generate points and edges of the GDB map database and complete the construction of the product knowledge graph.
In a third aspect of the disclosed embodiments, an electronic device is provided, comprising a memory, a processor and a computer program stored in the memory and executable on the processor, the processor implementing the steps of the above method when executing the computer program.
In a fourth aspect of the disclosed embodiments, a computer-readable storage medium is provided, which stores a computer program which, when executed by a processor, implements the steps of the above-described method.
Compared with the prior art, the embodiment of the disclosure has the beneficial effects that: the method comprises the steps that through the automatic construction task of obtaining a product knowledge graph, the automatic construction task at least comprises a first task node, a second task node, a third task node and a fourth task node which are sequentially connected in series; executing a first task script corresponding to the first task node, and retrieving service structured data from a relational database, wherein the service structured data at least comprises product structured data, product line structured data, product item structured data and employee structured data; after the first task script is executed, executing a second task script corresponding to a second task node, and constructing a body layer and an instance layer of the product knowledge graph according to the service structured data; after the second task script is executed, executing a third task script corresponding to a third task node, and constructing a mapping relation between the ontology layer and the instance layer based on a preset mapping strategy so as to sequence the service structured data into a triplet sequence; after the third task script is executed, executing a fourth task script corresponding to a fourth task node, synchronizing the triplet sequence into a GDB diagram database, generating points and edges of the GDB diagram database, completing the construction of a product knowledge graph, well associating entities such as products, product lines, product items, staff and the like to form the product knowledge graph, effectively realizing the construction of bottom data of query analysis and reasoning, and rapidly responding to the requirements of user questions and answers; and an automatic construction mode is adopted, so that the construction work efficiency of the product knowledge graph can be greatly improved.
Drawings
In order to more clearly illustrate the technical solutions of the embodiments of the present disclosure, the drawings that are required for the embodiments or the description of the prior art will be briefly introduced below, and it is obvious that the drawings in the following description are only some embodiments of the present disclosure, and other drawings may be obtained according to these drawings without inventive effort for a person of ordinary skill in the art.
Fig. 1 is a schematic flow chart of an automated knowledge graph construction method according to an embodiment of the disclosure;
fig. 2 is a schematic structural diagram of an automated knowledge graph construction apparatus according to an embodiment of the present disclosure;
fig. 3 is a schematic structural diagram of an electronic device according to an embodiment of the disclosure.
Detailed Description
In the following description, for purposes of explanation and not limitation, specific details are set forth, such as particular system configurations, techniques, etc. in order to provide a thorough understanding of the disclosed embodiments. However, it will be apparent to one skilled in the art that the present disclosure may be practiced in other embodiments that depart from these specific details. In other instances, detailed descriptions of well-known systems, devices, circuits, and methods are omitted so as not to obscure the description of the present disclosure with unnecessary detail.
An automated construction method and apparatus for a product knowledge graph according to an embodiment of the present disclosure will be described in detail with reference to the accompanying drawings.
Fig. 1 is a schematic flow chart of an automated knowledge graph construction method according to an embodiment of the disclosure. As shown in fig. 1, the automated knowledge graph construction method includes:
step S101, an automatic construction task of a product knowledge graph is obtained, wherein the automatic construction task at least comprises a first task node, a second task node, a third task node and a fourth task node which are sequentially connected in series.
Products in a product knowledge graph are understood to be anything that can be supplied to the market, used and consumed by people, and that meets some kind of needs of people, and generally include tangible products and intangible products. The tangible products include, but are not limited to, electronic devices such as mobile phones and computers, living goods such as water cups and spoons, buildings, mechanical parts and the like. Intangible products include, but are not limited to, software products such as computer programs, dictionaries, information records, etc., services that meet certain needs of people, such as maintenance activities provided by consumer-supplied tangible products (e.g., automobiles), etc.
In the specific implementation process, corresponding automatic creation tasks of the product knowledge graph can be created for subdivided products in different industry fields. Exemplary software products in the instant messaging field include QQ, weChat, mail, sms, etc. If the target product is a WeChat, the automatic creation task of creating a WeChat knowledge graph corresponding to the WeChat can be correspondingly created.
As an example, a Kettle tool may be used to create an automated build task for product knowledge graph to implement the design of a series of tasks and build completion notification. The Kettle is an ETL (Extraction-Transformation-Loading) tool of a foreign open source, is written by pure java, can run on Windows, linux, unix, and is efficient and stable in data Extraction.
As an example, when a click operation of a preset product knowledge graph construction module (e.g., a "product knowledge graph construction" button) by a user is received, an automatic construction task of the product knowledge graph may be called from a task library, i.e., the automatic construction task of the product knowledge graph is obtained.
Step S102, executing a first task script corresponding to the first task node, and retrieving service structured data from a relational database, wherein the service structured data at least comprises product structured data, product line structured data, product item structured data and employee structured data.
The first task node, the second task node, the third task node and the fourth task node refer to one task link in the automatic construction task of the product knowledge graph. It is also understood that one of the procedures/steps of building a knowledge graph of a product.
The first script, the second script, the third script and the fourth script are a program stored in a plain text. Specifically, the functional programs are functional programs corresponding to the first task node, the second task node, the third task node and the fourth task node respectively.
The embodiment of the disclosure is a technical scheme for automatically constructing a product knowledge graph from top to bottom. In practical applications, various service data (i.e., multi-source service data) can be obtained through kafka consumption, hive, mySQL library synchronization, offline table importing and other modes, and stored in a relational database. Among other things, kafka is a high-throughput distributed publish-subscribe messaging system that can handle all action flow data for consumers in a web site. hive is a data warehouse tool based on Hadoop (distributed system infrastructure) for data extraction, conversion, and loading, which is a mechanism that can store, query, and analyze large-scale data stored in Hadoop. MySQL is a relational database management system.
When the automatic construction task of the product knowledge graph is obtained, task scripts corresponding to all task nodes can be sequentially executed according to the order of the task nodes of the automatic construction task. In this embodiment, the order of task nodes of the automation construction task is first task node→second task node→third task node→fourth task node.
First, executing a first task script corresponding to a first task node, and retrieving business structural data associated with a product in a product knowledge graph to be constructed from a relational database. For example, if the product in the product knowledge graph to be constructed is a "mobile phone", the service structured data associated with the "mobile phone" can be retrieved from the relational database. The business structured data at least comprises product structured data, product line structured data, product project structured data and employee structured data. Taking a product in a product knowledge graph to be constructed as a mobile phone as an example, the product structured data is usually a data table for collecting and recording information related to the mobile phone, and the data table can be an EXCEL table comprising information such as mobile phone ID, mobile phone code, mobile phone name and the like. The product line structured data is usually a data table for collecting and recording information related to a 'mobile phone production line', and the data table can be an EXCEL table comprising information such as mobile phone production line ID, mobile phone production line code, mobile phone production line name, mobile phone production line responsible person and the like. The product item structured data is typically a data table in which information related to "mobile phone item" is collected and recorded, and the data table may be an EXCEL table including information of mobile phone item ID, mobile phone item code, mobile phone item name, mobile phone item responsible person, etc. Employee structured data is typically a data sheet that gathers, records, and/or stores information about "employees," which may be EXCEL forms that include employee ID, employee code, employee post, employee name, department to which the employee pertains, and/or the like.
And step S103, after the first task script is executed, executing a second task script corresponding to the second task node, and constructing an ontology layer and an instance layer of the product knowledge graph according to the service structural data.
As an example, after the first task script corresponding to the first task node is executed, the method automatically jumps to the second task node and executes the second task script corresponding to the second task node. Specifically, ontology and instantiation layers of a product knowledge graph can be built based on a protein (knowledge graph tool).
Step S104, after the second task script is executed, executing a third task script corresponding to a third task node, and constructing a mapping relation between the ontology layer and the instance layer based on a preset mapping strategy so as to sequence the service structured data into a triplet sequence.
The preset mapping policy refers to a corresponding rule that each element in the body layer corresponds to each element in the entity layer one by one, so as to establish an association relationship (i.e. mapping relationship) between the elements.
As an example, after the execution of the second task script corresponding to the second task node, the method automatically jumps to a third task node, executes the third task script corresponding to the third task node, and establishes a mapping relationship between the ontology layer and the instance layer according to a preset rule of correspondence between each element in the ontology layer and each element in the entity layer, thereby converting the service structured data into a triplet sequence.
Step S105, after the third task script is executed, executing a fourth task script corresponding to a fourth task node, synchronizing the triplet sequence into the GDB map database, generating points and edges of the GDB map database, and completing the construction of the product knowledge graph.
The GDB (graphic Database, GDB) Graph Database is a real-time and reliable online Database service for supporting a Property Graph (PG) Graph model and processing highly connected data query and storage.
As an example, after the third task script corresponding to the third task node is executed, the method automatically jumps to the fourth task node, executes the fourth task script corresponding to the fourth task node, and synchronizes the triplet sequence into the GDB map database.
According to the technical scheme provided by the embodiment of the disclosure, through the automatic construction task of obtaining the product knowledge graph, the automatic construction task at least comprises a first task node, a second task node, a third task node and a fourth task node which are sequentially connected in series; executing a first task script corresponding to the first task node, and retrieving service structured data from a relational database, wherein the service structured data at least comprises product structured data, product line structured data, product item structured data and employee structured data; after the first task script is executed, executing a second task script corresponding to a second task node, and constructing a body layer and an instance layer of the product knowledge graph according to the service structured data; after the second task script is executed, executing a third task script corresponding to a third task node, and constructing a mapping relation between the ontology layer and the instance layer based on a preset mapping strategy so as to sequence the service structured data into a triplet sequence; after the third task script is executed, executing a fourth task script corresponding to a fourth task node, synchronizing the triplet sequence into a GDB diagram database, generating points and edges of the GDB diagram database, completing the construction of a product knowledge graph, well associating entities such as products, product lines, product items, staff and the like to form the product knowledge graph, effectively realizing the construction of bottom data of query analysis and reasoning, and rapidly responding to the requirements of user questions and answers; and an automatic construction mode is adopted, so that the construction work efficiency of the product knowledge graph can be greatly improved.
In some embodiments, the service structured data is retrieved from a relational database, specifically including:
retrieving multi-source business data from a relational database;
and carrying out data cleaning, normalization and standardization processing on the multi-source service data to obtain service structured data.
The multi-source business data comprises business data of different business types, different storage modes, different data sources and the like. The business types include, but are not limited to, different business processes of the same product (such as hardware development design, sales of mobile phones, software development design, sales of mobile phones, etc.), the same business process of different products, different business processes of different products, etc. Storage means include, but are not limited to, distributed local storage, distributed cloud storage, centralized storage, and the like. Data sources generally refer to providers of data, for example, structured data from one line of cell phone products may be included, collated, or otherwise provided from a plurality of different cell phone manufacturers.
Data cleansing, which is mainly a procedure for finding and correcting identifiable errors in a data file, includes checking data consistency, processing invalid values, missing values, and the like.
Normalization is a processing method for eliminating dimension influence among data indexes of multi-source business data so that the data indexes are comparable.
The standardization means that after the original data is subjected to data standardization treatment, all data indexes are in the same order of magnitude, and the method is suitable for comprehensive comparison and evaluation.
In some embodiments, the first task node includes a first child node, a second child node, and a third child node.
Performing data cleaning, normalization and standardization processing on the multi-source service data to obtain service structured data, wherein the method comprises the following steps:
executing a first script corresponding to the first child node, and performing data cleaning on the multi-source service data to obtain first processing data;
after the first script is executed, executing a second script corresponding to the second child node, and carrying out normalization processing on the first processing data to obtain second processing data;
after the second script is executed, executing a third script corresponding to a third child node, and carrying out standardized processing on the second processing data to obtain service structured data.
Specifically, the first task node comprises a first sub-node, a second sub-node and a third sub-node which are sequentially connected in series. And executing a first script corresponding to the first child node in sequence, and performing data cleaning on the multi-source service data to check the consistency of data indexes of each service data, eliminating invalid values and error values in the data indexes, and supplementing default values in the data indexes to obtain first processing data. After the first script is executed, the second script is automatically transferred to a second child node, the second script is executed, and normalization processing is carried out on the first processing data to obtain second processing data. After the second script is executed, automatically jumping to a third child node, executing the third script, carrying out standardization processing on the second processing data, and finishing to obtain service structured data (such as an EXCEL form).
Through carrying out data cleaning, normalization and standardization processing on the multi-source service data, the library table design can be standardized, the quality of the data is ensured, and meanwhile, the quality of a subsequently constructed product knowledge graph can be ensured.
In other embodiments, the data cleaning, normalizing and normalizing process is performed on the multi-source service data to obtain service structured data, including:
and executing the first script, the second script and the third script in parallel, and performing synchronous data cleaning, normalization and standardization processing on the multi-source service data to obtain service structured data.
In this embodiment, the first sub-node, the second sub-node, and the third sub-node in the first task node may be parallel nodes, that is, the first script, the second script, and the third script corresponding to the first sub-node, the second sub-node, and the third sub-node may be executed simultaneously, and the multi-source service data is subjected to synchronous data cleaning, normalization, and standardization processing, and is consolidated to obtain service structured data.
In some embodiments, the ontology layer includes an entity layer, an entity attribute layer, and an entity relationship layer.
According to the service structured data, constructing a body layer of the product knowledge graph, which specifically comprises the following steps:
Analyzing the data table structure, the meaning and the association relation of the data fields of the service structured data to extract and obtain an information set for describing the service structured data, wherein the information set comprises entities, entity attributes and entity relations;
according to the entities in the information set, constructing an entity layer of the product knowledge graph;
constructing an entity attribute layer of the product knowledge graph according to the entity attributes in the information set;
and constructing an entity relationship layer of the product knowledge graph according to the entity relationship in the information set.
As an example, after the service structural data is obtained, an ontology layer of a product knowledge graph can be constructed by means of a protein (knowledge graph tool) to form a Schema file. The ontology layer mainly comprises three layers, namely an entity layer (owl: class), an entity attribute layer (owl: data attribute) and an entity relation layer (owl: object attribute, object). The entities herein mainly include product items, product lines, products and employees.
In an exemplary embodiment, first, by analyzing the data table structure, meaning and association relation of the data field of the service structured data, the entity, entity attribute and entity relation of the service structured data are extracted, that is, the information set for describing the data attribute of the service structured data is obtained. Then, the entity in the information set can be stored in the entity layer of the product knowledge graph, the entity attribute is stored in the entity attribute layer of the product knowledge graph, and the entity relationship is stored in the entity relationship layer of the product knowledge graph, so that the body layer of the product knowledge graph corresponding to the service structured data is constructed.
In some embodiments, based on a preset mapping policy, a mapping relationship between an ontology layer and an entity layer is constructed to sequence service structured data into a triplet sequence, which may specifically include:
modifying the initial mapping file according to the entity layer, the entity attribute layer and the entity relation layer of the body layer to obtain a modified mapping file;
and establishing a mapping relation between the ontology layer and the entity layer according to the modified mapping file so as to sequence the service structured data into a triplet sequence.
The initial Mapping file may be a D2RQ initialized Mapping (Mapping) file. D2rq is a way to access data (e.g., business structured data) in a relational database in a virtual RDF manner, i.e., without explicitly converting the data into RDF form. RDF is an XML application that processes metadata, which is "data describing data" or "information describing information".
And modifying the initial mapping file according to the entity layer, the entity attribute layer and the entity relation layer of the ontology layer to obtain a modified mapping file. Specifically, the Mapping (Mapping) file initialized by D2RQ may be modified by the Schema file obtained in the above step, that is, some Mapping relationships in the initialized Mapping file are adjusted by the entity layer, the entity attribute layer and the entity relationship layer in the entity layer, the entity attribute and the entity relationship layer, so as to obtain the modified Mapping file. For example, entities are mapped to a data table, entity attributes are mapped to each row of data of the data table, and entity relationships are mapped to each column of data of the data table.
And then, according to the modified mapping file, establishing a mapping relation between the ontology layer and the instance layer so as to sequence the service structured data into a triplet sequence. The instance layer comprises a data table of service structured data, each data of the data table and each column of data of the data table. Specifically, a first mapping relationship between an entity of an entity layer of the body layer and a data table can be established; establishing a second mapping relation between the entity attribute of the entity attribute layer of the ontology layer and each data of the data table; then, a third mapping relation between the entity relation of the entity relation layer of the body layer and each data of the data table is established; and finally, serializing the service structured data into a triplet sequence according to the first mapping relation, the second mapping relation and the third mapping relation.
The specific mapping modes are: each data table is mapped into a class, each row is a resource, and each column is a property of the resource. According to the mapping mode, service structured data can be serialized into an SPO triplet instance, and a triplet sequence is obtained.
In some embodiments, synchronizing the triplet sequence into the GDB map database and generating points and edges of the GDB map database includes:
Analyzing and classifying the triplet sequence to obtain a triplet sequence A and a triplet sequence B, wherein the triplet sequence A comprises an entity, an entity attribute and an attribute value, and the triplet sequence B comprises an entity, a relationship and an entity;
synchronizing the triplet sequence A and the triplet sequence B into the GDB map database, generating points of the GDB map database based on the triplet sequence A, and generating edges of the GDB map database based on the triplet sequence B.
Based on the data file of the triplet sequence, the triplet sequence is separated into two categories by python script parsing, one category is < entity-attribute value >, i.e. triplet sequence a, e.g. < screen-screen size-5.1 inches >, and the other category is < entity-relation-entity >, i.e. triplet sequence B, e.g. < handset-composition-battery >. By analyzing and classifying the triplet sequence, not only is the precipitation of triplet data realized, but also the open source tool DataX (an offline data synchronization tool/platform) is conveniently used for timing synchronization of the data, and the method is safe, convenient and efficient.
In practical application, the configuration file script can be executed by configuring the DataX configuration file, the triplet sequence A and the triplet sequence B can be synchronized into the GDB graph database, points of the GDB graph database are generated based on the triplet sequence A, and edges of the GDB graph database are generated based on the triplet sequence B, so that the construction of the product knowledge graph is completed.
By integrating the flow, the product knowledge graph corresponding to various business structured data can be automatically constructed, and the query and the question and answer of a user aiming at products, product projects, product lines and staff can be rapidly responded through the product knowledge graph, for example, one-hop and multi-hop query and question and answer of product role seekers, product project seekers and the like can be rapidly responded, so that the convenient and fast associated query is realized.
Any combination of the above optional solutions may be adopted to form an optional embodiment of the present application, which is not described herein in detail.
The following are device embodiments of the present disclosure that may be used to perform method embodiments of the present disclosure. For details not disclosed in the embodiments of the apparatus of the present disclosure, please refer to the embodiments of the method of the present disclosure.
Fig. 2 is a schematic diagram of an automated knowledge graph construction apparatus according to an embodiment of the disclosure. As shown in fig. 2, the automated knowledge graph construction apparatus includes:
the acquiring module 201 is configured to acquire an automatic construction task of the product knowledge graph, wherein the automatic construction task at least comprises a first task node, a second task node, a third task node and a fourth task node which are sequentially connected in series;
a first execution module 202 configured to execute a first task script corresponding to a first task node, and retrieve service structured data from a relational database, the service structured data including at least product structured data, product line structured data, product item structured data, and employee structured data;
The second execution module 203 is configured to execute a second task script corresponding to the second task node after the first task script is executed, and construct an ontology layer and an instance layer of the product knowledge graph according to the service structured data;
the third execution module 204 is configured to execute a third task script corresponding to the third task node after the second task script is executed, and construct a mapping relationship between the ontology layer and the instance layer based on a preset mapping policy so as to sequence the service structured data into a triplet sequence;
and the fourth execution module 205 is configured to execute a fourth task script corresponding to the fourth task node after the third task script is executed, synchronize the triplet sequence into the GDB map database, and generate points and edges of the GDB map database to complete the construction of the product knowledge graph.
In some embodiments, the ontology layer includes an entity layer, an entity attribute layer, and an entity relationship layer. The third execution module 204 includes:
the modification unit is configured to modify the initial mapping file according to the entity layer plane, the entity attribute layer plane and the entity relation layer plane of the body layer to obtain a modified mapping file;
The establishing unit is configured to establish a mapping relation between the entity layer and the entity layer according to the modified mapping file so as to sequence the service structured data into a triplet sequence.
In some embodiments, the instance layer includes a data table of business structured data, each row of the data table, and each column of the data table.
The above-mentioned establishing unit may be specifically configured to:
establishing a first mapping relation between an entity layer surface of the body layer and the data table;
establishing a second mapping relation between the entity attribute layer of the ontology layer and each data of the data table;
establishing a third mapping relation between the entity relation layer of the body layer and each column of data of the data table;
and serializing the service structured data into a triplet sequence according to the first mapping relation, the second mapping relation and the third mapping relation.
In some embodiments, the fourth execution module 205 includes:
the analysis unit is configured to analyze and classify the triplet sequence to obtain a triplet sequence A and a triplet sequence B, wherein the triplet sequence A comprises an entity, an entity attribute and an attribute value, and the triplet sequence B comprises an entity, a relationship and an entity;
And a synchronizing unit configured to synchronize the triplet sequence A and the triplet sequence B into the GDB map database, and generate points of the GDB map database based on the triplet sequence A, and generate edges of the GDB map database based on the triplet sequence B.
In some embodiments, the ontology layer includes an entity layer, an entity attribute layer, and an entity relationship layer. According to the service structured data, constructing an ontology layer of a product knowledge graph, which comprises the following steps:
analyzing the data table structure, the meaning and the association relation of the data fields of the service structured data to extract and obtain an information set for describing the service structured data, wherein the information set comprises entities, entity attributes and entity relations;
according to the entities in the information set, constructing an entity layer of the product knowledge graph;
constructing an entity attribute layer of the product knowledge graph according to the entity attributes in the information set;
and constructing an entity relationship layer of the product knowledge graph according to the entity relationship in the information set.
In some embodiments, the first execution module 202 includes:
a retrieval unit configured to retrieve multi-source service data from a relational database;
the processing unit is configured to perform data cleaning, normalization and standardization processing on the multi-source service data to obtain service structured data.
In some embodiments, the first task node includes a first sub-node, a second sub-node, and a third sub-node. The processing unit may be specifically configured to:
the first processing unit is configured to execute a first script corresponding to the first child node, and perform data cleaning on the multi-source service data to obtain first processing data;
the second processing unit is configured to execute a second script corresponding to the second child node after the first script is executed, and perform normalization processing on the first processing data to obtain second processing data;
the third processing unit is configured to execute a third script corresponding to a third child node after the second script is executed, and perform standardized processing on the second processing data to obtain service structured data;
or,
the parallel execution unit is configured to execute the first script, the second script and the third script in parallel, and perform synchronous data cleaning, normalization and standardization processing on the multi-source service data to obtain service structured data.
It should be understood that the sequence number of each step in the foregoing embodiment does not mean that the execution sequence of each process should be determined by the function and the internal logic of each process, and should not constitute any limitation on the implementation process of the embodiments of the disclosure.
Fig. 3 is a schematic diagram of an electronic device 3 provided by an embodiment of the present disclosure. As shown in fig. 3, the electronic apparatus 3 of this embodiment includes: a processor 301, a memory 302 and a computer program 303 stored in the memory 302 and executable on the processor 301. The steps of the various method embodiments described above are implemented when the processor 301 executes the computer program 303. Alternatively, the processor 301, when executing the computer program 303, performs the functions of the modules/units in the above-described apparatus embodiments.
The electronic device 3 may be an electronic device such as a desktop computer, a notebook computer, a palm computer, or a cloud server. The electronic device 3 may include, but is not limited to, a processor 301 and a memory 302. It will be appreciated by those skilled in the art that fig. 3 is merely an example of the electronic device 3 and is not limiting of the electronic device 3 and may include more or fewer components than shown, or different components.
The processor 301 may be a central processing unit (Central Processing Unit, CPU) or other general purpose processor, digital signal processor (Digital Signal Processor, DSP), application specific integrated circuit (Application Specific Integrated Circuit, ASIC), field programmable gate array (Field-Programmable Gate Array, FPGA) or other programmable logic device, discrete gate or transistor logic device, discrete hardware components, or the like.
The memory 302 may be an internal storage unit of the electronic device 3, for example, a hard disk or a memory of the electronic device 3. The memory 302 may also be an external storage device of the electronic device 3, for example, a plug-in hard disk, a Smart Media Card (SMC), a Secure Digital (SD) Card, a Flash memory Card (Flash Card) or the like, which are provided on the electronic device 3. The memory 302 may also include both internal storage units and external storage devices of the electronic device 3. The memory 302 is used to store computer programs and other programs and data required by the electronic device.
It will be apparent to those skilled in the art that, for convenience and brevity of description, only the above-described division of the functional units and modules is illustrated, and in practical application, the above-described functional distribution may be performed by different functional units and modules according to needs, i.e. the internal structure of the apparatus is divided into different functional units or modules to perform all or part of the above-described functions. The functional units and modules in the embodiment may be integrated in one processing unit, or each unit may exist alone physically, or two or more units may be integrated in one unit, where the integrated units may be implemented in a form of hardware or a form of a software functional unit.
The integrated modules/units, if implemented in the form of software functional units and sold or used as stand-alone products, may be stored in a computer readable storage medium. Based on such understanding, the present disclosure may implement all or part of the flow of the method of the above-described embodiments, or may be implemented by a computer program to instruct related hardware, and the computer program may be stored in a computer readable storage medium, where the computer program, when executed by a processor, may implement the steps of the method embodiments described above. The computer program may comprise computer program code, which may be in source code form, object code form, executable file or in some intermediate form, etc. The computer readable medium may include: any entity or device capable of carrying computer program code, a recording medium, a U disk, a removable hard disk, a magnetic disk, an optical disk, a computer Memory, a Read-Only Memory (ROM), a random access Memory (Random Access Memory, RAM), an electrical carrier signal, a telecommunications signal, a software distribution medium, and so forth. It should be noted that the content of the computer readable medium can be appropriately increased or decreased according to the requirements of the jurisdiction's jurisdiction and the patent practice, for example, in some jurisdictions, the computer readable medium does not include electrical carrier signals and telecommunication signals according to the jurisdiction and the patent practice.
The above embodiments are merely for illustrating the technical solution of the present disclosure, and are not limiting thereof; although the present disclosure has been described in detail with reference to the foregoing embodiments, it should be understood by those of ordinary skill in the art that: the technical scheme described in the foregoing embodiments can be modified or some technical features thereof can be replaced by equivalents; such modifications and substitutions do not depart from the spirit and scope of the technical solutions of the embodiments of the disclosure, and are intended to be included in the scope of the present disclosure.

Claims (8)

1. An automated knowledge graph construction method is characterized by comprising the following steps:
the method comprises the steps of obtaining an automatic construction task of a product knowledge graph, wherein the automatic construction task at least comprises a first task node, a second task node, a third task node and a fourth task node which are sequentially connected in series;
executing a first task script corresponding to the first task node, and retrieving service structured data from a relational database, wherein the service structured data at least comprises product structured data, product line structured data, product item structured data and employee structured data;
after the first task script is executed, executing a second task script corresponding to the second task node, and constructing a body layer and an instance layer of the product knowledge graph according to the service structural data;
After the second task script is executed, executing a third task script corresponding to the third task node, and constructing a mapping relation between the ontology layer and the instance layer based on a preset mapping strategy so as to sequence the service structured data into a triplet sequence;
after the third task script is executed, executing a fourth task script corresponding to the fourth task node, synchronizing the triplet sequence into a GDB graph database, generating points and edges of the GDB graph database, and completing the construction of the product knowledge graph;
the ontology layer comprises an entity layer, an entity attribute layer and an entity relation layer;
based on a preset mapping strategy, constructing a mapping relation between the ontology layer and the instance layer to sequence the service structured data into a triplet sequence, wherein the mapping relation comprises the following steps:
modifying the initial mapping file according to the entity layer, the entity attribute layer and the entity relation layer of the ontology layer to obtain a modified mapping file;
according to the modified mapping file, a mapping relation between the ontology layer and the instance layer is established so as to sequence the service structured data into a triplet sequence;
The instance layer comprises a data table of the service structured data, each row of data of the data table and each column of data of the data table;
and establishing a mapping relation between the ontology layer and the instance layer according to the modified mapping file so as to sequence the service structured data into a triplet sequence, wherein the mapping relation comprises the following steps:
establishing a first mapping relation between an entity layer surface of the ontology layer and the data table;
establishing a second mapping relation between the entity attribute layer of the ontology layer and each data of the data table;
establishing a third mapping relation between the entity relation layer surface of the ontology layer and each column of data of the data table;
and serializing the service structured data into a triplet sequence according to the first mapping relation, the second mapping relation and the third mapping relation.
2. The method of claim 1, wherein synchronizing the triplet sequence into a GDB-map database and generating points and edges of the GDB-map database comprises:
analyzing and classifying the triplet sequence to obtain a triplet sequence A and a triplet sequence B, wherein the triplet sequence A comprises an entity, an entity attribute and an attribute value, and the triplet sequence B comprises an entity, a relationship and an entity;
Synchronizing the triplet sequence A and the triplet sequence B into a GDB map database, generating points of the GDB map database based on the triplet sequence A, and generating edges of the GDB map database based on the triplet sequence B.
3. The method of claim 1, wherein the ontology layer comprises an entity layer, an entity attribute layer, and an entity relationship layer;
according to the service structured data, constructing an ontology layer of the product knowledge graph, which comprises the following steps:
analyzing the data table structure, the meaning and the association relation of the data fields of the service structured data to extract and obtain an information set for describing the service structured data, wherein the information set comprises entities, entity attributes and entity relations;
constructing an entity layer of the product knowledge graph according to the entities in the information set;
constructing an entity attribute layer of the product knowledge graph according to the entity attributes in the information set;
and constructing an entity relationship layer of the product knowledge graph according to the entity relationship in the information set.
4. The method of claim 1, wherein retrieving business structured data from a relational database comprises:
Retrieving multi-source business data from a relational database;
and carrying out data cleaning, normalization and standardization processing on the multi-source service data to obtain service structured data.
5. The method of claim 4, wherein the first task node comprises a first child node, a second child node, and a third child node;
performing data cleaning, normalization and standardization processing on the multi-source service data to obtain service structured data, wherein the method comprises the following steps:
executing a first script corresponding to the first child node, and performing data cleaning on the multi-source service data to obtain first processing data;
after the first script is executed, executing a second script corresponding to the second child node, and carrying out normalization processing on the first processing data to obtain second processing data;
after the second script is executed, executing a third script corresponding to the third child node, and carrying out standardized processing on the second processing data to obtain service structured data;
or,
and executing the first script, the second script and the third script in parallel, and performing synchronous data cleaning, normalization and standardization processing on the multi-source service data to obtain service structured data.
6. An automated knowledge graph construction device is characterized by comprising:
the automatic construction task at least comprises a first task node, a second task node, a third task node and a fourth task node which are sequentially connected in series;
the first execution module is configured to execute a first task script corresponding to the first task node, and retrieve service structured data from a relational database, wherein the service structured data at least comprises product structured data, product line structured data, product item structured data and employee structured data;
the second execution module is configured to execute a second task script corresponding to the second task node after the first task script is executed, and construct a body layer and an instance layer of the product knowledge graph according to the service structured data;
the third execution module is configured to execute a third task script corresponding to the third task node after the second task script is executed, and construct a mapping relation between the ontology layer and the instance layer based on a preset mapping strategy so as to sequence the service structured data into a triplet sequence;
The fourth execution module is configured to execute a fourth task script corresponding to the fourth task node after the third task script is executed, synchronize the triplet sequence into a GDB graph database, and generate points and edges of the GDB graph database to complete the construction of the product knowledge graph;
the ontology layer comprises an entity layer, an entity attribute layer and an entity relation layer;
based on a preset mapping strategy, constructing a mapping relation between the ontology layer and the instance layer to sequence the service structured data into a triplet sequence, wherein the mapping relation comprises the following steps:
modifying the initial mapping file according to the entity layer, the entity attribute layer and the entity relation layer of the ontology layer to obtain a modified mapping file;
according to the modified mapping file, a mapping relation between the ontology layer and the instance layer is established so as to sequence the service structured data into a triplet sequence;
the instance layer comprises a data table of the service structured data, each row of data of the data table and each column of data of the data table;
and establishing a mapping relation between the ontology layer and the instance layer according to the modified mapping file so as to sequence the service structured data into a triplet sequence, wherein the mapping relation comprises the following steps:
Establishing a first mapping relation between an entity layer surface of the ontology layer and the data table;
establishing a second mapping relation between the entity attribute layer of the ontology layer and each data of the data table;
establishing a third mapping relation between the entity relation layer surface of the ontology layer and each column of data of the data table;
and serializing the service structured data into a triplet sequence according to the first mapping relation, the second mapping relation and the third mapping relation.
7. An electronic device comprising a memory, a processor and a computer program stored in the memory and executable on the processor, characterized in that the processor implements the steps of the method according to any one of claims 1 to 5 when the computer program is executed.
8. A computer readable storage medium storing a computer program, characterized in that the computer program when executed by a processor implements the steps of the method according to any one of claims 1 to 5.
CN202210748639.2A 2022-06-28 2022-06-28 Automatic knowledge graph construction method and device Active CN114936294B (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN202210748639.2A CN114936294B (en) 2022-06-28 2022-06-28 Automatic knowledge graph construction method and device

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN202210748639.2A CN114936294B (en) 2022-06-28 2022-06-28 Automatic knowledge graph construction method and device

Publications (2)

Publication Number Publication Date
CN114936294A CN114936294A (en) 2022-08-23
CN114936294B true CN114936294B (en) 2024-04-16

Family

ID=82869129

Family Applications (1)

Application Number Title Priority Date Filing Date
CN202210748639.2A Active CN114936294B (en) 2022-06-28 2022-06-28 Automatic knowledge graph construction method and device

Country Status (1)

Country Link
CN (1) CN114936294B (en)

Citations (4)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN110019844A (en) * 2019-02-20 2019-07-16 众安信息技术服务有限公司 A kind of insurance industry knowledge mapping question answering system construction method and device
CN111488465A (en) * 2020-04-14 2020-08-04 税友软件集团股份有限公司 Knowledge graph construction method and related device
CN112699248A (en) * 2020-12-24 2021-04-23 厦门市美亚柏科信息股份有限公司 Knowledge ontology construction method, terminal equipment and storage medium
CN114638160A (en) * 2022-05-11 2022-06-17 西南交通大学 Knowledge service method for complex equipment digital twin model

Family Cites Families (1)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
AU2018241092B2 (en) * 2017-10-04 2019-11-21 Accenture Global Solutions Limited Knowledge enabled data management system

Patent Citations (4)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN110019844A (en) * 2019-02-20 2019-07-16 众安信息技术服务有限公司 A kind of insurance industry knowledge mapping question answering system construction method and device
CN111488465A (en) * 2020-04-14 2020-08-04 税友软件集团股份有限公司 Knowledge graph construction method and related device
CN112699248A (en) * 2020-12-24 2021-04-23 厦门市美亚柏科信息股份有限公司 Knowledge ontology construction method, terminal equipment and storage medium
CN114638160A (en) * 2022-05-11 2022-06-17 西南交通大学 Knowledge service method for complex equipment digital twin model

Non-Patent Citations (2)

* Cited by examiner, † Cited by third party
Title
学习资源的本体建模研究;李静;周竹荣;甘诚智;;计算机工程与设计;20080116(第01期);全文 *
调度自动化系统知识图谱的构建与应用;李新鹏;徐建航;郭子明;李军良;宁文元;王震学;;中国电力;20181208(第02期);全文 *

Also Published As

Publication number Publication date
CN114936294A (en) 2022-08-23

Similar Documents

Publication Publication Date Title
CN111400326B (en) Smart city data management system and method thereof
CN109272155A (en) A kind of corporate behavior analysis system based on big data
Grolinger et al. Knowledge as a service framework for disaster data management
US10303690B1 (en) Automated identification and classification of critical data elements
CN102508919A (en) Data processing method and system
CN113506117A (en) Aquatic product block chain traceability optimization system and method
CN111190580A (en) Spinach cloud technology platform based on micro-service architecture
WO2021213154A1 (en) Blockchain data processing method, system, terminal, and computer-readable storage medium
Tao et al. Research on marketing management system based on independent ERP and business BI using fuzzy TOPSIS
CN115934856A (en) Method and system for constructing comprehensive energy data assets
Grolinger et al. Collaborative knowledge as a service applied to the disaster management domain
Jiang et al. A domain ontology approach in the ETL process of data warehousing
CN114218291A (en) Portrait generation method, apparatus, device and storage medium based on target object
CN114297290A (en) Electric power marketing data sharing application platform
CN114936294B (en) Automatic knowledge graph construction method and device
US11836637B2 (en) Construction method of human-object-space interaction model based on knowledge graph
CN116881376A (en) Automatic exploration method for enterprise data assets
CN106648672A (en) Method and system for developing and running big data
US10397326B2 (en) IRC-Infoid data standardization for use in a plurality of mobile applications
Khan et al. Bigdata analytics techniques to obtain valuable knowledge
CN113342807A (en) Knowledge graph based on mixed database and construction method thereof
CN112633621A (en) Power grid enterprise management decision system and method based on PAAS platform
CN113392114B (en) Intelligent relationship management and intelligent data fusion method based on business object
Su et al. Research on Enterprise Digital Operation Management Method Based on Digital Middle Platform
Hu et al. Web2. 0-based enterprise knowledge management model

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination
GR01 Patent grant
GR01 Patent grant