CN112685405A

CN112685405A - Data management method, system, equipment and medium based on knowledge graph

Info

Publication number: CN112685405A
Application number: CN202011518155.6A
Authority: CN
Inventors: 陈翔
Original assignee: Fujia Newland Software Engineering Co ltd
Current assignee: Fujia Newland Software Engineering Co ltd
Priority date: 2020-12-21
Filing date: 2020-12-21
Publication date: 2021-04-20

Abstract

The invention provides a data management method, a system, equipment and a medium based on a knowledge graph in the technical field of big data, wherein the method comprises the following steps: step S10, the server creates a warehouse table, analyzes the task information to obtain the incidence relation between the data and stores the incidence relation into the warehouse table; step S20, the server acquires big data to be managed and preprocesses the big data; step S30, the server reads the incidence relation from the warehouse table, and generates a corresponding knowledge map by using the incidence relation and the preprocessed big data; and step S40, managing big data based on the knowledge graph. The invention has the advantages that: the quality of data management is greatly improved.

Description

Data management method, system, equipment and medium based on knowledge graph

Technical Field

The invention relates to the technical field of big data, in particular to a data management method, a system, equipment and a medium based on a knowledge graph.

Background

With the continuous improvement of big data analysis service capability, the conversion of enterprise operation analysis, application modes and architectures is promoted, and a business analysis scene taking dynamic, visual and correlation analysis as a core gradually becomes a main means. Thus, a need arises to manage data metrics for large data.

For the management of data indexes, traditionally, only the management is performed on the level defined by the index specification, and the index relationship and the index caliber are not managed, that is, only the definition of the data indexes is managed, which results in the following disadvantages: the conditions that the diameters of the indexes are not uniform, the same name is not synonymous, and the same name is synonymous are easy to occur, and after the diameter of the data index is changed, the workload of application and adjustment is huge, or the true floor diameter is not changed, and the conditions that the surface and the inside are different frequently occur.

Therefore, how to provide a data management method, system, device and medium based on the knowledge graph to improve the quality of data management becomes a problem to be solved urgently.

Disclosure of Invention

The technical problem to be solved by the invention is to provide a data management method, a system, equipment and a medium based on a knowledge graph, so that the quality of data management is improved.

In a first aspect, the present invention provides a data management method based on a knowledge graph, including the following steps:

step S10, the server creates a warehouse table, analyzes the task information to obtain the incidence relation between the data and stores the incidence relation into the warehouse table;

step S20, the server acquires big data to be managed and preprocesses the big data;

step S30, the server reads the incidence relation from the warehouse table, and generates a corresponding knowledge map by using the incidence relation and the preprocessed big data;

and step S40, managing big data based on the knowledge graph.

Further, the step S10 is specifically:

the server creates a warehouse table, synchronizes task information through ETL, analyzes the task information by using a metadata management tool to obtain an incidence relation between data, and stores the incidence relation into the warehouse table; the incidence relation is a generation relation, a dependency relation and a data category among the data.

Further, the step S20 is specifically:

the method comprises the steps that a server obtains big data to be managed, word segmentation processing is conducted on the big data through a machine learning technology, a plurality of word segments are generated, and index names and index definitions are extracted through the word segments.

Further, the step S30 is specifically:

and the server reads the incidence relation from the warehouse table, takes the index name as a node of the knowledge graph, takes the incidence relation as an edge connected among the nodes, and further generates a corresponding knowledge graph based on the nodes and the edge.

In a second aspect, the present invention provides a data management system based on knowledge-graph, comprising the following modules:

the incidence relation analysis module is used for creating a warehouse table by the server, analyzing the task information to obtain the incidence relation among the data and storing the incidence relation into the warehouse table;

the big data preprocessing module is used for acquiring big data to be managed by the server and preprocessing the big data;

the knowledge map generation module is used for reading the association relation from the warehouse table by the server and generating a corresponding knowledge map by using the association relation and the preprocessed big data;

and the big data management module is used for managing the big data based on the knowledge graph.

Further, the association analysis module specifically includes:

Further, the big data preprocessing module specifically comprises:

Further, the knowledge graph generation module specifically comprises:

In a third aspect, the present invention provides a knowledge-graph based data management apparatus comprising a memory, a processor and a computer program stored on the memory and executable on the processor, the processor implementing the method of the first aspect when executing the program.

In a fourth aspect, the present invention provides a knowledge-graph based data management medium having stored thereon a computer program which, when executed by a processor, performs the method of the first aspect.

One or more technical solutions provided in the embodiments of the present invention have at least the following technical effects or advantages:

the method comprises the steps of obtaining an association relation among data by analyzing task information, preprocessing big data to be managed to extract an index name, taking the index name as a node of a knowledge graph, taking the association relation as a side connected among the nodes to generate a corresponding knowledge graph, and finally managing the big data based on the knowledge graph, namely structuring the caliber (association relation) of each index name in advance, generating and updating the knowledge graph based on unified calibers, so that the situations of non-uniform calibers, non-synonymy of same name and non-synonymy of different name are avoided, the workload of updating the knowledge graph is greatly reduced, the situation of non-uniform calibers in a table is avoided, and the quality of data management is greatly improved.

The foregoing description is only an overview of the technical solutions of the present invention, and the embodiments of the present invention are described below in order to make the technical means of the present invention more clearly understood and to make the above and other objects, features, and advantages of the present invention more clearly understandable.

Drawings

The invention will be further described with reference to the following examples with reference to the accompanying drawings.

FIG. 1 is a flow chart of a method of data management based on a knowledge-graph of the present invention.

FIG. 2 is a schematic diagram of a data management system based on knowledge-graph according to the present invention.

FIG. 3 is a schematic diagram of a data management device based on knowledge-graph according to the present invention.

FIG. 4 is a schematic diagram of a data management medium based on a knowledge-graph according to the present invention.

FIG. 5 is a schematic diagram of the structure of a knowledge-graph of the present invention.

Detailed Description

The embodiment of the application provides a data management method, a system, equipment and a medium based on a knowledge graph, so that the quality of data management is improved.

The technical scheme in the embodiment of the application has the following general idea: the incidence relation among the data is obtained by analyzing the task information, the index name is extracted by preprocessing the big data to be managed, the index name is used as a node of the knowledge graph, the incidence relation is used as a side connected among the nodes, and the corresponding knowledge graph is generated to manage the big data so as to improve the quality of data management.

Example one

The embodiment provides a data management method based on a knowledge graph, as shown in fig. 1 and 5, including the following steps:

step S20, the server acquires big data to be managed and preprocesses the big data; the big data is basic data and atomization data of the business field;

and step S40, managing big data based on the knowledge graph.

The knowledge graph is an information management tool which takes a graph data structure as an information bearing mode and is used for describing the relationship between entities and concepts. The basic units of the knowledge graph are nodes, and more than two nodes are connected with each other by edges to form the graph. Typically, data in a knowledge graph is organized in a mix of (entities, attributes, values), (entities, relationships, entities) and stored as (nodes, edges, nodes) in the graph structure.

The step S10 specifically includes:

the method comprises the steps that a server creates a warehouse table, task information is synchronized through an ETL (data warehouse technology), a metadata management tool is used for analyzing the task information to obtain an incidence relation among data, and the incidence relation is stored in the warehouse table; the incidence relation is a generation relation, a dependency relation and a data category among data and is used for meeting the data requirement of application source tracing. Metadata (Metadata), also called intermediate data and relay data, is data describing data, mainly information describing data attributes, and is used to support functions such as indicating storage locations, history data, resource searching, file recording, and the like.

The data category can be divided according to the domain class supported by the application layer service and can be divided into four classes of a client domain, a product domain, a resource domain and a channel domain; dividing into three categories of individuals, families and guests according to the service types; according to different product types, basic communication products and communication value-added products can be divided.

The step S20 specifically includes:

the method comprises the steps that a server obtains big data to be managed, word segmentation processing is conducted on the big data through a natural language processing technology in a machine learning technology, a plurality of word segments are generated, and index names and index definitions are extracted through the word segments.

The step S30 specifically includes:

Example two

The embodiment provides a data management system based on knowledge graph, as shown in fig. 2 and fig. 5, including the following modules:

the big data preprocessing module is used for acquiring big data to be managed by the server and preprocessing the big data; the big data is basic data and atomization data of the business field;

The incidence relation analysis module specifically comprises:

The big data preprocessing module is specifically as follows:

The knowledge graph generation module specifically comprises:

Based on the same inventive concept, the application provides an electronic device embodiment corresponding to the first embodiment, which is detailed in the third embodiment.

EXAMPLE III

The embodiment provides a data management device based on knowledge graph, as shown in fig. 3, which includes a memory, a processor, and a computer program stored on the memory and executable on the processor, and when the processor executes the computer program, any one of the embodiments may be implemented.

Since the electronic device described in this embodiment is a device used for implementing the method in the first embodiment of the present application, based on the method described in the first embodiment of the present application, a specific implementation of the electronic device in this embodiment and various variations thereof can be understood by those skilled in the art, and therefore, how to implement the method in the first embodiment of the present application by the electronic device is not described in detail herein. The equipment used by those skilled in the art to implement the methods in the embodiments of the present application is within the scope of the present application.

Based on the same inventive concept, the application provides a storage medium corresponding to the fourth embodiment, which is described in detail in the fourth embodiment.

Example four

The embodiment provides a data management medium based on knowledge graph, as shown in fig. 4, on which a computer program is stored, and when the computer program is executed by a processor, any one of the embodiments can be implemented.

As will be appreciated by one skilled in the art, embodiments of the present invention may be provided as a method, system, or computer program product. Accordingly, the present invention may take the form of an entirely hardware embodiment, an entirely software embodiment or an embodiment combining software and hardware aspects. Furthermore, the present invention may take the form of a computer program product embodied on one or more computer-usable storage media (including, but not limited to, disk storage, CD-ROM, optical storage, and the like) having computer-usable program code embodied therein.

The present invention is described with reference to flowchart illustrations and/or block diagrams of methods, apparatus (systems) and computer program products according to embodiments of the invention. It will be understood that each flow and/or block of the flow diagrams and/or block diagrams, and combinations of flows and/or blocks in the flow diagrams and/or block diagrams, can be implemented by computer program instructions. These computer program instructions may be provided to a processor of a general purpose computer, special purpose computer, embedded processor, or other programmable data processing apparatus to produce a machine, such that the instructions, which execute via the processor of the computer or other programmable data processing apparatus, create means for implementing the functions specified in the flowchart flow or flows and/or block diagram block or blocks.

These computer program instructions may also be stored in a computer-readable memory that can direct a computer or other programmable data processing apparatus to function in a particular manner, such that the instructions stored in the computer-readable memory produce an article of manufacture including instruction means which implement the function specified in the flowchart flow or flows and/or block diagram block or blocks.

These computer program instructions may also be loaded onto a computer or other programmable data processing apparatus to cause a series of operational steps to be performed on the computer or other programmable apparatus to produce a computer implemented process such that the instructions which execute on the computer or other programmable apparatus provide steps for implementing the functions specified in the flowchart flow or flows and/or block diagram block or blocks.

Although specific embodiments of the invention have been described above, it will be understood by those skilled in the art that the specific embodiments described are illustrative only and are not limiting upon the scope of the invention, and that equivalent modifications and variations can be made by those skilled in the art without departing from the spirit of the invention, which is to be limited only by the appended claims.

Claims

1. A data management method based on knowledge graph is characterized in that: the method comprises the following steps:

and step S40, managing big data based on the knowledge graph.

2. A knowledge-graph based data management method according to claim 1, wherein: the step S10 specifically includes:

3. A knowledge-graph based data management method according to claim 1, wherein: the step S20 specifically includes:

4. A knowledge-graph based data management method according to claim 3, wherein: the step S30 specifically includes:

5. A data management system based on a knowledge graph, characterized by: the system comprises the following modules:

6. The knowledge-graph based data management system of claim 5, wherein: the incidence relation analysis module specifically comprises:

7. The knowledge-graph based data management system of claim 5, wherein: the big data preprocessing module is specifically as follows:

8. A knowledge-graph based data management system as claimed in claim 7, wherein: the knowledge graph generation module specifically comprises:

9. A knowledge-graph based data management apparatus comprising a memory, a processor and a computer program stored on the memory and executable on the processor, wherein the processor implements the method of any one of claims 1 to 4 when executing the program.

10. A knowledge-graph based data management medium having stored thereon a computer program, characterized in that the program, when being executed by a processor, carries out the method according to any one of claims 1 to 4.