CN112685405A - Data management method, system, equipment and medium based on knowledge graph - Google Patents

Data management method, system, equipment and medium based on knowledge graph Download PDF

Info

Publication number
CN112685405A
CN112685405A CN202011518155.6A CN202011518155A CN112685405A CN 112685405 A CN112685405 A CN 112685405A CN 202011518155 A CN202011518155 A CN 202011518155A CN 112685405 A CN112685405 A CN 112685405A
Authority
CN
China
Prior art keywords
data
incidence relation
knowledge
big data
server
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Pending
Application number
CN202011518155.6A
Other languages
Chinese (zh)
Inventor
陈翔
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Fujia Newland Software Engineering Co ltd
Original Assignee
Fujia Newland Software Engineering Co ltd
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Fujia Newland Software Engineering Co ltd filed Critical Fujia Newland Software Engineering Co ltd
Priority to CN202011518155.6A priority Critical patent/CN112685405A/en
Publication of CN112685405A publication Critical patent/CN112685405A/en
Pending legal-status Critical Current

Links

Images

Landscapes

  • Information Retrieval, Db Structures And Fs Structures Therefor (AREA)

Abstract

The invention provides a data management method, a system, equipment and a medium based on a knowledge graph in the technical field of big data, wherein the method comprises the following steps: step S10, the server creates a warehouse table, analyzes the task information to obtain the incidence relation between the data and stores the incidence relation into the warehouse table; step S20, the server acquires big data to be managed and preprocesses the big data; step S30, the server reads the incidence relation from the warehouse table, and generates a corresponding knowledge map by using the incidence relation and the preprocessed big data; and step S40, managing big data based on the knowledge graph. The invention has the advantages that: the quality of data management is greatly improved.

Description

Data management method, system, equipment and medium based on knowledge graph
Technical Field
The invention relates to the technical field of big data, in particular to a data management method, a system, equipment and a medium based on a knowledge graph.
Background
With the continuous improvement of big data analysis service capability, the conversion of enterprise operation analysis, application modes and architectures is promoted, and a business analysis scene taking dynamic, visual and correlation analysis as a core gradually becomes a main means. Thus, a need arises to manage data metrics for large data.
For the management of data indexes, traditionally, only the management is performed on the level defined by the index specification, and the index relationship and the index caliber are not managed, that is, only the definition of the data indexes is managed, which results in the following disadvantages: the conditions that the diameters of the indexes are not uniform, the same name is not synonymous, and the same name is synonymous are easy to occur, and after the diameter of the data index is changed, the workload of application and adjustment is huge, or the true floor diameter is not changed, and the conditions that the surface and the inside are different frequently occur.
Therefore, how to provide a data management method, system, device and medium based on the knowledge graph to improve the quality of data management becomes a problem to be solved urgently.
Disclosure of Invention
The technical problem to be solved by the invention is to provide a data management method, a system, equipment and a medium based on a knowledge graph, so that the quality of data management is improved.
In a first aspect, the present invention provides a data management method based on a knowledge graph, including the following steps:
step S10, the server creates a warehouse table, analyzes the task information to obtain the incidence relation between the data and stores the incidence relation into the warehouse table;
step S20, the server acquires big data to be managed and preprocesses the big data;
step S30, the server reads the incidence relation from the warehouse table, and generates a corresponding knowledge map by using the incidence relation and the preprocessed big data;
and step S40, managing big data based on the knowledge graph.
Further, the step S10 is specifically:
the server creates a warehouse table, synchronizes task information through ETL, analyzes the task information by using a metadata management tool to obtain an incidence relation between data, and stores the incidence relation into the warehouse table; the incidence relation is a generation relation, a dependency relation and a data category among the data.
Further, the step S20 is specifically:
the method comprises the steps that a server obtains big data to be managed, word segmentation processing is conducted on the big data through a machine learning technology, a plurality of word segments are generated, and index names and index definitions are extracted through the word segments.
Further, the step S30 is specifically:
and the server reads the incidence relation from the warehouse table, takes the index name as a node of the knowledge graph, takes the incidence relation as an edge connected among the nodes, and further generates a corresponding knowledge graph based on the nodes and the edge.
In a second aspect, the present invention provides a data management system based on knowledge-graph, comprising the following modules:
the incidence relation analysis module is used for creating a warehouse table by the server, analyzing the task information to obtain the incidence relation among the data and storing the incidence relation into the warehouse table;
the big data preprocessing module is used for acquiring big data to be managed by the server and preprocessing the big data;
the knowledge map generation module is used for reading the association relation from the warehouse table by the server and generating a corresponding knowledge map by using the association relation and the preprocessed big data;
and the big data management module is used for managing the big data based on the knowledge graph.
Further, the association analysis module specifically includes:
the server creates a warehouse table, synchronizes task information through ETL, analyzes the task information by using a metadata management tool to obtain an incidence relation between data, and stores the incidence relation into the warehouse table; the incidence relation is a generation relation, a dependency relation and a data category among the data.
Further, the big data preprocessing module specifically comprises:
the method comprises the steps that a server obtains big data to be managed, word segmentation processing is conducted on the big data through a machine learning technology, a plurality of word segments are generated, and index names and index definitions are extracted through the word segments.
Further, the knowledge graph generation module specifically comprises:
and the server reads the incidence relation from the warehouse table, takes the index name as a node of the knowledge graph, takes the incidence relation as an edge connected among the nodes, and further generates a corresponding knowledge graph based on the nodes and the edge.
In a third aspect, the present invention provides a knowledge-graph based data management apparatus comprising a memory, a processor and a computer program stored on the memory and executable on the processor, the processor implementing the method of the first aspect when executing the program.
In a fourth aspect, the present invention provides a knowledge-graph based data management medium having stored thereon a computer program which, when executed by a processor, performs the method of the first aspect.
One or more technical solutions provided in the embodiments of the present invention have at least the following technical effects or advantages:
the method comprises the steps of obtaining an association relation among data by analyzing task information, preprocessing big data to be managed to extract an index name, taking the index name as a node of a knowledge graph, taking the association relation as a side connected among the nodes to generate a corresponding knowledge graph, and finally managing the big data based on the knowledge graph, namely structuring the caliber (association relation) of each index name in advance, generating and updating the knowledge graph based on unified calibers, so that the situations of non-uniform calibers, non-synonymy of same name and non-synonymy of different name are avoided, the workload of updating the knowledge graph is greatly reduced, the situation of non-uniform calibers in a table is avoided, and the quality of data management is greatly improved.
The foregoing description is only an overview of the technical solutions of the present invention, and the embodiments of the present invention are described below in order to make the technical means of the present invention more clearly understood and to make the above and other objects, features, and advantages of the present invention more clearly understandable.
Drawings
The invention will be further described with reference to the following examples with reference to the accompanying drawings.
FIG. 1 is a flow chart of a method of data management based on a knowledge-graph of the present invention.
FIG. 2 is a schematic diagram of a data management system based on knowledge-graph according to the present invention.
FIG. 3 is a schematic diagram of a data management device based on knowledge-graph according to the present invention.
FIG. 4 is a schematic diagram of a data management medium based on a knowledge-graph according to the present invention.
FIG. 5 is a schematic diagram of the structure of a knowledge-graph of the present invention.
Detailed Description
The embodiment of the application provides a data management method, a system, equipment and a medium based on a knowledge graph, so that the quality of data management is improved.
The technical scheme in the embodiment of the application has the following general idea: the incidence relation among the data is obtained by analyzing the task information, the index name is extracted by preprocessing the big data to be managed, the index name is used as a node of the knowledge graph, the incidence relation is used as a side connected among the nodes, and the corresponding knowledge graph is generated to manage the big data so as to improve the quality of data management.
Example one
The embodiment provides a data management method based on a knowledge graph, as shown in fig. 1 and 5, including the following steps:
step S10, the server creates a warehouse table, analyzes the task information to obtain the incidence relation between the data and stores the incidence relation into the warehouse table;
step S20, the server acquires big data to be managed and preprocesses the big data; the big data is basic data and atomization data of the business field;
step S30, the server reads the incidence relation from the warehouse table, and generates a corresponding knowledge map by using the incidence relation and the preprocessed big data;
and step S40, managing big data based on the knowledge graph.
The knowledge graph is an information management tool which takes a graph data structure as an information bearing mode and is used for describing the relationship between entities and concepts. The basic units of the knowledge graph are nodes, and more than two nodes are connected with each other by edges to form the graph. Typically, data in a knowledge graph is organized in a mix of (entities, attributes, values), (entities, relationships, entities) and stored as (nodes, edges, nodes) in the graph structure.
The step S10 specifically includes:
the method comprises the steps that a server creates a warehouse table, task information is synchronized through an ETL (data warehouse technology), a metadata management tool is used for analyzing the task information to obtain an incidence relation among data, and the incidence relation is stored in the warehouse table; the incidence relation is a generation relation, a dependency relation and a data category among data and is used for meeting the data requirement of application source tracing. Metadata (Metadata), also called intermediate data and relay data, is data describing data, mainly information describing data attributes, and is used to support functions such as indicating storage locations, history data, resource searching, file recording, and the like.
The data category can be divided according to the domain class supported by the application layer service and can be divided into four classes of a client domain, a product domain, a resource domain and a channel domain; dividing into three categories of individuals, families and guests according to the service types; according to different product types, basic communication products and communication value-added products can be divided.
The step S20 specifically includes:
the method comprises the steps that a server obtains big data to be managed, word segmentation processing is conducted on the big data through a natural language processing technology in a machine learning technology, a plurality of word segments are generated, and index names and index definitions are extracted through the word segments.
The step S30 specifically includes:
and the server reads the incidence relation from the warehouse table, takes the index name as a node of the knowledge graph, takes the incidence relation as an edge connected among the nodes, and further generates a corresponding knowledge graph based on the nodes and the edge.
Example two
The embodiment provides a data management system based on knowledge graph, as shown in fig. 2 and fig. 5, including the following modules:
the incidence relation analysis module is used for creating a warehouse table by the server, analyzing the task information to obtain the incidence relation among the data and storing the incidence relation into the warehouse table;
the big data preprocessing module is used for acquiring big data to be managed by the server and preprocessing the big data; the big data is basic data and atomization data of the business field;
the knowledge map generation module is used for reading the association relation from the warehouse table by the server and generating a corresponding knowledge map by using the association relation and the preprocessed big data;
and the big data management module is used for managing the big data based on the knowledge graph.
The knowledge graph is an information management tool which takes a graph data structure as an information bearing mode and is used for describing the relationship between entities and concepts. The basic units of the knowledge graph are nodes, and more than two nodes are connected with each other by edges to form the graph. Typically, data in a knowledge graph is organized in a mix of (entities, attributes, values), (entities, relationships, entities) and stored as (nodes, edges, nodes) in the graph structure.
The incidence relation analysis module specifically comprises:
the method comprises the steps that a server creates a warehouse table, task information is synchronized through an ETL (data warehouse technology), a metadata management tool is used for analyzing the task information to obtain an incidence relation among data, and the incidence relation is stored in the warehouse table; the incidence relation is a generation relation, a dependency relation and a data category among data and is used for meeting the data requirement of application source tracing. Metadata (Metadata), also called intermediate data and relay data, is data describing data, mainly information describing data attributes, and is used to support functions such as indicating storage locations, history data, resource searching, file recording, and the like.
The data category can be divided according to the domain class supported by the application layer service and can be divided into four classes of a client domain, a product domain, a resource domain and a channel domain; dividing into three categories of individuals, families and guests according to the service types; according to different product types, basic communication products and communication value-added products can be divided.
The big data preprocessing module is specifically as follows:
the method comprises the steps that a server obtains big data to be managed, word segmentation processing is conducted on the big data through a natural language processing technology in a machine learning technology, a plurality of word segments are generated, and index names and index definitions are extracted through the word segments.
The knowledge graph generation module specifically comprises:
and the server reads the incidence relation from the warehouse table, takes the index name as a node of the knowledge graph, takes the incidence relation as an edge connected among the nodes, and further generates a corresponding knowledge graph based on the nodes and the edge.
Based on the same inventive concept, the application provides an electronic device embodiment corresponding to the first embodiment, which is detailed in the third embodiment.
EXAMPLE III
The embodiment provides a data management device based on knowledge graph, as shown in fig. 3, which includes a memory, a processor, and a computer program stored on the memory and executable on the processor, and when the processor executes the computer program, any one of the embodiments may be implemented.
Since the electronic device described in this embodiment is a device used for implementing the method in the first embodiment of the present application, based on the method described in the first embodiment of the present application, a specific implementation of the electronic device in this embodiment and various variations thereof can be understood by those skilled in the art, and therefore, how to implement the method in the first embodiment of the present application by the electronic device is not described in detail herein. The equipment used by those skilled in the art to implement the methods in the embodiments of the present application is within the scope of the present application.
Based on the same inventive concept, the application provides a storage medium corresponding to the fourth embodiment, which is described in detail in the fourth embodiment.
Example four
The embodiment provides a data management medium based on knowledge graph, as shown in fig. 4, on which a computer program is stored, and when the computer program is executed by a processor, any one of the embodiments can be implemented.
One or more technical solutions provided in the embodiments of the present invention have at least the following technical effects or advantages:
the method comprises the steps of obtaining an association relation among data by analyzing task information, preprocessing big data to be managed to extract an index name, taking the index name as a node of a knowledge graph, taking the association relation as a side connected among the nodes to generate a corresponding knowledge graph, and finally managing the big data based on the knowledge graph, namely structuring the caliber (association relation) of each index name in advance, generating and updating the knowledge graph based on unified calibers, so that the situations of non-uniform calibers, non-synonymy of same name and non-synonymy of different name are avoided, the workload of updating the knowledge graph is greatly reduced, the situation of non-uniform calibers in a table is avoided, and the quality of data management is greatly improved.
As will be appreciated by one skilled in the art, embodiments of the present invention may be provided as a method, system, or computer program product. Accordingly, the present invention may take the form of an entirely hardware embodiment, an entirely software embodiment or an embodiment combining software and hardware aspects. Furthermore, the present invention may take the form of a computer program product embodied on one or more computer-usable storage media (including, but not limited to, disk storage, CD-ROM, optical storage, and the like) having computer-usable program code embodied therein.
The present invention is described with reference to flowchart illustrations and/or block diagrams of methods, apparatus (systems) and computer program products according to embodiments of the invention. It will be understood that each flow and/or block of the flow diagrams and/or block diagrams, and combinations of flows and/or blocks in the flow diagrams and/or block diagrams, can be implemented by computer program instructions. These computer program instructions may be provided to a processor of a general purpose computer, special purpose computer, embedded processor, or other programmable data processing apparatus to produce a machine, such that the instructions, which execute via the processor of the computer or other programmable data processing apparatus, create means for implementing the functions specified in the flowchart flow or flows and/or block diagram block or blocks.
These computer program instructions may also be stored in a computer-readable memory that can direct a computer or other programmable data processing apparatus to function in a particular manner, such that the instructions stored in the computer-readable memory produce an article of manufacture including instruction means which implement the function specified in the flowchart flow or flows and/or block diagram block or blocks.
These computer program instructions may also be loaded onto a computer or other programmable data processing apparatus to cause a series of operational steps to be performed on the computer or other programmable apparatus to produce a computer implemented process such that the instructions which execute on the computer or other programmable apparatus provide steps for implementing the functions specified in the flowchart flow or flows and/or block diagram block or blocks.
Although specific embodiments of the invention have been described above, it will be understood by those skilled in the art that the specific embodiments described are illustrative only and are not limiting upon the scope of the invention, and that equivalent modifications and variations can be made by those skilled in the art without departing from the spirit of the invention, which is to be limited only by the appended claims.

Claims (10)

1. A data management method based on knowledge graph is characterized in that: the method comprises the following steps:
step S10, the server creates a warehouse table, analyzes the task information to obtain the incidence relation between the data and stores the incidence relation into the warehouse table;
step S20, the server acquires big data to be managed and preprocesses the big data;
step S30, the server reads the incidence relation from the warehouse table, and generates a corresponding knowledge map by using the incidence relation and the preprocessed big data;
and step S40, managing big data based on the knowledge graph.
2. A knowledge-graph based data management method according to claim 1, wherein: the step S10 specifically includes:
the server creates a warehouse table, synchronizes task information through ETL, analyzes the task information by using a metadata management tool to obtain an incidence relation between data, and stores the incidence relation into the warehouse table; the incidence relation is a generation relation, a dependency relation and a data category among the data.
3. A knowledge-graph based data management method according to claim 1, wherein: the step S20 specifically includes:
the method comprises the steps that a server obtains big data to be managed, word segmentation processing is conducted on the big data through a machine learning technology, a plurality of word segments are generated, and index names and index definitions are extracted through the word segments.
4. A knowledge-graph based data management method according to claim 3, wherein: the step S30 specifically includes:
and the server reads the incidence relation from the warehouse table, takes the index name as a node of the knowledge graph, takes the incidence relation as an edge connected among the nodes, and further generates a corresponding knowledge graph based on the nodes and the edge.
5. A data management system based on a knowledge graph, characterized by: the system comprises the following modules:
the incidence relation analysis module is used for creating a warehouse table by the server, analyzing the task information to obtain the incidence relation among the data and storing the incidence relation into the warehouse table;
the big data preprocessing module is used for acquiring big data to be managed by the server and preprocessing the big data;
the knowledge map generation module is used for reading the association relation from the warehouse table by the server and generating a corresponding knowledge map by using the association relation and the preprocessed big data;
and the big data management module is used for managing the big data based on the knowledge graph.
6. The knowledge-graph based data management system of claim 5, wherein: the incidence relation analysis module specifically comprises:
the server creates a warehouse table, synchronizes task information through ETL, analyzes the task information by using a metadata management tool to obtain an incidence relation between data, and stores the incidence relation into the warehouse table; the incidence relation is a generation relation, a dependency relation and a data category among the data.
7. The knowledge-graph based data management system of claim 5, wherein: the big data preprocessing module is specifically as follows:
the method comprises the steps that a server obtains big data to be managed, word segmentation processing is conducted on the big data through a machine learning technology, a plurality of word segments are generated, and index names and index definitions are extracted through the word segments.
8. A knowledge-graph based data management system as claimed in claim 7, wherein: the knowledge graph generation module specifically comprises:
and the server reads the incidence relation from the warehouse table, takes the index name as a node of the knowledge graph, takes the incidence relation as an edge connected among the nodes, and further generates a corresponding knowledge graph based on the nodes and the edge.
9. A knowledge-graph based data management apparatus comprising a memory, a processor and a computer program stored on the memory and executable on the processor, wherein the processor implements the method of any one of claims 1 to 4 when executing the program.
10. A knowledge-graph based data management medium having stored thereon a computer program, characterized in that the program, when being executed by a processor, carries out the method according to any one of claims 1 to 4.
CN202011518155.6A 2020-12-21 2020-12-21 Data management method, system, equipment and medium based on knowledge graph Pending CN112685405A (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN202011518155.6A CN112685405A (en) 2020-12-21 2020-12-21 Data management method, system, equipment and medium based on knowledge graph

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN202011518155.6A CN112685405A (en) 2020-12-21 2020-12-21 Data management method, system, equipment and medium based on knowledge graph

Publications (1)

Publication Number Publication Date
CN112685405A true CN112685405A (en) 2021-04-20

Family

ID=75449745

Family Applications (1)

Application Number Title Priority Date Filing Date
CN202011518155.6A Pending CN112685405A (en) 2020-12-21 2020-12-21 Data management method, system, equipment and medium based on knowledge graph

Country Status (1)

Country Link
CN (1) CN112685405A (en)

Citations (6)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
WO2018036239A1 (en) * 2016-08-24 2018-03-01 慧科讯业有限公司 Method, apparatus and system for monitoring internet media events based on industry knowledge mapping database
CN108197182A (en) * 2017-12-25 2018-06-22 百味云科技股份有限公司 A kind of data atlas analysis system and method
CN108446367A (en) * 2018-03-15 2018-08-24 湖南工业大学 A kind of the packaging industry data search method and equipment of knowledge based collection of illustrative plates
CN109670048A (en) * 2018-11-19 2019-04-23 平安科技(深圳)有限公司 Map construction method, apparatus and computer equipment based on air control management
CN110457482A (en) * 2019-06-06 2019-11-15 福建奇点时空数字科技有限公司 A kind of intelligent information service system of knowledge based map
CN111897808A (en) * 2020-07-15 2020-11-06 苏宁金融科技(南京)有限公司 Data processing method and device, computer equipment and storage medium

Patent Citations (6)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
WO2018036239A1 (en) * 2016-08-24 2018-03-01 慧科讯业有限公司 Method, apparatus and system for monitoring internet media events based on industry knowledge mapping database
CN108197182A (en) * 2017-12-25 2018-06-22 百味云科技股份有限公司 A kind of data atlas analysis system and method
CN108446367A (en) * 2018-03-15 2018-08-24 湖南工业大学 A kind of the packaging industry data search method and equipment of knowledge based collection of illustrative plates
CN109670048A (en) * 2018-11-19 2019-04-23 平安科技(深圳)有限公司 Map construction method, apparatus and computer equipment based on air control management
CN110457482A (en) * 2019-06-06 2019-11-15 福建奇点时空数字科技有限公司 A kind of intelligent information service system of knowledge based map
CN111897808A (en) * 2020-07-15 2020-11-06 苏宁金融科技(南京)有限公司 Data processing method and device, computer equipment and storage medium

Similar Documents

Publication Publication Date Title
US8719299B2 (en) Systems and methods for extraction of concepts for reuse-based schema matching
US8250532B2 (en) Efficient development of configurable software systems in a large software development community
CN111339171B (en) Data query method, device and equipment
CN108540351B (en) Automatic testing method for distributed big data service
CN112416923A (en) Metadata management method and device, equipment and storage medium
CN111177244A (en) Data association analysis method for multiple heterogeneous databases
CN112925757A (en) Method, equipment and storage medium for tracking operation log of intelligent equipment
CN110781542A (en) BIM model data processing method and device
CN116662441A (en) Distributed data blood margin construction and display method
CN106682210B (en) Log file query method and device
CN111143390A (en) Method and device for updating metadata
CN114398315A (en) Data storage method, system, storage medium and electronic equipment
US10169725B2 (en) Change-request analysis
CN107239568B (en) Distributed index implementation method and device
US20090112704A1 (en) Management tool for efficient allocation of skills and resources
CN112685405A (en) Data management method, system, equipment and medium based on knowledge graph
CN113590651B (en) HQL-based cross-cluster data processing system and method
CN111563123B (en) Real-time synchronization method for hive warehouse metadata
US8566814B2 (en) Transporting object packets in a nested system landscape
CN111143356B (en) Report retrieval method and device
CN114281461A (en) Management method, equipment and medium for configurable data source attributes
US8949787B2 (en) Locating isolation points in an application under multi-tenant environment
CN110609926A (en) Data tag storage management method and device
KR101638048B1 (en) Sql query processing method using mapreduce
CN116303469B (en) Comprehensive and rapid looking-up analysis data warehouse management method and device

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination
RJ01 Rejection of invention patent application after publication

Application publication date: 20210420

RJ01 Rejection of invention patent application after publication