CN112699175A

CN112699175A - Data management system and method thereof

Info

Publication number: CN112699175A
Application number: CN202110057150.6A
Authority: CN
Inventors: 黄晓雄; 赖伟; 李跃华; 郑博洪; 陈军; 虎清军; 陈文强; 刘铭; 吴杰; 张有为; 甘勇航; 李华
Original assignee: Guangzhou Teligen Communication Technology Co ltd
Current assignee: Guangzhou Teligen Communication Technology Co ltd
Priority date: 2021-01-15
Filing date: 2021-01-15
Publication date: 2021-04-23
Anticipated expiration: 2041-01-15
Also published as: CN112699175B

Abstract

The application discloses a data management system and a method thereof, wherein the data access module is used for reading multi-source heterogeneous data, accessing the multi-source heterogeneous data subjected to data exploration and data definition into a big data center, and performing data reconciliation; the data processing module is used for extracting data, converting the extracted data into a required format, correcting or eliminating abnormal data and distributing the data to a corresponding data warehouse; the data organization module is used for storing the distributed data in corresponding libraries in a classified manner to obtain various types of metadata; the data management module is used for performing directory integration and hierarchical classification on the metadata, determining the blood relationship and quality of the metadata, performing data operation and maintenance, and using and serving the metadata; the data service unit is used for providing data to the user. The method solves the technical problems that the data quality monitoring has no unified data quality evaluation standard and management standard, can not realize all-around quality monitoring and lacks a closed-loop processing mechanism for the data quality problem.

Description

Data management system and method thereof

Technical Field

The application relates to the technical field of data management, in particular to a data management system and a method thereof.

Background

Data is used as the foundation of system construction and is the core asset of a client, and a complete data supervision scheme is urgently needed. The whole management process of the data comprises access, standardization, warehousing, monitoring, control and the like.

The data asset management platform needs to have three capabilities: firstly, data resource integration capacity is required, ETL (Extract-Transform-Load) is an important ring for constructing a data warehouse, so that a user can complete the whole process of data access extraction, interactive conversion standardization and loading and warehousing to a target warehouse based on the platform; secondly, the data quality monitoring capability is required, so that a data asset manager can clearly master the overall situation of the data asset, can clearly know the incoming and outgoing vessel situation of each data resource, and can know the abnormal situation of the data in the first time; and finally, the data use control capability is required, and a data asset manager can classify the data in a grading way and control the data use permission in a fine-grained way.

The data integration is based on the data definition of a data access link, aims at the characteristics of large scale, various types, high-speed circulation, complexity, variability, uneven quality and different value density of big data, takes data application as guidance, improves the value density of the data through standardized processing, and realizes data value increment, data preparation and data abstraction for data intelligent application.

The data management is planning design, process control and quality supervision of the whole life cycle of the data resources, and through the standardized data management, the data resources can be transparent, manageable and controllable, data assets can be cleared, data standard landing can be perfected, the data processing flow can be standardized, the data quality can be improved, the safe use of the data can be guaranteed, and the data circulation and value extraction can be promoted.

The access adaptation degree of the existing data integration ETL tool to various data storage modes is mostly limited to one set of large data storage clusters, plug-in or script modes are mostly used, the interface is unfriendly, the processing mode is too simple, the requirement threshold for a user is high, the supervision is low, the cost is high, and the expandability is low.

The current data asset management scheme lacks a unified standard and normative management method, has a very fuzzy definition boundary on the data asset and a physical table, and cannot meet resource, catalogue and standardized asset management.

At present, a data security scheme is single, and can only be controlled to a table level and a field level, so that finer-grained combined security control such as field classification level, record classification level, desensitization management, red list management and the like is lacked.

At present, in the aspect of industrial data management, data quality monitoring has no unified data quality evaluation standard and management standard, and comprehensive quality monitoring cannot be achieved, and a closed-loop processing mechanism is lacked for data quality problems.

In the aspect of data application, the strong coupling of service application and data storage enables the modification of each service logic to directly influence data reading; the same data reading process and different business applications also need to repeatedly develop the data reading; an application system is developed, and business logic and database technology need to be understood; the organization and management of data resources can be time consuming.

Disclosure of Invention

The application provides a data management system and a method thereof, which solve the technical problems that data quality monitoring does not have uniform data quality evaluation standard and management standard, omnibearing quality monitoring cannot be achieved, and a closed-loop processing mechanism is lacked for data quality problems.

In view of the above, a first aspect of the present application provides a data governance system, the system comprising:

the system comprises a data access module, a data processing module, a data organization module, a data service module and a data management module;

the data access module is used for performing data reading, data exploration, data definition and data conversion on multi-source heterogeneous data; accessing the data exploration and the multi-source heterogeneous data after the data definition to a big data center, and performing data reconciliation on the data after the data definition and the data of a data provider;

the data processing module is used for extracting data from a data source, converting the extracted data into a required format, correcting or eliminating abnormal data and distributing the data to a corresponding data warehouse;

the data organization module is used for storing the distributed data to an original library or a resource library or a subject library or a business library or a knowledge base or a business element index library in a classified manner to obtain various types of metadata;

the data management module is used for performing directory integration and hierarchical classification on the metadata, determining the blood relationship of the metadata, determining the quality of the metadata, performing data operation and maintenance, and using and serving the metadata;

the data service unit is used for providing data to different systems and users.

Optionally, the data access module includes:

the data probing module is used for probing the service meaning, the data structure, the field format, the value range, the statistical distribution and the data quality of the data to obtain a data probing result;

the data definition module is used for defining data organization, registering a data resource catalog, defining data classification, defining data blood relationship, defining data quality detection rules, defining statistical strategies, defining data processing rules and defining data use rules according to the probing result;

the data reading module is used for reading data after data definition and checking whether the data after the data definition has conflict in information meaning; carrying out data reconciliation on the data defined by the data and the data of the data provider;

and the data conversion module is used for carrying out data decryption on the data, decompressing the data, recording the data ID, generating a data bill and providing data support for the data processing module.

Optionally, the data processing module includes:

the data extraction module is used for extracting data from the source format data;

the data cleaning module is used for generating data meeting preset standards and quality requirements;

the data association module is used for associating the data with other knowledge data and service data and outputting associated information;

the data comparison module is used for carrying out the same comparison or similarity calculation on the structured data and the unstructured data and outputting data meeting preset rules;

the data identification module is used for carrying out comparison analysis and model calculation on the data by using a tag engine based on a tag knowledge base, and marking a tag on the data to provide support for upper-layer application;

and the data distribution module is used for distributing the data to the corresponding data warehouse according to different application scenes and a preset distribution strategy.

Optionally, the data organization module includes an original library, a resource library, a subject library, a service library, a knowledge library and a service element index library;

the original library is used for storing original data and reflecting a data set of an original service scene; processing the source data to obtain standardized data, associated element information, label information and data classification information;

the resource library is used for integrating key elements established by various data resources and public data sets of association and relation among the elements; the elements are identification attributes of the data, including a citizen identity number, a license plate number, a mobile phone number and an MAC;

the theme library is used for storing theme objects capable of identifying people, places, cases, events, objects and organizations and comprises a personnel theme library, a place theme library, an object theme library, a case theme library, an event theme library, an information theme library and an organization theme library;

the business library is used for storing business data of each professional field, recording business processes and providing data support for business activities;

the knowledge base is used for storing knowledge data and rule method sets shared by public security fields;

the business element index library is used for storing a global index established by key elements of the business library so as to solve the problems of business association and business conflict.

Optionally, the data service module includes:

the query retrieval service module is used for providing a query interface for querying data resources for a user;

the model analysis service module is used for carrying out statistics, analysis and prediction on data by using an analysis model according to the requirements of a business scene to obtain an analysis result, so that the analysis result meets the requirements of the business scene;

the data pushing service module is used for collecting the data resources from the lower data centers to the corresponding upper data centers according to the needs and sending the data resources from the upper data centers to the corresponding lower data centers according to the needs;

the data authentication service module is used for identifying the access authority of the data according to the preset access control rule of the data;

and the data operation service module is used for providing operation interface services for increasing, modifying and deleting data.

Optionally, the data service module further includes:

the data security module is used for classifying data resources from a plurality of aspects, wherein the plurality of aspects comprise a data acquisition mode, a data type and a data field; grading the data resources according to the preset content sensitivity degrees of different data resources;

and the data management service module is used for carrying out interface packaging on the data management related service capacity according to the requirement and providing service for other application systems and other subsystems in the platform.

Optionally, the data service module further includes:

the data service customizing module is used for providing customized data service for a user, and specifically comprises the following steps:

s1, selecting a data table in any data warehouse to perform data service customization;

s2, if customizing the resource retrieval service, registering the data table as a resource, and mounting the registered resource to a data resource catalog;

s3, issuing the registered resources to enable the user to access externally;

s4, configuring the access authority according to the classification and grading result of the registered resources;

s5, selecting the issued resource with access authority to perform retrieval service deployment, so that the user can perform resource retrieval service customization according to the condition column and the result column of the resource acquired by the user;

s6, if the SQL query service is customized, selecting any data table in the data warehouse, compiling standard SQL, and customizing a condition column and a result column on the standard SQL, namely finishing the SQL query service deployment;

s7, issuing the SQL query service to enable the user to access the exterior;

and S8, performing access control on the authority, the flow, the frequency and the bandwidth of the resource retrieval service and the SQL query service.

Optionally, the data governance module includes:

the data operation and maintenance management module is used for carrying out early warning and disposal on abnormal states by acquiring state information of data access, processing, organization and service, so as to realize real-time monitoring and management on each task;

the data quality management module is used for timely discovering, positioning, monitoring and tracking various data quality problems by establishing a data quality evaluation standard and a management standard, and forming closed-loop processing of the data quality problems;

the model management module is used for managing the whole life cycle of the model;

the label management module is used for managing the whole life cycle of the label;

the data blood margin management module is used for tracking the source of the data and tracking the processing process of the data;

the data classification module is used for providing support for formulating an opening and sharing strategy of data resources by describing multi-dimensional characteristics and content sensitivity of data; the data resource directory module is used for supporting metadata management, and the metadata management comprises technical metadata, management metadata and service metadata; the technical metadata comprise data source information, a data structure, data blood relationship and influence analysis, a data cycle, a data history change condition and a data volume condition; the management metadata comprises metadata obtained after data classification and classification, and the service metadata comprises a data directory name, a data resource description, a data resource event right unit and a data resource management unit.

The data resource directory module is used for supporting metadata management, and the metadata management comprises technical metadata, management metadata and service metadata; the technical metadata comprise data source information, a data structure, data blood relationship and influence analysis, a data cycle, a data history change condition and a data volume condition; the management metadata comprises metadata obtained after data classification and classification, and the service metadata comprises a data directory name, a data resource description, a data resource event right unit and a data resource management unit.

A second aspect of the present application provides a data governance method, the method comprising:

acquiring a data access mode, a data updating period and a data storage period;

registering a data source;

formulating a data set standard, a data item standard, a data element standard, a qualifier standard and a named entity standard for the data source according to the data access mode, the data updating period and the data storage period;

performing data exploration on the access mode, the data scale, the business meaning, the data set table and the field of the data source, wherein the data exploration on the field comprises the blank value condition, the standard condition, the value range condition and the problem data condition of the field;

defining an ETL standardization processing process according to a data exploration result, wherein the data exploration result comprises a null value condition, a standard condition, a value domain condition and a problem data condition of a field;

judging whether operators meeting the requirements exist in the ETL or not;

if the ETL does not have operators meeting the requirements, customizing the operators meeting the requirements through tool operators and scalar operators built in the system;

if operators meeting the requirements exist in the ETL, cleaning and converting data by adopting the ETL through defining an ETL standardization processing process, and distributing the data after the data cleaning and the data conversion into a library in a data organization module;

after the data is distributed and put in storage, resource registration is carried out on the standardized data, and metadata of the data are enriched, wherein the metadata comprise technical metadata, management metadata and business metadata.

A third aspect of the present application provides a data administration device, the device comprising a processor and a memory:

the memory is used for storing program codes and transmitting the program codes to the processor;

the processor is configured to execute the steps of the data governance method according to the instructions in the program code.

According to the technical scheme, the method has the following advantages:

in this application, a data governance system is provided, including: the data access module is used for performing data reading, data exploration, data definition and data conversion on multi-source heterogeneous data; accessing multi-source heterogeneous data subjected to data exploration and data definition to a big data center, and performing data reconciliation on the data subjected to data definition and data of a data provider; the data processing module is used for extracting data from a data source, converting the extracted data into a required format, correcting or eliminating abnormal data and distributing the data to a corresponding data warehouse; the data organization module is used for storing the distributed data to an original library or a resource library or a subject library or a business library or a knowledge base or a business element index library in a classified manner to obtain various types of metadata; the data management module is used for performing directory integration and hierarchical classification on the metadata, determining the blood relationship of the metadata, determining the quality of the metadata, performing data operation and maintenance and using and serving the metadata; the data service unit is used for providing data to different systems and users.

The data are distributed to various databases, and the data warehouse is more diversified, so that various large data clusters and commonly used relational databases in the industry and unstructured data storage are supported; the data management module is used for performing directory integration and hierarchical classification on the metadata, determining the blood relationship of the metadata, determining the quality of the metadata, operating and maintaining the data and using and serving the metadata, so that the data resource management is more unified, the management metadata and the service metadata are provided besides the technical metadata, the blood relationship analysis and the influence analysis are also supported, and the resource, directory and standardized asset management is met.

Drawings

FIG. 1 is a system architecture diagram of one embodiment of a data governance system of the present application;

FIG. 2 is a schematic diagram of the operation of a data access module and a data processing module in an embodiment of a data administration system of the present application;

FIG. 3 is a method flow diagram of one embodiment of a data governance method of the present application;

FIG. 4 is a schematic diagram of a data resource management architecture in an embodiment of a data governance system according to the present application;

FIG. 5 is a schematic diagram of a data services management architecture in an embodiment of a data governance system.

Detailed Description

In order to make the technical solutions of the present application better understood, the technical solutions in the embodiments of the present application will be clearly and completely described below with reference to the drawings in the embodiments of the present application, and it is obvious that the described embodiments are only a part of the embodiments of the present application, and not all of the embodiments. All other embodiments, which can be derived by a person skilled in the art from the embodiments given herein without making any creative effort, shall fall within the protection scope of the present application.

Referring to fig. 1, fig. 1 is a system architecture diagram of an embodiment of a data governance system according to the present application, as shown in fig. 1, where fig. 1 includes:

the system comprises a data access module 101, a data processing module 102, a data organization module 103, a data governance module 104 and a data service module 105;

the data access module 101 is used for performing data reading, data exploration, data definition and data conversion on multi-source heterogeneous data; accessing multi-source heterogeneous data subjected to data exploration and data definition to a big data center, and performing data reconciliation on the data subjected to data definition and data of a data provider;

defining processes, methods and circulation mechanisms of links such as data reading, data processing, data management, data organization and data service; accessing the read multi-source heterogeneous data into a big data center according to data exploration and data definition, and finishing data reconciliation with a data provider after data reading;

it should be noted that the data access module 101 may be configured to define a flow, a method, and a circulation mechanism of data reading, data processing, data management, data organization, and data service; accessing the read multi-source heterogeneous data into a big data center according to data exploration and data definition, and finishing data reconciliation with a data provider after data reading; the multi-source heterogeneous data can refer to the schematic diagram of the multi-source heterogeneous data in fig. 2, and include country-level, province-level, city-level and district-level data which are divided according to multiple sources-longitudinally; acquiring data, management data and the like according to multisource-transversely divided social industries, government departments, the Internet of things, online data, offline data, domestic data, overseas data and means; structured, semi-structured, unstructured datasets, text, picture, audio and video data divided according to a heterogeneous-data format; a network file system, a distributed file system, a relational database, and a message bus divided according to a heterogeneous-storage structure. Where a data provider refers to a mechanism or program that provides data.

Specifically, the data access module 101 further includes a data probing module 1011, a data defining module 1012, a data reading module 1013, and a data converting module 1014;

the data probing module 1011 is configured to probe the service meaning, the data structure, the field format, the value range, the statistical distribution, and the data quality of the data, so as to obtain a data probing result.

It should be noted that the data exploration module 1011 may be configured to explore the service meaning, data structure, field format, value range, statistical distribution, data quality, and the like of the read data, so that the system identifies the content of the data in multiple dimensions, and provides a basis for data processing.

A data definition module 1012 for defining data organization, registering data resource catalog, defining data classification, defining data blood relationship, defining data quality detection rule, defining statistical strategy, defining data processing rule and defining data usage rule according to the probing result.

It should be noted that the data definition module 1012 may be configured to define the procedure of the ETL standardization process according to the result of the data exploration, i.e., the specification condition of the data and the problem data condition in the data. The method specifically comprises the steps of defining data organization, registering a data resource catalog, defining data classification, defining data consanguinity, defining data quality detection rules, defining statistical strategies, defining data processing rules and defining data use rules.

A data reading module 1013, configured to read data after data definition, and check whether the data after data definition has a conflict in information meaning; and carrying out data reconciliation on the data after the data definition and the data of the data provider.

It should be noted that the data reading module may be configured to read data defined by the data; checking whether the value of the data has conflict in information meaning, namely whether the data is consistent; and performing data reconciliation on the read data and the data of the data provider.

And the data conversion module 1014 is used for carrying out data decryption on the data, decompressing the data, recording the data ID, generating a data bill and providing data support for the data processing module.

It should be noted that the data conversion module is used for performing data decryption and data decompression on data, recording a data ID, generating a data bill, and providing data support for data processing.

The data processing module 102 is configured to extract data from a data source, convert the extracted data into a desired format, correct or clear anomalous data, and distribute the data to a corresponding data warehouse.

It should be noted that the data processing module 102 may, according to the data definition of the data access link, use data application as a guide for large data characteristics, such as large scale, various types, high-speed circulation, complexity, variability, uneven quality, and different values and densities, and promote data value and density through standardized processing, so as to realize data value increase, data preparation, and data abstraction for data intelligent application. The Data processing combines the capabilities of Data Exploration Data expansion, a Data warehouse DW, a Data acquisition tool ETL and the like, provides functions of Data extraction, Data cleaning, Data association, Data comparison, Data identification, Data distribution and the like, and realizes the processes of Data extraction (the Data extraction is a process of extracting Data from a Data source) (Extract), conversion (the Data conversion is a process of changing Data from one expression form to another expression form) (Transform), cleaning (the Data cleaning refers to a last program for finding and correcting recognizable errors in the Data, and comprises the steps of checking Data consistency, processing invalid values and missing values and the like) (cleaning), and Loading (the Data Loading refers to a process of storing the converted Data into the Data warehouse) (Loading).

In a particular embodiment of the method of the present invention,

the data extraction module is used for extracting data from the source format data; the data cleaning module is used for generating data meeting the standard and quality requirements;

the data association module is used for associating the data with other knowledge data, service data and the like and outputting associated information;

the data comparison module is used for carrying out same comparison or similarity calculation on the structured data and the unstructured data and outputting data of a hit rule;

the data identification module is used for carrying out comparison analysis and model calculation on the data by utilizing a tag engine based on a tag knowledge base, and labeling the data to provide support for upper-layer application;

and the data distribution module distributes the data to the corresponding data warehouse according to different application scenes and distribution strategies.

Specifically, the data processing module in the application adopts a data processing tool ETL and adopts a D3 technology, so that the configuration intelligent process is quantitative and controllable. Through an ETL configuration tool, a user can quickly construct a flow chart of data processing, and the quantization controllability of the whole process of the data processing is realized. The functions of the visual ETL tool comprise user-defined graphical processing flow, free dragging link nodes, graphical layout self-adaption, operator configuration simplification, real-time display of processed data quantity, real-time monitoring of warehouse states, real-time monitoring of operation states and the like. The system also provides rich warehouse reading capability, supports a relational database commonly used in the industry, big data Hive, HBase, ES, Kafka and HDFS clusters of all big manufacturers, and unstructured and structured files such as text files and compressed packages; the method also provides rich operator capability, covers various commonly used operators such as character classes, date classes, service classes, dictionary association classes, ID classes, coding classes, numerical classes, mapping classes, high-order classes and the like, provides powerful support for different application scenes, and enables the data processing process to land on the ground more quickly and effectively. However, the demand of the richer operators on the industry change survey can be unsatisfied, and at this time, very high requirements are required on the operator expansibility of the ETL tool. The excellent operator hierarchical architecture of the ETL tool provides common tool operators and scalar operators, and provides a foundation for high expansibility customized by operators; and allowing a user to customize operators in a kotlin language or script mode on the basis of the operators and register the operators into the ETL tool for use.

The data organization module 103 is configured to store the distributed data into an original library, a resource library, a subject library, a business library, a knowledge base, or a business element index library in a classified manner, so as to obtain multiple types of metadata.

It should be noted that the data organization module 103 includes a source library, a resource library, a subject library, a business library, a knowledge library, and a business element index library. May be used to store various types of metadata.

In a specific embodiment, a part of the original library stores original data, which can reflect a data set of an original service scene; on the basis, standardized data, associated element information, label information and data classification information generated after a series of processing is carried out on various source data are supplemented;

the resource library is used for integrating key elements (various identification attributes such as citizen identity numbers, license plate numbers, mobile phone numbers, MAC (media access control) and the like) established by various data resources and public data sets of association and relationship among the elements; the system mainly comprises an element association library, an element relation library, an element key behavior library, an element key content library and an element distribution library;

the theme library is used for storing theme objects capable of identifying people, places, cases, events, objects, organizations and the like, and comprises a personnel theme library, a place theme library, an article theme library, a case theme library, an event theme library, an information theme library, an organization theme library and the like;

the service library is a database of services in each professional field, supports data of the services in each professional field, records service processes, provides data support for activities of each service and the like;

the knowledge base is used for storing knowledge data and rule method sets shared in the public security field, and comprises knowledge data required by data access, processing, administration, organization and service, various rules, methods and process sets, and knowledge data and general algorithms required by various general models in various professional fields; the system mainly comprises a basic knowledge base, a basic algorithm base, an intelligent information processing knowledge base, a rule base and the like;

the service element index library is used for storing a global index established by key elements of the service library so as to solve the problems of service association and service conflict.

The data governance module 104 is used for performing directory integration and hierarchical classification on the metadata, determining the blood relationship of the metadata, determining the quality of the metadata, performing data operation and maintenance, and using and serving the metadata.

It should be noted that the data governance module 104 may be configured to perform directory integration, hierarchical classification, determination of blood relationship of metadata, determination of metadata quality, data operation and maintenance, and use and service of metadata.

Specifically, the data administration module 104 includes a data operation and maintenance management module 1041, a data quality management module 1042, a model management module 1043, a tag management module 1044, a data consanguineous management module 1045, a data classification module 1046, and a data resource directory module 1047.

It should be noted that the data operation and maintenance management module 1041 may be configured to perform interface encapsulation on data governance related service capabilities as needed, and provide services for other application systems and other subsystems in the platform.

The data quality management module 1042 is configured to ensure the quality of data.

It should be noted that the data quality management module 1042 may be used to ensure the quality of data. Indicators for evaluating data quality include, but are not limited to, integrity (whether data is missing), normalization (whether data is stored according to a required rule), consistency (whether values of data conflict in the meaning of information), accuracy (whether data is erroneous), uniqueness (whether data is duplicated), and timeliness (whether data is uploaded according to a time requirement). Data quality is an indicator describing the value content of the data.

A model management module 1043 for the model management module for the full life cycle management of the model;

a tag management module 1044 for full lifecycle management of tags;

a data blood margin management module 1045 for tracking the source of the data and tracking the processing procedure of the data;

it should be noted that the data blood margin management module 1045 may be configured to track a source of data, track a processing procedure of the data, on one hand, realize transparency of a data governance process through the data blood margin, and on the other hand, when a problem occurs in the finally provided and used data, quickly trace back the source of the problem data through the data blood margin. Data consanguinity, which provides consanguinity version management between table level, field level, and rule level; impact analysis, also an embodiment of blood relationship, provides a range of changing impacts for resources. Data audiences, wherein on the blood relationship graph, the data nodes on the right represent the audiences, namely data demanders (influence analysis), and the more the data demanders are, the higher the data value is represented; the data magnitude, in the data blood relationship graph, the thicker the line of data flow, the larger the data volume, and the value of the data resource is reflected to a certain extent; if the data has no audience, the use value is lost, and the rightmost data node is not available on the data blood relationship graph, so that whether the main node resource can be archived or logged out can be evaluated.

And the data grading classification module 1046 is configured to provide support for formulating an opening and sharing policy of the data resources by describing multidimensional features and content sensitivity of the data.

A data resource directory module 1047, configured to support metadata management, where the metadata management includes technical metadata, management metadata, and service metadata; the technical metadata comprise data source information, a data structure, data blood relationship and influence analysis, a data cycle, a data history change condition and a data volume condition; the management metadata comprises metadata obtained after data classification and classification, and the service metadata comprises a data directory name, a data resource description, a data resource event right unit and a data resource management unit.

It should be noted that the Data resource catalog module 1047 can be used to support metadata Management (Meta Data Management), which is an important basis for Data asset Management, and is a planning, implementation and control action for obtaining high-quality, integrated metadata. The metadata management can be divided into technical metadata, management metadata and business metadata, wherein the technical metadata comprises data source information, data structures, data consanguinity and influence analysis, data periods, data history change conditions, data volume conditions and the like, the management metadata comprises metadata of management classes such as data classification and classification, and the business metadata comprises data directory names, data resource descriptions, data resource right units, data resource management units and the like. The data resource catalog module is also used for clearing data assets and forming a standard, standard and unified data resource catalog by combing the data resources of various data sources and data processing links of the big data platform; and by combining with the user hierarchical classification access authority management, the data resources are scientifically, orderly and safely opened and shared. And provides management functions of the whole life cycle of the data resources, including functions of registration, updating, starting, stopping, canceling, inquiring, gathering, synchronizing and the like of the data.

The data service module 105 is used to provide data to different systems and users.

It should be noted that the data service module 105 includes:

and the query retrieval service module 1051 is configured to provide a query interface for querying data resources for a user.

It should be noted that the query retrieval service module 1051 may be configured to provide a query interface including data resource conditions and a query interface for structured data for a user; and the query retrieval service module 1051 supports various query modes such as precision/ambiguity, classification, combination, batch, and the like. The service provides basic service functions of data resource inquiry, general data inquiry, general expansion inquiry and the like.

And the model analysis service module 1052 is configured to perform statistics, analysis and prediction on data by using an analysis model according to the needs of the business scenario to obtain an analysis result, so that the analysis result meets the needs of the business scenario.

It should be noted that the model analysis service module 1052 may be configured to perform statistics, analysis, regularity exploration, prediction, and the like on data by using an analysis model according to data service and service needs, and return an analysis result to meet the requirement of complex and variable service scenarios of the application layer.

The data pushing service module 1053 is configured to collect the lower data centers to the corresponding upper data centers according to the data resources, and send the data resources to the corresponding lower data centers from the upper data centers according to the data resources.

It should be noted that the data push service module 1053 is used for data aggregation and data delivery. The basic core capability of the big data cloud platform for data exchange and information push among nodes at all levels, and between the inside of the network and other departments outside the network is provided. The data aggregation refers to the gathering of data resources from a prefecture data center (a lower level data center) to a provincial or department level data center (a higher level data center) as required, or the data resources can be imported from the outside of the network in a single direction and gathered to the corresponding lower level data center. The data issuing refers to issuing data resources from a provincial data center (upper level) to a lower level data center as required.

And the data authentication service module 1054 is configured to authenticate the access right of the data according to a preset access control rule of the data.

It should be noted that the data authentication service module 1054 is configured to authenticate the access right of the data according to the access control rule of the data. The access control rule classifies four dimensions from content sensitivity, data source, data type, field and field relation to control resource authority, and the resource authority is authenticated by data resource authority of the user, and the access control of the data resource is realized by using data authentication service.

And the data operation service module 1055 is used for providing operation interface services for adding, modifying and deleting data.

It should be noted that the data operation service module 1055 may be configured to provide operation interface services for adding, modifying, deleting, and the like of data.

The data security module 1056 is used for classifying data resources from multiple aspects including data acquisition mode, data type, and data field; and grading the data resources according to the preset content sensitivity degrees of different data resources.

It should be noted that the data security module 1056 is used to provide support for the service ability of the data resource by describing the multidimensional characteristics and content sensitivity of the data (i.e. data classification and data classification). The data classification is to classify the data resources from multiple dimensions such as a data acquisition mode, a data type, a field and the like, control the use range of the data resources according to the data type, and classify the data resources according to multiple layers in a certain dimension. And grading the records according to the content sensitivity degrees of different resources is supported.

The data management service module 1057 is used for interface packaging of data governance related service capabilities according to needs, and providing services for other application systems and other subsystems in the platform.

The data service customizing module 1058 is configured to provide a customized data service for a user, and specifically includes:

s2, if customizing the resource retrieval service, registering the data table as a resource, and mounting the registered resource to the data resource catalog;

s3, issuing the registered resources to enable the user to access the outside;

s6, if the SQL query service is customized, selecting any data table in the data warehouse, compiling standard SQL, and customizing a condition column and a result column on the standard SQL, namely completing the SQL query service deployment;

s7, issuing the SQL query service to enable the user to access the exterior;

The above is an embodiment of the system of the present application, and the present application further provides an embodiment of a data governance method, as shown in fig. 3, where fig. 3 includes:

301. acquiring a data access mode, a data updating period and a data storage period;

302. registering a data source;

303. establishing a data set standard, a data item standard, a data element standard, a qualifier standard and a named entity standard for a data source according to a data access mode, a data updating period and a data storage period;

304. performing data exploration on an access mode, a data scale, a service meaning, a data set table and a field of a data source, wherein the data exploration on the field comprises the knowledge of null value condition, standard condition, value range condition and problem data condition of the field;

305. defining an ETL standardization processing process according to a data exploration result, wherein the data exploration result comprises a null value condition, a standard condition, a value range condition and a problem data condition of a field;

306. judging whether operators meeting the requirements exist in the ETL or not;

307. if the ETL does not have operators meeting the requirements, customizing the operators meeting the requirements through tool operators and scalar operators built in the system;

308. if operators meeting the requirements exist in the ETL, cleaning and converting the data by adopting the ETL through defining an ETL standardization processing process, and distributing the data after the data cleaning and the data conversion into a library in a data organization module;

309. after the data is distributed and put in storage, resource registration is carried out on the standardized data, and metadata of the data are enriched, wherein the metadata comprise technical metadata, management metadata and business metadata.

It should be noted that, for the obtained metadata, catalog integration, hierarchical classification, data consanguinity determination, data quality determination, operation and maintenance of the data, and data use and service may be performed on the metadata, and the specific metadata integration and management functions are implemented by the data governance module.

It is clear to those skilled in the art that, for convenience and brevity of description, the specific working processes of the above-described systems, apparatuses and units may refer to the corresponding processes in the foregoing method embodiments, and are not described herein again.

The terms "comprises," "comprising," and "having," and any variations thereof, in this application are intended to cover a non-exclusive inclusion, such that a process, method, system, article, or apparatus that comprises a list of steps or elements is not necessarily limited to those steps or elements expressly listed, but may include other steps or elements not expressly listed or inherent to such process, method, article, or apparatus.

It should be understood that in the present application, "at least one" means one or more, "a plurality" means two or more. "and/or" for describing an association relationship of associated objects, indicating that there may be three relationships, e.g., "a and/or B" may indicate: only A, only B and both A and B are present, wherein A and B may be singular or plural. The character "/" generally indicates that the former and latter associated objects are in an "or" relationship. "at least one of the following" or similar expressions refer to any combination of these items, including any combination of single item(s) or plural items. For example, at least one (one) of a, b, or c, may represent: a, b, c, "a and b", "a and c", "b and c", or "a and b and c", wherein a, b, c may be single or plural.

In the several embodiments provided in the present application, it should be understood that the disclosed system and method may be implemented in other ways. For example, the above-described apparatus embodiments are merely illustrative, and for example, the division of the units is only one logical division, and other divisions may be realized in practice, for example, a plurality of units or components may be combined or integrated into another system, or some features may be omitted, or not executed.

The modules described as separate parts may or may not be physically separate, and parts displayed as modules may or may not be physical units, may be located in one place, or may be distributed on a plurality of network modules. Some or all of the modules may be selected according to actual needs to achieve the purpose of the solution of the present embodiment.

In addition, functional modules in the embodiments of the present application may be integrated into one processing unit, or each module may exist alone physically, or two or more modules are integrated into one module. The integrated module can be realized in a hardware mode, and can also be realized in a software functional module mode.

The integrated module, if implemented in the form of a software functional module and sold or used as a stand-alone product, may be stored in a computer readable storage medium. Based on such understanding, the technical solution of the present application may be substantially implemented or contributed to by the prior art, or all or part of the technical solution may be embodied in a software product, which is stored in a storage medium and includes instructions for causing a computer device (which may be a personal computer, a server, or a network device) to execute all or part of the steps of the method according to the embodiments of the present application. And the aforementioned storage medium includes: a U disk, a removable hard disk, a Read-Only Memory (ROM), a Random Access Memory (RAM), a magnetic disk or an optical disk, and other various media capable of storing program codes.

The above embodiments are only used for illustrating the technical solutions of the present application, and not for limiting the same; although the present application has been described in detail with reference to the foregoing embodiments, it should be understood by those of ordinary skill in the art that: the technical solutions described in the foregoing embodiments may still be modified, or some technical features may be equivalently replaced; and such modifications or substitutions do not depart from the spirit and scope of the corresponding technical solutions in the embodiments of the present application.

Claims

1. A data governance system, comprising:

2. The data governance system of claim 1, wherein the data access module comprises:

3. The data governance system of claim 1, wherein the data processing module comprises:

4. The data governance system of claim 1, wherein the data organization module comprises a raw repository, a resource repository, a subject repository, a business repository, a knowledge repository, and a business element index repository;

5. The data governance system of claim 1, wherein the data service module comprises:

6. The data governance system of claim 5, wherein the data service module further comprises:

7. The data governance system of claim 5, wherein the data service module further comprises:

s3, issuing the registered resources to enable the user to access externally;

s7, issuing the SQL query service to enable the user to access the exterior;

8. The data governance system of claim 1, wherein the data governance module comprises:

9. A data governance method, comprising:

acquiring a data access mode, a data updating period and a data storage period;

registering a data source;

judging whether operators meeting the requirements exist in the ETL or not;

10. A data governance device, the device comprising a processor and a memory:

the processor is configured to execute the data governance method of claim 9 according to instructions in the program code.