CN112699175B - Data management system and method thereof - Google Patents

Data management system and method thereof Download PDF

Info

Publication number
CN112699175B
CN112699175B CN202110057150.6A CN202110057150A CN112699175B CN 112699175 B CN112699175 B CN 112699175B CN 202110057150 A CN202110057150 A CN 202110057150A CN 112699175 B CN112699175 B CN 112699175B
Authority
CN
China
Prior art keywords
data
module
metadata
service
library
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Active
Application number
CN202110057150.6A
Other languages
Chinese (zh)
Other versions
CN112699175A (en
Inventor
黄晓雄
赖伟
李跃华
郑博洪
陈军
虎清军
陈文强
刘铭
吴杰
张有为
甘勇航
李华
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Guangzhou Teligen Communication Technology Co ltd
Original Assignee
Guangzhou Teligen Communication Technology Co ltd
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Guangzhou Teligen Communication Technology Co ltd filed Critical Guangzhou Teligen Communication Technology Co ltd
Priority to CN202110057150.6A priority Critical patent/CN112699175B/en
Publication of CN112699175A publication Critical patent/CN112699175A/en
Application granted granted Critical
Publication of CN112699175B publication Critical patent/CN112699175B/en
Active legal-status Critical Current
Anticipated expiration legal-status Critical

Links

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F16/00Information retrieval; Database structures therefor; File system structures therefor
    • G06F16/20Information retrieval; Database structures therefor; File system structures therefor of structured data, e.g. relational data
    • G06F16/25Integrating or interfacing systems involving database management systems
    • G06F16/254Extract, transform and load [ETL] procedures, e.g. ETL data flows in data warehouses
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F16/00Information retrieval; Database structures therefor; File system structures therefor
    • G06F16/20Information retrieval; Database structures therefor; File system structures therefor of structured data, e.g. relational data
    • G06F16/21Design, administration or maintenance of databases
    • G06F16/215Improving data quality; Data cleansing, e.g. de-duplication, removing invalid entries or correcting typographical errors

Landscapes

  • Engineering & Computer Science (AREA)
  • Databases & Information Systems (AREA)
  • Theoretical Computer Science (AREA)
  • Data Mining & Analysis (AREA)
  • Physics & Mathematics (AREA)
  • General Engineering & Computer Science (AREA)
  • General Physics & Mathematics (AREA)
  • Quality & Reliability (AREA)
  • Information Retrieval, Db Structures And Fs Structures Therefor (AREA)

Abstract

The application discloses a data management system and a method thereof, wherein the data management system comprises a data access module, a data acquisition module and a data accounting module, wherein the data access module is used for reading multi-source heterogeneous data, accessing the multi-source heterogeneous data subjected to data exploration and data definition into a large data center, and performing data accounting; the data processing module is used for extracting data, converting the extracted data into a required format, correcting or eliminating abnormal data and distributing the data to a corresponding data warehouse; the data organization module is used for storing the data after data distribution in a corresponding library in a classified manner to obtain various types of metadata; the data management module is used for carrying out catalog integration and hierarchical classification on the metadata, determining the blood margin and the quality of the metadata, carrying out data operation and maintenance and carrying out use and service on the metadata; the data service unit is used for providing data to users. The method solves the technical problems that the data quality monitoring has no unified data quality evaluation standard and management standard, the omnibearing quality monitoring cannot be realized, and the closed loop processing mechanism is lacked in the data quality problem.

Description

Data management system and method thereof
Technical Field
The application relates to the technical field of data management, in particular to a data management system and a method thereof.
Background
The data is taken as the basis of system construction, is a core asset of a client, and a complete data supervision scheme is urgently needed. The whole management process of the data comprises access, standardization, warehousing, monitoring, control and the like.
The data asset management platform needs to have three aspects of capabilities: firstly, the ETL (Extract-Transform-Load) is an important ring for constructing a data warehouse, so that a user can complete the whole process of data access extraction, interactive conversion standardization and loading and warehousing to a target warehouse based on the platform; secondly, the data quality monitoring capability is needed, a data asset manager can clearly master the overall situation of the data asset, can clearly know the coming pulse-removing blood-source situation of each item of data resource, and can know the abnormal situation of the data at the first time; and finally, the data asset manager needs to have data use control capability, can classify the data in a grading manner, and can control the data use authority in a fine granularity.
The data integration is defined according to data of a data access link, and aims at large data characteristics of huge scale, various types, high-speed circulation, complexity, variability, uneven quality and different value densities, data application is used as a guide, the value density of the data is improved through standardization processing, and data increment, data preparation and data abstraction are realized for data intelligent application.
The data management is planning design, process control and quality supervision of the whole life cycle of the data resource, and through standardized data management, transparency, manageability and controllability of the data resource can be realized, data assets are cleared, data standard landing is perfected, data processing flow is standardized, data quality is improved, safe use of the data is guaranteed, and data circulation and value extraction are promoted.
The access adaptation degree of the existing data integration ETL tool to various data storage modes is mostly limited to a set of large data storage clusters, plug-in units or script modes are mostly used, the interface is not friendly, the processing mode is too simple, the requirement threshold for a user is high, the supervision is low, the cost is high, and the expandability is low.
At present, a unified standard and standard management method is lacking in a data asset management scheme, definition limits of data assets and physical tables are too fuzzy, and asset management of resource, cataloging and standardization cannot be met.
The existing scheme for data security is single, and can only be controlled to a table level and a field level, so that the combined security control of finer granularity such as a field classification level, a record classification level, desensitization management, red list management and the like is lacked.
At present, in the aspect of industry data management, data quality monitoring does not have unified data quality evaluation standards and management specifications, and cannot achieve comprehensive quality monitoring, and a closed-loop processing mechanism is lacked for data quality problems.
In the aspect of data application, the business application is strongly coupled with the data storage, so that each time the business logic is modified, the data reading is directly influenced; the same data reading flow, different business applications, also need to repeatedly develop the data reading; the development of an application system requires understanding of service logic and database technology; the organization and management of data resources requires a significant amount of time.
Disclosure of Invention
The application provides a data management system and a method thereof, which solve the technical problems that the data quality monitoring has no unified data quality evaluation standard and management standard, the omnibearing quality monitoring is not realized, and the closed loop processing mechanism is lacked for the data quality problem.
In view of this, a first aspect of the present application provides a data governance system, the system comprising:
the system comprises a data access module, a data processing module, a data organization module, a data service module and a data management module;
the data access module is used for carrying out data reading, data exploration, data definition and data conversion on the multi-source heterogeneous data; accessing the multi-source heterogeneous data subjected to data exploration and data definition into a large data center, and performing data reconciliation on the data subjected to data definition and the data of a data provider;
The data processing module is used for extracting data from a data source, converting the extracted data into a required format, correcting or eliminating abnormal data and distributing the data to a corresponding data warehouse;
the data organization module is used for storing the data after data distribution to an original library or a resource library or a theme library or a business library or a knowledge library or a business element index library in a classified manner to obtain various types of metadata;
the data management module is used for carrying out catalog integration and hierarchical classification on the metadata, determining the blood margin of the metadata, determining the quality of the metadata, carrying out data operation and maintenance and carrying out use and service on the metadata;
the data service unit is used for providing data to different systems and users.
Optionally, the data access module includes:
the data exploration module is used for exploration of business meaning, data structure, field format, value range, statistical distribution and data quality of the data to obtain an exploration result of the data;
the data definition module is used for defining data organization, registering a data resource catalog, defining data classification, defining data blood edges, defining data quality detection rules, defining statistical strategies, defining data processing rules and defining data use rules according to the exploration result;
The data reading module is used for reading the data defined by the data and checking whether the data defined by the data has conflict in information meaning or not; and checking the data defined by the data with the data of the data provider;
and the data conversion module is used for decrypting the data, decompressing the data, recording the data ID, generating a data bill and providing data support for the data processing module.
Optionally, the data processing module includes:
the data extraction module is used for extracting data from the source format data;
the data cleaning module is used for generating data meeting preset standards and quality requirements;
the data association module is used for associating the data with other knowledge data and business data and outputting association information;
the data comparison module is used for carrying out the same comparison or similarity calculation on the structured data and the unstructured data and outputting the data meeting the preset rule;
the data identification module is used for comparing, analyzing and calculating a model of data by utilizing a label engine based on a label knowledge base, labeling the data and providing support for upper-layer application;
and the data distribution module is used for distributing the data to the corresponding data warehouse according to different application scenes and preset distribution strategies.
Optionally, the data organization module comprises an original library, a resource library, a subject library, a business library, a knowledge library and a business element index library;
the original library is used for storing original data and reflecting a data set of an original service scene; processing the source data to obtain standardized data, associated element information, label information and data classification information;
the resource library is used for integrating key elements established by various data resources and public data sets of association and relation among the elements; the elements are identification type attributes of data, including citizen identity numbers, license plate numbers, mobile phone numbers and MAC;
the topic library is used for storing topic objects for identifying people, places, cases, events, objects and organizations, and comprises a personnel topic library, a place topic library, an object topic library, a case topic library, an event topic library, an information topic library and an organization topic library;
the business library is used for storing data of business in each professional field, recording business processes and providing data support for business activities;
the knowledge base is used for storing knowledge data and rule method sets shared in the public security field;
the business element index library is used for storing global indexes established by key elements of the business library so as to solve the problems of business association and business conflict.
Optionally, the data service module includes:
the query retrieval service module is used for providing a query interface for querying data resources for a user;
the model analysis service module is used for carrying out statistics, analysis and prediction on the data by utilizing an analysis model according to the requirements of the service scene to obtain an analysis result, so that the analysis result meets the requirements of the service scene;
the data pushing service module is used for gathering the lower data center to the corresponding upper data center according to the requirement of the data resource and sending the data resource to the corresponding lower data center from the upper data center according to the requirement;
the data authentication service module is used for authenticating the access authority of the data according to the preset access control rule of the data;
and the data operation service module is used for providing operation interface services of adding, modifying and deleting data.
Optionally, the data service module further includes:
the data security module is used for classifying the data resources from a plurality of aspects, wherein the aspects comprise a data acquisition mode, a data type and a data field; grading the data resources according to preset content sensitivity degrees of different data resources;
And the data management service module is used for carrying out interface encapsulation on the data management related service capacity according to the requirement and providing services for other application systems and other subsystems in the platform.
Optionally, the data service module further includes:
the data service customizing module is used for providing customized data service for the user, and specifically comprises the following steps:
s1, selecting a data table in any data warehouse to customize data service;
s2, if the resource retrieval service is customized, registering the data table as resources, and mounting the registered resources to a data resource catalog;
s3, publishing the registered resources so that the user can access the resources externally;
s4, configuring the access rights according to the classification and grading results of the registered resources;
s5, selecting the published resources with access rights to perform retrieval service deployment, so that a user performs resource retrieval service customization according to a condition column and a result column of the resources required to be acquired;
s6, if the SQL query service is customized, selecting any data table in the data warehouse, writing standard SQL, and customizing a condition column and a result column on the standard SQL to finish the deployment of the SQL query service;
s7, publishing the SQL query service to enable the user to perform external access;
And S8, performing access control on authority, flow, frequency and bandwidth on the resource retrieval service and the SQL query service.
Optionally, the data governance module includes:
the data operation and maintenance management module is used for carrying out early warning and treatment on abnormal states by collecting state information of data access, processing, organization and service, so as to realize real-time monitoring and management of each task;
the data quality management module is used for timely finding, positioning, monitoring, tracking and solving various data quality problems by establishing a data quality evaluation standard and a management standard to form closed loop processing of the data quality problems;
the model management module is used for managing the full life cycle of the model;
the label management module is used for managing the whole life cycle of the label;
the data blood edge management module is used for tracking the source of data and tracking the processing process of the data;
the data classification module is used for providing support for formulating an opening and sharing strategy of data resources by describing the multidimensional characteristics and content sensitivity degree of the data; the data resource catalog module is used for supporting metadata management, and the metadata management comprises technical metadata, management metadata and service metadata; the technical metadata comprises data source information, a data structure, data blood-source and influence analysis, a data period, a data history change condition and a data volume condition; the management metadata comprises metadata after classifying and grading the data, and the business metadata comprises a data directory name, a data resource description, a data resource authority unit and a data resource management unit.
The data resource catalog module is used for supporting metadata management, and the metadata management comprises technical metadata, management metadata and service metadata; the technical metadata comprises data source information, a data structure, data blood-source and influence analysis, a data period, a data history change condition and a data volume condition; the management metadata comprises metadata after classifying and grading the data, and the business metadata comprises a data directory name, a data resource description, a data resource authority unit and a data resource management unit.
A second aspect of the present application provides a data governance method, the method comprising:
acquiring a data access mode, a data updating period and a data storage period;
registering a data source;
according to the data access mode, the data updating period and the data storage period, a data set standard, a data item standard, a data element standard, a qualifier standard and a named entity standard are established for the data source;
performing data exploration on the access mode, the data scale, the business meaning, the data set table and the field of the data source, wherein the data exploration on the field comprises the steps of knowing the null value condition, the standard condition, the value range condition and the problem data condition of the field;
Defining an ETL standardized processing process according to a data exploration result, wherein the data exploration result comprises null value conditions, standard conditions, value field conditions and problem data conditions of fields;
judging whether operators meeting the requirements exist in the ETL;
if the ETL has no operators meeting the requirements, customizing operators meeting the requirements through tool operators and scalar operators built in the system;
if the ETL has operators meeting the requirements, carrying out data cleaning and data conversion on the data by defining the ETL standardization processing process, and distributing the data after the data cleaning and data conversion into a library in a data organization module;
after distribution and warehousing, the standardized data are subjected to resource registration, and metadata of the data are enriched, wherein the metadata comprise technical metadata, management metadata and business metadata.
A third aspect of the present application provides a data governance device, the device comprising a processor and a memory:
the memory is used for storing program codes and transmitting the program codes to the processor;
the processor is configured to execute the steps of the data governance method as described in the second aspect above according to instructions in the program code.
From the above technical scheme, the application has the following advantages:
in this application, there is provided a data governance system comprising: the data access module is used for carrying out data reading, data exploration, data definition and data conversion on the multi-source heterogeneous data; accessing the multi-source heterogeneous data subjected to data exploration and data definition into a large data center, and performing data reconciliation on the data subjected to data definition and the data of a data provider; the data processing module is used for extracting data from a data source, converting the extracted data into a required format, correcting or eliminating abnormal data and distributing the data to a corresponding data warehouse; the data organization module is used for storing the data after data distribution to an original library or a resource library or a theme library or a business library or a knowledge library or a business element index library in a classified manner to obtain various types of metadata; the data management module is used for carrying out catalog integration and hierarchical classification on the metadata, determining the blood margin of the metadata, determining the quality of the metadata, carrying out data operation and maintenance and carrying out use and service on the metadata; the data service unit is used for providing data to different systems and users.
The method distributes data to various libraries, the data warehouse is more diversified, and various large data clusters and common relational databases in the industry and unstructured data storage are supported; the data management module is used for carrying out catalog integration on metadata, classifying the metadata in a grading manner, determining the blood margin of the metadata, determining the quality of the metadata, carrying out data operation and maintenance and using and serving the metadata, so that the data resource management is more unified, besides technical metadata, management metadata and business metadata are also provided, blood margin analysis and influence analysis are supported, and asset management which accords with resource, catalogization and standardization is realized.
Drawings
FIG. 1 is a system architecture diagram of one embodiment of a data management system of the present application;
FIG. 2 is a schematic diagram illustrating the operation of a data access module and a data processing module in one embodiment of a data management system of the present application;
FIG. 3 is a method flow diagram of one embodiment of a data governance method of the present application;
FIG. 4 is a schematic diagram of a data resource management architecture in one embodiment of a data management system of the present application;
FIG. 5 is a schematic diagram of a management architecture for data services in one embodiment of a data management system.
Detailed Description
In order to make the present application solution better understood by those skilled in the art, the following description will clearly and completely describe the technical solution in the embodiments of the present application with reference to the accompanying drawings in the embodiments of the present application, and it is apparent that the described embodiments are only some embodiments of the present application, not all embodiments. All other embodiments, which can be made by one of ordinary skill in the art without undue burden from the present disclosure, are within the scope of the present disclosure.
Referring to fig. 1, fig. 1 is a system architecture diagram of an embodiment of a data management system according to the present application, as shown in fig. 1, where fig. 1 includes:
A data access module 101, a data processing module 102, a data organization module 103, a data management module 104, and a data service module 105;
the data access module 101 is used for performing data reading, data exploration, data definition and data conversion on the multi-source heterogeneous data; accessing the multi-source heterogeneous data subjected to data exploration and data definition into a large data center, and performing data reconciliation on the data subjected to data definition and the data of a data provider;
defining a flow, a method and a circulation mechanism of links such as data reading, data processing, data management, data organization, data service and the like; accessing the read multi-source heterogeneous data into a large data center according to data exploration and data definition, and finishing data checking with a data provider after the data is read;
it should be noted that, the data access module 101 may be used to define a flow, a method, and a circulation mechanism of links such as data reading, data processing, data management, data organization, and data service; accessing the read multi-source heterogeneous data into a large data center according to data exploration and data definition, and finishing data checking with a data provider after the data is read; wherein, the multi-source heterogeneous data may refer to the schematic diagram of the multi-source heterogeneous data in fig. 2, including data of country level, province level, city level and county level divided according to multi-source-longitudinal direction; social industry, government departments, the Internet of things, online data, offline data, in-house data, overseas data, means acquisition data, management class data and the like according to multi-source-transverse division; structured, semi-structured, unstructured data sets, text, pictures, audio and video data divided according to heterogeneous-data formats; network file systems, distributed file systems, relational databases, and message buses, divided by heterogeneous-storage structures. Where a data provider refers to a mechanism or program that provides data.
Specifically, the data access module 101 further includes a data probing module 1011, a data definition module 1012, a data reading module 1013, and a data conversion module 1014;
the data probing module 1011 is configured to probe the business meaning, data structure, field format, value range, statistical distribution, and data quality of the data, so as to obtain a data probing result.
It should be noted that, the data probing module 1011 may be configured to probe the service meaning, the data structure, the field format, the value range, the statistical distribution, the data quality, and the like of the read data, so that the system multidimensional identification data content provides a basis for data processing.
The data definition module 1012 is used for defining data organization, registering data resource catalogs, defining data classification, defining data blood edges, defining data quality detection rules, defining statistical strategies, defining data processing rules and defining data use rules according to the exploration results.
It should be noted that, the data defining module 1012 may be configured to define a process of the ETL standardization process according to a result of data exploration, that is, a specification condition of data and a problem data condition in the data. The method specifically comprises the steps of defining data organization, registering a data resource catalog, defining data classification, defining data blood edges, defining data quality detection rules, defining statistical strategies, defining data processing rules and defining data use rules.
A data reading module 1013 for reading the data defined by the data and checking whether the data defined by the data has a conflict in information meaning; and checking the data defined by the data with the data of the data provider.
It should be noted that the data reading module may be configured to read the data defined by the data; checking whether the value of the data has conflict in information meaning, namely whether the data are consistent; and data checking is carried out on the read data and the data of the data provider.
The data conversion module 1014 is configured to decrypt the data, decompress the data, record the data ID, generate a data bill, and provide data support for the data processing module.
It should be noted that, the data conversion module is configured to decrypt data, decompress data, record data ID, generate a data bill, and provide data support for data processing.
The data processing module 102 is used to extract data from a data source, convert the extracted data into a desired format, correct or clear anomalous data, and distribute the data to a corresponding data warehouse.
It should be noted that, the data processing module 102 may be defined according to the data of the data access link, and aims at large data characteristics of huge scale, various types, high-speed circulation, complexity and variability, uneven quality and different value density, and uses the data application as a guide, and through standardization processing, the value density of the data is improved, so as to implement data increment, data preparation and data abstraction for the data intelligent application. The Data processing combines the capabilities of Data Exploration, data warehouse DW, data acquisition tool ETL and the like, and provides the functions of Data extraction, data cleaning, data association, data comparison, data identification, data distribution and the like, so that the processes of Data extraction (the process of extracting Data from a Data source) (Extract), conversion (the process of converting Data from one representation form to another representation form) (Transform), cleaning (the process of cleaning Data refers to the last procedure of finding and correcting identifiable errors in the Data, including checking Data consistency, processing invalid values, missing values and the like) (cleaning) and Loading (the process of storing converted Data into the Data warehouse) (Loading) are realized.
In one specific embodiment of the present invention,
the data extraction module is used for extracting data from the source format data; the data cleaning module is used for generating data meeting the standard and quality requirements;
the data association module is used for associating the data with other knowledge data, business data and the like and outputting association information;
the data comparison module is used for carrying out the same comparison or similarity calculation on structured data and unstructured data and outputting data of hit rules;
the data identification module is used for comparing, analyzing and calculating a model of data by utilizing a label engine based on a label knowledge base, labeling the data and providing support for upper-layer application;
and the data distribution module distributes the data to the corresponding data warehouse according to the distribution strategy according to different application scenes.
Specifically, the data processing module in the application adopts a data processing tool ETL and adopts a D3 technology, so that the quantization controllability of the configuration intelligent process is achieved. Through the ETL configuration tool, a user can quickly construct a flow chart of data processing, and the whole process of data processing is quantized and controllable. The functions of the visual ETL tool comprise self-defining graphic processing flow, free dragging link nodes, graphic layout self-adaption, operator configuration simplification, real-time display of processed data volume, real-time monitoring of warehouse state, real-time monitoring of operation state and the like. The method also provides rich warehouse reading capability, supports a relational database commonly used in the industry, and big data Hive, HBase, ES, kafka, HDFS clusters of various manufacturers, and various unstructured and structured files such as text files, compression packages and the like; the method also provides rich operator capability, covers various common operators such as characters, dates, business, dictionary association, IDs, codes, numerical values, mapping, higher orders and the like, provides powerful support for different application scenes, and enables the data processing process to fall to the ground more quickly and effectively. However, the richer operators have unsatisfied time for the requirements of industry change testing, and then have high requirements for the operator expansibility of the ETL tool. The ETL tool has excellent operator layering architecture, provides common tool operators and scalar operators, and provides a foundation stone for high expansibility of operator customization; allowing users to customize operators in kotlen language or script mode based on the operators and register into ETL tools for use.
The data organization module 103 is configured to store the data after data distribution to an original library or a resource library or a theme library or a business library or a knowledge library or a business element index library in a classified manner, so as to obtain metadata of multiple types.
It should be noted that, the data organization module 103 includes an origin library, a resource library, a subject library, a business library, a knowledge library, and a business element index library. May be used to store various types of metadata.
In a specific embodiment, a part of the original library stores original data, and can reflect a data set of an original business scene; on the basis, standardized data, associated element information, label information and data classification information generated after a series of processing is carried out on various source data are supplemented;
the resource library is used for integrating key elements (various identification attributes such as citizen identity numbers, license plate numbers, mobile phone numbers, MAC (media access control) and the like) established by various data resources and public data sets of association and relation among the elements; the system mainly comprises an element association library, an element relation library, an element key behavior library, an element key content library and an element distribution library;
the topic library is used for storing topic objects for identifying people, places, cases, events, objects, organizations and the like, and comprises a personnel topic library, a place topic library, an article topic library, a case topic library, an event topic library, an information topic library, an organization topic library and the like;
The business library is a database of business in each professional field, supports data of the business in each professional field, records business processes, provides data support for each business activity and the like;
the knowledge base is used for storing knowledge data and rule method sets shared in the public safety field, wherein the knowledge data comprise knowledge data required by data access, processing, management, organization and service, various rules, methods and process sets, and knowledge data and general algorithms required by various general models in various professional fields; the system mainly comprises a basic knowledge base, a basic algorithm base, an intelligent information processing knowledge base, a rule base and the like;
the business element index library is used for storing global indexes established by key elements of the business library so as to solve the problems of business association and business conflict.
The data governance module 104 is configured to conduct catalog integration, hierarchical classification, determining blood edges, determining metadata quality, data operation and maintenance, and usage and service of metadata.
It should be noted that the data governance module 104 may be configured to perform catalog integration, hierarchical classification, determining blood edges, determining metadata quality, data operation and maintenance, and use and service of metadata.
Specifically, the data governance module 104 includes a data operation management module 1041, a data quality management module 1042, a model management module 1043, a label management module 1044, a data blood-edge management module 1045, a data classification module 1046, and a data resource catalog module 1047.
It should be noted that, the data operation and maintenance management module 1041 may be configured to package the service capability related to data management as required, and provide services for other application systems and other subsystems in the platform.
The data quality management module 1042 is used for guaranteeing the quality of the data.
It should be noted that the data quality management module 1042 may be configured to ensure the quality of data. The indexes for evaluating the quality of the data include, but are not limited to, integrity (whether the data is missing), normalization (whether the data is stored according to a required rule), consistency (whether the values of the data have conflicts in the meaning of information), accuracy (whether the data is wrong), uniqueness (whether the data is repetitive), timeliness (whether the data is uploaded according to the time requirement). The data quality is an indicator describing the value content of the data.
The model management module 1043 is used for a model management module and is used for full life cycle management of the model;
The tag management module 1044 is used for managing the full life cycle of the tag;
the data blood edge management module 1045 is used for tracking the source of the data and tracking the processing process of the data;
it should be noted that, the data blood edge management module 1045 may be used to track a source of data, and track a processing procedure of the data, on one hand, the transparency of a data treatment process may be achieved through the data blood edge, and on the other hand, when a problem occurs in the data that is finally provided for use, the source of the problem data may be quickly traced through the data blood edge. Data blood-edge, providing blood-edge version management among table level, field level, rule level; impact analysis, also an embodiment of blood edges, provides a range of changing impact for a resource. On the blood relationship graph, the right data node represents the audience, namely the data demand party (influence analysis), and the more the data demand party is, the greater the data value is represented; in the data magnitude and data blood relationship diagram, the thicker the line of data flow is, the larger the data quantity is, and the value of the data resource is reflected to a certain extent; if the data has no audience, the use value is lost, and the rightmost node has no data node from the data blood relationship diagram, so that whether the main node resource can be archived or logged off can be evaluated.
The data classification module 1046 is configured to provide support for formulating an open and shared policy of the data resource by describing multidimensional features and content sensitivity of the data.
A data resource catalog module 1047 for supporting metadata management including technical metadata, management metadata, service metadata; the technical metadata comprises data source information, data structures, data blood edges and influence analysis, data periods, data history change conditions and data volume conditions; the management metadata comprises metadata after classifying and grading the data, and the business metadata comprises a data directory name, a data resource description, a data resource authority unit and a data resource management unit.
It should be noted that the data resource catalog module 1047 may be used to support metadata management (Meta Data Management), which is an important basis for data asset management, and is planning, implementation, and control actions for obtaining high-quality, integrated metadata. Metadata management may be classified into technical metadata including data source information, data structures, data blood-source and influence analysis, data periods, data history change conditions, data volume conditions, etc., management metadata including metadata of management classes such as data classification and classification, and business metadata including data directory names, data resource descriptions, data resource right units, data resource management units, etc. The data resource catalog module is also used for clearing data assets, and standard, standard and unified data resource catalogs are formed by combing various data sources of the big data platform and data resources of all links of data processing; and the access rights management is classified by combining with the user classification, so that the data resources are scientifically, orderly and safely opened and shared. And provides management functions for the whole life cycle of the data resource, including functions of registering, updating, enabling, disabling, logging off, inquiring, gathering, synchronizing and the like.
The data service module 105 is used to provide data to different systems and users.
The data service module 105 includes:
the query retrieval service module 1051 is configured to provide a query interface for a user to query data resources.
It should be noted that, the query search service module 1051 may be configured to provide a query interface including a data resource case and a query interface of structured data for a user; and the query retrieval service module 1051 supports multiple query approaches, exact/fuzzy, classified, combined, batch, etc. The service provides basic service functions such as data resource query, general data query, general expansion query and the like.
The model analysis service module 1052 is configured to perform statistics, analysis and prediction on data by using an analysis model according to the requirement of the service scenario, so as to obtain an analysis result, so that the analysis result meets the requirement of the service scenario.
It should be noted that, the model analysis service module 1052 may be configured to perform statistics, analysis, regular exploration, prediction, and the like on data by using an analysis model according to data service and service requirements, and return an analysis result, so as to meet the requirements of complex and variable service scenarios of an application layer.
The data push service module 1053 is configured to aggregate the lower data centers to the corresponding upper data centers according to the data resources, and issue the data resources from the upper data centers to the corresponding lower data centers according to the data resources.
The data push service module 1053 is used for data aggregation and data distribution. The method is basic core capability for carrying out data exchange and information push among all levels of nodes of the big data cloud platform and among other departments inside the network and outside the network. The data aggregation refers to the data resource is collected from a data center (lower data center) of a city to a data center (upper data center) of a province and a data center of a department according to the requirement, or can be imported from outside a network in a single direction and collected to a corresponding lower data center. The data issuing refers to issuing the data resources from the provincial data center (upper level) to the lower level data center according to the requirements.
The data authentication service module 1054 is configured to authenticate access rights of data according to preset access control rules of the data.
The data authentication service module 1054 is configured to authenticate access rights of data according to access control rules of the data. The access control rule performs resource authority control from four dimensions of content sensitivity, data source, data type, field and field relation classification, and the resource authentication realizes access control to the data resource by using the data authentication service through the data resource authority of the user.
The data operation service module 1055 is configured to provide operation interface services for adding, modifying, and deleting data.
It should be noted that the data operation service module 1055 may be used to provide operation interface services such as adding, modifying, deleting, and the like, of data.
The data security module 1056 is configured to classify data resources from a plurality of aspects, including data acquisition mode, data type, data field; and grading the data resources according to the preset content sensitivity degrees of different data resources.
It should be noted that the data security module 1056 is configured to provide support for formulating the service capability of the data resource by describing the multidimensional nature and content sensitivity (i.e., data classification and data hierarchy) of the data. The data classification is to classify the data resources from multiple dimensions such as a data acquisition mode, a data type, a field and the like, control the use range of the data resources according to the data type, and classify the data resources according to multiple layers under a certain dimension. Support ranking records according to content sensitivity level of different resources.
The data management service module 1057 is configured to interface and encapsulate the data management related service capability as required, and provide services for other application systems and other subsystems in the platform.
The data service customization module 1058 is configured to provide customized data services for users, and specifically includes:
s1, selecting a data table in any data warehouse to customize data service;
s2, if the resource retrieval service is customized, registering the data table as resources, and mounting the registered resources to a data resource catalog;
s3, publishing the registered resources so that the user can access the resources externally;
s4, configuring the access rights according to the classification and grading results of the registered resources;
s5, selecting the published resources with access rights to perform retrieval service deployment, so that a user performs resource retrieval service customization according to a condition column and a result column of the resources required to be acquired;
s6, if the SQL query service is customized, selecting any data table in the data warehouse, writing standard SQL, and customizing a condition column and a result column on the standard SQL to finish the deployment of the SQL query service;
s7, publishing the SQL query service to enable the user to perform external access;
and S8, performing access control on authority, flow, frequency and bandwidth on the resource retrieval service and the SQL query service.
The method distributes data to various libraries, the data warehouse is more diversified, and various large data clusters and common relational databases in the industry and unstructured data storage are supported; the data management module is used for carrying out catalog integration on metadata, classifying the metadata in a grading manner, determining the blood margin of the metadata, determining the quality of the metadata, carrying out data operation and maintenance and using and serving the metadata, so that the data resource management is more unified, besides technical metadata, management metadata and business metadata are also provided, blood margin analysis and influence analysis are supported, and asset management which accords with resource, catalogization and standardization is realized.
The foregoing is an embodiment of the system of the present application, and the present application further provides an embodiment of a data management method, as shown in fig. 3, where fig. 3 includes:
301. acquiring a data access mode, a data updating period and a data storage period;
302. registering a data source;
303. according to the data access mode, the data updating period and the data storage period, a data set standard, a data item standard, a data element standard, a qualifier standard and a named entity standard are established for the data source;
304. performing data exploration on the access mode, the data scale, the service meaning, the data set table and the field of the data source, wherein the data exploration on the field comprises the steps of knowing the null value condition, the standard condition, the value range condition and the problem data condition of the field;
305. defining an ETL standardized processing process according to a data exploration result, wherein the data exploration result comprises null value conditions, standard conditions, value field conditions and problem data conditions of fields;
306. judging whether operators meeting the requirements exist in the ETL;
307. if the ETL has no operators meeting the requirements, customizing operators meeting the requirements through tool operators and scalar operators built in the system;
308. if the ETL has operators meeting the requirements, carrying out data cleaning and data conversion on the data by defining the process of ETL standardization processing, and distributing the data after the data cleaning and data conversion into a library in a data organization module;
309. After distribution and warehousing, the standardized data are subjected to resource registration, and metadata of the data are enriched, wherein the metadata comprise technical metadata, management metadata and business metadata.
It should be noted that, for the obtained metadata, catalog integration, classification, data blood-source determination, data quality determination, data operation and maintenance, and data use and service can be performed on the metadata, and specific metadata integration and management functions are implemented by the data management module.
It will be clear to those skilled in the art that, for convenience and brevity of description, specific working procedures of the above-described systems, apparatuses and units may refer to corresponding procedures in the foregoing method embodiments, which are not repeated herein.
The terms "comprises," "comprising," and "having," and any variations thereof, are intended to cover a non-exclusive inclusion, such that a process, method, system, article, or apparatus that comprises a list of steps or elements is not necessarily limited to those steps or elements expressly listed or inherent to such process, method, article, or apparatus.
It should be understood that in this application, "at least one" means one or more, and "a plurality" means two or more. "and/or" for describing the association relationship of the association object, the representation may have three relationships, for example, "a and/or B" may represent: only a, only B and both a and B are present, wherein a, B may be singular or plural. The character "/" generally indicates that the context-dependent object is an "or" relationship. "at least one of" or the like means any combination of these items, including any combination of single item(s) or plural items(s). For example, at least one (one) of a, b or c may represent: a, b, c, "a and b", "a and c", "b and c", or "a and b and c", wherein a, b, c may be single or plural.
In the several embodiments provided in this application, it should be understood that the disclosed systems and methods may be implemented in other ways. For example, the apparatus embodiments described above are merely illustrative, e.g., the division of the units is merely a logical function division, and there may be additional divisions when actually implemented, e.g., multiple units or components may be combined or integrated into another system, or some features may be omitted or not performed.
The modules described as separate components may or may not be physically separate, and components shown as modules may or may not be physical units, may be located in one place, or may be distributed over multiple network modules. Some or all of the modules may be selected according to actual needs to achieve the purpose of the solution of this embodiment.
In addition, each functional module in each embodiment of the present application may be integrated in one processing unit, or each module may exist alone physically, or two or more modules may be integrated in one module. The integrated modules may be implemented in hardware or in software functional modules.
The integrated modules, if implemented in the form of software functional modules and sold or used as a stand-alone product, may be stored in a computer readable storage medium. Based on such understanding, the technical solution of the present application may be embodied in essence or a part contributing to the prior art or all or part of the technical solution in the form of a software product stored in a storage medium, including several instructions to cause a computer device (which may be a personal computer, a server, or a network device, etc.) to perform all or part of the steps of the methods described in the embodiments of the present application. And the aforementioned storage medium includes: u disk, mobile hard disk, read-Only Memory (ROM), random access Memory (Random Access Memory, RAM), magnetic disk or optical disk, etc.
The above embodiments are merely for illustrating the technical solution of the present application, and not for limiting the same; although the present application has been described in detail with reference to the foregoing embodiments, it should be understood by those of ordinary skill in the art that: the technical scheme described in the foregoing embodiments can be modified or some technical features thereof can be replaced by equivalents; such modifications and substitutions do not depart from the spirit and scope of the corresponding technical solutions.

Claims (9)

1. A data management system, comprising:
the system comprises a data access module, a data processing module, a data organization module, a data service module and a data management module;
the data access module is used for carrying out data reading, data exploration, data definition and data conversion on the multi-source heterogeneous data; accessing the multi-source heterogeneous data subjected to data exploration and data definition into a large data center, and performing data reconciliation on the data subjected to data definition and the data of a data provider;
the data processing module is used for extracting data from a data source, converting the extracted data into a required format, correcting or eliminating abnormal data and distributing the data to a corresponding data warehouse;
The data organization module is used for storing the data after data distribution to an original library or a resource library or a theme library or a business library or a knowledge library or a business element index library in a classified manner to obtain various types of metadata;
the data management module is used for carrying out catalog integration and hierarchical classification on the metadata, determining the blood margin of the metadata, determining the quality of the metadata, carrying out data operation and maintenance and carrying out use and service on the metadata;
the data service module is used for providing data to different systems and users;
the data access module comprises:
the data exploration module is used for exploration of business meaning, data structure, field format, value range, statistical distribution and data quality of the data to obtain an exploration result of the data;
the data definition module is used for defining data organization, registering a data resource catalog, defining data classification, defining data blood edges, defining data quality detection rules, defining statistical strategies, defining data processing rules and defining data use rules according to the exploration result;
the data reading module is used for reading the data defined by the data and checking whether the data defined by the data has conflict in information meaning or not; and checking the data defined by the data with the data of the data provider;
And the data conversion module is used for decrypting the data, decompressing the data, recording the data ID, generating a data bill and providing data support for the data processing module.
2. The data management system of claim 1, wherein the data processing module comprises:
the data extraction module is used for extracting data from the source format data;
the data cleaning module is used for generating data meeting preset standards and quality requirements;
the data association module is used for associating the data with other knowledge data and business data and outputting association information;
the data comparison module is used for carrying out the same comparison or similarity calculation on the structured data and the unstructured data and outputting the data meeting the preset rule;
the data identification module is used for comparing, analyzing and calculating a model of data by utilizing a label engine based on a label knowledge base, labeling the data and providing support for upper-layer application;
and the data distribution module is used for distributing the data to the corresponding data warehouse according to different application scenes and preset distribution strategies.
3. The data management system of claim 1, wherein the data organization module comprises an origin library, a resource library, a subject library, a business library, a knowledge library, and a business element index library;
The original library is used for storing original data and reflecting a data set of an original service scene; processing the source data to obtain standardized data, associated element information, label information and data classification information;
the resource library is used for integrating key elements established by various data resources and public data sets of association and relation among the elements; the elements are identification type attributes of data, including citizen identity numbers, license plate numbers, mobile phone numbers and MAC;
the topic library is used for storing topic objects for identifying people, places, cases, events, objects and organizations, and comprises a personnel topic library, a place topic library, an object topic library, a case topic library, an event topic library, an information topic library and an organization topic library;
the business library is used for storing data of business in each professional field, recording business processes and providing data support for business activities;
the knowledge base is used for storing knowledge data and rule method sets shared in the public security field;
the business element index library is used for storing global indexes established by key elements of the business library so as to solve the problems of business association and business conflict.
4. The data management system of claim 1, wherein the data service module comprises:
The query retrieval service module is used for providing a query interface for querying data resources for a user;
the model analysis service module is used for carrying out statistics, analysis and prediction on the data by utilizing an analysis model according to the requirements of the service scene to obtain an analysis result, so that the analysis result meets the requirements of the service scene;
the data pushing service module is used for gathering the lower data center to the corresponding upper data center according to the requirement of the data resource and sending the data resource to the corresponding lower data center from the upper data center according to the requirement;
the data authentication service module is used for authenticating the access authority of the data according to the preset access control rule of the data;
and the data operation service module is used for providing operation interface services of adding, modifying and deleting data.
5. The data management system of claim 4, wherein the data service module further comprises:
the data security module is used for classifying the data resources from a plurality of aspects, wherein the aspects comprise a data acquisition mode, a data type and a data field; grading the data resources according to preset content sensitivity degrees of different data resources;
And the data management service module is used for carrying out interface encapsulation on the data management related service capacity according to the requirement and providing services for other application systems and other subsystems in the platform.
6. The data management system of claim 4, wherein the data service module further comprises:
the data service customizing module is used for providing customized data service for the user, and specifically comprises the following steps:
s1, selecting a data table in any data warehouse to customize data service;
s2, if the resource retrieval service is customized, registering the data table as resources, and mounting the registered resources to a data resource catalog;
s3, publishing the registered resources so that the user can access the resources externally;
s4, configuring the access rights according to the classification and grading results of the registered resources;
s5, selecting the published resources with access rights to perform retrieval service deployment, so that a user performs resource retrieval service customization according to a condition column and a result column of the resources required to be acquired;
s6, if the SQL query service is customized, selecting any data table in the data warehouse, writing standard SQL, and customizing a condition column and a result column on the standard SQL to finish the deployment of the SQL query service;
S7, publishing the SQL query service to enable the user to perform external access;
and S8, performing access control on authority, flow, frequency and bandwidth on the resource retrieval service and the SQL query service.
7. The data governance system of claim 1, wherein said data governance module comprises:
the data operation and maintenance management module is used for carrying out early warning and treatment on abnormal states by collecting state information of data access, processing, organization and service, so as to realize real-time monitoring and management of each task;
the data quality management module is used for timely finding, positioning, monitoring, tracking and solving various data quality problems by establishing a data quality evaluation standard and a management standard to form closed loop processing of the data quality problems;
the model management module is used for managing the full life cycle of the model;
the label management module is used for managing the whole life cycle of the label;
the data blood edge management module is used for tracking the source of data and tracking the processing process of the data;
the data classification module is used for providing support for formulating an opening and sharing strategy of data resources by describing the multidimensional characteristics and content sensitivity degree of the data; the data resource catalog module is used for supporting metadata management, and the metadata management comprises technical metadata, management metadata and service metadata; the technical metadata comprises data source information, a data structure, data blood-source and influence analysis, a data period, a data history change condition and a data volume condition; the management metadata comprises metadata after classifying and grading the data, and the business metadata comprises a data directory name, a data resource description, a data resource authority unit and a data resource management unit.
8. A method of data management comprising:
acquiring a data access mode, a data updating period and a data storage period;
registering a data source;
according to the data access mode, the data updating period and the data storage period, a data set standard, a data item standard, a data element standard, a qualifier standard and a named entity standard are established for the data source;
performing data exploration on the access mode, the data scale, the business meaning, the data set table and the field of the data source, wherein the data exploration on the field comprises the steps of knowing the null value condition, the standard condition, the value range condition and the problem data condition of the field;
defining an ETL standardized processing process according to a data exploration result, wherein the data exploration result comprises null value conditions, standard conditions, value field conditions and problem data conditions of fields;
judging whether operators meeting the requirements exist in the ETL;
if the ETL has no operators meeting the requirements, customizing operators meeting the requirements through tool operators and scalar operators built in the system;
if the ETL has operators meeting the requirements, carrying out data cleaning and data conversion on the data by defining the ETL standardization processing process, and distributing the data after the data cleaning and data conversion into a library in a data organization module;
After distribution and warehousing, the standardized data are subjected to resource registration, and metadata of the data are enriched, wherein the metadata comprise technical metadata, management metadata and business metadata.
9. A data governance device, the device comprising a processor and a memory:
the memory is used for storing program codes and transmitting the program codes to the processor;
the processor is configured to execute the data governance method of claim 8 in accordance with instructions in the program code.
CN202110057150.6A 2021-01-15 2021-01-15 Data management system and method thereof Active CN112699175B (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN202110057150.6A CN112699175B (en) 2021-01-15 2021-01-15 Data management system and method thereof

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN202110057150.6A CN112699175B (en) 2021-01-15 2021-01-15 Data management system and method thereof

Publications (2)

Publication Number Publication Date
CN112699175A CN112699175A (en) 2021-04-23
CN112699175B true CN112699175B (en) 2024-02-13

Family

ID=75515369

Family Applications (1)

Application Number Title Priority Date Filing Date
CN202110057150.6A Active CN112699175B (en) 2021-01-15 2021-01-15 Data management system and method thereof

Country Status (1)

Country Link
CN (1) CN112699175B (en)

Families Citing this family (38)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN113051254A (en) * 2021-04-25 2021-06-29 中航机载系统共性技术有限公司 System file data structuring method based on database and Internet
CN112988850A (en) * 2021-04-27 2021-06-18 明品云(北京)数据科技有限公司 Article information analysis management method, system, equipment and medium
CN113312416B (en) * 2021-05-20 2022-09-09 成都美尔贝科技股份有限公司 Cross-data-center ETL tool
CN113297252A (en) * 2021-05-28 2021-08-24 北京信息科技大学 Data query service method with mode being unaware
CN113553425A (en) * 2021-06-28 2021-10-26 北京来也网络科技有限公司 Data aggregation method, device, equipment and storage medium based on RPA and AI
CN113468257A (en) * 2021-07-05 2021-10-01 乐融致新电子科技(天津)有限公司 Data quality monitoring method and device based on data warehouse
CN113392076A (en) * 2021-07-08 2021-09-14 网银在线(北京)科技有限公司 Method, device, electronic equipment and medium for acquiring metadata quality information
CN113342786A (en) * 2021-08-02 2021-09-03 浩鲸云计算科技股份有限公司 Model management and control-based online data management and management method and system
CN113535707B (en) * 2021-08-05 2022-04-15 南京华飞数据技术有限公司 Method for managing personnel information data based on big data
CN113656370B (en) * 2021-08-16 2024-04-30 南方电网数字电网集团有限公司 Data processing method and device for electric power measurement system and computer equipment
CN113656608B (en) * 2021-08-18 2023-10-24 中国科学院软件研究所 Big data system and automatic data processing method for software defined satellite
CN113448951B (en) * 2021-09-02 2021-12-21 深圳市信润富联数字科技有限公司 Data processing method, device, equipment and computer readable storage medium
CN113506098A (en) * 2021-09-10 2021-10-15 国能信控互联技术有限公司 Power plant metadata management system and method based on multi-source data
CN113778967B (en) * 2021-09-14 2024-03-12 中国环境科学研究院 Yangtze river basin data acquisition processing and resource sharing system
CN113626447B (en) * 2021-10-12 2022-02-22 民航成都信息技术有限公司 Civil aviation data management platform and method
CN113641663B (en) * 2021-10-19 2022-01-18 北京金鸿睿信息科技有限公司 Big data management method and system based on DAMA theory
CN113871018A (en) * 2021-10-21 2021-12-31 卫宁健康科技集团股份有限公司 Medical data management method, system and computer equipment based on metadata model
CN114022114B (en) * 2021-11-03 2022-07-15 广州智算信息技术有限公司 Data management system and method based on telecommunication industry
CN113901042A (en) * 2021-12-10 2022-01-07 西安中电环通数字科技有限公司 Ecological environment data dynamic activity level library and terminal
CN114254081B (en) * 2021-12-22 2024-06-04 中冶赛迪信息技术(重庆)有限公司 Enterprise big data search system, method and electronic equipment
CN114298550A (en) * 2021-12-28 2022-04-08 安徽海螺信息技术工程有限责任公司 Method for treating cement production operation data
CN114297283B (en) * 2021-12-29 2024-07-12 厦门安胜网络科技有限公司 Metadata-driven data security management method and system
CN114416714B (en) * 2022-01-18 2022-09-02 军事科学院系统工程研究院后勤科学与技术研究所 Data management system
CN114925045B (en) * 2022-04-11 2024-05-03 杭州半云科技有限公司 PaaS platform for big data integration and management
CN114925048A (en) * 2022-04-25 2022-08-19 上海杰狮信息技术有限公司 Natural resource full life cycle management method based on natural resource code and storage medium
CN116821104B (en) * 2022-08-18 2024-07-16 钟漍标 Industrial Internet data processing method and system based on big data
CN116226894B (en) * 2023-05-10 2023-08-04 杭州比智科技有限公司 Data security treatment system and method based on meta bin
CN116303408A (en) * 2023-05-24 2023-06-23 中数通信息有限公司 DAMA data frame-based data governance process management method and system
CN116450620B (en) * 2023-06-12 2023-09-12 中国科学院空天信息创新研究院 Database design method and system for multi-source multi-domain space-time reference data
CN116932515A (en) * 2023-08-01 2023-10-24 北京健康在线技术开发有限公司 Data management method, device, equipment and medium for realizing data decoupling of production system
CN116821428B (en) * 2023-08-29 2023-11-07 成都智慧锦城大数据有限公司 Intelligent business data storage method and system based on data center
CN117033449B (en) * 2023-10-09 2023-12-15 北京中科闻歌科技股份有限公司 Data processing method based on kafka stream, electronic equipment and storage medium
CN117762954A (en) * 2023-11-17 2024-03-26 深圳市前海数据服务有限公司 Automatic data management method
CN117573759A (en) * 2023-12-11 2024-02-20 中国电子投资控股有限公司 Multi-source heterogeneous data management system, management device and management method
CN117520352A (en) * 2024-01-02 2024-02-06 贵州航天云网科技有限公司 Multi-source heterogeneous data management system and method for complex industrial process
CN117707026B (en) * 2024-02-05 2024-06-07 中铁四局集团有限公司 Scene linkage platform based on multi-source heterogeneous system and construction method thereof
CN117785983A (en) * 2024-02-20 2024-03-29 四川大学华西医院 Target object evaluation method, system, electronic device and storage medium
CN118174971B (en) * 2024-05-15 2024-07-19 中国信息通信研究院 Multi-source heterogeneous data management method and system for network threat

Citations (2)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN110765337A (en) * 2019-11-15 2020-02-07 中科院计算技术研究所大数据研究院 Service providing method based on internet big data
CN112199433A (en) * 2020-10-28 2021-01-08 云赛智联股份有限公司 Data management system for city-level data middling station

Family Cites Families (1)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US10459881B2 (en) * 2015-02-27 2019-10-29 Podium Data, Inc. Data management platform using metadata repository

Patent Citations (2)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN110765337A (en) * 2019-11-15 2020-02-07 中科院计算技术研究所大数据研究院 Service providing method based on internet big data
CN112199433A (en) * 2020-10-28 2021-01-08 云赛智联股份有限公司 Data management system for city-level data middling station

Also Published As

Publication number Publication date
CN112699175A (en) 2021-04-23

Similar Documents

Publication Publication Date Title
CN112699175B (en) Data management system and method thereof
CN110019176B (en) Data management control system for improving success rate of data management service
CN109522312B (en) Data processing method, device, server and storage medium
Stvilia et al. A framework for information quality assessment
US9594823B2 (en) Data relationships storage platform
US7623675B2 (en) Video data management using encapsulation assets
CN110119395B (en) Method for realizing association processing of data standard and data quality based on metadata in big data management
CN112199433A (en) Data management system for city-level data middling station
CN110414802A (en) Conglomerate Analysis of Policy Making flight deck system
CN115617776A (en) Data management system and method
CN115222374A (en) Government affair data service system based on big data processing
Jianmin et al. An improved join‐free snowflake schema for ETL and OLAP of data warehouse
CN104820700B (en) Processing method of unstructured data of transformer substation
Bhuyan et al. Crime predictive model using big data analytics
CN111538720A (en) Method and system for cleaning basic data in power industry
CN116910023A (en) Data management system
CN113495978A (en) Data retrieval method and device
Burgard et al. Data warehouse and business intelligence systems in the context of e-HRM
Oreščanin et al. Managing Personal Identifiable Information in Data Lakes
Ramesh et al. A comparative study of data mining tools and techniques for business intelligence
Kvet et al. Temporal context manager
Wei et al. A method and application for constructing a authentic data space
CN111858598A (en) Mass data comprehensive management system and method
Zgolli et al. Metadata in data lake ecosystems
WO2021034329A1 (en) Data set signatures for data impact driven storage management

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination
GR01 Patent grant
GR01 Patent grant