CN111917887A - System for realizing data governance under big data environment - Google Patents

System for realizing data governance under big data environment Download PDF

Info

Publication number
CN111917887A
CN111917887A CN202010825054.7A CN202010825054A CN111917887A CN 111917887 A CN111917887 A CN 111917887A CN 202010825054 A CN202010825054 A CN 202010825054A CN 111917887 A CN111917887 A CN 111917887A
Authority
CN
China
Prior art keywords
data
service
engine
management
sharing
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Pending
Application number
CN202010825054.7A
Other languages
Chinese (zh)
Inventor
徐明明
顾伟
丁雅娜
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Primeton Information Technology Co ltd
Original Assignee
Primeton Information Technology Co ltd
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Primeton Information Technology Co ltd filed Critical Primeton Information Technology Co ltd
Priority to CN202010825054.7A priority Critical patent/CN111917887A/en
Publication of CN111917887A publication Critical patent/CN111917887A/en
Pending legal-status Critical Current

Links

Images

Classifications

    • HELECTRICITY
    • H04ELECTRIC COMMUNICATION TECHNIQUE
    • H04LTRANSMISSION OF DIGITAL INFORMATION, e.g. TELEGRAPHIC COMMUNICATION
    • H04L67/00Network arrangements or protocols for supporting network services or applications
    • H04L67/50Network services
    • H04L67/51Discovery or management thereof, e.g. service location protocol [SLP] or web services
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F21/00Security arrangements for protecting computers, components thereof, programs or data against unauthorised activity
    • G06F21/60Protecting data
    • G06F21/602Providing cryptographic facilities or services
    • HELECTRICITY
    • H04ELECTRIC COMMUNICATION TECHNIQUE
    • H04LTRANSMISSION OF DIGITAL INFORMATION, e.g. TELEGRAPHIC COMMUNICATION
    • H04L41/00Arrangements for maintenance, administration or management of data switching networks, e.g. of packet switching networks
    • H04L41/06Management of faults, events, alarms or notifications
    • H04L41/0631Management of faults, events, alarms or notifications using root cause analysis; using analysis of correlation between notifications, alarms or events based on decision criteria, e.g. hierarchy, tree or time analysis
    • HELECTRICITY
    • H04ELECTRIC COMMUNICATION TECHNIQUE
    • H04LTRANSMISSION OF DIGITAL INFORMATION, e.g. TELEGRAPHIC COMMUNICATION
    • H04L41/00Arrangements for maintenance, administration or management of data switching networks, e.g. of packet switching networks
    • H04L41/50Network service management, e.g. ensuring proper service fulfilment according to agreements
    • H04L41/5041Network service management, e.g. ensuring proper service fulfilment according to agreements characterised by the time relationship between creation and deployment of a service
    • HELECTRICITY
    • H04ELECTRIC COMMUNICATION TECHNIQUE
    • H04LTRANSMISSION OF DIGITAL INFORMATION, e.g. TELEGRAPHIC COMMUNICATION
    • H04L67/00Network arrangements or protocols for supporting network services or applications
    • H04L67/01Protocols
    • H04L67/06Protocols specially adapted for file transfer, e.g. file transfer protocol [FTP]

Abstract

The invention relates to a system for realizing data management in a big data environment, which comprises an environment initialization module, a data sharing platform, a data management module and a data management module, wherein the environment initialization module is used for providing basic operation conditions which need to be met by the construction of the data sharing platform, namely a data sharing platform product assembly, safety and a hardware environment; the data sharing design module comprises a framework layer, a core scene and a service sharing and publishing unit and is used for providing services to the outside through the framework layer; and the data management application module is used for establishing a data sharing integrated platform and integrating the functions of resources, job management, data development, data use, service engine management, scheduling plan, statistical monitoring and service authority configuration. By adopting the system for realizing data management in the big data environment, the data sharing integrated platform at the bottom layer of each business system is established, so that the platform can run through and manage the data of each business system, the data exchange and sharing among the respective business systems in the government and the enterprise are achieved, the aim of data management is realized, and the value of the data is improved.

Description

System for realizing data governance under big data environment
Technical Field
The invention relates to the field of data asset management, in particular to the field of data management, and specifically relates to a system for realizing data management in a big data environment.
Background
Under the current large background that the data volume is increased greatly, the demands of governments and enterprises on data service and treatment are very urgent, the data management of headquarters, branches and subsidiaries is loose, the internal system acquires data too variously, the management modes are multiple, no uniform standard exists, sharing cannot be realized, and particularly, the problems of providing data management and data service in a business system and how to efficiently and high-value use data are solved.
The data service sharing platform is positioned in a sharing channel of longitudinally through and transversely interconnected enterprise data resources, so that the data service sharing platform becomes a data factory of enterprises, organizations and departments, and mainly starts from a series of problems of safety control, service management, service consumption, service development efficiency and the like, a unified platform is provided for unified service control over multi-source and multi-type data, and the enterprises are helped to use the data more effectively and reliably. The data sharing platform is an important mode for processing data exchange, is a management mode for planning and arranging data and providing the data, comprises the flow and trend of the whole data, and realizes data value through effective management.
At present, data are used in governments and enterprises, and through years of informatization construction and operation in respective data systems, the enterprises establish perfect business application systems, so that innovation and development of core business are effectively supported. The client data information is scattered, and the client data information among the subsidiaries is not really shared. The internal system obtains the client data and has dispersed source systems, various modes are difficult to manage, the timeliness for obtaining the client data is low, the data standard is not uniform, a uniform client data service platform is lacked, along with the increase of application systems, the data quantity and the data application environment are increased, and the problems of unreasonableness and non-uniformity gradually exist in the process of using the data. Many enterprises do not have unified data asset standards at present, data quality in each business system is uneven, an information island phenomenon exists, data with the same name in different departments can have different meanings, the same data can have different names, and effective interaction and sharing of the data have problems.
Along with the increase of data accumulation, the problem that partial system data is not updated timely exists, core business data cannot be retrieved, the accuracy and timeliness of the data are low, almost all reports of the existing reports need to be modeled repeatedly during modeling, manual participation is excessive, the levels are complex, processes and indexes cannot be monitored and analyzed accurately and efficiently, and the utilization efficiency of the data and the reuse rate of the models are low. The business personnel can not fully and deeply master the complex system of the company, especially the technical level. In order to enable a business department to better control from a data structure to data quality, the method for combing the structure relationship between a business system and a database becomes one of the problems which are urgently needed to be solved at present, and solves the problem of resource waste to a certain extent.
Disclosure of Invention
The invention aims to overcome the defects of the prior art and provides a system for realizing data management in a big data environment, which has the advantages of high efficiency, simple and convenient operation and wide application range.
In order to achieve the purpose, the system for realizing data governance in the big data environment comprises the following steps:
the system for realizing data governance in the big data environment is mainly characterized by comprising the following components:
the environment initialization module is used for providing basic operation conditions which need to be met by the data sharing platform, namely data sharing platform product components, safety and hardware environment;
the data sharing design module is connected with the environment initialization module and comprises a framework layer, a core scene and a service sharing and publishing unit, and the framework layer, the core scene and the service sharing and publishing unit are connected with the environment initialization module and are used for providing services to the outside through the framework layer;
and the data management application module is connected with the data sharing design module and used for establishing a data sharing integrated platform and realizing data exchange and data promotion by integrating the functions of resources, job management, data development, data use, service engine management, scheduling plan, statistical monitoring and service authority configuration.
Preferably, the data sharing platform product component includes a management and control platform, a service engine, a service state monitoring, a visualization development tool, a transmission agent, a scheduling engine, a data publishing engine and a resource collecting client.
The management and control platform is used for providing functions of resource catalogue, data development, data use, service engine management, scheduling plan, statistical monitoring and service authority configuration and carrying out unified registration management on data resources and service resources;
the service engine is used for analyzing the batch operation model, executing the batch operation and transmitting the file, and providing multi-protocol and multi-data source adaptive support;
the service state monitoring is used for providing log analysis and monitoring capability and providing background support for functions of early warning, warning in the process and statistic analysis after the process;
the visual development tool is used for providing functions of visual batch operation model definition and debugging, visual model performance monitoring, metadata management and data processing model deployment;
the transmission agent is used for supporting data extraction of data sources such as databases, big data and the like, supporting one-to-one and one-to-many transmission of any node and any size file, and supporting a file transmission strategy;
the scheduling engine is used for supporting serial and parallel scheduling of jobs and job flows, providing scheduling of various rules and providing diversified scheduling modes for operation of the jobs and the job flows;
the data publishing engine is used for providing data service publishing and data access capabilities and supporting real-time interface service publishing in the form of a single table and a result set;
and the resource acquisition client is used for acquiring the indexes of the CPU, the memory, the disk and the network of the physical machine where each engine is located in real time.
Preferably, the environment initialization module ensures security through service access control, data encryption and desensitization, access based on a security protocol.
Preferably, the service access control accesses the data service through the spring cloud Gateway, and the data service is controlled through a token authorization, an IP white list, an access frequency and an access flow multiple interceptor in sequence, so that the service access safety is ensured.
Preferably, the data encryption and desensitization are controlled by data encryption, data desensitization and row-column-level authority, so that the data access security is ensured.
Preferably, the architecture layer comprises a functional architecture and a technical architecture, the functional architecture comprises a support engine and a functional module, and the technical architecture comprises a resource layer, an access layer, a logic layer, a service providing layer and a presentation layer.
Preferably, the shared publication provides services to the outside through a web service, a file service and a JDBC service, and the provided services include data service publication, fault alarm, data quality check and data service monitoring.
Preferably, the support engine comprises a real-time service engine, a batch service engine, a scheduling engine and a log engine.
Preferably, the functional modules include resource directory, data service distribution, data usage and security and data service monitoring.
Preferably, the data service release is a real-time service release and a batch service release respectively.
Preferably, the fault alarm is checked according to a definition rule by an event mode, and alarms are performed when the rule is satisfied, wherein the alarms respectively comprise a service engine alarm, a service state alarm and a service quality alarm.
Preferably, the data service monitoring provides a capability of checking the quality of the data resources, quality checking is performed before, during and after the data resources are checked, the data quality is guaranteed, and checking according to a custom rule is supported.
The system for realizing data governance in the big data environment, the method for realizing data governance sharing based on the data island problem in the government and the enterprise, which is disclosed by the invention, provides uniform data service capability based on a big data architecture, is a sharing channel which is open to the outside for enterprise data resources, enables the value of the data resources to be changed, and enables the platform to penetrate through and manage data of each business system by establishing a data sharing integrated platform at the bottom layer of each business system, thereby achieving the data exchange and sharing between each business system in the government and the enterprise, realizing the goal of data governance and improving the value of the data.
Drawings
FIG. 1 is a schematic structural diagram of a system for implementing data governance in a big data environment according to the present invention.
FIG. 2 is a schematic diagram of an embodiment of a system for implementing data governance in a big data environment according to the present invention.
Detailed Description
In order to more clearly describe the technical contents of the present invention, the following further description is given in conjunction with specific embodiments.
The system for realizing data governance in the big data environment comprises the following components:
the environment initialization module is used for providing basic operation conditions which need to be met by the data sharing platform, namely data sharing platform product components, safety and hardware environment;
the data sharing design module is connected with the environment initialization module and comprises a framework layer, a core scene and a service sharing and publishing unit, and the framework layer, the core scene and the service sharing and publishing unit are connected with the environment initialization module and are used for providing services to the outside through the framework layer;
and the data management application module is connected with the data sharing design module and used for establishing a data sharing integrated platform and realizing data exchange and data promotion by integrating the functions of resources, job management, data development, data use, service engine management, scheduling plan, statistical monitoring and service authority configuration.
As a preferred embodiment of the present invention, the data sharing platform product component includes a management and control platform, a service engine, a service state monitoring, a visualization development tool, a transmission agent, a scheduling engine, a data publishing engine, and a resource collecting client.
The management and control platform is used for providing functions of resource catalogue, data development, data use, service engine management, scheduling plan, statistical monitoring and service authority configuration and carrying out unified registration management on data resources and service resources;
the service engine is used for analyzing the batch operation model, executing the batch operation and transmitting the file, and providing multi-protocol and multi-data source adaptive support;
the service state monitoring is used for providing log analysis and monitoring capability and providing background support for functions of early warning, warning in the process and statistic analysis after the process;
the visual development tool is used for providing functions of visual batch operation model definition and debugging, visual model performance monitoring, metadata management and data processing model deployment;
the transmission agent is used for supporting data extraction of data sources such as databases, big data and the like, supporting one-to-one and one-to-many transmission of any node and any size file, and supporting a file transmission strategy;
the scheduling engine is used for supporting serial and parallel scheduling of jobs and job flows, providing scheduling of various rules and providing diversified scheduling modes for operation of the jobs and the job flows;
the data publishing engine is used for providing data service publishing and data access capabilities and supporting real-time interface service publishing in the form of a single table and a result set;
and the resource acquisition client is used for acquiring the indexes of the CPU, the memory, the disk and the network of the physical machine where each engine is located in real time.
As a preferred embodiment of the invention, the environment initialization module ensures security through service access control, data encryption and desensitization, and access based on a security protocol.
As a preferred embodiment of the present invention, the service access control accesses the data service through the spring cloud Gateway, and sequentially controls token authorization, an IP white list, an access frequency, and an access traffic multiple interceptor, thereby ensuring the security of service access.
As a preferred embodiment of the invention, the data encryption and desensitization are controlled by data encryption, data desensitization and rank-level authority, so that the data access security is ensured.
As a preferred embodiment of the present invention, the architecture layer includes a functional architecture and a technical architecture, the functional architecture includes a support engine and a functional module, and the technical architecture includes a resource layer, an access layer, a logic layer, a service providing layer and a presentation layer.
As a preferred embodiment of the present invention, the shared publication provides services to the outside through a web service, a file service, and a JDBC service, and the provided services include data service publication, failure alarm, data quality check, and data service monitoring.
The support engine comprises a real-time service engine, a batch service engine, a scheduling engine and a log engine.
The function module comprises a resource directory, data service distribution, data use and security and data service monitoring as a preferred embodiment of the invention.
As a preferred embodiment of the present invention, the data service distribution is a real-time service distribution and a batch service distribution, respectively.
As a preferred embodiment of the present invention, the fault alarm is checked according to a defined rule by an event mode, and alarms are performed when the rule is satisfied, wherein the alarms respectively include a service engine alarm, a service state alarm and a service quality alarm.
As a preferred embodiment of the present invention, the data service monitoring provides a capability of quality check on data resources, performs quality check before, during, and after, respectively, ensures data quality, and supports checking according to a custom rule.
In a specific embodiment of the invention, an environment initialization module is a basic operation condition which is set up by a data sharing platform and needs to be met, and mainly refers to a data sharing component, security and a hardware environment, the data sharing platform component refers to related applications, data service sharing and issuing security is mainly considered from three aspects of service access control, data encryption and desensitization and access based on a security protocol, and the hardware environment mainly refers to an operating system, a database, a browser and the like which are operated; the data sharing design module comprises a technical architecture, a core scene and service sharing and publishing, wherein the architecture layer is mainly explained from the perspective of a functional architecture and the technical architecture, the functional architecture mainly refers to a support engine and a functional module, the technical architecture refers to the explanation from a resource layer, an access layer, a logic layer, a service providing layer and a display layer, and the sharing and publishing is the capability of enabling data resources to provide services to the outside in the forms of web service, file service, JDBC service and the like; the data management application module mainly refers to real number management on the basis of the environment initialization module and the data sharing design module, and the purpose of data exchange is achieved.
The invention is formed as shown in figure 1, the environment initialization module, the data sharing design module and the data governance application module are described in detail as follows:
firstly, an environment initialization module:
the environment initialization module is a basic operation condition which needs to be met and is set up by a data sharing platform, and mainly indicates a data sharing component, security and a hardware environment, the data sharing platform component refers to related applications, the data service sharing and issuing security mainly considers three aspects of service access control, data encryption and desensitization and access based on a security protocol, and the hardware environment mainly refers to an operating system, a database, a browser and the like which operate.
The introduction information is specifically described for the shared components as follows:
the data sharing platform product component mainly comprises: the system comprises a management and control platform, a service engine, a service state monitoring device, a visual development tool, a transmission agent, a scheduling engine, a data release engine and a resource acquisition client.
The management and control platform is a data service sharing unified management and control platform, provides a series of functions such as resource catalogue, data development, data use, service engine management, scheduling plan, statistical monitoring, service authority configuration and the like, and performs unified registration management on data resources and service resources.
The service engine is a batch job service and file transmission service engine and is responsible for batch job model analysis, batch job execution and file transmission service; providing multi-protocol and multi-data source adaptation support; and a high-performance and high-reliability operation environment is provided for service operation.
The service state monitoring provides log analysis and monitoring capability, and provides background support for functions of early warning, warning in the process, statistic analysis after the process and the like.
The visual development tool is a visual batch operation designer and provides functions of visual batch operation model definition and debugging, visual model performance monitoring, metadata management, data processing model deployment and the like.
The transmission agent is a data extraction and file transmission agent, supports data extraction of data sources such as databases and big data, supports one-to-one and one-to-many transmission of files with any node and any size, and supports various file transmission strategies such as compression, encryption and the like.
The scheduling engine is a job scheduling engine, supports serial and parallel scheduling of jobs and job flows, provides scheduling of various rules such as calendars, frequencies, events and the like, and provides diversified scheduling modes for operation of the jobs and the job flows.
The data publishing engine provides data service publishing and data access capabilities based on a SpringBoot architecture and supports real-time interface service publishing in the forms of a list, a result set and the like.
The resource acquisition client acquires indexes such as a CPU, an internal memory, a magnetic disk, a network and the like of a physical machine where each engine is located in real time and displays the indexes through Governor.
The data service sharing and releasing security mainly takes security consideration from three aspects of service access control, data encryption and desensitization and access based on a security protocol, and the description is as follows:
the service access control is that a consumer system accesses data service through a spring cloud Gateway and controls the data service through a token authorization, an IP white list, an access frequency and an access flow multiple interceptor in sequence, so that the service access safety is ensured.
Data encryption and desensitization are security components provided by a data service engine end, so that data access security is ensured, and the data encryption and desensitization are mainly controlled from the following aspects:
data encryption: providing a plurality of number encryption modes such as MD5, DES, AES, RSA and the like;
data desensitization: desensitizing any data of the field according to desensitization rules, such as conventional replacement, encryption replacement and the like;
rank-level authority: row level and column level data right control is provided for a consumer system.
The access based on the security protocol is transmitted based on HTTPS and SFTP security protocols in the data access process, so that the data is prevented from being stolen and tampered in the transmission process, and the integrity of the data is ensured.
The hardware environment mainly refers to an operating system, a database, a browser and the like, and the specific description is as follows:
the use of the hardware environment mainly meets certain performance requirements, and mainly refers to parameters in dimensions of a CPU, a memory, a hard disk, an operating system, a JDK, a database and a browser of the system.
Secondly, a data sharing design module:
the data sharing design module comprises a technical framework, a core scene and service sharing and publishing, wherein a framework layer is mainly explained from the perspective of a functional framework and the technical framework, the functional framework mainly refers to a support engine and a functional module, the technical framework refers to an explanation from a resource layer, an access layer, a logic layer, a service providing layer and a display layer, the sharing and publishing is the capability of enabling data resources to provide services to the outside in the forms of web service, file service, JDBC service and the like, and the sharing and publishing are mainly considered from four aspects of data service publishing, fault warning, data quality checking and data service monitoring.
The functional architecture mainly comprises four support engines and four high-power modules.
Four big support engines:
a real-time service engine: the real-time service issuing and accessing function is provided in a RESTful mode;
a batch service engine: the batch service issuing and accessing function is provided in a File mode;
a scheduling engine: providing a scheduling function for batch service;
a log engine: and collecting logs for index analysis.
Four high-power modules:
resource catalog: data consumers and data developers use the views;
and (3) data service release: the release management of real-time service and batch service is realized;
data use and security: defining a data application use process and data security management;
data service monitoring: and carrying out full link monitoring on the life cycle of the data service.
The technology stack involved in the development of the data service sharing platform is divided into five layers:
resource layer: performing automatic acquisition of technical metadata and management of service metadata on various data sources such as a mainstream relational database (Oracle, SQLServer and Mysql), big data (HBase and Hive), files and the like;
and an access layer: access interaction with a resource layer is realized based on communication protocols such as JDBC, HTTP, RPC, SFTP and the like;
a logic layer: providing adapters for different data sources of a resource layer, and providing reusable security, monitoring, scheduling and log components;
the service providing layer is used for providing services to the outside in a micro-service architecture mode based on the SpringBoot and the SpringCloud;
a display layer: the front-end page and the display effect are realized by adopting vue + iview + es6+ axios + EChats technology.
The shared publishing is a capability of enabling data resources to provide services to the outside through web services, file services, JDBC services and other forms, and is mainly considered from four aspects of data service publishing, fault warning, data quality check and data service monitoring:
data service releases provide two types of service releases: real-time service release and batch service release;
and (3) real-time service release: the DB, HBase and File data are issued into real-time service and provided in a RESTful mode;
and (3) batch service release: and publishing the DB and Hive data into a batch service, and providing the batch service in a file mode.
Three data type services are supported:
data source service: publishing the whole data source for service based on the data resource catalog;
single table service: issuing the selected list and field as a service based on the data resource directory;
result set service: and assembling the selected multiple tables and fields into a new result set based on the data resource catalog, and publishing the result set as a service according to the user-defined result set.
The fault alarm is checked according to a definition rule in an event mode, and when the rule is met, the fault alarm is given, and the notification is supported in an in-station information, mail and short message mode.
Service engine alerting: detecting the indexes of a service engine CPU and a memory, and giving an alarm when the indexes reach a threshold value;
service state alarm: detecting the service state in real time, and giving an alarm in time when the service stops running;
and (3) service quality alarm: monitoring the access abnormity and the response time of the service, and automatically giving an alarm when the access abnormity occurs or the response time reaches a configuration threshold value.
The data quality checking provides the capability of checking the quality of data resources in the whole link of sharing and releasing the data service, and the quality checking is carried out in advance, in advance and after, so that the data quality is ensured. And the check is supported according to the custom rule.
In advance: checking the main foreign key, the timestamp field, the data type and the like;
in the process: checking non-empty and repeated records;
after the fact: and checking timeliness, consistency and the like.
The data service monitoring is mainly considered from the aspects of asynchronous log landing, log reading and analyzing indexes, index storage, fault processing and the like.
Third, data governance application module
The data management application module is mainly used for carrying out real number management on the basis of the environment initialization module and the data sharing design module, penetrating and managing data of each business system by establishing a data sharing integrated platform at the bottom layer of each business system, and integrating a series of functions such as resource, operation management, data development, data use, service engine management, scheduling plan, statistical monitoring, service authority configuration and the like on the data management shared platform to achieve the purposes of data exchange and data value improvement.
The invention is described as a specific embodiment by using a data governance method in the electronic information industry.
In a data implementation case of an electronic information industry company, data information is identified through business research, partial data is extracted due to the fact that client production data have confidentiality requirements, the data are processed, real client data information does not appear, and the displayed data are processed data for explaining the method.
After obtaining the data sheet, the method according to the invention is mainly implemented as shown in fig. 2:
environment initialization module
The environment initialization module is a basic operation condition which needs to be met and is set up by a data sharing platform, and mainly indicates a data sharing component, security and a hardware environment, the data sharing platform component refers to related applications, the data service sharing and issuing security mainly considers three aspects of service access control, data encryption and desensitization and access based on a security protocol, and the hardware environment mainly refers to an operating system, a database, a browser and the like which operate.
In the implementation link, communication between applications needs to be added, which is mainly completed through configuration, and relates to respective configuration files of the applications.
The application can be deployed to the same machine or different machines according to the actual situation of the customer site, and the deployment situation of the embodiment is described here. The machine does not write the actual ip, but replaces it with a, b, c.
And (4) branch environment:
Figure BDA0002635844010000091
the hardware environment information of the three machines a, b and c is as follows:
Figure BDA0002635844010000101
the data sharing platform product component mainly comprises: the system comprises a management and control platform, a service engine, a service state monitoring device, a visual development tool, a transmission agent, a scheduling engine, a data release engine and a resource acquisition client. The management and control platform is a data service sharing unified management and control platform, provides a series of functions such as resource catalogue, data development, data use, service engine management, scheduling plan, statistical monitoring, service authority configuration and the like, and performs unified registration management on data resources and service resources. After communication configuration among applications is completed, a service engine, a service state monitoring tool, a visual development tool, a transmission agent, a scheduling engine, a data release engine and a resource acquisition client can work and operate after being normally started, and then technical metadata, service metadata and service metadata views are provided on a control platform through a resource directory, so that resources are easier to discover, a systematic resource directory management of partitions and sub-nodes is provided, data safety is protected, and data are quickly searched. Metadata information is acquired through an automatic acquisition and analysis means, and the registration input of technical, business and service metadata is established to mark data directions.
The nodes can be named according to specific services and can be increased according to actual service requirements, and the user nodes are as follows:
beijing, Shanghai, Guangzhou, Shenzhen, Shanxi, Shandong … …
Data managers and developers can register, modify and delete source regions, preposition regions, sharing regions and consumer database resources through nodes corresponding to the platform. The database resource management supports connection of Oracle, SQLServer and MySQL relational databases; the platform supports that the database resource nodes in the resource directory can be expanded, the database tables and the fields are displayed by layered preview, and the service description can be added.
The data manager realizes the classified management of the database, the big data, the Web service and the file resource according to the business theme and the business entity (the classification and the name can be defined according to the requirement). Its main functions include:
service theme management, including the functions of adding, modifying and deleting service themes;
service entity management, including the functions of adding, modifying and deleting service entities;
business entity resource allocation and resource viewing, wherein the resource allocation requirements can realize the association of a database, a file, an interface and big data;
supporting the retrieval of classified resources;
the resources allocated by the business entity support rapid service distribution.
The data service development of the data service sharing platform can be divided into online development and offline development. The off-line development is to develop the data model and the Web service only through Studio, and the off-line developed model only needs to be deployed through a platform, which is not described herein again.
The job types used in online job development are all generated by developing templates, the template development is to develop dbr or dsr template files through studio, the development process is the same as that of the ETL model, except that the template development has new label definitions for data sources and tables, such as: the data source is marked with% { source } when filled in, and the table is marked with [ sourcetablename ] when filled in. The variables of the data source and table may be mapped when the template is used in a development scenario. Current base templates contain the following categories: full quantity synchronization; comparing and synchronizing the full text; synchronizing the time stamps; database to file; data desensitization; synchronizing the file to the database; HBase is synchronized to a file; synchronizing the triggers; canaltohase real-time synchronization; DBTHBASE; MONGLEDBTOASE; monoglycaeseclle; DBTOHBASEDESENSITZATION, etc., besides the above templates, the development of different types of database type data exchange can be carried out according to the actual requirements of the project. By adopting the template online development, the use difficulty can be simplified, and professional developers are not required to develop and design the model. Due to the complex function of the system, the most core data exchange of online jobs is listed as follows.
Full quantity synchronization: the full-volume extraction is generally suitable for statistical analysis or service requirements without secondary updating, and the service system data source is directly extracted through the full-volume extraction once or for multiple times without any operation, although the full-volume data extraction mode is simpler, more direct and faster. Through the acquisition assembly in the system, the full amount of files in the database can be acquired at one time without increasing filtering conditions. The full-volume collection is suitable for the service requirement of small data traffic. This method cannot perform incremental data synchronization, and is not suitable for synchronization with large data volume.
Time stamp synchronization: the premise for incremental data extraction in this manner is that both the source and target databases must have time-stamped fields. The maximum time in the target database is read first, and then all data larger than the time is read from the source database by taking the time as a parameter. The timestamp-based approach requires a timestamp field in each table in the associated application system to record the modification time for each table. This approach does not affect the efficiency of the original application, but if the table does not have the timestamp field, it requires a large adjustment to the original system, and this approach cannot capture the operation data changes that are not caused by the application system. The advantages are that: the processing speed is high, and the data processing logic is relatively simple. The disadvantages are that the table structure of the source database without the timestamp field needs to be changed, and the source database is needed to maintain the timestamp field; data synchronization cannot be achieved because the deleted data cannot be acquired using the timestamp field.
Each data exchange template is developed according to the current market and customer requirements, is suitable for different scenes, can select the optimal template according to the actual project scene and the data requirements, and is used for meeting the requirements of the current market and customers for the next few newly-added unstructured data exchange templates.
Two, data sharing design module
The data sharing design module comprises a technical framework, a core scene and service sharing and publishing, wherein a framework layer is mainly explained from the perspective of a functional framework and the technical framework, the functional framework mainly refers to a support engine and a functional module, the technical framework refers to an explanation from a resource layer, an access layer, a logic layer, a service providing layer and a display layer, the sharing and publishing is the capability of enabling data resources to provide services to the outside in the forms of web service, file service, JDBC service and the like, and the sharing and publishing are mainly considered from four aspects of data service publishing, fault warning, data quality checking and data service monitoring.
The functional architecture mainly comprises four support engines and four high-power modules.
Four big support engines:
a real-time service engine: the real-time service issuing and accessing function is provided in a RESTful mode;
the method is mainly characterized in that in a management and control platform, unified and standard data collected in a resource management platform are issued through a service issuing engine and provided for a third party to use, safe management standards are used, and the safe management of the data and the full-link monitoring of a data service life cycle are realized through service monitoring from application to examination and approval and the control of row and column authorities.
The technology stack involved in the development of the data service sharing platform is divided into five layers and can provide different database exchanges for clients:
resource layer: performing automatic acquisition of technical metadata and management of service metadata on various data sources such as mainstream relational databases (Oracle, SQLServer, Mysql, postgrep, dm, mongledb, file), big data (HBase, Hive), files and the like;
and an access layer: access interaction with a resource layer is realized based on communication protocols such as JDBC, HTTP, RPC, SFTP and the like;
a logic layer: providing adapters for different data sources of a resource layer, and providing reusable security, monitoring, scheduling and log components;
the service providing layer is used for providing services to the outside in a micro-service architecture mode based on the SpringBoot and the SpringCloud;
a display layer: the front-end page and the display effect are realized by adopting vue + iview + es6+ axios + EChats technology.
The shared publishing is a capability of enabling data resources to provide services to the outside through web services, file services, JDBC services and other forms, and is mainly considered from four aspects of data service publishing, fault warning, data quality check and data service monitoring:
data service releases provide two types of service releases: real-time service release and batch service release;
and (3) real-time service release: the DB, HBase and File data are issued into real-time service and provided in a RESTful mode;
and (3) batch service release: and publishing the DB and Hive data into a batch service, and providing the batch service in a file mode.
Three data type services are supported:
data source service: publishing the whole data source for service based on the data resource catalog;
single table service: issuing the selected list and field as a service based on the data resource directory;
result set service: and assembling the selected multiple tables and fields into a new result set based on the data resource catalog, and publishing the result set as a service according to the user-defined result set.
The fault alarm is checked according to a definition rule in an event mode, and when the rule is met, the fault alarm is given, and the notification is supported in an in-station information, mail and short message mode.
Service engine alerting: detecting the indexes of a service engine CPU and a memory, and giving an alarm when the indexes reach a threshold value;
service state alarm: detecting the service state in real time, and giving an alarm in time when the service stops running;
and (3) service quality alarm: monitoring the access abnormity and the response time of the service, and automatically giving an alarm when the access abnormity occurs or the response time reaches a configuration threshold value.
The data quality checking provides the capability of checking the quality of data resources in the whole link of sharing and releasing the data service, and the quality checking is carried out in advance, in advance and after, so that the data quality is ensured. And the check is supported according to the custom rule.
In advance: checking the main foreign key, the timestamp field, the data type and the like;
in the process: checking non-empty and repeated records;
after the fact: and checking timeliness, consistency and the like.
The data service monitoring is mainly considered from the aspects of asynchronous log landing, log reading and analyzing indexes, index storage, fault processing and the like.
Third, data governance application module
The data management application module is mainly used for carrying out real number management on the basis of the environment initialization module and the data sharing design module, penetrating and managing data of each business system by establishing a data sharing integrated platform at the bottom layer of each business system, and integrating a series of functions such as resource, operation management, data development, data use, service engine management, scheduling plan, statistical monitoring, service authority configuration and the like on the data management shared platform to achieve the purposes of data exchange and data value improvement.
The field implementation personnel on the management and control platform comprises the following steps:
1. configuration of data resources
The method realizes connection with a data source, automatically collects metadata, relies on metadata for data exchange, and is essentially based on metadata exchange. Semi-structured and structured data are automatically collected. Metadata is descriptive information about the structure and meaning of data, data manipulating the data, and database systems, with the important goal of providing a comprehensive guide to data resources. The metadata not only defines data mode, source, extraction conversion rule and the like in data exchange, but also the operation of the whole data exchange system is based on the metadata, and the metadata links each loose component in the data exchange system to form an organic whole. The method comprises the steps of finishing business combing of core functions of departments and information resource combing corresponding to the core functions of the departments through automatic metadata acquisition, compiling department information resource catalogs, finding out what information resources exist and where the information resources exist, improving the information resource sharing degree, and establishing an information resource sharing mechanism and a management system. And combining the current data situation in the enterprise internal information system and the requirements of the enterprise business attribute and the technical attribute to form the business attribute and the technical attribute of the enterprise data standard, and formulating the effective and reasonable index data specification requirement.
2. Configuration switching mode
The novel data exchange platform provides various data exchange services such as data, message files and the like, can quickly establish an interactive data exchange and information sharing platform which spans a hardware platform, a database and an operating system, provides an open environment, supports various clients, databases, networks and communication protocols, and realizes data interaction with the databases, the files and web interfaces through visual configuration. The data exchange is organically combined with the individuality of the service logic, and the requirements of data integration and external data exchange are quickly responded. The novel data exchange platform provides various data exchange services such as data, message files and the like, can quickly establish an interactive data exchange and information sharing platform which spans a hardware platform, a database and an operating system, provides an open environment, supports various clients, databases, networks and communication protocols, and realizes data interaction with the databases, the files and web interfaces through visual configuration. The data exchange is organically combined with the individuality of the service logic, and the requirements of data integration and external data exchange are quickly responded.
3. Configuration exchange method
Structured and semi-structured data exchange mainly comprises: timestamp synchronization, full-text comparison synchronization, trigger synchronization, CDC increment synchronization, full-volume synchronization and the like.
4. Specified period exchange
Structured and semi-structured data exchange mainly comprises: timestamp synchronization, full text comparison synchronization, trigger synchronization, CDC increment synchronization, and full volume synchronization. The job is registered in the unified scheduling system before running, and after the registration is successful, the scheduling management of the scheduling system determines the execution sequence of the job according to the configured task plan to allocate resources. The job is registered in the unified scheduling system before running, and after the registration is successful, the scheduling management of the scheduling system determines the execution sequence of the job according to the configured task plan to allocate resources.
Scheduling includes the following:
the triggering mode is as follows: in the scheduling management, the operation triggering is carried out regularly according to the calendar and the frequency;
the operation sequence is as follows: after triggering, the operation can carry out operation sequencing and operation order adjustment according to the previously set data;
and (3) task planning: the task plan can carry out task scheduling according to the configured task execution cycle;
resource allocation: when scheduling is performed, a transmission task is executed by resource allocation according to the status of the registered job server.
Through product implementation, field implementers can provide overall monitoring, detailed plan monitoring and event monitoring on the whole by a data exchange platform through simple configuration on a control platform; the visual multi-dimensional operation running monitoring and perfect resource monitoring function is used for carrying out data monitoring and statistics on the operation and the nodes related to the operation. The method can count the scheduling log, the job execution log, the historical log, the exchanged data amount and the success and failure times of data exchange in the job exchange process, can ensure that the problems of the system are found in the first time, and can timely eliminate the problems, ensure the normal operation of the system and meet the data of customers.
The system for realizing data governance in the big data environment, the method for realizing data governance sharing based on the data island problem in the government and the enterprise, which is disclosed by the invention, provides uniform data service capability based on a big data architecture, is a sharing channel which is open to the outside for enterprise data resources, enables the value of the data resources to be changed, and enables the platform to penetrate through and manage data of each business system by establishing a data sharing integrated platform at the bottom layer of each business system, thereby achieving the data exchange and sharing between each business system in the government and the enterprise, realizing the goal of data governance and improving the value of the data.
In this specification, the invention has been described with reference to specific embodiments thereof. It will, however, be evident that various modifications and changes may be made thereto without departing from the broader spirit and scope of the invention. The specification and drawings are, accordingly, to be regarded in an illustrative rather than a restrictive sense.

Claims (12)

1. A system for realizing data governance in a big data environment is characterized by comprising:
the environment initialization module is used for providing basic operation conditions which need to be met by the data sharing platform, namely data sharing platform product components, safety and hardware environment;
the data sharing design module is connected with the environment initialization module and comprises a framework layer, a core scene and a service sharing and publishing unit, and the framework layer, the core scene and the service sharing and publishing unit are connected with the environment initialization module and are used for providing services to the outside through the framework layer;
and the data management application module is connected with the data sharing design module and used for establishing a data sharing integrated platform and realizing data exchange and data promotion by integrating the functions of resources, job management, data development, data use, service engine management, scheduling plan, statistical monitoring and service authority configuration.
2. The system for implementing data governance in a big data environment according to claim 1, wherein the data sharing platform product component comprises a management and control platform, a service engine, a service state monitoring, a visualization development tool, a transmission agent, a scheduling engine, a data publishing engine and a resource collection client.
The management and control platform is used for providing functions of resource catalogue, data development, data use, service engine management, scheduling plan, statistical monitoring and service authority configuration and carrying out unified registration management on data resources and service resources;
the service engine is used for analyzing the batch operation model, executing the batch operation and transmitting the file, and providing multi-protocol and multi-data source adaptive support;
the service state monitoring is used for providing log analysis and monitoring capability and providing background support for functions of early warning, warning in the process and statistic analysis after the process;
the visual development tool is used for providing functions of visual batch operation model definition and debugging, visual model performance monitoring, metadata management and data processing model deployment;
the transmission agent is used for supporting data extraction of data sources such as databases, big data and the like, supporting one-to-one and one-to-many transmission of any node and any size file, and supporting a file transmission strategy;
the scheduling engine is used for supporting serial and parallel scheduling of jobs and job flows, providing scheduling of various rules and providing diversified scheduling modes for operation of the jobs and the job flows;
the data publishing engine is used for providing data service publishing and data access capabilities and supporting real-time interface service publishing in the form of a single table and a result set;
and the resource acquisition client is used for acquiring the indexes of the CPU, the memory, the disk and the network of the physical machine where each engine is located in real time.
3. The system for implementing data governance in a big data environment according to claim 1, wherein said environment initialization module ensures security through service access control, data encryption and desensitization, access based on a security protocol.
4. The system for realizing data governance in a big data environment according to claim 3, wherein the service access control accesses the data service through a spring cloud Gateway, and is controlled sequentially through token authorization, an IP white list, an access frequency, and a multiple interceptor of an access flow, thereby ensuring the security of service access.
5. The system for realizing data governance in a big data environment according to claim 3, wherein data encryption and desensitization are controlled by data encryption, data desensitization and row-column-level permissions to ensure data access security.
6. The system for implementing data governance in a big data environment according to claim 1, wherein the architecture layer comprises a functional architecture and a technical architecture, the functional architecture comprises a support engine and a functional module, and the technical architecture comprises a resource layer, an access layer, a logic layer, a service provision layer and a presentation layer.
7. The system for implementing data governance in a big data environment according to claim 1, wherein the shared publishing provides services to the outside through web services, file services and JDBC services, and the services provided include data service publishing, fault alerting, data quality checking and data service monitoring.
8. The system for implementing data governance in a big data environment according to claim 6, wherein said support engine comprises a real-time service engine, a batch service engine, a scheduling engine and a logging engine.
9. The system for implementing data governance in a big data environment according to claim 6, wherein said functional modules include resource directory, data service distribution, data usage and security and data service monitoring.
10. The system for implementing data governance in a big data environment according to claim 7, wherein said data service releases are real-time service releases and batch service releases, respectively.
11. The system for implementing data management under big data environment of claim 7, wherein the failure alarm is checked by event according to defined rules, and alarms are performed when the rules are satisfied, wherein the alarms respectively include service engine alarm, service state alarm and service quality alarm.
12. The system for implementing data governance in a big data environment according to claim 7, wherein the data service monitoring provides data resource quality checking capability, quality checking is performed before, during and after the data resource quality checking capability, data quality is guaranteed, and checking according to a custom rule is supported.
CN202010825054.7A 2020-08-17 2020-08-17 System for realizing data governance under big data environment Pending CN111917887A (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN202010825054.7A CN111917887A (en) 2020-08-17 2020-08-17 System for realizing data governance under big data environment

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN202010825054.7A CN111917887A (en) 2020-08-17 2020-08-17 System for realizing data governance under big data environment

Publications (1)

Publication Number Publication Date
CN111917887A true CN111917887A (en) 2020-11-10

Family

ID=73278194

Family Applications (1)

Application Number Title Priority Date Filing Date
CN202010825054.7A Pending CN111917887A (en) 2020-08-17 2020-08-17 System for realizing data governance under big data environment

Country Status (1)

Country Link
CN (1) CN111917887A (en)

Cited By (14)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN112291265A (en) * 2020-11-17 2021-01-29 珠海大横琴科技发展有限公司 Data sharing method, device, server and storage medium
CN112507374A (en) * 2020-11-17 2021-03-16 贵州电网有限责任公司 Data responsibility management system
CN112613069A (en) * 2020-12-23 2021-04-06 国家电网有限公司大数据中心 Automatic desensitization method based on negative list data resources
CN112817564A (en) * 2021-01-07 2021-05-18 湖北智泽云创科技有限公司 Mala back-end rapid development framework system and method
CN112925767A (en) * 2021-03-03 2021-06-08 浪潮云信息技术股份公司 Multi-data-source dynamic data synchronization management method and system based on internet supervision
CN113051323A (en) * 2021-03-11 2021-06-29 江苏省生态环境监控中心(江苏省环境信息中心) Water environment big data exchange method
CN113486096A (en) * 2021-06-21 2021-10-08 上海百秋电子商务有限公司 Multi-library timing execution report data preprocessing and query method and system
CN113778967A (en) * 2021-09-14 2021-12-10 中国环境科学研究院 Yangtze river basin data acquisition processing and resource sharing system
CN113849503A (en) * 2021-09-10 2021-12-28 杭州未名信科科技有限公司 Open big data processing system, method and medium
CN114626822A (en) * 2022-03-22 2022-06-14 山东省国土测绘院 Full-link data integration method and system
CN114691784A (en) * 2022-06-01 2022-07-01 杭州量之智能科技有限公司 Sharing platform, sharing method, sharing equipment and storage medium for data governance
CN116126647A (en) * 2023-04-17 2023-05-16 南京飓风引擎信息技术有限公司 Data linkage analysis system suitable for digital enterprises
CN116483909A (en) * 2023-05-17 2023-07-25 杭州端点网络科技有限公司 Big data integration system
CN117149884A (en) * 2023-10-30 2023-12-01 华兴国创(北京)科技有限公司 Data processing transaction method

Citations (7)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN105516213A (en) * 2014-09-22 2016-04-20 上海宝景信息技术发展有限公司 Enterprise-oriented efficient integrated management control system for enterprises
CN106354833A (en) * 2016-08-31 2017-01-25 广东京信软件科技有限公司 Platform for achieving data management and sharing exchange on basis of B/S framework
US20170308714A1 (en) * 2016-04-26 2017-10-26 Adobe Systems Incorporated Data management for combined data using structured data governance metadata
CN108647217A (en) * 2017-12-27 2018-10-12 广东智政信息科技有限公司 Big data platform integrated management system based on safety supervision application
CN109981772A (en) * 2019-03-22 2019-07-05 西安电子科技大学 A kind of multiple domain data share exchange platform architecture based on block chain
CN110809017A (en) * 2019-08-16 2020-02-18 云南电网有限责任公司玉溪供电局 Data analysis application platform system based on cloud platform and micro-service framework
CN111209455A (en) * 2019-12-29 2020-05-29 横琴宝蓝科技有限公司 Visual data exchange management platform

Patent Citations (7)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN105516213A (en) * 2014-09-22 2016-04-20 上海宝景信息技术发展有限公司 Enterprise-oriented efficient integrated management control system for enterprises
US20170308714A1 (en) * 2016-04-26 2017-10-26 Adobe Systems Incorporated Data management for combined data using structured data governance metadata
CN106354833A (en) * 2016-08-31 2017-01-25 广东京信软件科技有限公司 Platform for achieving data management and sharing exchange on basis of B/S framework
CN108647217A (en) * 2017-12-27 2018-10-12 广东智政信息科技有限公司 Big data platform integrated management system based on safety supervision application
CN109981772A (en) * 2019-03-22 2019-07-05 西安电子科技大学 A kind of multiple domain data share exchange platform architecture based on block chain
CN110809017A (en) * 2019-08-16 2020-02-18 云南电网有限责任公司玉溪供电局 Data analysis application platform system based on cloud platform and micro-service framework
CN111209455A (en) * 2019-12-29 2020-05-29 横琴宝蓝科技有限公司 Visual data exchange management platform

Non-Patent Citations (2)

* Cited by examiner, † Cited by third party
Title
普元信息技术股份有限公司: "Primeton MetaCube7用户手册", 《HTTPS://MARKETPLACE.HUAWEICLOUD.COM/PRODUCT/00301-563110-0--0》 *
肖炯恩等: "大数据背景下的政府数据治理:共享机制、管理机制研究", 《科技管理研究》 *

Cited By (20)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN112291265B (en) * 2020-11-17 2022-02-18 珠海大横琴科技发展有限公司 Data sharing method, device, server and storage medium
CN112507374A (en) * 2020-11-17 2021-03-16 贵州电网有限责任公司 Data responsibility management system
CN112291265A (en) * 2020-11-17 2021-01-29 珠海大横琴科技发展有限公司 Data sharing method, device, server and storage medium
CN112613069A (en) * 2020-12-23 2021-04-06 国家电网有限公司大数据中心 Automatic desensitization method based on negative list data resources
CN112817564A (en) * 2021-01-07 2021-05-18 湖北智泽云创科技有限公司 Mala back-end rapid development framework system and method
CN112925767A (en) * 2021-03-03 2021-06-08 浪潮云信息技术股份公司 Multi-data-source dynamic data synchronization management method and system based on internet supervision
CN113051323B (en) * 2021-03-11 2023-09-01 江苏省生态环境监控中心(江苏省环境信息中心) Water environment big data exchange method
CN113051323A (en) * 2021-03-11 2021-06-29 江苏省生态环境监控中心(江苏省环境信息中心) Water environment big data exchange method
CN113486096A (en) * 2021-06-21 2021-10-08 上海百秋电子商务有限公司 Multi-library timing execution report data preprocessing and query method and system
CN113849503A (en) * 2021-09-10 2021-12-28 杭州未名信科科技有限公司 Open big data processing system, method and medium
CN113849503B (en) * 2021-09-10 2023-10-20 杭州未名信科科技有限公司 Open big data processing system, method and medium
CN113778967A (en) * 2021-09-14 2021-12-10 中国环境科学研究院 Yangtze river basin data acquisition processing and resource sharing system
CN113778967B (en) * 2021-09-14 2024-03-12 中国环境科学研究院 Yangtze river basin data acquisition processing and resource sharing system
CN114626822A (en) * 2022-03-22 2022-06-14 山东省国土测绘院 Full-link data integration method and system
CN114691784B (en) * 2022-06-01 2022-08-23 杭州量之智能科技有限公司 Sharing platform, sharing method, sharing equipment and storage medium for data governance
CN114691784A (en) * 2022-06-01 2022-07-01 杭州量之智能科技有限公司 Sharing platform, sharing method, sharing equipment and storage medium for data governance
CN116126647A (en) * 2023-04-17 2023-05-16 南京飓风引擎信息技术有限公司 Data linkage analysis system suitable for digital enterprises
CN116483909A (en) * 2023-05-17 2023-07-25 杭州端点网络科技有限公司 Big data integration system
CN117149884A (en) * 2023-10-30 2023-12-01 华兴国创(北京)科技有限公司 Data processing transaction method
CN117149884B (en) * 2023-10-30 2024-01-12 华兴国创(北京)科技有限公司 Data processing transaction method

Similar Documents

Publication Publication Date Title
CN111917887A (en) System for realizing data governance under big data environment
US11755628B2 (en) Data relationships storage platform
CN112396404A (en) Data center system
CN109902072A (en) A kind of log processing system
CN112685385A (en) Big data platform for smart city construction
CN109213819A (en) Information resource sharing system
EP2849097A2 (en) A method for operating storage resources in an in-memory warehouse system
CN112199433A (en) Data management system for city-level data middling station
Diotalevi et al. Collection and harmonization of system logs and prototypal Analytics services with the Elastic (ELK) suite at the INFN-CNAF computing centre
CN114218218A (en) Data processing method, device and equipment based on data warehouse and storage medium
CN113642299A (en) One-key generation method based on power grid statistical form
CN113626447B (en) Civil aviation data management platform and method
CN115617776A (en) Data management system and method
CN116010494A (en) Data exchange system supporting heterogeneous data sources
CN115794929A (en) Data management system and data management method for data mart
CN115169011A (en) Editing system and application system of airplane assembly outline
CN115858513A (en) Data governance method, data governance device, computer equipment and storage medium
Balliu et al. A big data analyzer for large trace logs
Martinviita Time series database in Industrial IoT and its testing tool
CN111538720B (en) Method and system for cleaning basic data of power industry
CN111291029B (en) Data cleaning method and device
CN106993032A (en) The embedded accurate communication cloud service platform applied based on mobile Internet
CN113094385B (en) Data sharing fusion platform and method based on software defined open tool set
CN116561213A (en) Real-time visualized operation platform of system based on big data
CN114547173A (en) Data warehouse construction method, device and equipment and computer storage medium

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination