CN112527774A - Data center building method and system and storage medium - Google Patents

Data center building method and system and storage medium Download PDF

Info

Publication number
CN112527774A
CN112527774A CN202011501793.7A CN202011501793A CN112527774A CN 112527774 A CN112527774 A CN 112527774A CN 202011501793 A CN202011501793 A CN 202011501793A CN 112527774 A CN112527774 A CN 112527774A
Authority
CN
China
Prior art keywords
data
management
metadata
standard
model
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Pending
Application number
CN202011501793.7A
Other languages
Chinese (zh)
Inventor
张培
罗静
庞辛酉
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
CRSC Institute of Smart City Research and Design Co Ltd
Original Assignee
CRSC Institute of Smart City Research and Design Co Ltd
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by CRSC Institute of Smart City Research and Design Co Ltd filed Critical CRSC Institute of Smart City Research and Design Co Ltd
Priority to CN202011501793.7A priority Critical patent/CN112527774A/en
Publication of CN112527774A publication Critical patent/CN112527774A/en
Pending legal-status Critical Current

Links

Images

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F16/00Information retrieval; Database structures therefor; File system structures therefor
    • G06F16/20Information retrieval; Database structures therefor; File system structures therefor of structured data, e.g. relational data
    • G06F16/21Design, administration or maintenance of databases
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F11/00Error detection; Error correction; Monitoring
    • G06F11/30Monitoring
    • G06F11/3003Monitoring arrangements specially adapted to the computing system or computing system component being monitored
    • G06F11/302Monitoring arrangements specially adapted to the computing system or computing system component being monitored where the computing system component is a software system
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F11/00Error detection; Error correction; Monitoring
    • G06F11/30Monitoring
    • G06F11/3065Monitoring arrangements determined by the means or processing involved in reporting the monitored data
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F16/00Information retrieval; Database structures therefor; File system structures therefor
    • G06F16/20Information retrieval; Database structures therefor; File system structures therefor of structured data, e.g. relational data
    • G06F16/21Design, administration or maintenance of databases
    • G06F16/211Schema design and management
    • G06F16/212Schema design and management with details for data modelling support
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F16/00Information retrieval; Database structures therefor; File system structures therefor
    • G06F16/20Information retrieval; Database structures therefor; File system structures therefor of structured data, e.g. relational data
    • G06F16/21Design, administration or maintenance of databases
    • G06F16/215Improving data quality; Data cleansing, e.g. de-duplication, removing invalid entries or correcting typographical errors
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F16/00Information retrieval; Database structures therefor; File system structures therefor
    • G06F16/20Information retrieval; Database structures therefor; File system structures therefor of structured data, e.g. relational data
    • G06F16/23Updating
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F16/00Information retrieval; Database structures therefor; File system structures therefor
    • G06F16/20Information retrieval; Database structures therefor; File system structures therefor of structured data, e.g. relational data
    • G06F16/24Querying
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F16/00Information retrieval; Database structures therefor; File system structures therefor
    • G06F16/20Information retrieval; Database structures therefor; File system structures therefor of structured data, e.g. relational data
    • G06F16/25Integrating or interfacing systems involving database management systems
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F16/00Information retrieval; Database structures therefor; File system structures therefor
    • G06F16/20Information retrieval; Database structures therefor; File system structures therefor of structured data, e.g. relational data
    • G06F16/25Integrating or interfacing systems involving database management systems
    • G06F16/254Extract, transform and load [ETL] procedures, e.g. ETL data flows in data warehouses
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F16/00Information retrieval; Database structures therefor; File system structures therefor
    • G06F16/20Information retrieval; Database structures therefor; File system structures therefor of structured data, e.g. relational data
    • G06F16/26Visual data mining; Browsing structured data
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F2201/00Indexing scheme relating to error detection, to error correction, and to monitoring
    • G06F2201/80Database-specific techniques

Abstract

The invention relates to a data middlebox building method, a data middlebox building system and a storage medium, wherein the method comprises the following steps of: a flow management layer is set up and used for standardizing and processing the data management and application processes so as to control all links of the data full life cycle; an asset management layer is set up and used for managing metadata and data to form an enterprise data asset map so as to know data sources, transmission and storage modes; and constructing a quality management layer for combing and analyzing data quality problems and establishing a data standard. The invention provides a systematized data management method for the full life cycle of application aiming at the acquisition, calculation, storage and management of mass data, so that a manager of data application can insights the complex dependency among data, application and a system, and further effectively manage the data.

Description

Data center building method and system and storage medium
Technical Field
The invention relates to a data middlebox building method, a data middlebox building system and a storage medium based on an asset management system, and relates to the technical field of information systems and big data.
Background
With the explosion of digital economy, research and practice of data assets are receiving more and more attention, and many enterprises have started to build their own data middleboxes for better managing and applying their own data assets. If a data middle station is not set up, a large amount of data exists in a dispersed business system in an isolated island form, so that the unified management of business data is not facilitated, precious data cannot be converted into data assets, and important business analysis and guide functions of the data assets are exerted. The establishment of the enterprise data middle platform can reduce the cost of IT system construction and interaction between systems, and quickly respond to the business requirements of the front end.
Although the concept of the data center is well-established, and many enterprises have gone on the way of the attempt of the data center building, a mature theory is not basically formed, and the building method of the data center is not summarized as an effective method. At present, most of data middle platform construction methods are limited by the traditional PaaS thinking, most of data middle platform construction methods are light and standard due to heavy platform construction, and are deficient in a soft scientific layer.
Disclosure of Invention
In view of the above problems, an object of the present invention is to provide a data center building method, system and storage medium capable of collecting, calculating, storing and managing mass data, so that the data can be effectively managed.
In order to achieve the purpose, the invention adopts the following technical scheme:
in a first aspect, the present invention provides a method for building a data center, including:
a flow management layer is set up and used for standardizing and processing the data management and application processes so as to control all links of the data full life cycle;
an asset management layer is set up and used for managing metadata and data to form an enterprise data asset map so as to know data sources, transmission and storage modes;
and constructing a quality management layer for combing and analyzing data quality problems and establishing a data standard.
The data center building method further includes building a flow management layer including a data scheduling flow, a data monitoring flow, a data management flow and a data alarm flow, and specifically includes: the data scheduling process is used for providing a scheduling method triggered by calendar, frequency and events; the data monitoring process is used for providing monitoring modes of job flow, job, event, plan and log, and each monitoring mode adopts a list and/or a graph; and (3) data management flow: the system is used for supporting the unified management of the operation of an ETL tool, a script, a storage process and a runnable program; and (3) data alarm flow: for providing alerts and having an error notification mechanism.
The data center building method further comprises the steps that the asset management layer building comprises metadata management, and the metadata management comprises metadata management maintenance, metadata inquiry and metadata maintenance;
the metadata management maintenance is used for adding, deleting, updating and inquiring metadata types;
the metadata query is used for uniformly storing all metadata examples and metadata example relations into a table through a database engine by utilizing the storage and calculation characteristics that big data has no fixed column, can be transversely expanded and is concurrent in real time;
the metadata maintenance is used for performing maintenance management on the metadata which is already published.
The data center building method further comprises the steps that the asset management layer building further comprises data management, and the data management comprises data base management, data model management and data acquisition management;
the data base management is used for managing, inquiring and maintaining data;
the data model management comprises data model management, model relation maintenance and model import/export; the data model management unifies data views of enterprises by designing a data model, defines requirements of business departments on data information, constructs an atomic layer basis of a data warehouse and initializes attribution of business data; the model relation maintenance is used for checking and maintaining the data model after the data model is designed, and the data model conforms to the model relation integrity constraint; model import and export are used for importing and exporting the data model after the data model is built;
the data acquisition management comprises adapter management, acquisition management and scheduling management; the adapter management is characterized in that the data adapter takes a processing engine as a core, and the built-in services of the processing engine comprise data processing functions of data collection, data cleaning, data filling and data format translation; collecting and managing, wherein data collection comprises log collection and data source data synchronization; scheduling management, wherein the data adapter provides a real-time monitoring tool to monitor and manage the system, and a system administrator can find system faults and abnormal running conditions at the first time and process the system faults and the abnormal running conditions in time; meanwhile, the data processing condition can be subjected to statistical analysis and timely mastered.
The data center building method further comprises the following data model designing steps:
designing a conceptual model: defining system boundary, determining subject domain and its content;
designing a logic model: determining a dimension modeling method and organizing data;
designing a physical model: the logical models of the data warehouse are physically organized into a database.
The data center building method further comprises the following steps that the building of the quality management layer comprises metadata inspection, and the metadata inspection comprises the following steps: checking consistency of metadata; metadata attribute checking; and generating a metadata quality report and a metadata normative check.
The data center building method further comprises the step of building a data standard by the quality management layer, wherein the data standard building process comprises the following steps:
and (3) standard planning: combing out the whole range of data standard construction, defining a data standard system frame and classification, and making an implementation plan of the data standard;
and (3) standard compilation: determining each classification data standard template, compiling the data standard and forming a data standard primary draft;
and (3) standard review release: revising and perfecting the data standard to form a formal data standard;
standard landing enforcement and maintenance enhancements: tracking and evaluating the standard landing condition, establishing a corresponding management flow for the standard change, and performing standard version management.
In a second aspect, the present invention further provides a data center building system, including:
the flow management layer building unit is used for building a flow management layer, standardizing and streamlining the data management and application process, and controlling all links of the full life cycle of the data;
the asset management layer building unit is used for building an asset management layer for managing metadata and data to form an enterprise data asset map so as to know data sources, transmission and storage modes;
and the quality management layer building unit is used for building a quality management layer, combing and analyzing data quality problems and building a data standard.
In a third aspect, the present invention further provides an electronic device, where the electronic device at least includes a processor and a memory, and a computer program is stored in the memory, and is characterized in that the processor executes the computer program when running the computer program to implement the data center building method according to the first aspect of the present invention.
In a fourth aspect, the present invention further provides a computer storage medium having computer-readable instructions stored thereon, where the computer-readable instructions are executable by a processor to implement the data center building method according to the first aspect of the present invention.
Due to the adoption of the technical scheme, the invention has the following advantages:
1. the invention provides a systematic data platform construction method for a full life cycle of application aiming at massive data acquisition, calculation, storage and management, which manages a series of items such as data definition, format, value range, business rule, processing logic, safety authority, processing dependency relationship among data and the like of each link, and a data user can clearly know the relationship between the data and the data through a data middle platform; in addition, a manager of the data application can insights on complex dependency relationships among data, applications and systems, and the data is effectively managed;
2. along with surging mass data, the urgency of enterprises to efficient data management is increasingly strengthened, and the built data center station can perform efficient data management, so that the management of the data is more standard, the application is more efficient, and the supported decision is more accurate;
3. metadata management in the built data center system is an important component, is an important basis for enterprises to realize data assets and asset service, has a million-of-thousands relationship with data safety, data quality, data architecture, data model and the like under a large data management environment, and is also a bridge for service and technology intercommunication; in addition, the invention also incorporates data standard management into the data asset management system, establishes data standard for the subject and object of the activity, and uses the data model to depict the logical relationship between the objects;
in conclusion, the invention provides a more systematic method theory, which makes the construction and management of the data center have a basis.
Drawings
Various other advantages and benefits will become apparent to those of ordinary skill in the art upon reading the following detailed description of the preferred embodiments. The drawings are only for purposes of illustrating the preferred embodiments and are not to be construed as limiting the invention. Like reference numerals refer to like parts throughout the drawings. In the drawings:
fig. 1 is an architecture diagram of a data center system according to an embodiment of the present invention.
Detailed Description
Exemplary embodiments of the present invention will be described in more detail below with reference to the accompanying drawings. While exemplary embodiments of the invention are shown in the drawings, it should be understood that the invention can be embodied in various forms and should not be limited to the embodiments set forth herein. Rather, these embodiments are provided so that this disclosure will be thorough and complete, and will fully convey the scope of the invention to those skilled in the art.
It is to be understood that the terminology used herein is for the purpose of describing particular example embodiments only, and is not intended to be limiting. As used herein, the singular forms "a", "an" and "the" may be intended to include the plural forms as well, unless the context clearly indicates otherwise. The terms "comprises," "comprising," "including," and "having" are inclusive and therefore specify the presence of stated features, steps, operations, elements, and/or components, but do not preclude the presence or addition of one or more other features, steps, operations, elements, components, and/or groups thereof. The method steps, processes, and operations described herein are not to be construed as necessarily requiring their performance in the particular order described or illustrated, unless specifically identified as an order of performance. It should also be understood that additional or alternative steps may be used.
For convenience of description, spatially relative terms, such as "inner", "outer", "lower", "upper", and the like, may be used herein to describe one element or feature's relationship to another element or feature as illustrated in the figures. Such spatially relative terms are intended to encompass different orientations of the device in use or operation in addition to the orientation depicted in the figures.
The data center building method, the data center building system and the storage medium provided by the embodiment of the invention comprise the following steps: a flow management layer is set up and used for standardizing and processing the data management and application processes so as to control all links of the data full life cycle; an asset management layer is set up and used for managing metadata and data to form an enterprise data asset map so as to know data sources, transmission and storage modes; and constructing a quality management layer for combing and analyzing data quality problems and establishing a data standard. The invention provides a systematized data platform for the full life cycle of application aiming at the acquisition, calculation, storage and management of mass data, which can enable a data user to clearly know the relation between the data and the data, and enable a manager of data application to insights the complex dependency among the data, the application and the system, thereby effectively managing the data.
Example 1
As shown in fig. 1, the present embodiment provides a data center platform building method based on a data asset governance system, where the data center platform based on the data asset governance system includes a building process of three layers, and the three layers are respectively: the system comprises a flow management layer, an asset management layer and a quality management layer, wherein the construction process of each layer is as follows:
s1, building a flow management layer
The process management layer is used for carrying out standardization and process management on the processes related to data management and application, and defining roles and authorities of all actions so as to effectively control all links of the full life cycle of the data, including a data scheduling process, a data monitoring process, a data management process and a data alarming process, and the construction specifications of all the processes can be set according to actual data center station use and are not limited.
In some implementations, the flow management layer is configured and built visually by a data flow platform, and unified management, data processing, storage processes and unified scheduling of scripts for all jobs are implemented through the flow management layer, and the specific implementation includes:
and (3) data scheduling flow: scheduling methods such as calendar, frequency, event triggering and the like are provided, each type can provide various scheduling rules according to needs, and the scheduling methods can be mutually combined for use, so that various complex service calling is realized, and configuration is specifically carried out according to needs.
A data monitoring process: monitoring modes such as job flow, job, event, plan and log are provided, each mode can adopt a list mode and a graph mode, and preferably, the execution log can view historical and current data states; furthermore, manual intervention methods such as immediate execution, interruption, breakpoint continuous running and resetting can be provided, and the operation and maintenance work of the data is greatly reduced.
And (3) data management flow: the unified management of various operations such as ETL tools, scripts, stored procedures, runnable programs and the like is supported, the limitation of a platform can be crossed, and the unified calling of various operations is realized.
And (3) data alarm flow: the centralized alarm function is provided, and the operation failure condition can be checked in real time; meanwhile, an error notification mechanism is also provided, and the real-time acquisition of the operation condition of the operation can be realized through configuration.
S2, building an asset management layer
The asset management layer is used for deeply managing metadata and data, forming an enterprise data asset map and knowing the source, transmission and storage modes of each data.
The asset management layer mainly comprises metadata management and data management.
Metadata management provides users with high quality, accurate, easily managed data throughout the entire lifecycle of data center construction, operation, and maintenance. Meanwhile, in the whole process of data center construction, links such as data source analysis, an ETL process, a database structure, a data model, business application theme organization and front-end display need to be supported through corresponding metadata. Through metadata management, an accurate view of the information data assets of the whole system is formed, through a unified view of metadata, a data cleaning period is shortened, data quality is improved so that massive data from various service systems in a data center project can be systematically managed, the relation among service metadata is combed, information data standards are established to perfect the explanation and definition of the data, consistent and unified data definition in an enterprise range is formed, and tracking analysis can be carried out on the data sources, the operation conditions, the transitions and the like.
The basic function of data management is to extract informative data from a large number of data resources, as requested by the user. Such data may be obtained, for example, by retrieval, sorting, merging, conversion, aggregation, and the like. Data management addresses two major issues, one is to define the required forms of various data, and the other is how these requirements are handled by the system.
In some embodiments of the invention, metadata management includes metadata management maintenance, metadata queries, and metadata maintenance.
The metadata management and maintenance are used for adding, deleting, updating and inquiring metadata types. The system uses a real-world object-oriented mode to define metadata types, each metadata type has a plurality of attributes, and inheritance relationships, association relationships and containment relationships exist among the metadata types. The metadata type modeling method can define both structured metadata and unstructured metadata, and has strong flexibility and universality. Preferably, the metadata management maintenance establishing step includes: designing the inheritance relationship of basic type, design object property type, design object type and design object type. When a system loads a new metadata type, firstly, enumerations, structures and label definitions are analyzed, and a basic type instance is created; secondly, analyzing the inheritance relationship and creating an abstract metadata type example; then analyzing the object type to create a metadata type instance; and finally, analyzing the association, containment and inheritance relationships among the object types, initiating a query request to a metadata query module, creating each metadata type node, adding edges among the metadata type nodes, and establishing a map relationship.
The metadata query utilizes the storage and calculation characteristics of no fixed column of big data, lateral expansion and high real-time concurrency to uniformly store all metadata instances and metadata instance relations into one table (such as HBASE) through a graph database engine, so that the workload of data table definition in the traditional mode is reduced, and the system extracts information in the metadata instances to create metadata indexes so as to facilitate efficient query (such as SOLR and Elastic Search) while storing the metadata instances.
Preferably, the metadata query is established as follows:
1. in the initial stage of building a data warehouse system, the boundary range of the system is determined according to preset requirements, and the principle of determining the system range is to firstly ensure key points, not to enlarge the key points and only to refine the key points.
2. After the boundary range of the system is determined, the metadata of the existing system is sorted and added into the correspondence of the semantic layer. And then stored in a database, which can adopt a special metadata knowledge base or a general relational database.
3. Determining the scope of metadata management, such as: the conversion process of the data in the data warehouse and the extraction route of the related data are managed through the metadata, so that the whole historical process of the data in the warehouse is known to data warehouse development and use personnel.
4. And determining a tool for managing the metadata, and finishing corresponding work by adopting a certain tool. The current related tool is Microsoft's rendition, which has a corresponding programming interface, and can complete the function of entering and exiting the meta-model by means of the Microsoft's rendition; similar to this are also the OEE of Platinum; in addition, there is also the Wcc of Sybase, which can integrate the extraction tool and the transformation tool through MDIS, an old standard before MDC, can represent data extraction and transformation in a window, and can export the semantic layer to a front-end tool in the format of MDIS (such as inproptu of Cognos).
The metadata maintenance mainly includes maintenance management of published metadata, and the published metadata on line must be re-processed if adjustment and optimization are needed, and direct modification of the metadata is not allowed. For safety, all operation behaviors of the metadata are recorded in the metadata operation log. The system automatically generates a general maintenance interface according to the metadata type attribute and the extended attribute rule thereof, the containing relation among the metadata types and the incidence relation among the metadata types. When a user enters a module to maintain a metadata instance, a system analyzes metadata type attributes and expansion rules thereof, an attribute characterization configuration interface is created, and configuration operation of the user follows the attribute expansion rules; the system analyzes the metadata type containment relationship, creates a drilling interface, and a user can view the sub-metadata in a drilling mode; and the system analyzes the metadata type association relationship and creates an association interface. The sub-module interacts with the metadata query module to realize the addition, modification, deletion and query of metadata instances.
In some embodiments of the invention, data management includes data base management, data model management, and data collection management.
In some implementations, the data base management includes data management maintenance, data query, data maintenance.
The data management and maintenance is used for adding, deleting, updating and inquiring metadata types, and the establishment and the construction are the same as those of the metadata management and maintenance, which is not described herein again.
Data query, using the storage and calculation features of large data with no fixed column, lateral expansion and high real-time concurrency, uniformly storing all metadata instances and metadata instance relationships into one table (for example, HBASE) through a graph database engine, reducing workload of data table definition in the traditional manner, extracting information in metadata instances to create metadata indexes to facilitate efficient query (for example, SOLR and Elastic Search) while the system stores the metadata instances, and the specific establishment process is similar to the metadata query process, which is not repeated herein.
And the data maintenance automatically generates a general maintenance interface according to the metadata type attribute and the extended attribute rule thereof, the containing relation among the metadata types and the incidence relation among the metadata types. When a user enters a module to maintain a metadata instance, a system analyzes metadata type attributes and expansion rules thereof, an attribute characterization configuration interface is created, and configuration operation of the user follows the attribute expansion rules; the system analyzes the metadata type containment relationship, creates a drilling interface, and a user can view the sub-metadata in a drilling mode; and the system analyzes the metadata type association relationship and creates an association interface. The sub-module interacts with the metadata query module to realize the similar processes of adding, modifying, deleting, querying and the like of the metadata instance to the metadata maintenance and establishment process, and the details are not repeated herein.
In other implementations, data model management includes data model management, model relationship maintenance, and model import/export.
Data model management: the data model refers to the unified definition, coding and naming of enterprise operation and logic rules by using entities, attributes and the relation of the entities and the attributes; is a set of language for communication between business personnel and developers. The data model is used for unifying data views of enterprises, defining requirements of business departments on data information, establishing the basis of an atomic layer of a data warehouse, supporting development planning of the data warehouse and initializing attribution of business data; wherein, the step of data model design:
concept model design (business model): defining a system boundary; determining a main subject area and its content;
designing a logic model: dimension modeling methods (fact tables, dimension tables); data is organized in star and snowflake types;
designing a physical model: a process of materializing the logical model of the data warehouse into a database;
and (3) maintaining the relation of the model: after the data model is designed, the data model is checked and maintained according to the constraint of the integrity of the model relation. Integrity constraints are constraints and dependencies that data and its associations in a given data model have to define the state of a database and the changes of the state that conform to the data model to ensure that the data is correct, valid and consistent, including: entity integrity, referential integrity, user defined integrity.
Model import and export: after the data model is built, the model can be imported and exported.
In some embodiments of the invention, data collection management includes adapter management, collection management, and schedule management.
In some implementations, the adapter manages: the data adapter is oriented to distributed data collection management service, can provide flexible and universal data collection management functions meeting the requirements of enterprise-level application, and enables users to conveniently mine, extract, convert and manage data at any time and any place. The data adapter takes a processing engine as a core, and services built in the processing engine comprise: data collection, data cleansing, data population, data format translation, and other basic data processing functions.
In other implementations, the collection management: data acquisition is mainly divided into log acquisition and data source data synchronization.
The log collection can be divided into the log collection of a browser page and the log collection of a client according to the type of a product; data source data synchronization can be divided into direct data source synchronization, generated data file synchronization and database log synchronization according to a synchronization mode.
The browser page collection mainly collects a browsing log (PV/UV, etc.) and an interactive operation log (operation event) of the page. The collection of these logs is typically performed by posting a standard statistical JS code on the page. The process of implanting the code can be manually written in the page function development stage, and can also be dynamically implanted by the server when the corresponding page requests when the project is operated. After the page log is collected, certain clearness and preprocessing are required to be carried out at a server side. Such as cleaning false traffic data, recognizing attacks, normally complementing data, eliminating invalid data, formatting data, isolating data and the like.
Client log collection typically develops a special statistical SDK for data collection by the APP client. The collection of client data has high service characteristics and higher customization requirements, so that data is collected from the perspective of 'per event', such as click event, login event, service operation event and the like, besides some basic data of the application environment. The basic data can be acquired by the SDK by default, and after other events are defined by the service side, the SDK interface is called according to the specification. The data on H5 is merged to Native and then sent uniformly by SDK. One important principle of log collection is "standardization" and "normalization", and only if the collection mode is standardized and normalized, the collection cost can be reduced to the maximum extent, the log collection efficiency is improved, and the subsequent statistical calculation is realized more efficiently.
Direct data source synchronization refers to directly connecting to a service database, and reading data of a target database through a standardized interface (such as JDBC). This approach is easier to implement, but may have performance implications if the traffic is large for the data source.
Generating data file synchronization means that a data file is generated from a data source system and then is synchronized into a target database through a file system. The method is suitable for the scene with more dispersed data sources, the verification is needed before and after the data file transmission, and the file compression and encryption are also needed to be properly carried out, so that the efficiency is improved, and the safety is ensured.
Database log synchronization refers to synchronization based on log files of a source database. Most databases now support the generation of data log files and the recovery of data with data log files. This data log file can be used for incremental synchronization. The method has small influence on the system performance and high synchronization efficiency.
In yet other implementations, schedule management: the data adapter provides a real-time monitoring tool to monitor and manage the system, and a system administrator can find system faults and abnormal running conditions at the first time and process the system faults and the abnormal running conditions in time; meanwhile, the data processing condition can be subjected to statistical analysis and timely mastered.
S3, building a quality management layer
The quality management layer is used for combing and analyzing data quality problems, is constructed by finding out the current situation of the data quality, selects a proper solution for different quality problems by finding out the current situation, works out a detailed solution, and finally forms a knowledge base for solving the data quality problems for the reference of the following people. The data quality control refers to standardizing each information acquisition point of an information system in order to meet the requirement of information utilization, and comprises a series of processes of establishing a modeled operation rule, verifying original information, feeding back and correcting error information and the like.
The quality management layer building comprises metadata checking, data quality and data standards.
In some embodiments of the present invention, the metadata check is to perform a comprehensive check on definition, organization, precision of the data and a relationship between the data, based on normalization, integrity, and correctness of the data, to determine whether the data meets quality requirements, and the metadata check includes:
1. metadata consistency check
According to common data storage types, metadata consistency check mainly completes object-level record consistency check and data consistency check of field-level numerical values, time and character types, namely conventionally-mentioned table-level count, sum of the field-level numerical values, time type difference sum and checksum of the character types.
2. Metadata attribute checking
The attribute item checking method comprises the following steps: number of attribute items, attribute item definition
The attribute value checking method comprises the following steps: checked by the nature of the attribute values. I illegal character check, non-null check, frequency method check, fixed length check.
3. Metadata quality reporting
After consistency check and attribute check are carried out on the metadata, a metadata quality report is generated, and quality problems and solutions existing in the metadata are explained in detail.
4. Metadata normalization checks
After the metadata quality problem is solved, the metadata needs to be subjected to normative check, including metadata standard specification and the like.
In some embodiments of the invention, the data quality includes three basic categories of form quality, content quality, and utility quality. Wherein, the form quality mainly considers whether the data set can well match the service requirement in the structure and expression form and is easy to understand and obtain. The content quality mainly refers to whether the specific content and value of the data set are consistent with the actual service. Utility quality the main survey data sets have a high relevance in the traffic characteristics as well as in the time dimension. The data quality index is defined as shown in table 1:
TABLE 1
Figure BDA0002843572400000111
In some embodiments of the present invention, the data quality further includes data checking, where the data checking refers to implementing integrity and consistency check of data, and improving the data quality, and the data checking is a complete data quality control chain from data acquisition, preprocessing, comparison, analysis, early warning, notification, and problem repair, and the checking rule is: a relevance check comprising: whether key value association of the two data tables exists or not; whether the data volumes of the two data tables are consistent or not; whether the table structures of the two data tables are consistent, such as the number of fields, the types and the widths of the fields, and the like; whether the contents of the two data tables are consistent or not and whether the contents of one data table are missing or not. And configuring a check rule, such as field mapping and the like. Configuring a scheduling rule, such as a scheduling frequency; and configuring a report template, such as checking results and the like.
In some embodiments of the invention, the data quality further comprises problem management, and analysis is to find the reasons or conditions for the quality, and for these reasons, a data cleaning strategy is made. General data quality issues arise: invalid, repeated, missing, inconsistent, wrong values, wrong formats, problematic business logic rules, wrong data extraction procedures, etc., in addition to inconsistent statistical calibers, can also result in the viewed data being undesirable. According to the situations, the program to be written is automatically cleaned by a computer according to the data size and the requirement of a mining system, and the problem management flow is as follows:
defining service requirements and methods: defining key points, opportunities and targets of data quality management to guide all work in the whole project period; analyzing the information environment: collecting, sorting and analyzing information environment related to data quality, determining information life cycle, ensuring related data to be evaluated, and designing a data acquisition and evaluation scheme; and (3) evaluating the data quality: the data quality is evaluated for a data quality dimension applicable to this problem. The evaluation results are used to determine the root cause of the data quality problem, where improvement is needed; determining the reason of the data quality problem: determining and prioritizing root causes that cause data quality problems, and specific suggestions for solving these problems; formulating an improvement scheme: determining a final specific solution; prevention of future data errors: a solution is implemented that addresses the root cause of the data quality problem. And (3) real-time control: monitoring and verifying the improvements made, maintaining the results by standardizing, archiving, and continuously monitoring the improvements. Communication action and result: archiving and communicating quality management effects, improvements made and results of the improvements.
In some embodiments of the invention, the data standard is the basis for data asset management, which is a process of accurately defining data assets; the embodiment of the invention manages the data relationship among a source system, a data platform, a data mart, a data model, a database, a table, a field, a report (index storage field), a field and a field in data application surrounding a data center station through the construction of metadata and data management. According to a big data standard system established by a national information technology standardization technical committee big data standard working group, a standard system framework of data is composed of seven categories of standards, which are respectively as follows: base standards, data standards, technical standards, platform and tool standards, management standards, security and privacy standards, industry application standards. The data standard is a consistent convention for the expression, format and definition of data, and comprises the unified definition of data service attributes, technical attributes and management attributes; the purpose of the data standards is to make the data used and exchanged externally within an organization consistent and accurate. The construction implementation flow of the data standard is as follows:
the first stage is as follows: standard planning
From the actual situation, the national standard, the current standard, the new system requirement standard, the industry passing standard and the like are collected by combining the industry experience, the whole range of data standard construction is combed out, the data standard system framework and classification are defined, and the implementation plan of the data standard is made.
And a second stage: standard compilation
According to the data standard system framework and classification, each classified data standard template is firstly determined, and then relevant personnel compile the data standard according to the investigation results of relevant national standards, row standards, technical business requirements and the like to form a data standard initial draft.
And a third stage: standard review publishing
After the standards are compiled, the data standards need to be reviewed to ensure the completeness and the standardization of the data standards, and the data standards are revised and perfected after expert opinions and the personal opinions of all relevant departments are fully requested. And the completed data standard can be released to the whole enterprise after passing through the leadership examination and approval to form a formal data standard.
A fourth stage: standard touchdown enforcement and maintenance enhancements
All data standards cannot completely fall to the ground, and the situation that a historical system cannot be modified may exist in actual work, so that a data standard falling strategy and a falling range are determined firstly, a corresponding falling scheme is formulated, then the execution of the data standard falling scheme is promoted, and the standard falling situation is tracked and evaluated.
The data standards may subsequently need to be continuously updated and refined as business evolves, as national standards change, and as regulatory requirements change. In the data standard maintenance stage, a corresponding management flow needs to be established for standard change, and standard version management is well done.
In summary, the present invention forms a closed loop of data quality management through the construction of the above management layers.
Example 2
The embodiment 1 provides a data center building method, and correspondingly, the embodiment provides a data center building system. The building system provided by this embodiment can implement the data center building method of embodiment 1, and the building system can be implemented by software, hardware or a combination of software and hardware. For example, the building system may comprise integrated or separate functional modules or functional units to perform the corresponding steps in the methods of embodiment 1. Since the building system of the embodiment is basically similar to the method embodiment, the description process of the embodiment is relatively simple, and reference may be made to the partial description of embodiment 1 for relevant points, and the embodiment of the building system of the embodiment is only schematic.
The system is built to a platform in data that this embodiment provided, and this system includes:
the flow management layer building unit is used for building a flow management layer, standardizing and streamlining the data management and application process, and controlling all links of the full life cycle of the data;
the asset management layer building unit is used for building an asset management layer for managing metadata and data to form an enterprise data asset map so as to know data sources, transmission and storage modes;
and the quality management layer building unit is used for building a quality management layer, combing and analyzing data quality problems and building a data standard.
Example 3
This embodiment provides an electronic device corresponding to the data center building method provided in this embodiment 1, where the electronic device may be an electronic device for a client, such as a mobile phone, a notebook computer, a tablet computer, a desktop computer, and the like, so as to execute the building method of embodiment 1.
The electronic equipment comprises a processor, a memory, a communication interface and a bus, wherein the processor, the memory and the communication interface are connected through the bus to complete mutual communication. The bus may be an Industry Standard Architecture (ISA) bus, a Peripheral Component Interconnect (PCI) bus, an Extended Industry Standard Architecture (EISA) bus, or the like. A computer program capable of running on the processor is stored in the memory, and the processor executes the data center building method provided in embodiment 1 when running the computer program.
In some implementations, the Memory may be a high-speed Random Access Memory (RAM), and may also include a non-volatile Memory, such as at least one disk Memory.
In other implementations, the processor may be various general-purpose processors such as a Central Processing Unit (CPU), a Digital Signal Processor (DSP), and the like, and is not limited herein.
Example 4
The data center building method of this embodiment 1 can be embodied as a computer program product, and the computer program product may include a computer readable storage medium on which computer readable program instructions for executing the building method described in this embodiment 1 are loaded.
The computer readable storage medium may be a tangible device that retains and stores instructions for use by an instruction execution device. The computer readable storage medium may be, for example, but not limited to, an electronic memory device, a magnetic memory device, an optical memory device, an electromagnetic memory device, a semiconductor memory device, or any combination of the foregoing.
It should be noted that the flowchart and block diagrams in the figures illustrate the architecture, functionality, and operation of possible implementations of systems, methods and computer program products according to various embodiments of the present application. Each block in the flowchart or block diagrams may represent a module, segment, or portion of code, which comprises one or more executable instructions for implementing the specified logical function(s). Finally, it should be noted that: the above embodiments are only for illustrating the technical solutions of the present invention and not for limiting the same, and although the present invention is described in detail with reference to the above embodiments, those of ordinary skill in the art should understand that: modifications and equivalents may be made to the embodiments of the invention without departing from the spirit and scope of the invention, which is to be covered by the claims. The above description is only for the specific embodiments of the present application, but the scope of the present application is not limited thereto, and any person skilled in the art can easily conceive of the changes or substitutions within the technical scope of the present application, and shall be covered by the scope of the present application. Therefore, the protection scope of the present application should be defined by the claims.

Claims (10)

1. A data center building method is characterized by comprising the following steps:
a flow management layer is set up and used for standardizing and processing the data management and application processes so as to control all links of the data full life cycle;
an asset management layer is set up and used for managing metadata and data to form an enterprise data asset map so as to know data sources, transmission and storage modes;
and constructing a quality management layer for combing and analyzing data quality problems and establishing a data standard.
2. The data center building method according to claim 1, wherein building a process management layer comprises building a data scheduling process, a data monitoring process, a data management process and a data alarming process, and specifically comprises:
the data scheduling process is used for providing a scheduling method triggered by calendar, frequency and events;
the data monitoring process is used for providing monitoring modes of job flow, job, event, plan and log, and each monitoring mode adopts a list and/or a graph;
and (3) data management flow: the system is used for supporting the unified management of the operation of an ETL tool, a script, a storage process and a runnable program;
and (3) data alarm flow: for providing alerts and having an error notification mechanism.
3. The data center building method according to claim 1, characterized in that asset management layer building comprises metadata management, and the metadata management comprises metadata management maintenance, metadata inquiry and metadata maintenance;
metadata management and maintenance, which is used for adding, deleting, updating and inquiring metadata types;
the metadata query is used for uniformly storing all metadata examples and metadata example relations into a table through a database engine by utilizing the storage and calculation characteristics that big data has no fixed column, can be transversely expanded and is concurrent in real time;
and metadata maintenance, which is used for maintaining and managing the published metadata.
4. The data center building method according to claim 1, characterized in that the asset management layer building further comprises data management, wherein the data management comprises data base management, data model management and data acquisition management;
the data base management is used for managing, inquiring and maintaining data;
the data model management comprises data model management, model relation maintenance and model import/export; the data model management unifies data views of enterprises by designing a data model, defines requirements of business departments on data information, constructs an atomic layer basis of a data warehouse and initializes attribution of business data; the model relation maintenance is used for checking and maintaining the data model after the data model is designed, and the data model conforms to the model relation integrity constraint; model import and export are used for importing and exporting the data model after the data model is built;
the data acquisition management comprises adapter management, acquisition management and scheduling management; the adapter management is characterized in that the data adapter takes a processing engine as a core, and the built-in services of the processing engine comprise data processing functions of data collection, data cleaning, data filling and data format translation; collecting and managing, wherein data collection comprises log collection and data source data synchronization; scheduling management, wherein the data adapter provides a real-time monitoring tool to monitor and manage the system, and a system administrator can find system faults and abnormal running conditions at the first time and process the system faults and the abnormal running conditions in time; meanwhile, the data processing condition can be subjected to statistical analysis and timely mastered.
5. The data center building method according to claim 4, wherein the data model designing step includes:
designing a conceptual model: defining system boundary, determining subject domain and its content;
designing a logic model: determining a dimension modeling method and organizing data;
designing a physical model: the logical models of the data warehouse are physically organized into a database.
6. The data center building method according to any one of claims 1 to 5, wherein the quality management layer building comprises metadata inspection, and the metadata inspection comprises: checking consistency of metadata; metadata attribute checking; and generating a metadata quality report and a metadata normative check.
7. The data center building method according to any one of claims 1 to 5, wherein the building of the quality management layer further comprises a construction data standard, and the construction flow of the data standard comprises the following steps:
and (3) standard planning: combing out the whole range of data standard construction, defining a data standard system frame and classification, and making an implementation plan of the data standard;
and (3) standard compilation: determining each classification data standard template, compiling the data standard and forming a data standard primary draft;
and (3) standard review release: revising and perfecting the data standard to form a formal data standard;
standard landing enforcement and maintenance enhancements: tracking and evaluating the standard landing condition, establishing a corresponding management flow for the standard change, and performing standard version management.
8. A data center building system, comprising:
the flow management layer building unit is used for building a flow management layer, standardizing and streamlining the data management and application process, and controlling all links of the full life cycle of the data;
the asset management layer building unit is used for building an asset management layer for managing metadata and data to form an enterprise data asset map so as to know data sources, transmission and storage modes;
and the quality management layer building unit is used for building a quality management layer, combing and analyzing data quality problems and building a data standard.
9. An electronic device comprising at least a processor and a memory, the memory having stored thereon a computer program, wherein the processor executes when executing the computer program to implement the data center building method according to any one of claims 1 to 7.
10. A computer storage medium having computer readable instructions stored thereon which are executable by a processor to implement the data center building method of any one of claims 1 to 7.
CN202011501793.7A 2020-12-18 2020-12-18 Data center building method and system and storage medium Pending CN112527774A (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN202011501793.7A CN112527774A (en) 2020-12-18 2020-12-18 Data center building method and system and storage medium

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN202011501793.7A CN112527774A (en) 2020-12-18 2020-12-18 Data center building method and system and storage medium

Publications (1)

Publication Number Publication Date
CN112527774A true CN112527774A (en) 2021-03-19

Family

ID=75001336

Family Applications (1)

Application Number Title Priority Date Filing Date
CN202011501793.7A Pending CN112527774A (en) 2020-12-18 2020-12-18 Data center building method and system and storage medium

Country Status (1)

Country Link
CN (1) CN112527774A (en)

Cited By (8)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN113239016A (en) * 2021-06-01 2021-08-10 通号智慧城市研究设计院有限公司 Database design assistance apparatus and method
CN114115883A (en) * 2022-01-26 2022-03-01 广州云徙科技有限公司 Method for quickly constructing front-end application by using middle station service capability
CN114357029A (en) * 2022-01-04 2022-04-15 工银瑞信基金管理有限公司 Method, device, equipment, medium and program product for processing service data
CN114510534A (en) * 2022-01-28 2022-05-17 广东航宇卫星科技有限公司 Data synchronization method, device, equipment and storage medium
CN114531268A (en) * 2021-12-31 2022-05-24 华能信息技术有限公司 Security center system, security center method, computer device, and storage medium
CN114661704A (en) * 2022-03-23 2022-06-24 杭州半云科技有限公司 Data resource full life cycle management method, system, terminal and medium
WO2024002102A1 (en) * 2022-06-27 2024-01-04 中国信息通信研究院 Active administration system for data assets, computing device, and storage medium
CN114531268B (en) * 2021-12-31 2024-04-30 华能信息技术有限公司 Security center console system, security center console method, computer device, and storage medium

Cited By (10)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN113239016A (en) * 2021-06-01 2021-08-10 通号智慧城市研究设计院有限公司 Database design assistance apparatus and method
CN113239016B (en) * 2021-06-01 2024-04-02 通号智慧城市研究设计院有限公司 Database design assistance apparatus and method
CN114531268A (en) * 2021-12-31 2022-05-24 华能信息技术有限公司 Security center system, security center method, computer device, and storage medium
CN114531268B (en) * 2021-12-31 2024-04-30 华能信息技术有限公司 Security center console system, security center console method, computer device, and storage medium
CN114357029A (en) * 2022-01-04 2022-04-15 工银瑞信基金管理有限公司 Method, device, equipment, medium and program product for processing service data
CN114357029B (en) * 2022-01-04 2022-09-02 工银瑞信基金管理有限公司 Method, device, equipment and medium for processing service data
CN114115883A (en) * 2022-01-26 2022-03-01 广州云徙科技有限公司 Method for quickly constructing front-end application by using middle station service capability
CN114510534A (en) * 2022-01-28 2022-05-17 广东航宇卫星科技有限公司 Data synchronization method, device, equipment and storage medium
CN114661704A (en) * 2022-03-23 2022-06-24 杭州半云科技有限公司 Data resource full life cycle management method, system, terminal and medium
WO2024002102A1 (en) * 2022-06-27 2024-01-04 中国信息通信研究院 Active administration system for data assets, computing device, and storage medium

Similar Documents

Publication Publication Date Title
US11755628B2 (en) Data relationships storage platform
US20240070487A1 (en) Systems and methods for enriching modeling tools and infrastructure with semantics
CN112527774A (en) Data center building method and system and storage medium
Davoudian et al. Big data systems: A software engineering perspective
US11599539B2 (en) Column lineage and metadata propagation
US11106665B1 (en) Automated SQL source code review
CN114925045A (en) PaaS platform for large data integration and management
US20220129816A1 (en) Methods and arrangements to manage requirements and controls, and data at the intersection thereof
Hansen et al. An empirical study of software architectures’ effect on product quality
CN111737335B (en) Product information integration processing method and device, computer equipment and storage medium
US10592391B1 (en) Automated transaction and datasource configuration source code review
US10585663B1 (en) Automated data store access source code review
KR20080026243A (en) System for evaluating data quality management maturity
KR100796906B1 (en) Method for Quality Control of DataBase
CN116860311A (en) Script analysis method, script analysis device, computer equipment and storage medium
CN116578614A (en) Data management method, system, medium and equipment for pipeline equipment
CN116561114A (en) Metadata-based management method
KR100796905B1 (en) System for Quality Control of DataBase
KR100792322B1 (en) Framework for Quality Control of DataBase
US10275237B1 (en) Automated spring wiring source code review
Hilal et al. Toward a new approach for modeling dependability of data warehouse system
Kalyonova et al. Design Of specialized storage for heterogeneous project data
Fernandes et al. Automated Refactoring of Unbounded Queries in Software Automation Platforms
Oelsner et al. IQM4HD concepts
US10599425B1 (en) Automated data access object source code review

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination