CN115794929B - Data management system and data management method for data marts - Google Patents

Data management system and data management method for data marts Download PDF

Info

Publication number
CN115794929B
CN115794929B CN202310061992.8A CN202310061992A CN115794929B CN 115794929 B CN115794929 B CN 115794929B CN 202310061992 A CN202310061992 A CN 202310061992A CN 115794929 B CN115794929 B CN 115794929B
Authority
CN
China
Prior art keywords
data
layer
service
dimension
user
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Active
Application number
CN202310061992.8A
Other languages
Chinese (zh)
Other versions
CN115794929A (en
Inventor
杨雷
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Zhongqi Scc Beijing Finance Information Service Co ltd
Original Assignee
Zhongqi Scc Beijing Finance Information Service Co ltd
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Zhongqi Scc Beijing Finance Information Service Co ltd filed Critical Zhongqi Scc Beijing Finance Information Service Co ltd
Priority to CN202310061992.8A priority Critical patent/CN115794929B/en
Publication of CN115794929A publication Critical patent/CN115794929A/en
Application granted granted Critical
Publication of CN115794929B publication Critical patent/CN115794929B/en
Active legal-status Critical Current
Anticipated expiration legal-status Critical

Links

Images

Classifications

    • YGENERAL TAGGING OF NEW TECHNOLOGICAL DEVELOPMENTS; GENERAL TAGGING OF CROSS-SECTIONAL TECHNOLOGIES SPANNING OVER SEVERAL SECTIONS OF THE IPC; TECHNICAL SUBJECTS COVERED BY FORMER USPC CROSS-REFERENCE ART COLLECTIONS [XRACs] AND DIGESTS
    • Y02TECHNOLOGIES OR APPLICATIONS FOR MITIGATION OR ADAPTATION AGAINST CLIMATE CHANGE
    • Y02DCLIMATE CHANGE MITIGATION TECHNOLOGIES IN INFORMATION AND COMMUNICATION TECHNOLOGIES [ICT], I.E. INFORMATION AND COMMUNICATION TECHNOLOGIES AIMING AT THE REDUCTION OF THEIR OWN ENERGY USE
    • Y02D10/00Energy efficient computing, e.g. low power processors, power management or thermal management

Landscapes

  • Information Retrieval, Db Structures And Fs Structures Therefor (AREA)

Abstract

The invention provides a data management system and a data management method of a data mart, wherein the system comprises: the data management platform is provided with a business data model, a target data source, a dimension table model, a wide table model, user information, user authority information and available resources; the data center station is used for extracting, converting and loading data from a data source; the data preparation layer is used for extracting data through the data center table, storing the generated data record in a data warehouse, extracting service data and cleaning the data based on service rules in the service data model, and obtaining a structured detail table; the service data layer is used for summarizing the detail table data based on the dimension and the dimension attribute set in the dimension table model to obtain a dimension table and a summary table; the application data layer extracts data information in the service data layer based on the dimension and the field information set in the wide table model to obtain application data; the application data layer and the data management platform are both provided with a data access interface for the data integration application platform.

Description

Data management system and data management method for data marts
Technical Field
The invention relates to the technical field of big data, in particular to a data management system and a data management method of a data mart.
Background
A data mart (DataMart), also called a data market, is a small data warehouse that gathers raw data from applications, businesses, operations into a data warehouse and provides specialized data services for a team of specific professionals, designed to support decision making, support system functions. A data mart is a smaller, more centralized data warehouse into which raw data is simply generated from business databases for ease of statistics, and thus flows into special professional teams to support the customized use of those teams. These team-level databases may be referred to as data marts. A data mart is a team-level collection of data and rules organized for decision (demand) support thereof are referred to as "topic domains". In a data mart, each data unit has a fixed time stamp, and the data is atomic level data and user data summarized in a time dimension. The data comprises non-updatable data (operation data) and continuously-changing data (business data) sets, and the main function is to support business operators to make multidimensional data analysis through collecting business data and operation data in a multi-business system so as to support business decisions.
From the data source analysis, the data of the data marts comes from business databases, log files, operation records of each business line of the enterprise, or from a special data warehouse already established. However, based on data generated by the service line service, there are various formats of data such as structured service data, unstructured log data, and the like, and the storage paths and media are different, so that the data cannot be effectively managed, so that it is extremely difficult to analyze the data, and the operation efficiency of development results is low. When each business line puts forward data demands, the project delivery system is often re-used, and the project, development, test and production are repeatedly carried out, so that the data output efficiency is low.
In addition, while a data mart may be considered a smaller, more centralized data warehouse, it has some differences from conventional data warehouses, mainly in that: (1) The data warehouse is oriented to the whole enterprise, and the data of the data warehouse comprises all business data in the enterprise and provides data support for the enterprise; while data marts are smaller team oriented; (2) The granularity of the data is different, the granularity of the data warehouse is very small, and the data warehouse is basically detailed data; the data marts can provide wide table data or summarized data according to different dimensions; (3) Some of the data in the data marts comes from the data warehouse, but some of the data comes from non-business data (i.e., data that cannot be included in the data warehouse) generated by the business or application. In this case, if a data query service directly provided by the data warehouse is employed, there are the following problems: 1) The inquiry quantity is large, the calculation cost is high, and the corresponding hardware investment is high; 2) Accidents such as overtime response and downtime of a data warehouse easily occur in the use peak period; 3) The time for delivering the data report is long, the cost is high, and the cost for modifying and maintaining is also high; 4) The data multiplexing property is poor; and 5) data management difficulties.
How to quickly acquire data based on a data mart, so that the data acquisition efficiency is improved, the data reusability is improved, and the management is easy, which is a problem to be solved at present.
Disclosure of Invention
In view of the above, embodiments of the present invention provide a data management system and a data management method for a data mart, so as to solve one or more of the drawbacks of the prior art.
The technical scheme of the invention is as follows:
a data management system for a data mart, the system comprising: the system comprises a data source, a data center, a data preparation layer, a detail data layer, a service data layer, an application data layer and a data management platform;
the data management platform is provided with a plurality of business data models, target data source information corresponding to business, a dimension table model, a wide table model, user information, user authority information and available resource information, wherein business rules and field mapping relation information are arranged in the business data models, dimension and dimension attributes are arranged in the dimension table model, and the wide table model is provided with wide table dimensions and field information under each dimension; each business data model is associated with a corresponding target data source, a dimension table model and a wide table model;
The data source comprises business data, log data and monitoring class data generated by the application service in each business behavior;
the data center is used for extracting, converting and loading data from a data source;
the data preparation layer is used for extracting, converting and loading data from a data source through the data center, generating a data record based on the playback of log data, storing the generated data record in a data warehouse, extracting and cleaning the service data from a corresponding target data source based on service rules and field mapping relation information set in a service data model selected from a data management platform, and storing the extracted and cleaned structured service data in the detail data layer;
the detail data layer is used for storing the extracted and cleaned structured business data in a detail table;
the service data layer is used for summarizing the data in the detail table based on the dimension and the dimension attribute set in the dimension table model corresponding to the service data model, and at least obtaining a dimension table and a summary table;
the application data layer is used for extracting data information in the service data layer at least based on the dimension and the field information set in the wide table model corresponding to the service data model to obtain application data used as a data mart, wherein the application data at least comprises a service wide table;
At least the data preparation layer, the application data layer and the data management platform are all provided with a data access interface for the data integration application platform.
In some embodiments of the invention, the system further comprises: the data integration application platform is used for providing data services for users based on the access interface.
In some embodiments of the present invention, the data management platform is further provided with an application interface configuration, a stream data management module, an executor, and a scheduler; the stream data management module is used for providing stream processing tasks, batch processing task setting and timing management of each task; the executor is used for executing tasks based on task setting and timing management of the stream data management module; the scheduler is used for performing task scheduling.
In some embodiments of the invention, the detail data layer and the service data layer are also provided with a data access interface for the data integration application platform.
In some embodiments of the present invention, the service data layer further includes a static table; the application data layer further includes: user portrayal and public data.
In another aspect of the present invention, there is also provided a data management method implemented by a data management system based on a data mart, the data management system including: the system comprises a data source, a data center, a data preparation layer, a detail data layer, a service data layer, an application data layer and a data management platform; the data management platform is provided with a plurality of business data models, target data source information corresponding to business, a dimension table model, a wide table model, user information, user authority information and available resource information, wherein business rules and business fields are arranged in the business data models, dimension and dimension attributes are arranged in the dimension table model, and field information in wide table dimensions and under each dimension is arranged in the wide table model; each business data model is associated with a corresponding target data source, a dimension table model and a wide table model; the data source comprises business data, log data and monitoring class data generated by the application service in each business behavior; the data preparation layer, the application data layer and the data management platform are provided with a data access interface for the data integration application platform to access, and the method comprises the following steps:
Data extraction, conversion and loading are carried out from a data source through the data center in the data preparation layer, data records are generated based on the playback of log data, the generated data records are stored in a data warehouse, service data extraction and data cleaning are carried out from corresponding target data sources based on service rules and field mapping relation information set in a service data model selected from a data management platform, and the structural service data obtained after extraction and cleaning are stored in the detail data layer;
storing the extracted and cleaned structured business data as a detail table in the detail data layer;
summarizing the data in the detail table based on the dimension and the dimension attribute set in the dimension table model corresponding to the service data model at the service data layer, and at least obtaining a dimension table and a summary table;
and extracting information from the data in the service data layer based on the dimension and the field information set in the wide table model corresponding to the service data model at least to obtain the application data serving as the data marts corresponding to the selected service data model, wherein the application data at least comprises a service wide table.
In some embodiments of the invention, the method further comprises: when a user query request is received, determining user access rights based on a user identifier carried in the user query request and user rights information in the data management platform, rejecting the user request under the condition that the user does not have the access rights, querying the requested content from the data preparation layer, the detail data layer, the service data layer or the application data layer based on the content of the user request under the condition that the user has the access rights, and returning a query result to the user.
The data management system and the method for the data mart can greatly improve the calculation performance, reduce the data use cost, improve the data reusability and enable the data to be easy to manage.
Additional advantages, objects, and features of the invention will be set forth in part in the description which follows and in part will become apparent to those having ordinary skill in the art upon examination of the following or may be learned from practice of the invention. The objectives and other advantages of the invention may be realized and attained by the structure particularly pointed out in the written description and drawings.
It will be appreciated by those skilled in the art that the objects and advantages that can be achieved with the present invention are not limited to the above-described specific ones, and that the above and other objects that can be achieved with the present invention will be more clearly understood from the following detailed description.
Drawings
The accompanying drawings, which are included to provide a further understanding of the invention and are incorporated in and constitute a part of this application, illustrate and together with the description serve to explain the invention. In the accompanying drawings:
FIG. 1 is a schematic diagram of a data management system for a data mart according to an embodiment of the present invention.
FIG. 2 is a flow chart of a method for managing data of a data mart according to an embodiment of the invention.
Detailed Description
The present invention will be described in further detail with reference to the following embodiments and the accompanying drawings, in order to make the objects, technical solutions and advantages of the present invention more apparent. The exemplary embodiments of the present invention and the descriptions thereof are used herein to explain the present invention, but are not intended to limit the invention.
It should be noted here that, in order to avoid obscuring the present invention due to unnecessary details, only structures and/or processing steps closely related to the solution according to the present invention are shown in the drawings, while other details not greatly related to the present invention are omitted.
It should be emphasized that the term "comprises/comprising" when used herein is taken to specify the presence of stated features, elements, steps or components, but does not preclude the presence or addition of one or more other features, elements, steps or components.
In view of the problems in the prior art, the present invention provides a data management system based on a mixed-mode data mart framework, or referred to as a mixed-mode data mart, where the mixed mode refers to: the method can provide the query directly by utilizing the data warehouse, can provide the query based on each data mart obtained by data step-by-step summarization, and can also perform the query of different levels of data. The system provided by the embodiment of the invention mainly comprises the following parts: the system comprises a data source, a data middle platform, a data preparation layer, a detail data layer, a service data layer, an application data layer, a data management platform and a data application integration platform.
As shown in fig. 1, the data management platform is provided with a business data model, target data source information corresponding to a business, a dimension table model, a wide table model, user information, user authority information, available resource information and the like. For different services, the service data model can be provided with a plurality of service rules and field mapping relations, so that the data center station can extract source data based on the service rules and field mapping relations set in the service data model. Dimension and dimension attributes may be provided in the dimension table model. The wide table model may have wide table dimensions and field information in each dimension. In the embodiment of the invention, each service data model is associated with a corresponding target data source, a dimension table model and a wide table model, so that for a specific service, the target data source related to the required service can be selected from a plurality of heterogeneous data sources to extract, convert and load (ETL) data.
In an embodiment of the present invention, the data source may include a plurality of heterogeneous data sources, and each data source may include service data, log data, and monitoring class data generated by the application service in each service behavior.
The data center is used for extracting, converting and loading data from a data source. The data preparation layer is used for extracting, converting and loading data from a data source through the data center station. In an embodiment of the present invention, a data center records data based on playback of acquired log data, generates a data record, and stores the generated data record in a data warehouse. Further, the data center station extracts service data from the corresponding target data source and cleans the data according to the service rules and field mapping relation information set in the service data model selected from the data management platform aiming at the data in the data warehouse and the data source, and the extracted and cleaned structured service data is stored in the detail data layer. The detail data layer is used for storing the extracted and cleaned structured business data in the detail table.
The service data layer is used for summarizing the data in the detail table based on the dimension and the dimension attribute set in the dimension table model corresponding to the service data model, and at least obtaining the dimension table and the summarization table.
The application data layer is used for extracting data information in the service data layer at least based on the dimension and the field information set in the wide table model corresponding to the service data model to obtain application data used as a data mart, wherein the application data at least comprises a service wide table.
In the embodiment of the invention, the data preparation layer, the application data layer and the data management platform are provided with the data access interface for the data integration application platform to access, so that the data integration application platform can access the data preparation layer, the application data layer and the data management platform.
The various portions of the data management system of the present invention based on a mixed-mode data mart will be described in more detail below.
The data source refers to a plurality of heterogeneous data such as business data, log data, monitoring class data and the like which are directly generated by each application service in the business behavior under the micro-service architecture, wherein the log data can comprise business logs, system logs, behavior logs and the like. The data provided by the data source is used as the underlying data for the other logical layers.
In the embodiment of the invention, the data center is used for supporting each business department to provide business data and computing services. The data center is used for mining data sources and can provide data acquisition, data storage services and the like. The data acquisition is used for acquiring data from a data source, and acquiring service data, log data, monitoring data and the like of a required service. The data center station not only can provide a data source through data access, but also can perform data conversion, writing or caching of various data sources of internal sources, and can also realize data scheduling, data transmission and data checking functions, and can realize quasi-real-time data transfer. In the embodiment of the invention, various data sources including various business data, log data and monitoring data can be accessed through the data center platform service, and the data can be preliminarily cleaned and stored in a structured manner through defined data conversion and verification rules. In the embodiment of the invention, the data center station orderly records each record by playing back the obtained various logs, realizes data acquisition, and adopts the time stamp as a data generation mark. In this way, incremental data migration checking may be supported. In the data migration process, the data extraction mode uses log files of the data source database, but not the data query function of the database, so that the performance of the database is not lost in the whole amount or in the increment, the adding, deleting and modifying functions of the database are not influenced due to the large data amount, and the normal operation of the service is ensured.
The data preparation layer is also called a data preparation area, namely a source pasting layer and an ODS (Operational Data Store, operation data storage) layer, and under the condition that the performance of service data sources (databases and files) is not lost, the data preparation layer performs data stream extraction on the data sources based on specific services through a data center, and the extracted data is stored in a data warehouse after preliminary cleaning and transmission. The data in the data preparation layer is the layer closest to the data in the data source, and the data in the data preparation layer is the data source from which the data warehouse subsequently processes the data. By taking account of the problem of tracing the data, etc., the data of the data preparation layer is subjected to only very slight data cleansing, and even the data extracted from a plurality of data sources can be stored as it is.
In the embodiment of the invention, in order to ensure the data writing performance, middleware such as KafKa and the like can be adopted as a buffer layer in the data preparation layer to buffer the extracted data so as to improve the stability of the system.
The detail data layer is a detail table for structuring and storing the data stored in the data warehouse by the data preparation layer after further extraction, cleaning and transmission. More specifically, the extracted data is stored in the form of a structured list after preliminary data cleansing and preliminary conversion based on data cleansing rules defined in the data management platform in advance. The detail data layer is an isolation layer of the business layer and the data warehouse, and the detail data layer and the data preparation layer keep the same data granularity, but add operations related to data cleaning and data normalization.
The detail data layer obtains structured relational data (service data) after data processing according to a predefined service rule in the data management platform based on the data preparation layer and the data source, wherein the service rule can include, for example: the dimension of the defined business rules such as the dimension of the user, the business theme, the combination of different business stages and the like can be newly added and adjusted on the data management platform according to the requirement, and the business rules can be optionally combined based on the requirement. In the embodiment of the invention, in the detail data layer, data aggregation can be performed based on a preset business rule dimension, for example, the data of the same theme are collected into a table, so that the usability of the data is improved. The invention provides the detail table data based on the random combination of the business rules in practical application, so that the requirement that a user frequently changes the analysis dimension of the business data can be realized. In the embodiment of the invention, the detail data layer synchronizes the data in the data warehouse and/or the data source to the corresponding field in the data table of the detail data based on the field mapping relation predefined in the data management platform (such as predefined in the business data model). By way of example, the data model defined in the data management platform may include, for example: an electronically-determined data model, a weight amount model, a user classification model, etc., and the data model may be defined with, for example, such as: business rules such as data state change rules of each stage of the electronic right-confirming certificate, day-by-day summarizing rules of user data and the like are used for generating a detail table on a detail data layer based on the business rules.
The service data layer is used for generating data summarization with different dimensions based on detail data and by aggregation with a dimension table model defined in the data management platform as intermediate data. The data tables generated in the service data layer include a dimension table and a fact table (e.g., a summary table and a static table). The dimension table is also called a dimension table or a lookup table, and is metadata used for storing dimension information, and more particularly, the dimension table is a table used for describing dimension data, and the dimension data comprises dimension and dimension attribute information. Dimension tables correspond to fact tables, the purpose of which is to add business meaning and context to the fact tables in the data warehouse. The dimension table is an entry point of the fact table, the dimension table realizes a service interface of the data warehouse, the dimension table stores attribute values of dimensions and can be associated with the fact table, the dimension table is equivalent to extracting and standardizing the attributes frequently appearing in the fact table and managing the attributes by using one table, and common dimension tables are as follows: date table (storing attributes such as week, month, quarter, etc. corresponding to date), place table (including attributes such as country, province/state, city, etc.), etc. The summary table is a data table obtained by performing summary analysis on the data in the detail table according to a predetermined dimension, and the summary analysis includes summation, maximum value calculation, minimum value calculation, average value calculation, and the like, and the present invention is not limited to this. A static table is a fact table that embodies the status of each time point data. As an example, the service data layer may lightly aggregate the data in the detail table according to dimensions such as user, time, service type, etc., and generate a series of intermediate tables or result tables such as "user service data classification detail table", "user service data date (month/year) summary table", "date summary data table", "user static service data table", etc., as intermediate data, so as to improve reusability of public indexes and reduce the work of repetitive processing. The dimension attribute is obtained through definition and management of the dimension by the dimension table model of the data management platform, and statistics of reports and the like can be directly carried out by utilizing the service data without carrying out statistics by detail data.
The application data layer is used for further aggregating service data based on business service to generate application data, wherein the application data is data provided for data products and data analysis, and the application data can directly provide data query analysis for the business service. After the data from the service data layer is processed by the service data layer, the service data layer is further aggregated again according to an actual service data model, so that service available wide-table data or view data can be generated and stored in an application layer database, and the data is data of an application end of a data mart framework. The application data can generally provide data query for each business service directly, not only support conventional SQL query, but also provide direct data query for report and chart directly through API interface.
In the embodiment of the invention, non-public user portrait data, business wide table data, public data and the like are also generated in the application data layer. The user portrait data may include, for example, user feature-oriented data including labels of user attributes (including natural attributes, social attributes, and the like), user interest preferences, and the like, which are obtained based on user behavior analysis, so as to embody a user avatar, thereby providing a targeted service for the user. The public data may for example comprise some user-oriented shared resource data. In the embodiment of the invention, the application data can be provided for the data integration application platform through a preset interface for users to directly inquire. Furthermore, in embodiments of the present invention, data in the application data layer may be generated as micro-visualization data.
In the embodiment of the invention, the data management platform is used as a data mart resource management, system configuration and business model design platform. The platform is a visual data mart management platform, and the functions comprise basic management functions of users, authorities, resources and the like, data source management, data structures (dimension tables and wide tables), data models, task scheduling, target databases and the like. Platform-level management functions not available in traditional data mart management functions are provided. As a management platform in the framework of the data mart. The data management platform is used for executing the following functions:
(1) User, authority and resource information is stored for user, user authority and resource management.
When a user requests data through the data management platform, the data management platform firstly confirms whether the user has the request authority based on the user identification information, further processes the user request under the condition that the user has the authority, and refuses the user request under the condition that the user has no authority.
(2) A field mapping relationship is defined, and data in the data warehouse and the data source can be synchronized to corresponding fields in a detail table of a detail data layer based on the field mapping relationship when detail data is generated.
(3) A data model, a dimension table model, a target data source and the like are built
The data model is used for building for different services, and different service requirements can correspond to different data models, such as an electronic weight data model, a weight limit determining model, a user classification model and the like, but the invention is not limited to the data models. Different data models may be associated with different target data sources and have different business rules to directly extract data from the target data sources for business needs to prepare the data and generate a list in the detail data layer. The dimension table model is defined with data dimension information corresponding to different service requirements and is used for generating a dimension table based on the data dimension. For specific business, the system can select a data model, a dimension table model and a corresponding target data source to be used by the data mart through the data management platform, collect and aggregate the data in the detail table according to the dimension corresponding to the related business rule, and then synchronize the generated data into the corresponding data table of the service data.
(4) Establishing a wide-table model
The wide table model is a Bigtable/HBase model and is used for generating a wide table and supporting functions of data version, life cycle, primary key row self-increment, condition updating and the like. The broad table may include, for example, primary keys, partition keys, attribute columns, versions, data types, lifecycles, maximum number of versions, and the like.
For specific business, the system of the invention utilizes the data model, the wide table model, the corresponding target data source and the like which are selected by the data management platform and used by the data mart to gather, aggregate and reorganize the data in the service data layer according to the related business rules, and then synchronize the generated data into the corresponding data table of the application data, such as the business wide table data table.
(5) The method can define and manage data processing modes such as timing tasks, streaming processing, batch processing and the like, realize various operation and maintenance modes such as automatic, manual, abnormal processing and the like, set the timing tasks, the streaming processing and the batch processing can be realized in the streaming data management, and execution of the tasks and the processing can be realized by an executor and a scheduler.
The data warehouse and the detail data are used as target databases, and the time stamp is used as a data generation mark, so that incremental data migration verification can be supported. When the service is abnormal, the timing task and the flow batch integrated data synchronization function can be intervened in a manual operation mode, and the data synchronization can be intervened by manual start and stop. In addition, the data management platform is correspondingly provided with a stream data management module, an executor and a scheduler; the stream data management module is used for providing stream processing tasks, batch processing task setting and timing management of each task; the executor is used for executing tasks based on task setting and timing management of the stream data management module; the scheduler is used for task scheduling.
(6) Is provided with an application interface
Through the application interface, the service end data mart data interface can be defined, so that service data sources of a plurality of data marts are managed, and multi-source management is realized. The business data service interface can be automatically generated, can support the function of converting custom SQL into API data query interface based on a wide table structure, greatly reduces development difficulty, can directly service data, and has simple data management.
In the embodiment of the invention, the data integration application platform is used as a data operation platform in the data mart framework and is used for providing data service, so that the data integration application platform can also be called a data service providing platform. The data integration application platform can realize authority management of interfaces and reports based on data API and reports externally provided by application data, supports chart and report analysis, and provides data query service conforming to data safety management standards. The data integration application platform has three modes of embedded type, API and SaaS. In one embodiment of the invention, the data integration application platform is part of a data management system that is a data mart. In another embodiment of the present invention, the data integration application platform may not be part of, but be associated with, the data management system of the data mart via a data interface. As shown in fig. 1, the data preparation layer, the application data layer, and the data management platform may be provided with a data access interface for the data integration application platform so that a user may directly obtain data from these different levels. In some embodiments of the present invention, the detail data layer and the service data layer may also be provided with a data access interface for the data integration application platform, so that the user may directly obtain data from these layers.
Based on the data mart framework, the data mart can be accessed into at least two data platforms: a data management platform and a data application integration platform. The data warehouse, detail data, service data, and application data enable structured data storage.
The traditional data analysis method for the data sources needs to link a plurality of data sources and uniformly analyze the service data and the operation data, the analysis method needs to be continuously modified along with the adjustment of the service requirement, and the research and development cost is high. The framework of the invention provides a data preparation function, realizes the extraction of data of multiple data sources into a data warehouse and a detail data warehouse, and reduces the number of linked data sources during data analysis.
Compared with the existing framework, the data management system and the data management method of the data mart have the following advantages:
(1) In the existing query service for the data warehouse, the data volume is huge, no matter what demands are put forward by the service, the data volume is directly calculated from the service database in real time, the calculation cost is high, and the corresponding hardware investment is also high. In the invention, data is summarized step by step, and the data is summarized to the application data layer step by step according to user dimension and business rule predefined in the data management platform aiming at business through a plurality of layers of data source, data warehouse, detail data layer, service data layer and application data layer, wherein the data can support direct inquiry at the moment, for example: to query a completely closed electronic right certificate payment way, the existing query is that the examples need to associate various databases (mysql, oracle) corresponding to "open service", "payment service", "warranty service", "repayment service" and "clear service"; in the embodiment of the invention, the electronic right certificate payment path result can be directly inquired only by using a wide table associated user table in the application data layer, and the application data which can be directly inquired is formed through step-by-step summarization of a data model, a dimension table model and a wide table model. In addition, in the embodiment of the invention, through the synchronization and playback intelligence of the service log, the data synchronization does not affect the service database, and the service database can provide computing power for service by 100%, so that the system performance is greatly enhanced. In the embodiment of the invention, the data center station orderly records each piece of data by playing back the acquired various log data, thereby completing migration of various data sources in total and incremental data with higher stability.
(2) During the use peak period, as the use frequency increases, the data query response based on the existing centralized data warehouse can be slow, and even accidents such as response timeout, downtime of the data warehouse and the like can occur. If the service database is down due to data query, the normal operation of the service is affected. In the embodiment of the invention, the data query calculation is completely decoupled from the actual query of the user, and the data query calculation can be performed in advance at idle time or at right time, so that the dependence of the user on hardware in the use peak period is solved.
(3) By adopting the existing data architecture, because of the lack of the intermediate table, the result table and other hierarchy temporary data, each calculation is directly inquired in the service database according to logic in the requirement, so that the cost for delivering one data report is high, and the subsequent iterative maintenance cost for modifying and adjusting the report is also high if the requirement changes, namely, the data use cost is high. In the embodiment of the invention, the data in the detail table, the step-by-step summarized data in the service data layer and the data in the application data layer can provide query service, the data use cost is low, and the data can be changed as required by modifying the service rules, the combination relation of the service rules, the maintenance table model, the wide table model and the like as required, thereby reducing the development cost and the difficulty.
(4) The existing system has poor data reusability, and can not acquire dimension attributes because dimension cannot be defined and managed, and can not extract and normalize the attributes frequently and repeatedly appearing in service actual data to manage by using a dimension table, and the system has no hierarchy temporary table and can only count through detail data. In the embodiment of the invention, the definition and management of the detail data and the dimension and the wide table enable all levels of data to be stored and support the custom inquiry, so that the data multiplexing performance is high.
(5) Existing system data management is difficult. The existing system only defines dimensions in an implementation layer, so that the dimension table, temporary data and the wide table cannot be visually managed, and the report can be delivered to a business team only once. In the embodiment of the invention, the data can be directly serviced through the SQL to API function, and the data management is simple.
Based on the above system, the present invention also provides a data management method based on a mixed mode data mart, as shown in fig. 2, the method may include the following steps:
at step S110, data extraction, conversion and loading are performed from the data source through the data center at the data preparation layer, data records are generated based on playback of log data, and the generated data records are stored in the data warehouse.
Step S120, the data preparation layer extracts service data from the corresponding target data source and cleans the data through the data center based on the service rules and field mapping relation information set in the service data model selected from the data management platform, and the extracted and cleaned structured service data is stored in the detail data layer.
And step S130, storing the extracted and cleaned structured business data as a detail table in a detail data layer.
And step S140, summarizing the data in the detail table based on the dimension and the dimension attribute set in the dimension table model corresponding to the service data model in the service data layer, and at least obtaining the dimension table and the summarization table.
And step S150, extracting information from the data in the service data layer at least based on the dimension and the field information set in the wide table model corresponding to the service data model at the application data layer to obtain the application data serving as the data marts corresponding to the selected service data model, wherein the application data at least comprises the service wide table.
The method further comprises the steps of: when a user query request is received, determining user access rights based on a user identifier carried in the user query request and user rights information in a data management platform, rejecting the user request under the condition that the user does not have the access rights, querying the requested content from a data preparation layer, a detail data layer, a service data layer or an application data layer based on the content requested by the user under the condition that the user has the access rights, and returning a query result to the user.
Those of ordinary skill in the art will appreciate that the various illustrative components, systems, and methods described in connection with the embodiments disclosed herein can be implemented as hardware, software, or a combination of both. The particular implementation is hardware or software dependent on the specific application of the solution and the design constraints. Skilled artisans may implement the described functionality in varying ways for each particular application, but such implementation decisions should not be interpreted as causing a departure from the scope of the present invention. When implemented in hardware, it may be, for example, an electronic circuit, an Application Specific Integrated Circuit (ASIC), suitable firmware, a plug-in, a function card, or the like. When implemented in software, the elements of the invention are the programs or code segments used to perform the required tasks. The program or code segments may be stored in a machine readable medium or transmitted over transmission media or communication links by a data signal carried in a carrier wave.
It should be understood that the invention is not limited to the particular arrangements and instrumentality described above and shown in the drawings. For the sake of brevity, a detailed description of known methods is omitted here. In the above embodiments, several specific steps are described and shown as examples. However, the method processes of the present invention are not limited to the specific steps described and shown, and those skilled in the art can make various changes, modifications and additions, or change the order between steps, after appreciating the spirit of the present invention.
In this disclosure, features that are described and/or illustrated with respect to one embodiment may be used in the same way or in a similar way in one or more other embodiments and/or in combination with or instead of the features of the other embodiments.
The above description is only of the preferred embodiments of the present invention and is not intended to limit the present invention, and various modifications and variations can be made to the embodiments of the present invention by those skilled in the art. Any modification, equivalent replacement, improvement, etc. made within the spirit and principle of the present invention should be included in the protection scope of the present invention.

Claims (8)

1. A data management system for a data mart, the system comprising: the system comprises a data source, a data center, a data preparation layer, a detail data layer, a service data layer, an application data layer and a data management platform;
the data management platform is provided with a plurality of business data models, target data source information corresponding to business, a dimension table model, a wide table model, user information, user authority information and available resource information, wherein business rules and field mapping relations are arranged in the business data models, dimension and dimension attributes are arranged in the dimension table model, and the wide table model is provided with wide table dimensions and field information under each dimension; each business data model is associated with a corresponding target data source, a dimension table model and a wide table model;
The data sources comprise a plurality of heterogeneous data sources, and each heterogeneous data source comprises business data, log data and monitoring class data generated by application service in each business behavior;
the data center is used for extracting, converting and loading data from a data source;
the data preparation layer is used for extracting data from a data source by using a log file through the data center, converting and loading the data, orderly generating a data record based on the playback of the log data, storing the generated data record in a data warehouse, extracting and cleaning the service data from a corresponding target data source by the data center according to service rules and field mapping relation information set in a service data model selected from a data management platform aiming at the data in the data warehouse, and storing the extracted and cleaned structured service data in the detail data layer;
the detail data layer is used for storing the extracted and cleaned structured business data in a detail table;
the service data layer is used for summarizing the data in the detail table based on the dimension and the dimension attribute set in the dimension table model corresponding to the service data model, and at least obtaining a dimension table and a summary table;
The application data layer is used for extracting data information in the service data layer at least based on the dimension and the field information set in the wide table model corresponding to the service data model to obtain application data used as a data mart, wherein the application data at least comprises a service wide table;
the data preparation layer, the detail data layer, the service data layer, the application data layer and the data management platform provide data access interfaces for the data integration application platform, and can query data of different levels through the data access interfaces corresponding to different layers;
when a user query request is received, the system determines the user access right based on the user identification carried in the user query request and the user right information in the data management platform, refuses the user request under the condition that the user does not have the access right, inquires the requested content from the data preparation layer, the detail data layer, the service data layer or the application data layer based on the content of the user request under the condition that the user has the access right, and returns an inquiry result to the user.
2. The system of claim 1, wherein the system further comprises:
The data integration application platform is used for providing data services for users based on the access interface.
3. The system according to claim 1 or 2, wherein the data management platform is further provided with an application interface configuration, a stream data management module, an executor and a scheduler;
the stream data management module is used for providing stream processing tasks, batch processing task setting and timing management of each task;
the executor is used for executing tasks based on task setting and timing management of the stream data management module;
the scheduler is used for performing task scheduling.
4. The system according to claim 1 or 2, wherein the service data layer further comprises a static table; the application data layer further includes: user portrayal and public data.
5. A data management method implemented by a data management system based on a data mart, the data management system comprising: the system comprises a data source, a data center, a data preparation layer, a detail data layer, a service data layer, an application data layer and a data management platform; the data management platform is provided with a plurality of business data models, target data source information corresponding to business, a dimension table model, a wide table model, user information, user authority information and available resource information, wherein business rules and business fields are arranged in the business data models, dimension and dimension attributes are arranged in the dimension table model, and field information in wide table dimensions and under each dimension is arranged in the wide table model; each business data model is associated with a corresponding target data source, a dimension table model and a wide table model; the data sources comprise a plurality of heterogeneous data sources, and each heterogeneous data source comprises business data, log data and monitoring class data generated by application service in each business behavior; the data preparation layer, the detail data layer, the service data layer, the application data layer and the data management platform are provided with data access interfaces for the data integration application platform to access, and can query different levels of data through the data access interfaces corresponding to different layers, and the method comprises the following steps:
The data preparation layer performs data extraction from a data source through the data center station by using a log file, performs data conversion and loading, orderly generates a data record based on the playback of the log data, stores the generated data record in a data warehouse, performs service data extraction and data cleaning from a corresponding target data source by the data center station according to service rules and field mapping relation information set in a service data model selected from a data management platform for the data in the data warehouse, and stores the structural service data obtained after the extraction and cleaning in the detail data layer;
storing the extracted and cleaned structured business data as a detail table in the detail data layer;
summarizing the data in the detail table based on the dimension and the dimension attribute set in the dimension table model corresponding to the service data model at the service data layer, and at least obtaining a dimension table and a summary table;
information extraction is carried out on data in the service data layer on the basis of at least dimensions and field information set in a wide table model corresponding to the service data model in the application data layer, so that application data serving as a data mart corresponding to the selected service data model is obtained, and the application data at least comprises a service wide table;
The method further comprises the steps of: when a user query request is received, determining user access rights based on a user identifier carried in the user query request and user rights information in the data management platform, rejecting the user request under the condition that the user does not have the access rights, querying the requested content from the data preparation layer, the detail data layer, the service data layer or the application data layer based on the content of the user request under the condition that the user has the access rights, and returning a query result to the user.
6. The method of claim 5, wherein the system further comprises the data integration application platform; the method also includes providing, by the data integration application platform, a data service to a user.
7. The method of claim 5, wherein the data management platform is further provided with an application interface configuration, a stream data management module, an executor, and a scheduler; the method further comprises the steps of:
providing data stream processing tasks, batch processing task setting and timing management of each task through the stream data management module;
executing, by the executor, the set task based on the task setting and timing management of the stream data management module; and
And the scheduler is used for performing task scheduling.
8. The method of claim 5, wherein the service data layer further comprises a static table; the application data layer further includes: user portrayal and public data.
CN202310061992.8A 2023-01-13 2023-01-13 Data management system and data management method for data marts Active CN115794929B (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN202310061992.8A CN115794929B (en) 2023-01-13 2023-01-13 Data management system and data management method for data marts

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN202310061992.8A CN115794929B (en) 2023-01-13 2023-01-13 Data management system and data management method for data marts

Publications (2)

Publication Number Publication Date
CN115794929A CN115794929A (en) 2023-03-14
CN115794929B true CN115794929B (en) 2023-05-23

Family

ID=85429776

Family Applications (1)

Application Number Title Priority Date Filing Date
CN202310061992.8A Active CN115794929B (en) 2023-01-13 2023-01-13 Data management system and data management method for data marts

Country Status (1)

Country Link
CN (1) CN115794929B (en)

Families Citing this family (4)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN116578600A (en) * 2023-05-19 2023-08-11 广州经传多赢投资咨询有限公司 Micro-service data aggregation method, system, equipment and storage medium
CN117009334B (en) * 2023-08-04 2024-03-01 哈尔滨航天恒星数据系统科技有限公司 Intelligent access and processing method for massive agricultural multi-source heterogeneous sensing data, electronic equipment and storage medium
CN117251633B (en) * 2023-10-08 2024-08-27 国任财产保险股份有限公司 Customer data management system
CN117633074A (en) * 2023-11-24 2024-03-01 国任财产保险股份有限公司 Financial data-based processing system

Family Cites Families (4)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US10671628B2 (en) * 2010-07-09 2020-06-02 State Street Bank And Trust Company Systems and methods for data warehousing
CN113592680A (en) * 2021-07-28 2021-11-02 浙江省公众信息产业有限公司 Service platform based on regional education big data
CN113628069B (en) * 2021-08-11 2023-01-20 广东电网有限责任公司 Planning domain power grid data market construction method and system, computer and storage medium
CN114357088B (en) * 2021-12-14 2024-02-27 中核武汉核电运行技术股份有限公司 Nuclear power industry data warehouse system

Also Published As

Publication number Publication date
CN115794929A (en) 2023-03-14

Similar Documents

Publication Publication Date Title
CN115794929B (en) Data management system and data management method for data marts
US10754877B2 (en) System and method for providing big data analytics on dynamically-changing data models
US9679021B2 (en) Parallel transactional-statistics collection for improving operation of a DBMS optimizer module
US9218408B2 (en) Method for automatically creating a data mart by aggregated data extracted from a business intelligence server
Santos et al. Real-time data warehouse loading methodology
CN112396404A (en) Data center system
US10235430B2 (en) Systems, methods, and apparatuses for detecting activity patterns
CN114925045B (en) PaaS platform for big data integration and management
Srivastava et al. Warehouse creation-a potential roadblock to data warehousing
CN111475490B (en) Data management system and method of data directory system
CN113312376B (en) Method and terminal for real-time processing and analysis of Nginx logs
CN113064866A (en) Power business data integration system
CN113642299A (en) One-key generation method based on power grid statistical form
CN114218218A (en) Data processing method, device and equipment based on data warehouse and storage medium
US8051099B2 (en) Energy efficient data provisioning
CN117421376A (en) Method and device for processing number of bins of online analysis of service data stream
Zhang et al. HyBench: A New Benchmark for HTAP Databases
CN111125045B (en) Lightweight ETL processing platform
CN111259082A (en) Method for realizing full data synchronization in big data environment
US9305066B2 (en) System and method for remote data harmonization
CN114862277A (en) Enterprise hybrid cloud management system
CN114706881A (en) Method for querying SQL (structured query language) aiming at high risk based on database middleware
CN114218216A (en) Resource management method, device, equipment and storage medium
CN113111103A (en) Intelligent comprehensive big data fusion processing platform
Mekterović et al. Improving the ETL process and maintenance of higher education information system data warehouse

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination
GR01 Patent grant
GR01 Patent grant