CN117216171A

CN117216171A - Data warehouse and data processing method based on kimball dimension modeling

Info

Publication number: CN117216171A
Application number: CN202311235481.XA
Authority: CN
Inventors: 丁刘健; 陶治; 焦尧
Original assignee: Hangzhou Zhulong Information Technology Co ltd
Current assignee: Hangzhou Zhulong Information Technology Co ltd
Priority date: 2023-09-22
Filing date: 2023-09-22
Publication date: 2023-12-12

Abstract

The application provides a data warehouse and a data processing method based on kimball dimension modeling, and relates to the field of data processing. The data warehouse comprises a processing unit, a grabbing unit and a service unit; the grabbing unit is used for grabbing different types of first data in the data source and sending the different types of first data to the processing unit; the processing unit is used for processing the first data of any type to obtain second data; the service unit is used for receiving a data acquisition request comprising a data identifier and a user ID sent by the user terminal, verifying the user ID, and if the user ID passes the verification, sending the data acquisition request to the processing unit; and the processing unit is also used for acquiring target data in the second data according to the data identification and sending the target data to the user terminal through the service unit. The application can process the data with huge data quantity, complex structure, numerous types and the like, and the unified data format ensures the consistency of the data.

Description

Data warehouse and data processing method based on kimball dimension modeling

Technical Field

The application relates to the field of data processing, in particular to a data warehouse and a data processing method based on kimball dimension modeling.

Background

With the rapid development of applications such as mobile internet and internet of things, the global data volume has been increased explosively. The rapid increase in data size is predictive of the fact that the large data age has now entered. The network operators have huge users and control capability on terminals and user surfing channels, so that the network operators have a good data base in the aspect of user behavior analysis, deeply analyze the flow behavior characteristics and rules of the users, find the potential consumption demands of the users, and are effective means for improving the value and the operation level. However, not only is the data size larger and larger, but the complexity of large data processing is greatly increased by the multiple data types and processing real-time requirements. Big data presents technical challenges to conventional data analysis processing techniques (e.g., data warehouse). The traditional data analysis processing technology cannot process high expansibility and mass requirements of big data; the traditional data analysis processing is usually only aimed at a certain type of data and is single in comparison, and the big data has the characteristics of huge data volume, complex structure, numerous types and the like, so that new challenges are provided for the storage, processing and analysis of the big data.

Disclosure of Invention

The embodiment of the application aims to provide a data warehouse and a data processing method based on kimball dimension modeling, which are used for solving the problems in the prior art and improving the data storage quality.

In a first aspect, a kimball dimension modeling-based data warehouse is provided, the data warehouse comprises a processing unit, a grabbing unit and a service unit;

the processing unit is respectively connected with the grabbing unit and the service unit;

the grabbing unit is used for grabbing first data of different types in the data source and sending the first data of different types to the processing unit;

the processing unit is used for processing any type of first data to obtain second data;

the service unit is used for receiving a data acquisition request comprising a data identifier and a user ID sent by a user terminal, verifying the user ID, and if the user ID passes the verification, sending the data acquisition request to the processing unit;

the processing unit is further configured to obtain target data in the second data according to the data identifier, and send the target data to the user terminal through the service unit.

In one possible implementation, the processing unit includes: an ETL module and a DW module;

the ETL module is connected with the grabbing unit;

the DW module is respectively connected with the ETL module and the service unit;

the ETL module is used for performing one or more of extraction, conversion and loading on any type of first data to obtain second data, and sending the different types of first data and the second data to the DW module respectively;

the DW module is used for storing the first data and the second data of different types.

In one possible implementation, the processing unit includes: ETL module, ODS module and DW module;

the ODS module is respectively connected with the grabbing unit and the ETL module;

the ODS module is configured to store the first data sent by the capturing unit and send the first data to the ETL module;

the ETL module is used for performing one or more of extraction, conversion and loading on any type of first data to obtain second data, and sending the second data to the DW module;

the DW module is used for storing the second data.

In one possible implementation, processing the first data to obtain second data includes:

dividing the first data according to attribute fields set in a star mode to obtain different attribute information;

the second data is determined based on the different attribute information.

In one possible implementation, the service unit is further configured to encrypt the target data sent by the processing unit, obtain encrypted data, and send the encrypted data to the user terminal.

In one possible implementation, encrypting the target data sent by the processing unit to obtain encrypted data includes:

splitting the target data according to the fields corresponding to the target data to obtain a plurality of target sub-data;

performing exclusive OR operation on any target sub-data and a randomly generated original secret key to obtain sub-encrypted data;

and integrating the plurality of sub-encrypted data to obtain the encrypted data.

In one possible implementation, the service unit is further configured to return the data acquisition request to the user terminal if the verification is not passed, and send a prompt message indicating that the user ID verification fails to the user terminal.

In a second aspect, a data processing method of a data warehouse based on kimball dimension modeling is provided, and the method may include:

capturing first data of different types in a data source;

processing the first data of any type to obtain second data;

receiving a data acquisition request comprising a data identifier and a user ID sent by a user terminal, and verifying the user ID;

and if the verification is passed, acquiring target data in the second data according to the data identification, and transmitting the target data to the user terminal.

In a third aspect, an electronic device is provided, the electronic device comprising a processor, a communication interface, a memory, and a communication bus, wherein the processor, the communication interface, and the memory are in communication with each other via the communication bus;

a memory for storing a computer program;

a processor for implementing the method steps of any one of the above first aspects when executing a program stored on a memory.

In a fourth aspect, a computer-readable storage medium is provided, in which a computer program is stored which, when being executed by a processor, carries out the method steps of any of the first aspects.

The application provides a data warehouse and a data processing method based on kimball dimension modeling, and relates to the field of data processing. The data warehouse comprises a processing unit, a grabbing unit and a service unit; the grabbing unit is used for grabbing different types of first data in the data source and sending the different types of first data to the processing unit; the processing unit is used for processing the first data of any type to obtain second data; the service unit is used for receiving a data acquisition request comprising a data identifier and a user ID sent by the user terminal, verifying the user ID, and if the user ID passes the verification, sending the data acquisition request to the processing unit; and the processing unit is also used for acquiring target data in the second data according to the data identification and sending the target data to the user terminal through the service unit. The application can process the data with huge data quantity, complex structure, numerous types and the like, improves the data quality and unifies the data format to ensure the consistency of the data. Based on the authentication of the service unit to the user ID and the encryption of the data, the security in the data transmission process can be ensured.

Drawings

In order to more clearly illustrate the technical solutions of the embodiments of the present application, the drawings that are needed in the embodiments of the present application will be briefly described below, it should be understood that the following drawings only illustrate some embodiments of the present application and should not be considered as limiting the scope, and other related drawings can be obtained according to these drawings without inventive effort for a person skilled in the art.

FIG. 1 is a system architecture diagram of a data warehouse for kimball dimension modeling in accordance with an embodiment of the present application;

FIG. 2 is a block diagram of a data warehouse based on kimball dimension modeling according to an embodiment of the present application;

fig. 3 is a schematic structural diagram of an a-structure processing unit according to an embodiment of the present application;

fig. 4 is a schematic structural diagram of a B-structure processing unit according to an embodiment of the present application;

FIG. 5 is a schematic diagram of a star pattern according to an embodiment of the present application;

FIG. 6 is a schematic diagram of a snowflake pattern according to an embodiment of the present application;

fig. 7 is a schematic structural diagram of a constellation pattern according to an embodiment of the present application;

FIG. 8 is a schematic flow chart of a data processing method of a data warehouse based on kimball dimension modeling according to an embodiment of the present application;

fig. 9 is a schematic structural diagram of an electronic device according to an embodiment of the present application.

Detailed Description

The following description of the embodiments of the present application will be made clearly and completely with reference to the accompanying drawings, in which it is apparent that the embodiments described are only some embodiments, but not all embodiments of the present application. All other embodiments, which can be made by those skilled in the art based on the embodiments of the application without making any inventive effort, are intended to be within the scope of the application.

For convenience of understanding, the terms involved in the embodiments of the present application are explained below:

a Data repository (Data repository), which may be abbreviated as DW or DWH. The father Bill Inmon (Bill Inmon) of Data warehouse proposed in 1990, the main function is to make systematic analysis and arrangement of a large amount of Data accumulated by the organization through online transaction processing (OLTP) of information system through Data storage architecture specific to Data warehouse theory, so as to facilitate various analysis methods such as online analysis processing (OLAP) and Data Mining (Data Mining), and further support the creation of Decision Support System (DSS) and supervisor information system (EIS), so as to help decision maker to analyze valuable information from a large amount of Data quickly and effectively, and facilitate decision making and quick response to external environment change, and help to construct Business Intelligence (BI).

Dimension modeling, which is the key to success of data warehouse/business intelligence projects, is based on simplicity to succeed in data presentation regardless of whether our data volume is increasing from GB to TG or PB, and is based on how simplicity can be provided by considering moment, with business as the driver, and user comprehensiveness and query performance as targets.

With the rapid development of applications such as mobile internet and internet of things, the traditional data analysis processing technology cannot process high expansibility and mass requirements of big data; the traditional data analysis processing is usually only aimed at a certain type of data and is single in comparison, and the big data has the characteristics of huge data volume, complex structure, numerous types and the like, so that new challenges are provided for the storage, processing and analysis of the big data.

In order to solve the above problem, the data warehouse based on kimball dimension modeling provided by the embodiment of the present application may be applied to a system architecture as shown in fig. 1, where the system may include: data layer, data warehouse, application layer.

Wherein the data warehouse is connected with the data layer and the application layer in a wired or wireless way respectively.

And the data layer is used for providing different types of first data for the data warehouse for the data source.

And the data warehouse acquires the first data through the data layer, and processes the acquired data to obtain the second data so as to provide a data base for the application layer.

And the application layer acquires corresponding data through the data warehouse.

The preferred embodiments of the present application will be described below with reference to the accompanying drawings of the specification, it being understood that the preferred embodiments described herein are for illustration and explanation only, and not for limitation of the present application, and embodiments of the present application and features of the embodiments may be combined with each other without conflict.

FIG. 2 is a schematic structural diagram of a data warehouse based on kimball dimension modeling according to an embodiment of the present application, and as shown in FIG. 2, the data warehouse may include: the system comprises a processing unit, a grabbing unit and a service unit.

The processing unit is respectively connected with the grabbing unit and the service unit.

The grabbing unit is used for grabbing different types of first data in the data source and sending the different types of first data to the processing unit. Specifically, a data grabbing unit is built by using the net5, so that data of a plurality of data sources can be collected, and the data sources can be expanded automatically.

And the processing unit is used for processing the first data of any type to obtain second data.

Specifically, the processing unit may include two structures A, B:

a structure: as shown in fig. 3: the processing unit may include an ETL module and a DW module.

The ETL module is connected with the grabbing unit; the DW module is respectively connected with the ETL module and the service unit.

And the ETL module is used for carrying out one or more of extraction, conversion and loading on the first data of any type to obtain second data, and respectively sending the first data and the second data of different types to the DW module.

And the DW module is used for storing different types of first data and second data.

The method is that the DW module is used for carrying out backup storage on the original total first data, so that the safety and the uneasiness of the data can be ensured, the second data is stored, and the simplicity and the accuracy of the data can be ensured when the application layer needs to acquire the data.

And B structure: as shown in fig. 4: the processing unit includes: ETL module, ODS module, and DW module.

Wherein, the ODS module is respectively connected with the grabbing unit and the ETL module; the DW module is respectively connected with the ETL module and the service unit.

And the ODS module is used for storing the first data sent by the grabbing unit and sending the first data to the ETL module.

And the ETL module is used for carrying out one or more of extraction, conversion and loading on the first data of any type to obtain second data, and sending the second data to the DW module.

And the DW module is used for storing the second data.

This way, the original full amount of the first data is backed up and stored by the ODS module. The second data is stored by the DW module.

It should be noted that, the DW module in the processing unit generates the data table by using a preset pattern based on the dimension modeling mode of kimball. When the processing unit is in an a structure, the DW module may include a data table storage mode and a normal data storage mode in a preset mode. When the processing unit is in a B structure, the DW module comprises a data table storage mode of a preset mode.

The preset mode comprises any one of a star mode, a snowflake mode and a constellation mode.

As shown in FIG. 5, the star schema is centered on fact tables, with all dimension tables directly attached to the fact tables.

In particular, fact tables may include primary dimensions and metrics; for example: the primary dimension may be a region key, a time key, a department key, a product key, etc. The metrics may be sales quantity and sales amount.

Based on the primary dimension, a primary dimension table directly connected with the fact table is established; the primary dimension table includes primary keys and dimension information.

For example: the region dimension table comprises a region main key; the dimension information may be province, city, etc. The time dimension table comprises a time primary key, and dimension information can be year, month and day. The product dimension table comprises a product main key, and dimension information can be a product name, a product selling price, a product quality and the like. The department dimension table includes a department primary key, and dimension information can be a head office, a branch office, an agency, and the like. Specific information of the fact table and the dimension table is set according to the requirements.

As shown in FIG. 6, the snowflake schema dimension table is based on a star schema dimension table, and other secondary dimension tables are extended outwards.

The difference between snowflake mode and star mode is that: and establishing a secondary dimension table based on the primary dimension table.

For example, when the primary dimension table is a region dimension table, the secondary dimension table is divided based on the primary dimension table: the province dimension table comprises province main keys and province names; the city dimension table includes a city primary key and a city name.

As shown in fig. 7, the constellation pattern is based on a plurality of fact tables, and the dimension information is shared, i.e., some dimension tables may be shared between the fact tables.

The constellation pattern is different from the star pattern and the snowflake pattern in that a plurality of fact tables and a plurality of primary dimension tables exist, wherein the plurality of primary dimension tables can be shared by a plurality of fact tables, and primary dimension tables singly used by all the fact tables can also exist.

For example: the primary dimension table A, the primary dimension table C and the primary dimension table D are shared by the fact table 1 and the fact table 2; primary dimension table B is fact table 2 single use.

In the method, the service unit does not need to excessively think about business meanings in the data when calling the data based on the split of the dimension to the data.

The ODS module and DW module in both configurations of the processing unit are built by Mysq l.

In some embodiments, the dimensions of the DW module are freely expandable, for example: custom time, etc.

In some embodiments, the data in the DW module may convert the data format according to a predetermined standard to obtain data in a uniform format. The method solves the problems that the data sources are too many, the data are too scattered, and complex operation of aggregation after scattered inquiry is needed. A standard semantic set surrounding data, such as: consistency of naming conventions, consistency of data formats, etc.

In some embodiments, one or more of extracting, converting and loading the first data of any type to obtain the second data may specifically include:

and dividing the first data according to the attribute fields set in the preset mode to obtain different attribute information.

The second data is determined based on the different attribute information and the preset process.

The preset process may be one or more of extraction, conversion and loading.

The service unit is used for receiving a data acquisition request which is sent by a user through the user terminal and comprises a data identifier and a user ID, verifying the user ID, and if the user ID passes the verification, sending the data acquisition request to the processing unit; if the verification is not passed, the data acquisition request is returned to the user terminal, and prompt information of failure of user ID verification is sent to the user terminal. The service units may be written using net5+efcore. The service unit in the mode provides a mode of acquiring data externally, controls access rights, avoids the problem of data warehouse safety caused by direct access of the user terminal to the database, and can prevent the user terminal from connecting a plurality of systems to perform data access by taking the service unit in the mode as a unique access point for data access, thereby causing low data access efficiency.

Further, the current user ID is verified, and the authority of the current user ID can be determined according to the corresponding relation between the user ID and the corresponding authority. And determining whether the target data corresponding to the data identifier can be sent to the user terminal according to the authority of the current user ID. The current user ID may also be verified based on the user ID whitelist according to the set user ID whitelist.

And the DW module in the processing unit is also used for acquiring target data in the second data according to the data identification and sending the target data to the user terminal through the service unit.

In some embodiments, when the DW module does not acquire the target data in the second data according to the data identifier, a data capture request is generated based on the data identifier, the data capture request is sent to the capture unit, the capture unit captures data to be processed corresponding to the data identifier in the data layer according to the received data capture request, the data to be processed is sent to the ETL module, the ETL module processes the data to be processed according to a preset mode to obtain processed data, the processed data is sent to the DW module for storage, and the DW module sends the processed data to the user terminal.

In the existing network environment, the security of data is always a very important issue, and often users cannot perceive the data by illegal tampering in the process of storing or reading the data, so that larger influence and loss are caused.

Specifically, the method comprises the following steps:

performing exclusive OR operation on the target sub-data and the randomly generated original secret key aiming at any target sub-data to obtain sub-encrypted data;

and integrating the plurality of sub-encrypted data to obtain encrypted data.

The encrypted data is then sent to the user terminal.

And the user terminal processes the encrypted data through exclusive or inverse operation to obtain target data corresponding to the encrypted data.

The service unit in this manner ensures the security of the data in the data warehouse.

The application provides a data warehouse based on kimball dimension modeling, which comprises a processing unit, a grabbing unit and a service unit. The application can process the data with huge data quantity, complex structure, numerous types and the like, improves the data quality and unifies the data format to ensure the consistency of the data. Based on the authentication of the service unit to the user ID and the encryption of the data, the security in the data transmission process can be ensured.

Fig. 8 is a flow chart of a data processing method of a data warehouse based on kimball dimension modeling according to an embodiment of the present application. As shown in fig. 8, the method is applied to a server connected to a data warehouse, and the method may include:

step 810, capturing different types of first data in a data source.

Step S820, processing the first data of any type to obtain second data.

Step S830, receiving a data acquisition request including a data identifier and a user ID sent by a user terminal, verifying the user ID, and determining a sending position of the target data corresponding to the data identifier according to a verification result.

The embodiment of the application also provides an electronic device, as shown in fig. 9, which includes a processor 910, a communication interface 920, a memory 930, and a communication bus 940, where the processor 910, the communication interface 920, and the memory 930 implement communication between each other through the communication bus 940.

A memory 930 for storing a computer program;

processor 910, when executing a program stored on memory 930, performs the following steps:

The communication bus mentioned above may be a peripheral component interconnect standard (Peripheral Component Interconnect, PCI) bus or an extended industry standard architecture (Extended Industry Standard Architecture, EISA) bus, or the like. The communication bus may be classified as an address bus, a data bus, a control bus, or the like. For ease of illustration, the figures are shown with only one bold line, but not with only one bus or one type of bus.

The communication interface is used for communication between the electronic device and other devices.

The Memory may include random access Memory (Random Access Memory, RAM) or may include Non-Volatile Memory (NVM), such as at least one disk Memory. Optionally, the memory may also be at least one memory device located remotely from the aforementioned processor.

The processor may be a general-purpose processor, including a central processing unit (Central Processing Unit, CPU), a network processor (Network Processor, NP), etc.; but also digital signal processors (Digital Signal Processing, DSP), application specific integrated circuits (Application Specific Integrated Circuit, ASIC), field programmable gate arrays (Field-Programmable Gate Array, FPGA) or other programmable logic devices, discrete gate or transistor logic devices, discrete hardware components.

Since the implementation manner and the beneficial effects of the solution to the problem of each device of the electronic apparatus in the foregoing embodiment may be implemented by referring to each step in the embodiment shown in fig. 8, the specific working process and the beneficial effects of the electronic apparatus provided by the embodiment of the present application are not repeated herein.

In yet another embodiment of the present application, a computer readable storage medium is provided, in which instructions are stored, which when run on a computer, cause the computer to perform a data processing method of a data warehouse based on kimball dimension modeling as described in any of the above embodiments.

In yet another embodiment of the present application, there is also provided a computer program product containing instructions that, when run on a computer, cause the computer to perform a data processing method of a kimball dimension modeling based data warehouse as described in any of the above embodiments.

It will be appreciated by those skilled in the art that embodiments of the present application may be provided as methods, systems, or computer program products. Accordingly, embodiments of the present application may take the form of an entirely hardware embodiment, an entirely software embodiment or an embodiment combining software and hardware aspects. Furthermore, embodiments of the present application may take the form of a computer program product on one or more computer-usable storage media (including, but not limited to, disk storage, CD-ROM, optical storage, etc.) having computer-usable program code embodied therein.

Embodiments of the present application are described with reference to flowchart illustrations and/or block diagrams of methods, apparatus (systems) and computer program products according to embodiments of the application. It will be understood that each flow and/or block of the flowchart illustrations and/or block diagrams, and combinations of flows and/or blocks in the flowchart illustrations and/or block diagrams, can be implemented by computer program instructions. These computer program instructions may be provided to a processor of a general purpose computer, special purpose computer, embedded processor, or other programmable data processing apparatus to produce a machine, such that the instructions, which execute via the processor of the computer or other programmable data processing apparatus, create means for implementing the functions specified in the flowchart flow or flows and/or block diagram block or blocks.

These computer program instructions may also be stored in a computer-readable memory that can direct a computer or other programmable data processing apparatus to function in a particular manner, such that the instructions stored in the computer-readable memory produce an article of manufacture including instruction means which implement the function specified in the flowchart flow or flows and/or block diagram block or blocks.

These computer program instructions may also be loaded onto a computer or other programmable data processing apparatus to cause a series of operational steps to be performed on the computer or other programmable apparatus to produce a computer implemented process such that the instructions which execute on the computer or other programmable apparatus provide steps for implementing the functions specified in the flowchart flow or flows and/or block diagram block or blocks.

While preferred embodiments of the present application have been described, additional variations and modifications in those embodiments may occur to those skilled in the art once they learn of the basic inventive concepts. Therefore, the present embodiments are intended to be construed as including the preferred embodiments and all such alterations and modifications that fall within the scope of the embodiments.

It will be apparent to those skilled in the art that various modifications and variations can be made in the embodiments of the present application without departing from the spirit or scope of the embodiments of the application. Thus, if such modifications and variations of the embodiments in the present application fall within the scope of the embodiments of the present application and the equivalent techniques thereof, such modifications and variations are also intended to be included in the embodiments of the present application.

Claims

1. A data warehouse based on kimball dimension modeling, characterized in that the data warehouse comprises a processing unit, a grabbing unit and a service unit;

2. The data warehouse of claim 1, wherein the processing unit includes: an ETL module and a DW module;

the ETL module is connected with the grabbing unit;

3. The data warehouse of claim 1, wherein the processing unit includes: ETL module, ODS module and DW module;

the DW module is used for storing the second data.

4. The data warehouse as claimed in claim 1, wherein the processing of the first data to obtain second data includes:

the second data is determined based on the different attribute information.

5. The data warehouse as claimed in claim 1, wherein the service element is further configured to encrypt the target data sent by the processing element to obtain encrypted data, and send the encrypted data to the user terminal.

6. The data warehouse as claimed in claim 5, wherein encrypting the target data sent by the processing unit to obtain encrypted data comprises:

7. The data warehouse as claimed in claim 1, wherein the service element is further configured to return the data acquisition request to the user terminal if the verification fails, and to send a prompt for failure of user ID verification to the user terminal.

8. A data processing method of a data warehouse based on kimball dimension modeling, applied to a server connected with the data warehouse, characterized in that the method comprises:

capturing first data of different types in a data source;

processing the first data of any type to obtain second data;

9. An electronic device, characterized in that the electronic device comprises a processor, a communication interface, a memory and a communication bus, wherein the processor, the communication interface and the memory are in communication with each other through the communication bus;

a memory for storing a computer program;

a processor for implementing the method steps of claim 8 when executing a program stored on a memory.

10. A computer-readable storage medium, characterized in that the computer-readable storage medium has stored therein a computer program which, when executed by a processor, implements the method steps of claim 8.