CN112328708A

CN112328708A - Mixed data warehouse technology for real-time aggregation of multiple data sources

Info

Publication number: CN112328708A
Application number: CN202011273030.1A
Authority: CN
Inventors: 江品磊; 赵子昂; 孙海龙; 罗靖东
Original assignee: Shenzhen Prajna Big Data Technology Co ltd
Current assignee: Shenzhen Prajna Big Data Technology Co ltd
Priority date: 2020-11-13
Filing date: 2020-11-13
Publication date: 2021-02-05

Abstract

A mixed data warehouse technology for real-time aggregation of multiple data sources comprises a data aggregation query middleware, wherein the middleware comprises a query script, a query client, a query analysis engine, a source data source loading component and a target data source aggregation component; the query script is an extended version of a structured database query language script and consists of a multi-segment structured database query language, and each segment defines queried data, a data source to be queried and effective time; the method has the technical key points that the method adopts the loading principle according to the requirement, and a user can automatically capture data segments of different data sources and import the data segments into a new data source only by writing a data query script, and perform aggregate query in the new data source; compared with the existing database warehouse building, the method does not need to supervise warehouse building, only needs to compile query scripts, saves a complex ETL process, and provides convenience for data query operation of enterprises.

Description

Mixed data warehouse technology for real-time aggregation of multiple data sources

Technical Field

The invention belongs to the field of data warehouses, and particularly relates to a mixed data warehouse technology for real-time aggregation of multiple data sources.

Background

The data warehouse is a strategic set which provides all types of data support for decision making processes of all levels of enterprises and is considered as a core component of business intelligence; it is a central repository of information created for analytical reporting and decision support purposes. The data warehouse provides guidance for enterprises needing business intelligence, including business process improvement, monitoring and controlling time, cost, quality and the like;

when carrying out aggregation analysis on multiple system multiple data sources of an enterprise, a data warehouse is usually established for the enterprise, then data of different data sources are extracted, installed and exchanged and loaded into a newly established data warehouse through an ETL tool at regular time, and then an analysis system carries out aggregation analysis on the data by connecting the newly established data warehouse;

then, the following technical problems occur when the operation is performed again in the conventional mode: firstly, a data warehouse needs to be established, and the cost cannot be controlled by medium and small enterprises; secondly, when data are analyzed, the data need to be preprocessed by means of the etl, and the development and implementation period is long; thirdly, the data are synchronized to the data warehouse at regular time, and the data cannot be subjected to aggregation analysis in real time.

Disclosure of Invention

The invention aims to overcome the defects of the prior art and provide a mixed data warehouse technology for real-time aggregation of multiple data sources.

In order to achieve the purpose, the invention adopts the following technical scheme:

a mixed data warehouse technology for real-time aggregation of multiple data sources comprises a data aggregation query middleware, wherein the middleware comprises a query script, a query client, a query analysis engine, a source data source loading component and a target data source aggregation component;

the query script is an extended version of a structured database query language script and consists of a multi-segment structured database query language, and each segment defines queried data, a data source to be queried and effective time;

the query client is used for receiving the query script and sending the query script to the query analysis engine;

the query analysis engine is used for analyzing the query script, splitting the query script into different structured database query languages, and querying and sending the query script to the source data loading component and the target data source aggregation component;

the source data source loading component is used for receiving a query script to acquire data and a table structure from a data source and sending the data and the table structure to the target data source aggregation component;

and the target data source aggregation component is used for receiving the structure and the data of the source data source loading component, converting the structure and the data into a new zero-time table and receiving the query script to return aggregated data.

Preferably, when querying the required data, the specific process is as follows:

firstly, using a query analysis engine to query at least two groups of service data sources to obtain a required data set segment;

then, using the source data source loading component, receiving a new data source of the data set fragment;

and finally, querying the aggregated data set segment of the new data source in the target data source aggregation component.

Preferably, after querying the data result in the target data source aggregation component, the following steps are also present:

the method comprises the following steps: networking the query client, backing up the query result, and regularly cleaning the storage space in the memory;

step two: uploading the query result to a network terminal, summarizing the query result by utilizing a big data analysis technology, and generating keywords for the data source with the query times being earlier;

step three: and recording and counting the keywords which do not appear in the query process to generate new keywords for supplementing the database.

Preferably, in the first step, the data is uploaded to the enterprise cloud space and the memory during backup, and only the memory space is periodically cleaned up, and the cleaning cycle may be set to one week, one month, and one year.

Preferably, the memory comprises a hard disk group consisting of a plurality of groups of mobile hard disks.

Preferably, the structured database query language includes a data definition language, a data operation language, a data query language and a data management language, the data definition language is for operation of the logical structure of the database, the data operation language and the data query language are for specific data, and the data management language is for management of the authority portion.

Compared with the prior art, the invention provides a mixed data warehouse technology for real-time aggregation of multiple data sources, which has the following beneficial effects:

the invention adopts the loading principle as required, and a user can automatically capture data segments of different data sources and import the data segments into a new data source only by writing a data query script, and perform aggregate query in the new data source;

compared with the existing database warehouse building, the method does not need to supervise warehouse building, only needs to compile query scripts, saves a complex ETL process, and provides convenience for data query operation of enterprises.

Drawings

FIG. 1 is a flow chart of data query in the present invention.

Detailed Description

The following further describes an embodiment of a hybrid data warehouse technology for real-time aggregation of multiple data sources according to the present invention with reference to fig. 1. The hybrid data warehouse technology for real-time aggregation of multiple data sources of the present invention is not limited to the description of the following embodiments.

A mixed data warehouse technology for real-time aggregation of multiple data sources comprises a data aggregation query middleware, wherein the middleware comprises a query script, a query client, a query analysis engine, a source data source loading component and a target data source aggregation component.

the source data source loading component is used for receiving the query script to acquire data and a table structure from the data source and sending the data and the table structure to the target data source aggregation component;

the target data source aggregation component is used for receiving the structure and the data of the source data source loading component, converting the structure and the data into a new zero-time table and receiving the query script to return aggregated data;

As shown in fig. 1, when querying the required data, the specific process is as follows:

firstly, querying two groups of service data sources by using a query analysis engine to obtain a required data set segment;

in addition, when the query analysis engine is used, three or more groups of service data sources can be queried, and the more data sources are, the more data set segments are needed, so that the integrity of the finally queried data is ensured.

After querying the data result in the standard data source aggregation component, the following steps are also present:

step three: recording and counting keywords which do not appear in the query process to generate new keywords for supplementing a database; the purpose of this step is to facilitate the subsequent person to be able to perform targeted capture when performing keyword queries.

In the first step, data are uploaded into an enterprise cloud space and a memory during backup, only the memory is cleaned periodically, and the cleaning period can be set to be one week, one month and one year; the memory comprises a hard disk group which is composed of a plurality of groups of mobile hard disks.

The structured database query language comprises a data definition language, a data operation language, a data query language and a data management language, wherein the data definition language aims at the operation of the logical structure of the database, the data operation language and the data query language aim at specific data, and the data management language aims at the management of the authority part.

The foregoing is a more detailed description of the invention in connection with specific preferred embodiments and it is not intended that the invention be limited to these specific details. For those skilled in the art to which the invention pertains, several simple deductions or substitutions can be made without departing from the spirit of the invention, and all shall be considered as belonging to the protection scope of the invention.

Claims

1. A mixed data warehouse technology for real-time aggregation of multiple data sources is characterized in that: the method comprises a data aggregation query middleware, wherein the middleware comprises a query script, a query client, a query analysis engine, a source data source loading component and a target data source aggregation component;

2. The hybrid data warehouse technology for real-time aggregation of multiple data sources of claim 1, wherein: when the required data is queried, the specific process is as follows:

3. The hybrid data warehouse technology for real-time aggregation of multiple data sources of claim 2, wherein: after the data result is inquired in the target data source aggregation component, the following steps are also present:

4. The hybrid data warehouse technology for real-time aggregation of multiple data sources of claim 3, wherein: in the first step, data is uploaded into the enterprise cloud space and the memory during backup, only the memory space is periodically cleaned, and the cleaning period can be set to be one week, one month and one year.

5. The hybrid data warehouse technology for real-time aggregation of multiple data sources as claimed in claim 4, wherein: the memory comprises a hard disk group which is composed of a plurality of groups of mobile hard disks.

6. The hybrid data warehouse technology for real-time aggregation of multiple data sources of claim 1, wherein: the structured database query language comprises a data definition language, a data operation language, a data query language and a data management language, wherein the data definition language aims at the operation of the logical structure of the database, the data operation language and the data query language aim at specific data, and the data management language aims at the management of the authority part.