CN112328708A - Mixed data warehouse technology for real-time aggregation of multiple data sources - Google Patents
Mixed data warehouse technology for real-time aggregation of multiple data sources Download PDFInfo
- Publication number
- CN112328708A CN112328708A CN202011273030.1A CN202011273030A CN112328708A CN 112328708 A CN112328708 A CN 112328708A CN 202011273030 A CN202011273030 A CN 202011273030A CN 112328708 A CN112328708 A CN 112328708A
- Authority
- CN
- China
- Prior art keywords
- data
- query
- source
- aggregation
- script
- Prior art date
- Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
- Pending
Links
- 230000002776 aggregation Effects 0.000 title claims abstract description 42
- 238000004220 aggregation Methods 0.000 title claims abstract description 42
- 238000005516 engineering process Methods 0.000 title claims abstract description 19
- 238000013515 script Methods 0.000 claims abstract description 39
- 238000000034 method Methods 0.000 claims abstract description 20
- 238000004458 analytical method Methods 0.000 claims abstract description 18
- 230000008569 process Effects 0.000 claims abstract description 11
- 238000004140 cleaning Methods 0.000 claims description 6
- 238000013523 data management Methods 0.000 claims description 6
- 238000007405 data analysis Methods 0.000 claims description 3
- 239000012634 fragment Substances 0.000 claims description 3
- 238000007726 management method Methods 0.000 claims description 3
- 230000006855 networking Effects 0.000 claims description 3
- 230000001502 supplementing effect Effects 0.000 claims description 3
- 230000008676 import Effects 0.000 abstract description 3
- 239000000306 component Substances 0.000 description 22
- 230000009286 beneficial effect Effects 0.000 description 1
- 239000008358 core component Substances 0.000 description 1
- 230000007547 defect Effects 0.000 description 1
- 230000006872 improvement Effects 0.000 description 1
- 238000012544 monitoring process Methods 0.000 description 1
- 238000006467 substitution reaction Methods 0.000 description 1
- 230000001360 synchronised effect Effects 0.000 description 1
Images
Classifications
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F16/00—Information retrieval; Database structures therefor; File system structures therefor
- G06F16/20—Information retrieval; Database structures therefor; File system structures therefor of structured data, e.g. relational data
- G06F16/28—Databases characterised by their database models, e.g. relational or object models
- G06F16/283—Multi-dimensional databases or data warehouses, e.g. MOLAP or ROLAP
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F16/00—Information retrieval; Database structures therefor; File system structures therefor
- G06F16/20—Information retrieval; Database structures therefor; File system structures therefor of structured data, e.g. relational data
- G06F16/24—Querying
- G06F16/242—Query formulation
- G06F16/2433—Query languages
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F16/00—Information retrieval; Database structures therefor; File system structures therefor
- G06F16/20—Information retrieval; Database structures therefor; File system structures therefor of structured data, e.g. relational data
- G06F16/24—Querying
- G06F16/242—Query formulation
- G06F16/2433—Query languages
- G06F16/244—Grouping and aggregation
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F16/00—Information retrieval; Database structures therefor; File system structures therefor
- G06F16/20—Information retrieval; Database structures therefor; File system structures therefor of structured data, e.g. relational data
- G06F16/24—Querying
- G06F16/245—Query processing
- G06F16/2455—Query execution
- G06F16/24553—Query execution of query operations
- G06F16/24554—Unary operations; Data partitioning operations
- G06F16/24556—Aggregation; Duplicate elimination
Landscapes
- Engineering & Computer Science (AREA)
- Theoretical Computer Science (AREA)
- Databases & Information Systems (AREA)
- Physics & Mathematics (AREA)
- Data Mining & Analysis (AREA)
- General Engineering & Computer Science (AREA)
- General Physics & Mathematics (AREA)
- Computational Linguistics (AREA)
- Mathematical Physics (AREA)
- Information Retrieval, Db Structures And Fs Structures Therefor (AREA)
Abstract
A mixed data warehouse technology for real-time aggregation of multiple data sources comprises a data aggregation query middleware, wherein the middleware comprises a query script, a query client, a query analysis engine, a source data source loading component and a target data source aggregation component; the query script is an extended version of a structured database query language script and consists of a multi-segment structured database query language, and each segment defines queried data, a data source to be queried and effective time; the method has the technical key points that the method adopts the loading principle according to the requirement, and a user can automatically capture data segments of different data sources and import the data segments into a new data source only by writing a data query script, and perform aggregate query in the new data source; compared with the existing database warehouse building, the method does not need to supervise warehouse building, only needs to compile query scripts, saves a complex ETL process, and provides convenience for data query operation of enterprises.
Description
Technical Field
The invention belongs to the field of data warehouses, and particularly relates to a mixed data warehouse technology for real-time aggregation of multiple data sources.
Background
The data warehouse is a strategic set which provides all types of data support for decision making processes of all levels of enterprises and is considered as a core component of business intelligence; it is a central repository of information created for analytical reporting and decision support purposes. The data warehouse provides guidance for enterprises needing business intelligence, including business process improvement, monitoring and controlling time, cost, quality and the like;
when carrying out aggregation analysis on multiple system multiple data sources of an enterprise, a data warehouse is usually established for the enterprise, then data of different data sources are extracted, installed and exchanged and loaded into a newly established data warehouse through an ETL tool at regular time, and then an analysis system carries out aggregation analysis on the data by connecting the newly established data warehouse;
then, the following technical problems occur when the operation is performed again in the conventional mode: firstly, a data warehouse needs to be established, and the cost cannot be controlled by medium and small enterprises; secondly, when data are analyzed, the data need to be preprocessed by means of the etl, and the development and implementation period is long; thirdly, the data are synchronized to the data warehouse at regular time, and the data cannot be subjected to aggregation analysis in real time.
Disclosure of Invention
The invention aims to overcome the defects of the prior art and provide a mixed data warehouse technology for real-time aggregation of multiple data sources.
In order to achieve the purpose, the invention adopts the following technical scheme:
a mixed data warehouse technology for real-time aggregation of multiple data sources comprises a data aggregation query middleware, wherein the middleware comprises a query script, a query client, a query analysis engine, a source data source loading component and a target data source aggregation component;
the query script is an extended version of a structured database query language script and consists of a multi-segment structured database query language, and each segment defines queried data, a data source to be queried and effective time;
the query client is used for receiving the query script and sending the query script to the query analysis engine;
the query analysis engine is used for analyzing the query script, splitting the query script into different structured database query languages, and querying and sending the query script to the source data loading component and the target data source aggregation component;
the source data source loading component is used for receiving a query script to acquire data and a table structure from a data source and sending the data and the table structure to the target data source aggregation component;
and the target data source aggregation component is used for receiving the structure and the data of the source data source loading component, converting the structure and the data into a new zero-time table and receiving the query script to return aggregated data.
Preferably, when querying the required data, the specific process is as follows:
firstly, using a query analysis engine to query at least two groups of service data sources to obtain a required data set segment;
then, using the source data source loading component, receiving a new data source of the data set fragment;
and finally, querying the aggregated data set segment of the new data source in the target data source aggregation component.
Preferably, after querying the data result in the target data source aggregation component, the following steps are also present:
the method comprises the following steps: networking the query client, backing up the query result, and regularly cleaning the storage space in the memory;
step two: uploading the query result to a network terminal, summarizing the query result by utilizing a big data analysis technology, and generating keywords for the data source with the query times being earlier;
step three: and recording and counting the keywords which do not appear in the query process to generate new keywords for supplementing the database.
Preferably, in the first step, the data is uploaded to the enterprise cloud space and the memory during backup, and only the memory space is periodically cleaned up, and the cleaning cycle may be set to one week, one month, and one year.
Preferably, the memory comprises a hard disk group consisting of a plurality of groups of mobile hard disks.
Preferably, the structured database query language includes a data definition language, a data operation language, a data query language and a data management language, the data definition language is for operation of the logical structure of the database, the data operation language and the data query language are for specific data, and the data management language is for management of the authority portion.
Compared with the prior art, the invention provides a mixed data warehouse technology for real-time aggregation of multiple data sources, which has the following beneficial effects:
the invention adopts the loading principle as required, and a user can automatically capture data segments of different data sources and import the data segments into a new data source only by writing a data query script, and perform aggregate query in the new data source;
compared with the existing database warehouse building, the method does not need to supervise warehouse building, only needs to compile query scripts, saves a complex ETL process, and provides convenience for data query operation of enterprises.
Drawings
FIG. 1 is a flow chart of data query in the present invention.
Detailed Description
The following further describes an embodiment of a hybrid data warehouse technology for real-time aggregation of multiple data sources according to the present invention with reference to fig. 1. The hybrid data warehouse technology for real-time aggregation of multiple data sources of the present invention is not limited to the description of the following embodiments.
A mixed data warehouse technology for real-time aggregation of multiple data sources comprises a data aggregation query middleware, wherein the middleware comprises a query script, a query client, a query analysis engine, a source data source loading component and a target data source aggregation component.
The query script is an extended version of a structured database query language script and consists of a multi-segment structured database query language, and each segment defines queried data, a data source to be queried and effective time;
the query client is used for receiving the query script and sending the query script to the query analysis engine;
the query analysis engine is used for analyzing the query script, splitting the query script into different structured database query languages, and querying and sending the query script to the source data loading component and the target data source aggregation component;
the source data source loading component is used for receiving the query script to acquire data and a table structure from the data source and sending the data and the table structure to the target data source aggregation component;
the target data source aggregation component is used for receiving the structure and the data of the source data source loading component, converting the structure and the data into a new zero-time table and receiving the query script to return aggregated data;
the invention adopts the loading principle as required, and a user can automatically capture data segments of different data sources and import the data segments into a new data source only by writing a data query script, and perform aggregate query in the new data source;
compared with the existing database warehouse building, the method does not need to supervise warehouse building, only needs to compile query scripts, saves a complex ETL process, and provides convenience for data query operation of enterprises.
As shown in fig. 1, when querying the required data, the specific process is as follows:
firstly, querying two groups of service data sources by using a query analysis engine to obtain a required data set segment;
in addition, when the query analysis engine is used, three or more groups of service data sources can be queried, and the more data sources are, the more data set segments are needed, so that the integrity of the finally queried data is ensured.
Then, using the source data source loading component, receiving a new data source of the data set fragment;
and finally, querying the aggregated data set segment of the new data source in the target data source aggregation component.
After querying the data result in the standard data source aggregation component, the following steps are also present:
the method comprises the following steps: networking the query client, backing up the query result, and regularly cleaning the storage space in the memory;
step two: uploading the query result to a network terminal, summarizing the query result by utilizing a big data analysis technology, and generating keywords for the data source with the query times being earlier;
step three: recording and counting keywords which do not appear in the query process to generate new keywords for supplementing a database; the purpose of this step is to facilitate the subsequent person to be able to perform targeted capture when performing keyword queries.
In the first step, data are uploaded into an enterprise cloud space and a memory during backup, only the memory is cleaned periodically, and the cleaning period can be set to be one week, one month and one year; the memory comprises a hard disk group which is composed of a plurality of groups of mobile hard disks.
The structured database query language comprises a data definition language, a data operation language, a data query language and a data management language, wherein the data definition language aims at the operation of the logical structure of the database, the data operation language and the data query language aim at specific data, and the data management language aims at the management of the authority part.
The foregoing is a more detailed description of the invention in connection with specific preferred embodiments and it is not intended that the invention be limited to these specific details. For those skilled in the art to which the invention pertains, several simple deductions or substitutions can be made without departing from the spirit of the invention, and all shall be considered as belonging to the protection scope of the invention.
Claims (6)
1. A mixed data warehouse technology for real-time aggregation of multiple data sources is characterized in that: the method comprises a data aggregation query middleware, wherein the middleware comprises a query script, a query client, a query analysis engine, a source data source loading component and a target data source aggregation component;
the query script is an extended version of a structured database query language script and consists of a multi-segment structured database query language, and each segment defines queried data, a data source to be queried and effective time;
the query client is used for receiving the query script and sending the query script to the query analysis engine;
the query analysis engine is used for analyzing the query script, splitting the query script into different structured database query languages, and querying and sending the query script to the source data loading component and the target data source aggregation component;
the source data source loading component is used for receiving a query script to acquire data and a table structure from a data source and sending the data and the table structure to the target data source aggregation component;
and the target data source aggregation component is used for receiving the structure and the data of the source data source loading component, converting the structure and the data into a new zero-time table and receiving the query script to return aggregated data.
2. The hybrid data warehouse technology for real-time aggregation of multiple data sources of claim 1, wherein: when the required data is queried, the specific process is as follows:
firstly, using a query analysis engine to query at least two groups of service data sources to obtain a required data set segment;
then, using the source data source loading component, receiving a new data source of the data set fragment;
and finally, querying the aggregated data set segment of the new data source in the target data source aggregation component.
3. The hybrid data warehouse technology for real-time aggregation of multiple data sources of claim 2, wherein: after the data result is inquired in the target data source aggregation component, the following steps are also present:
the method comprises the following steps: networking the query client, backing up the query result, and regularly cleaning the storage space in the memory;
step two: uploading the query result to a network terminal, summarizing the query result by utilizing a big data analysis technology, and generating keywords for the data source with the query times being earlier;
step three: and recording and counting the keywords which do not appear in the query process to generate new keywords for supplementing the database.
4. The hybrid data warehouse technology for real-time aggregation of multiple data sources of claim 3, wherein: in the first step, data is uploaded into the enterprise cloud space and the memory during backup, only the memory space is periodically cleaned, and the cleaning period can be set to be one week, one month and one year.
5. The hybrid data warehouse technology for real-time aggregation of multiple data sources as claimed in claim 4, wherein: the memory comprises a hard disk group which is composed of a plurality of groups of mobile hard disks.
6. The hybrid data warehouse technology for real-time aggregation of multiple data sources of claim 1, wherein: the structured database query language comprises a data definition language, a data operation language, a data query language and a data management language, wherein the data definition language aims at the operation of the logical structure of the database, the data operation language and the data query language aim at specific data, and the data management language aims at the management of the authority part.
Priority Applications (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN202011273030.1A CN112328708A (en) | 2020-11-13 | 2020-11-13 | Mixed data warehouse technology for real-time aggregation of multiple data sources |
Applications Claiming Priority (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN202011273030.1A CN112328708A (en) | 2020-11-13 | 2020-11-13 | Mixed data warehouse technology for real-time aggregation of multiple data sources |
Publications (1)
Publication Number | Publication Date |
---|---|
CN112328708A true CN112328708A (en) | 2021-02-05 |
Family
ID=74319126
Family Applications (1)
Application Number | Title | Priority Date | Filing Date |
---|---|---|---|
CN202011273030.1A Pending CN112328708A (en) | 2020-11-13 | 2020-11-13 | Mixed data warehouse technology for real-time aggregation of multiple data sources |
Country Status (1)
Country | Link |
---|---|
CN (1) | CN112328708A (en) |
Cited By (3)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN114168612A (en) * | 2021-09-06 | 2022-03-11 | 川投信息产业集团有限公司 | Asset big data platform query acceleration method |
CN114329253A (en) * | 2022-01-05 | 2022-04-12 | 北京安博通科技股份有限公司 | Network operation data query method, device, equipment and storage medium |
CN114826645A (en) * | 2022-03-03 | 2022-07-29 | 深圳市迪讯飞科技有限公司 | Method and terminal for real-time aggregation of multi-channel data |
Citations (2)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN103425762A (en) * | 2013-08-05 | 2013-12-04 | 南京邮电大学 | Telecom operator mass data processing method based on Hadoop platform |
CN109558403A (en) * | 2018-09-28 | 2019-04-02 | 中国平安人寿保险股份有限公司 | Data aggregation method and device, computer installation and computer readable storage medium |
-
2020
- 2020-11-13 CN CN202011273030.1A patent/CN112328708A/en active Pending
Patent Citations (2)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN103425762A (en) * | 2013-08-05 | 2013-12-04 | 南京邮电大学 | Telecom operator mass data processing method based on Hadoop platform |
CN109558403A (en) * | 2018-09-28 | 2019-04-02 | 中国平安人寿保险股份有限公司 | Data aggregation method and device, computer installation and computer readable storage medium |
Cited By (6)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN114168612A (en) * | 2021-09-06 | 2022-03-11 | 川投信息产业集团有限公司 | Asset big data platform query acceleration method |
CN114168612B (en) * | 2021-09-06 | 2022-08-16 | 川投信息产业集团有限公司 | Asset big data platform query acceleration method |
CN114329253A (en) * | 2022-01-05 | 2022-04-12 | 北京安博通科技股份有限公司 | Network operation data query method, device, equipment and storage medium |
CN114329253B (en) * | 2022-01-05 | 2022-08-30 | 北京安博通科技股份有限公司 | Network operation data query method, device, equipment and storage medium |
CN114826645A (en) * | 2022-03-03 | 2022-07-29 | 深圳市迪讯飞科技有限公司 | Method and terminal for real-time aggregation of multi-channel data |
CN114826645B (en) * | 2022-03-03 | 2024-04-16 | 深圳市迪讯飞科技有限公司 | Method and terminal for real-time aggregation of multipath data |
Similar Documents
Publication | Publication Date | Title |
---|---|---|
CN109460349B (en) | Test case generation method and device based on log | |
CN112328708A (en) | Mixed data warehouse technology for real-time aggregation of multiple data sources | |
US9679021B2 (en) | Parallel transactional-statistics collection for improving operation of a DBMS optimizer module | |
CN1959717B (en) | System and method for preprocessing mass remote sensing data collection driven by order form | |
CN103699693A (en) | Metadata-based data quality management method and system | |
CN115934680B (en) | One-stop big data analysis processing system | |
CN111563130A (en) | Data credible data management method and system based on block chain technology | |
CN111563041B (en) | Test case on-demand accurate execution method | |
CN112148689A (en) | Data sharing and exchanging system for city-level data middling station | |
CN109144734A (en) | A kind of container resource quota distribution method and device | |
CN111190580A (en) | Spinach cloud technology platform based on micro-service architecture | |
CN111538720B (en) | Method and system for cleaning basic data of power industry | |
US20200042623A1 (en) | Method and system for implementing an automated data validation tool | |
CN112232672A (en) | Management system and method of industrial mechanism model | |
CN111930807B (en) | Rail transit data analysis method, device, equipment and storage medium | |
CN111125045B (en) | Lightweight ETL processing platform | |
CN110750582A (en) | Data processing method, device and system | |
CN113641739A (en) | Spark-based intelligent data conversion method | |
CN112306992A (en) | Big data platform based on internet | |
CN115344633A (en) | Data processing method, device, equipment and storage medium | |
CN115689788A (en) | Financial data analysis method | |
CN115982213A (en) | BI data analysis method, system and storage medium | |
CN108681495A (en) | A kind of bad block repair method and device | |
CN114817171A (en) | Buried point data quality control method | |
CN115168297A (en) | Bypassing log auditing method and device |
Legal Events
Date | Code | Title | Description |
---|---|---|---|
PB01 | Publication | ||
PB01 | Publication | ||
SE01 | Entry into force of request for substantive examination | ||
SE01 | Entry into force of request for substantive examination |