CN112328708A - Mixed data warehouse technology for real-time aggregation of multiple data sources - Google Patents

Mixed data warehouse technology for real-time aggregation of multiple data sources Download PDF

Info

Publication number
CN112328708A
CN112328708A CN202011273030.1A CN202011273030A CN112328708A CN 112328708 A CN112328708 A CN 112328708A CN 202011273030 A CN202011273030 A CN 202011273030A CN 112328708 A CN112328708 A CN 112328708A
Authority
CN
China
Prior art keywords
data
query
source
aggregation
script
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Pending
Application number
CN202011273030.1A
Other languages
Chinese (zh)
Inventor
江品磊
赵子昂
孙海龙
罗靖东
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Shenzhen Prajna Big Data Technology Co ltd
Original Assignee
Shenzhen Prajna Big Data Technology Co ltd
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Shenzhen Prajna Big Data Technology Co ltd filed Critical Shenzhen Prajna Big Data Technology Co ltd
Priority to CN202011273030.1A priority Critical patent/CN112328708A/en
Publication of CN112328708A publication Critical patent/CN112328708A/en
Pending legal-status Critical Current

Links

Images

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F16/00Information retrieval; Database structures therefor; File system structures therefor
    • G06F16/20Information retrieval; Database structures therefor; File system structures therefor of structured data, e.g. relational data
    • G06F16/28Databases characterised by their database models, e.g. relational or object models
    • G06F16/283Multi-dimensional databases or data warehouses, e.g. MOLAP or ROLAP
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F16/00Information retrieval; Database structures therefor; File system structures therefor
    • G06F16/20Information retrieval; Database structures therefor; File system structures therefor of structured data, e.g. relational data
    • G06F16/24Querying
    • G06F16/242Query formulation
    • G06F16/2433Query languages
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F16/00Information retrieval; Database structures therefor; File system structures therefor
    • G06F16/20Information retrieval; Database structures therefor; File system structures therefor of structured data, e.g. relational data
    • G06F16/24Querying
    • G06F16/242Query formulation
    • G06F16/2433Query languages
    • G06F16/244Grouping and aggregation
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F16/00Information retrieval; Database structures therefor; File system structures therefor
    • G06F16/20Information retrieval; Database structures therefor; File system structures therefor of structured data, e.g. relational data
    • G06F16/24Querying
    • G06F16/245Query processing
    • G06F16/2455Query execution
    • G06F16/24553Query execution of query operations
    • G06F16/24554Unary operations; Data partitioning operations
    • G06F16/24556Aggregation; Duplicate elimination

Landscapes

  • Engineering & Computer Science (AREA)
  • Theoretical Computer Science (AREA)
  • Databases & Information Systems (AREA)
  • Physics & Mathematics (AREA)
  • Data Mining & Analysis (AREA)
  • General Engineering & Computer Science (AREA)
  • General Physics & Mathematics (AREA)
  • Computational Linguistics (AREA)
  • Mathematical Physics (AREA)
  • Information Retrieval, Db Structures And Fs Structures Therefor (AREA)

Abstract

A mixed data warehouse technology for real-time aggregation of multiple data sources comprises a data aggregation query middleware, wherein the middleware comprises a query script, a query client, a query analysis engine, a source data source loading component and a target data source aggregation component; the query script is an extended version of a structured database query language script and consists of a multi-segment structured database query language, and each segment defines queried data, a data source to be queried and effective time; the method has the technical key points that the method adopts the loading principle according to the requirement, and a user can automatically capture data segments of different data sources and import the data segments into a new data source only by writing a data query script, and perform aggregate query in the new data source; compared with the existing database warehouse building, the method does not need to supervise warehouse building, only needs to compile query scripts, saves a complex ETL process, and provides convenience for data query operation of enterprises.

Description

Mixed data warehouse technology for real-time aggregation of multiple data sources
Technical Field
The invention belongs to the field of data warehouses, and particularly relates to a mixed data warehouse technology for real-time aggregation of multiple data sources.
Background
The data warehouse is a strategic set which provides all types of data support for decision making processes of all levels of enterprises and is considered as a core component of business intelligence; it is a central repository of information created for analytical reporting and decision support purposes. The data warehouse provides guidance for enterprises needing business intelligence, including business process improvement, monitoring and controlling time, cost, quality and the like;
when carrying out aggregation analysis on multiple system multiple data sources of an enterprise, a data warehouse is usually established for the enterprise, then data of different data sources are extracted, installed and exchanged and loaded into a newly established data warehouse through an ETL tool at regular time, and then an analysis system carries out aggregation analysis on the data by connecting the newly established data warehouse;
then, the following technical problems occur when the operation is performed again in the conventional mode: firstly, a data warehouse needs to be established, and the cost cannot be controlled by medium and small enterprises; secondly, when data are analyzed, the data need to be preprocessed by means of the etl, and the development and implementation period is long; thirdly, the data are synchronized to the data warehouse at regular time, and the data cannot be subjected to aggregation analysis in real time.
Disclosure of Invention
The invention aims to overcome the defects of the prior art and provide a mixed data warehouse technology for real-time aggregation of multiple data sources.
In order to achieve the purpose, the invention adopts the following technical scheme:
a mixed data warehouse technology for real-time aggregation of multiple data sources comprises a data aggregation query middleware, wherein the middleware comprises a query script, a query client, a query analysis engine, a source data source loading component and a target data source aggregation component;
the query script is an extended version of a structured database query language script and consists of a multi-segment structured database query language, and each segment defines queried data, a data source to be queried and effective time;
the query client is used for receiving the query script and sending the query script to the query analysis engine;
the query analysis engine is used for analyzing the query script, splitting the query script into different structured database query languages, and querying and sending the query script to the source data loading component and the target data source aggregation component;
the source data source loading component is used for receiving a query script to acquire data and a table structure from a data source and sending the data and the table structure to the target data source aggregation component;
and the target data source aggregation component is used for receiving the structure and the data of the source data source loading component, converting the structure and the data into a new zero-time table and receiving the query script to return aggregated data.
Preferably, when querying the required data, the specific process is as follows:
firstly, using a query analysis engine to query at least two groups of service data sources to obtain a required data set segment;
then, using the source data source loading component, receiving a new data source of the data set fragment;
and finally, querying the aggregated data set segment of the new data source in the target data source aggregation component.
Preferably, after querying the data result in the target data source aggregation component, the following steps are also present:
the method comprises the following steps: networking the query client, backing up the query result, and regularly cleaning the storage space in the memory;
step two: uploading the query result to a network terminal, summarizing the query result by utilizing a big data analysis technology, and generating keywords for the data source with the query times being earlier;
step three: and recording and counting the keywords which do not appear in the query process to generate new keywords for supplementing the database.
Preferably, in the first step, the data is uploaded to the enterprise cloud space and the memory during backup, and only the memory space is periodically cleaned up, and the cleaning cycle may be set to one week, one month, and one year.
Preferably, the memory comprises a hard disk group consisting of a plurality of groups of mobile hard disks.
Preferably, the structured database query language includes a data definition language, a data operation language, a data query language and a data management language, the data definition language is for operation of the logical structure of the database, the data operation language and the data query language are for specific data, and the data management language is for management of the authority portion.
Compared with the prior art, the invention provides a mixed data warehouse technology for real-time aggregation of multiple data sources, which has the following beneficial effects:
the invention adopts the loading principle as required, and a user can automatically capture data segments of different data sources and import the data segments into a new data source only by writing a data query script, and perform aggregate query in the new data source;
compared with the existing database warehouse building, the method does not need to supervise warehouse building, only needs to compile query scripts, saves a complex ETL process, and provides convenience for data query operation of enterprises.
Drawings
FIG. 1 is a flow chart of data query in the present invention.
Detailed Description
The following further describes an embodiment of a hybrid data warehouse technology for real-time aggregation of multiple data sources according to the present invention with reference to fig. 1. The hybrid data warehouse technology for real-time aggregation of multiple data sources of the present invention is not limited to the description of the following embodiments.
A mixed data warehouse technology for real-time aggregation of multiple data sources comprises a data aggregation query middleware, wherein the middleware comprises a query script, a query client, a query analysis engine, a source data source loading component and a target data source aggregation component.
The query script is an extended version of a structured database query language script and consists of a multi-segment structured database query language, and each segment defines queried data, a data source to be queried and effective time;
the query client is used for receiving the query script and sending the query script to the query analysis engine;
the query analysis engine is used for analyzing the query script, splitting the query script into different structured database query languages, and querying and sending the query script to the source data loading component and the target data source aggregation component;
the source data source loading component is used for receiving the query script to acquire data and a table structure from the data source and sending the data and the table structure to the target data source aggregation component;
the target data source aggregation component is used for receiving the structure and the data of the source data source loading component, converting the structure and the data into a new zero-time table and receiving the query script to return aggregated data;
the invention adopts the loading principle as required, and a user can automatically capture data segments of different data sources and import the data segments into a new data source only by writing a data query script, and perform aggregate query in the new data source;
compared with the existing database warehouse building, the method does not need to supervise warehouse building, only needs to compile query scripts, saves a complex ETL process, and provides convenience for data query operation of enterprises.
As shown in fig. 1, when querying the required data, the specific process is as follows:
firstly, querying two groups of service data sources by using a query analysis engine to obtain a required data set segment;
in addition, when the query analysis engine is used, three or more groups of service data sources can be queried, and the more data sources are, the more data set segments are needed, so that the integrity of the finally queried data is ensured.
Then, using the source data source loading component, receiving a new data source of the data set fragment;
and finally, querying the aggregated data set segment of the new data source in the target data source aggregation component.
After querying the data result in the standard data source aggregation component, the following steps are also present:
the method comprises the following steps: networking the query client, backing up the query result, and regularly cleaning the storage space in the memory;
step two: uploading the query result to a network terminal, summarizing the query result by utilizing a big data analysis technology, and generating keywords for the data source with the query times being earlier;
step three: recording and counting keywords which do not appear in the query process to generate new keywords for supplementing a database; the purpose of this step is to facilitate the subsequent person to be able to perform targeted capture when performing keyword queries.
In the first step, data are uploaded into an enterprise cloud space and a memory during backup, only the memory is cleaned periodically, and the cleaning period can be set to be one week, one month and one year; the memory comprises a hard disk group which is composed of a plurality of groups of mobile hard disks.
The structured database query language comprises a data definition language, a data operation language, a data query language and a data management language, wherein the data definition language aims at the operation of the logical structure of the database, the data operation language and the data query language aim at specific data, and the data management language aims at the management of the authority part.
The foregoing is a more detailed description of the invention in connection with specific preferred embodiments and it is not intended that the invention be limited to these specific details. For those skilled in the art to which the invention pertains, several simple deductions or substitutions can be made without departing from the spirit of the invention, and all shall be considered as belonging to the protection scope of the invention.

Claims (6)

1. A mixed data warehouse technology for real-time aggregation of multiple data sources is characterized in that: the method comprises a data aggregation query middleware, wherein the middleware comprises a query script, a query client, a query analysis engine, a source data source loading component and a target data source aggregation component;
the query script is an extended version of a structured database query language script and consists of a multi-segment structured database query language, and each segment defines queried data, a data source to be queried and effective time;
the query client is used for receiving the query script and sending the query script to the query analysis engine;
the query analysis engine is used for analyzing the query script, splitting the query script into different structured database query languages, and querying and sending the query script to the source data loading component and the target data source aggregation component;
the source data source loading component is used for receiving a query script to acquire data and a table structure from a data source and sending the data and the table structure to the target data source aggregation component;
and the target data source aggregation component is used for receiving the structure and the data of the source data source loading component, converting the structure and the data into a new zero-time table and receiving the query script to return aggregated data.
2. The hybrid data warehouse technology for real-time aggregation of multiple data sources of claim 1, wherein: when the required data is queried, the specific process is as follows:
firstly, using a query analysis engine to query at least two groups of service data sources to obtain a required data set segment;
then, using the source data source loading component, receiving a new data source of the data set fragment;
and finally, querying the aggregated data set segment of the new data source in the target data source aggregation component.
3. The hybrid data warehouse technology for real-time aggregation of multiple data sources of claim 2, wherein: after the data result is inquired in the target data source aggregation component, the following steps are also present:
the method comprises the following steps: networking the query client, backing up the query result, and regularly cleaning the storage space in the memory;
step two: uploading the query result to a network terminal, summarizing the query result by utilizing a big data analysis technology, and generating keywords for the data source with the query times being earlier;
step three: and recording and counting the keywords which do not appear in the query process to generate new keywords for supplementing the database.
4. The hybrid data warehouse technology for real-time aggregation of multiple data sources of claim 3, wherein: in the first step, data is uploaded into the enterprise cloud space and the memory during backup, only the memory space is periodically cleaned, and the cleaning period can be set to be one week, one month and one year.
5. The hybrid data warehouse technology for real-time aggregation of multiple data sources as claimed in claim 4, wherein: the memory comprises a hard disk group which is composed of a plurality of groups of mobile hard disks.
6. The hybrid data warehouse technology for real-time aggregation of multiple data sources of claim 1, wherein: the structured database query language comprises a data definition language, a data operation language, a data query language and a data management language, wherein the data definition language aims at the operation of the logical structure of the database, the data operation language and the data query language aim at specific data, and the data management language aims at the management of the authority part.
CN202011273030.1A 2020-11-13 2020-11-13 Mixed data warehouse technology for real-time aggregation of multiple data sources Pending CN112328708A (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN202011273030.1A CN112328708A (en) 2020-11-13 2020-11-13 Mixed data warehouse technology for real-time aggregation of multiple data sources

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN202011273030.1A CN112328708A (en) 2020-11-13 2020-11-13 Mixed data warehouse technology for real-time aggregation of multiple data sources

Publications (1)

Publication Number Publication Date
CN112328708A true CN112328708A (en) 2021-02-05

Family

ID=74319126

Family Applications (1)

Application Number Title Priority Date Filing Date
CN202011273030.1A Pending CN112328708A (en) 2020-11-13 2020-11-13 Mixed data warehouse technology for real-time aggregation of multiple data sources

Country Status (1)

Country Link
CN (1) CN112328708A (en)

Cited By (3)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN114168612A (en) * 2021-09-06 2022-03-11 川投信息产业集团有限公司 Asset big data platform query acceleration method
CN114329253A (en) * 2022-01-05 2022-04-12 北京安博通科技股份有限公司 Network operation data query method, device, equipment and storage medium
CN114826645A (en) * 2022-03-03 2022-07-29 深圳市迪讯飞科技有限公司 Method and terminal for real-time aggregation of multi-channel data

Citations (2)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN103425762A (en) * 2013-08-05 2013-12-04 南京邮电大学 Telecom operator mass data processing method based on Hadoop platform
CN109558403A (en) * 2018-09-28 2019-04-02 中国平安人寿保险股份有限公司 Data aggregation method and device, computer installation and computer readable storage medium

Patent Citations (2)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN103425762A (en) * 2013-08-05 2013-12-04 南京邮电大学 Telecom operator mass data processing method based on Hadoop platform
CN109558403A (en) * 2018-09-28 2019-04-02 中国平安人寿保险股份有限公司 Data aggregation method and device, computer installation and computer readable storage medium

Cited By (6)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN114168612A (en) * 2021-09-06 2022-03-11 川投信息产业集团有限公司 Asset big data platform query acceleration method
CN114168612B (en) * 2021-09-06 2022-08-16 川投信息产业集团有限公司 Asset big data platform query acceleration method
CN114329253A (en) * 2022-01-05 2022-04-12 北京安博通科技股份有限公司 Network operation data query method, device, equipment and storage medium
CN114329253B (en) * 2022-01-05 2022-08-30 北京安博通科技股份有限公司 Network operation data query method, device, equipment and storage medium
CN114826645A (en) * 2022-03-03 2022-07-29 深圳市迪讯飞科技有限公司 Method and terminal for real-time aggregation of multi-channel data
CN114826645B (en) * 2022-03-03 2024-04-16 深圳市迪讯飞科技有限公司 Method and terminal for real-time aggregation of multipath data

Similar Documents

Publication Publication Date Title
CN109460349B (en) Test case generation method and device based on log
CN112328708A (en) Mixed data warehouse technology for real-time aggregation of multiple data sources
US9679021B2 (en) Parallel transactional-statistics collection for improving operation of a DBMS optimizer module
CN1959717B (en) System and method for preprocessing mass remote sensing data collection driven by order form
CN103699693A (en) Metadata-based data quality management method and system
CN115934680B (en) One-stop big data analysis processing system
CN111563130A (en) Data credible data management method and system based on block chain technology
CN111563041B (en) Test case on-demand accurate execution method
CN112148689A (en) Data sharing and exchanging system for city-level data middling station
CN109144734A (en) A kind of container resource quota distribution method and device
CN111190580A (en) Spinach cloud technology platform based on micro-service architecture
CN111538720B (en) Method and system for cleaning basic data of power industry
US20200042623A1 (en) Method and system for implementing an automated data validation tool
CN112232672A (en) Management system and method of industrial mechanism model
CN111930807B (en) Rail transit data analysis method, device, equipment and storage medium
CN111125045B (en) Lightweight ETL processing platform
CN110750582A (en) Data processing method, device and system
CN113641739A (en) Spark-based intelligent data conversion method
CN112306992A (en) Big data platform based on internet
CN115344633A (en) Data processing method, device, equipment and storage medium
CN115689788A (en) Financial data analysis method
CN115982213A (en) BI data analysis method, system and storage medium
CN108681495A (en) A kind of bad block repair method and device
CN114817171A (en) Buried point data quality control method
CN115168297A (en) Bypassing log auditing method and device

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination