CN114676208A - Data warehouse - Google Patents

Data warehouse Download PDF

Info

Publication number
CN114676208A
CN114676208A CN202210364549.3A CN202210364549A CN114676208A CN 114676208 A CN114676208 A CN 114676208A CN 202210364549 A CN202210364549 A CN 202210364549A CN 114676208 A CN114676208 A CN 114676208A
Authority
CN
China
Prior art keywords
data
layer
warehouse
source
data warehouse
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Pending
Application number
CN202210364549.3A
Other languages
Chinese (zh)
Inventor
王振宇
周建清
杨克杰
金和
郑祥智
林超俊
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Newford Research Institute Of Advanced Technology
Original Assignee
Newford Research Institute Of Advanced Technology
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Newford Research Institute Of Advanced Technology filed Critical Newford Research Institute Of Advanced Technology
Priority to CN202210364549.3A priority Critical patent/CN114676208A/en
Publication of CN114676208A publication Critical patent/CN114676208A/en
Pending legal-status Critical Current

Links

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F16/00Information retrieval; Database structures therefor; File system structures therefor
    • G06F16/20Information retrieval; Database structures therefor; File system structures therefor of structured data, e.g. relational data
    • G06F16/28Databases characterised by their database models, e.g. relational or object models
    • G06F16/283Multi-dimensional databases or data warehouses, e.g. MOLAP or ROLAP
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F16/00Information retrieval; Database structures therefor; File system structures therefor
    • G06F16/20Information retrieval; Database structures therefor; File system structures therefor of structured data, e.g. relational data
    • G06F16/21Design, administration or maintenance of databases
    • G06F16/215Improving data quality; Data cleansing, e.g. de-duplication, removing invalid entries or correcting typographical errors
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F16/00Information retrieval; Database structures therefor; File system structures therefor
    • G06F16/20Information retrieval; Database structures therefor; File system structures therefor of structured data, e.g. relational data
    • G06F16/25Integrating or interfacing systems involving database management systems
    • G06F16/254Extract, transform and load [ETL] procedures, e.g. ETL data flows in data warehouses

Landscapes

  • Engineering & Computer Science (AREA)
  • Databases & Information Systems (AREA)
  • Theoretical Computer Science (AREA)
  • Data Mining & Analysis (AREA)
  • Physics & Mathematics (AREA)
  • General Engineering & Computer Science (AREA)
  • General Physics & Mathematics (AREA)
  • Quality & Reliability (AREA)
  • Information Retrieval, Db Structures And Fs Structures Therefor (AREA)

Abstract

A data warehouse, source data layer: the temporary storage layer directly follows the data structure and data of the peripheral system and is a temporary storage area of the interface data; a data warehouse layer: the detail layer comprises data obtained after the data of the source data layer is cleaned; a data application layer: the front end applies the data source which is directly read; and calculating the generated data according to the report and the thematic analysis requirements. Has the advantages that: and an analysis-oriented integrated data environment is constructed, and decision support is provided for enterprises.

Description

Data warehouse
Technical Field
The invention relates to a data warehouse.
Background
With the improvement of computer storage capacity and the development of complex algorithms, in recent years, the amount of network data is exponentially increased, the applications with mass data requirements, such as scientific data processing, business intelligent data analysis and the like, become more and more popular, and the traditional technical architecture cannot meet the requirement of large data processing.
Disclosure of Invention
The invention aims to solve the defects in the prior art, and provides a data warehouse for constructing an analysis-oriented integrated data environment and providing decision support for enterprises.
In order to achieve the purpose, the invention adopts the following technical scheme: a data warehouse comprises a source data layer, a data warehouse layer and a data application layer;
source data layer: the temporary storage layer directly follows the data structure and data of the peripheral system and is a temporary storage area of the interface data;
a data warehouse layer: the detail layer comprises data obtained after the data of the source data layer is cleaned;
a data application layer: the front end applies the data source which is directly read; and calculating the generated data according to the report and the thematic analysis requirements.
Further, the data source layer is an ODS layer for accessing the original data as it is.
Further, the data warehouse layer is a DW layer, and the DW layer establishes a data model according to subjects from data obtained from the ODS layer.
Further, the data application layer: the data application layer is a WEB layer and provides data for data products and data analysis.
Further, the DW layer is divided into a DWD layer, a DWM layer, and a DWs layer.
The invention has the beneficial effects that: a very large amount of data can be saved for analysis and multiple data access techniques are allowed; open system technology makes the cost of analyzing large amounts of data reasonable and hardware solutions mature.
Detailed Description
The technical solutions in the embodiments of the present invention will be clearly and completely described below, and it is obvious that the described embodiments are only a part of the embodiments of the present invention, and not all embodiments.
A data warehouse comprises a source data layer, a data warehouse layer and a data application layer;
source data layer: the temporary storage layer directly follows the data structure and data of the peripheral system and is a temporary storage area of the interface data;
a data warehouse layer: the detail layer comprises data obtained after the data of the source data layer is cleaned;
a data application layer: the front end applies the data source which is directly read; and calculating the generated data according to the report and the thematic analysis requirements.
A data warehouse, the data source layer is an ODS layer, and is used for directly accessing original data.
A data warehouse layer is a DW layer, and a data model is built by data obtained by the DW layer from an ODS layer according to topics.
A data warehouse, a data application layer: the data application layer provides data for WEB to data products and data analysis.
A data warehouse has DWD layer, DWM layer and DWS layer.
A data warehouse, according to the process of data inflow and outflow, the data warehouse architecture can be divided into: the system comprises a source data layer, a data warehouse layer and a data application layer.
The data warehouse is a platform for integrating data management in the middle.
Source data layer: the data of the layer is not changed, the data structure and the data of the peripheral system are directly used, and the data are not opened to the outside; the temporary storage layer is a temporary storage area of interface data and is prepared for data processing of the next step.
A data warehouse layer: also referred to as detail layer, the DW layer data should be consistent, accurate, clean data, i.e., data after the source system data has been cleaned (to remove impurities).
A data application layer: the front end applies the data source which is directly read; and calculating the generated data according to the report and the thematic analysis requirements.
The process of data warehouse acquisition from various data sources and data transformation and flow within the data warehouse can be considered as the ETL (extract, Transfer, Load) process, the ETL is the data warehouse pipeline, and can also be considered as the data warehouse blood, which maintains the metabolism of data in the data warehouse, and most of the effort of daily management and maintenance work of the data warehouse is to keep the ETL normal and stable.
A data warehouse, further comprising dependent data marts:
the data of the subordinate data mart comes from the data warehouse, and the data in the data warehouse is transmitted to the subordinate data mart after being integrated, reconstructed and summarized.
The benefits of establishing a subordinate data mart are mainly:
performance: when the query performance of a data warehouse is in question, it may be considered to build several dependent data marts and move queries from the data warehouse to the data marts.
Safety: each department may have full control over their own data.
Data consistency: because the data sources of each data mart are the same data warehouse, the condition of data inconsistency is effectively eliminated.
The data warehouse construction key points are as follows:
the data warehouse is subject-oriented; the data organization of the operation type database is oriented to the transaction processing task, and the data in the data warehouse is organized according to a certain subject domain. Topics are important aspects of interest to a user in making decisions using a data repository, and a topic is typically associated with multiple operational information systems.
The data warehouse is integrated, the data of the data warehouse is from scattered operation type data, the required data is extracted from the original data for processing and integration, and the data can be entered into the data warehouse after being unified and integrated;
the data in the data warehouse is obtained through system processing, summarizing and sorting on the basis of extracting and cleaning the original scattered database data, and the inconsistency in the source data must be eliminated so as to ensure that the information in the data warehouse is the consistent global information about the whole enterprise.
Data of a data warehouse is mainly used for enterprise decision analysis, related data operation is mainly data query, and once certain data enters the data warehouse, the data is generally reserved for a long time, namely a large number of query operations are generally arranged in the data warehouse, but modification and deletion operations are few, and only regular loading and refreshing are generally needed.
The data in the data warehouse usually contains historical information, and the system records information of the enterprise from a certain past point (such as the point of starting to apply the data warehouse) to current various stages, and by means of the information, quantitative analysis and prediction can be made on the development process and future trend of the enterprise.
The data warehouse is not updatable, the data warehouse mainly provides data for decision analysis, and the related operation is mainly data query;
the data warehouse changes along with time, and the traditional relational database system is more suitable for processing formatted data and can better meet the requirements of business processing. The stable data is preserved in a read-only format and does not change over time.
Summarizing: the operational data is mapped into a format usable for decision making.
Large capacity: the set of time series data is typically very large.
Non-normalized: the Dw data may be, and often is, redundant.
Metadata: data describing the data is saved.
A data source: data comes from internal and external non-integrated operating systems.
A data warehouse, the principle of data warehouse hierarchy: in order to facilitate data analysis, the bottom layer complex service needs to be shielded, and the data is simply, completely and integrally exposed to an analysis layer;
the impact of bottom layer business change and upper layer requirement change on the model is minimized, the influence of business system change is weakened on a basic data layer, the influence of the requirement change on the model is weakened by combining a top-down construction method, namely, the high cohesion of data in a subject or in each complete system, and the loose coupling of data between subjects or between each complete system construct a warehouse basic data layer, so that the integration work of bottom layer business data is separated from the upper application development work, the basic warehouse hierarchy is clearer for large-scale warehouse development, and the externally exposed data is more uniform.
A data warehouse, data source layer: an ODS layer;
the ODS layer is the layer closest to the data in the data source, and in order to consider the problem that the data may need to be traced back subsequently, excessive data cleaning work is not suggested for the ODS layer, the original data is accessed as it is, and the processes of data denoising, deduplication, abnormal value processing and the like can be put on the following DWD layer for doing.
A data warehouse layer: a DW layer;
the data warehouse layer is a layer which is designed in a core mode when the data warehouse layer is used for data warehousing, and various data models are built according to subjects through data obtained from the ODS layer.
The DW layer is further subdivided into a DWD (data WareHouse detail) layer, a DWM (data WareHouse middle) layer, and a DWS (data WareHouse service) layer.
A data application layer: a WEB layer;
here, the data, which is mainly provided for data production and data analysis, is generally stored in Mysql database for online system use, and may also be stored in Hive or drive for data analysis and data mining use. The WEB layer is associated with the DW layer. Such as report data that we often say, are generally put here.
A data warehouse, the use of the data warehouse:
under the intelligent large environment of information technology and data, the data warehouse provides many economic and efficient computing resources in the fields of software and hardware, Internet and intranet solutions and databases, can store a great amount of data for analysis and use, and allows a plurality of data access technologies to be used.
Open system technology makes the cost of analyzing large amounts of data reasonable and hardware solutions mature.
The technologies mainly used in data warehouse applications are as follows:
in parallel
Computing hardware environments, operating system environments, database management systems and all related fields of database operation, query tools and techniques, applications, etc. can benefit from parallel up-to-date achievements.
Partitioning
The partition function makes it easier to support large tables and indexes, while also improving data management and query performance.
Data compression
Data compression functionality reduces the cost of disk systems used to store large amounts of data that are typically required in data warehouse environments, and new data compression techniques have also eliminated the negative impact of compressing data on query performance.
The above description is only for the preferred embodiment of the present invention, but the scope of the present invention is not limited thereto, and any person skilled in the art should be able to cover the technical scope of the present invention and the equivalent alternatives or modifications according to the technical solution and the inventive concept of the present invention within the technical scope of the present invention.

Claims (5)

1. A data warehouse, characterized by: the system comprises a source data layer, a data warehouse layer and a data application layer;
source data layer: the temporary storage layer directly follows the data structure and data of the peripheral system and is a temporary storage area of the interface data;
a data warehouse layer: the detail layer comprises data obtained after the data of the source data layer is cleaned;
a data application layer: the front end applies the data source which is directly read; and calculating the generated data according to the report and the thematic analysis requirements.
2. A data store according to claim 1, wherein: the data source layer is an ODS layer for accessing the original data as is.
3. A data store according to claim 2, wherein: the data warehouse layer is a DW layer, and the DW layer establishes a data model according to subjects from data obtained from the ODS layer.
4. A data store according to claim 3, wherein: a data application layer: the data application layer is a WEB layer and provides data for data products and data analysis.
5. A data store according to claim 3, wherein: the DW layer is divided into a DWD layer, a DWM layer, and a DWS layer.
CN202210364549.3A 2022-04-08 2022-04-08 Data warehouse Pending CN114676208A (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN202210364549.3A CN114676208A (en) 2022-04-08 2022-04-08 Data warehouse

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN202210364549.3A CN114676208A (en) 2022-04-08 2022-04-08 Data warehouse

Publications (1)

Publication Number Publication Date
CN114676208A true CN114676208A (en) 2022-06-28

Family

ID=82077935

Family Applications (1)

Application Number Title Priority Date Filing Date
CN202210364549.3A Pending CN114676208A (en) 2022-04-08 2022-04-08 Data warehouse

Country Status (1)

Country Link
CN (1) CN114676208A (en)

Cited By (1)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN116756263A (en) * 2023-08-18 2023-09-15 中国标准化研究院 Method for processing town big data based on geographic information data

Cited By (2)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN116756263A (en) * 2023-08-18 2023-09-15 中国标准化研究院 Method for processing town big data based on geographic information data
CN116756263B (en) * 2023-08-18 2023-11-14 中国标准化研究院 Method for processing town big data based on geographic information data

Similar Documents

Publication Publication Date Title
US10936588B2 (en) Self-described query execution in a massively parallel SQL execution engine
US9542424B2 (en) Lifecycle-based horizontal partitioning
Plattner A common database approach for OLTP and OLAP using an in-memory column database
CN104933112B (en) Distributed interconnection Transaction Information storage processing method
EP2270691B1 (en) Computer-implemented method for operating a database and corresponding computer system
CN109597850A (en) Tobacco integrated information data mart modeling stores platform and data processing method
CN101566981A (en) Method for establishing dynamic virtual data base in analyzing and processing system
CN113392227A (en) Metadata knowledge map engine system facing rail transit field
Bear et al. The vertica database: Sql rdbms for managing big data
CN112148718A (en) Big data support management system for city-level data middling station
CN112131203A (en) Method and system for building data warehouse
El Alami et al. Supply of a key value database redis in-memory by data from a relational database
CN114676208A (en) Data warehouse
Morzy et al. Modeling a Multiversion Data Warehouse: A Formal Approach.
Baranowski et al. A prototype for the evolution of ATLAS EventIndex based on Apache Kudu storage
Li et al. A Comparative Study of Row and Column Storage for Time Series Data
Xiao Data Processing Model of Bank Credit Evaluation System.
Peng Analysis of administrative management and decision-making based on data warehouse
Sheng et al. Fast Access and Retrieval of Big Data Based on Unique Identification.
Rácz et al. Two-phase data warehouse optimized for data mining
Hou Analysis and research on the difference between data warehouse and database
CN112380221A (en) Operation method of hadoop acquisition system
Rana et al. An examination of data warehouses
Liu et al. A research on unified storage management and access technology applied in power network dispatch and control big data
Jarke et al. Data warehouse practice: An overview

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination