CN112527886A

CN112527886A - Data warehouse system based on urban brain

Info

Publication number: CN112527886A
Application number: CN202110173925.6A
Authority: CN
Inventors: 梁鹏飞; 李晓东; 崔师龙; 王崟乐
Original assignee: Zhongguancun Smart City Co Ltd
Current assignee: Zhongguancun Smart City Co Ltd
Priority date: 2021-02-09
Filing date: 2021-02-09
Publication date: 2021-03-19

Abstract

The embodiment of the invention provides a data warehouse system based on a city brain, which comprises: the system comprises a distributed file system based on Hadoop, data ETL, a five-layer data warehouse, online analysis processing, a distributed computing engine based on Hadoop and metadata; the distributed file system and the calculation engine based on Hadoop are adopted to construct a distributed data warehouse system, and the multivariate heterogeneous data is stored, processed and analyzed in a unified manner; carry out reasonable layering to data warehouse, promote the rate of reuse of data, guarantee basically that the data on this layer in the several storehouses relies on the data acquisition of last layer, avoid the appearance of new demand at every turn to bring duplicative work load.

Description

Data warehouse system based on urban brain

Technical Field

The invention relates to the technical field of data warehouses, in particular to a data warehouse construction method and system based on a city brain.

Background

The urban brain is based on data resources generated by cities, utilizes new-generation information technologies such as artificial intelligence, big data, block chains, 5G, Internet of things and the like, constructs an artificial intelligence center for urban traffic management, public safety, emergency management, urban sanitation, tourism, environmental protection, urban fine management and the like, promotes construction and gets through various urban digital management platforms, utilizes real-time full-amount urban data, corrects operation short boards in time, optimizes urban public resources, and realizes urban management modes, service modes and high-quality breakthrough of digital industry development.

The development of data warehouse technology in China is fast, and various industries such as telecommunication, banking, finance, insurance, manufacturing, retail and the like establish own data warehouses, wherein the most representative data warehouses belong to an operation analysis system constructed by telecommunication operators. Of course, the data warehouse is at great risk, primarily because the data warehouse is analysis-oriented, so holding business demands is a prerequisite for successful real-time data warehouse projects. If the business requirements cannot be met, the technical realization of the data warehouse is perfect and has no meaning; second, it is technically necessary to ensure that data can be efficiently and accurately obtained from the business system, effectively manage the huge data in the data warehouse, and provide flexible and effective access to data for personnel at different levels of the enterprise. In addition, the data warehouse involves a plurality of departments and a plurality of systems, how to effectively obtain the support of high-level leaders, coordinate resources of all parties and effectively manage projects is also the key influencing the success or failure of the data warehouse projects. At present, no uniform specification exists for the bins, and each company selects a proper layering and modeling mode according to own requirements.

With the explosion of internet data, people are gradually aware of the importance of data, scientific data processing and business intelligent data analysis are more and more widely applied to the data analysis requirement of building a city brain integrated with a city, and the traditional database cannot meet the requirement of storing and processing big data. At present, the construction of a data warehouse aiming at a Hadoop ecosystem does not have a clear specification, so a data warehouse construction method and a data warehouse construction system for a city brain are explored.

Disclosure of Invention

Therefore, in order to meet business requirements, the embodiment of the invention provides a data warehouse system of a city brain, and the method is applied to a city brain scene, improves the training efficiency of an AI model by introducing a quantum computing technology and utilizing the speed advantage of quantum computing, and meets the increasing requirements of city operation data and new business scenes. The specific technical scheme is as follows:

to achieve the above object, an embodiment of the present invention provides a data warehouse system based on a city brain, including: the system comprises a distributed file system based on Hadoop, data ETL, a five-layer data warehouse, online analysis processing, a distributed computing engine based on Hadoop and metadata; the distributed file system is used for storing the data source in a file form; the five-layer data warehouse is used for counting and storing the data source; the online analysis processing is used for responding to most analysis requirements in a preset time period; the distributed computing engine in Hadoop is used for computing the data source passing through the ETL of the data; the metadata is data describing data for identifying a resource; evaluating the resources; tracking changes of the resource in the using process; the method realizes simple and efficient management of a large amount of networked data; the information resources are effectively discovered, searched and integrally organized, and the used resources are effectively managed.

Further, the five-layer data warehouse comprises: the system comprises an original data layer, a detail granularity fact layer, a data service public granularity layer, a data subject accumulation layer and a data application layer; wherein the content of the first and second substances,

the original data layer is used for acquiring original data from a data source and storing the original data;

the detail granularity fact layer is used for constructing a detail layer fact table with the finest granularity based on each specific business process characteristic by taking a business process as a modeling drive;

the data service common granularity layer is used for taking an analyzed subject object as a modeling drive, constructing a summary index fact table of common granularity based on the application of an upper layer and the index requirements of products, and physically modeling by a wide-tabulation means; constructing a statistical index with standard naming and consistent caliber, providing a public index for an upper layer, and establishing an aggregate broad table and a detailed fact table;

the data subject accumulation layer is used for summarizing an index fact table every day and carrying out wide-tabulation processing;

the data application layer is used for storing personalized statistical index data of the data products.

Further, the raw data includes: the system comprises a geographic information system, a government affair system, log data and structured data of the equipment of the Internet of things.

Furthermore, the statistical index data is obtained by the data topic accumulation layer and the data service common granularity layer, and when some complex statistical indexes cannot be obtained by the data topic accumulation layer and the data service common granularity layer, the statistical index data is obtained by the original data layer.

Further, the five-layer data warehouse adopts an open-source Hive hierarchical data warehouse.

Further, the detailed granularity fact layer further comprises a relational model modeling layer and a dimension model modeling layer, wherein the relational model modeling layer is used for building a relational model of the database, and the dimension model modeling layer is used for building a dimension model of the data; when the relational model of the database is designed and constructed, the specification requirement of a three-normal form is adopted; when the dimension model is constructed, a fact table is used as a center to organize a table data table.

Further, the data table comprises a dimension table and a fact table; the dimension table is used for storing description information of the fact; the fact table comprises a transactional fact table, a periodic snapshot fact table and an accumulative snapshot fact table.

Further, the dimension model adopts a star model; the dimension model modeling layer comprises:

the service selection module is used for selecting a service line related to a specific service in a service process system;

the statement data granularity module is used for indicating the level of the refinement degree and the comprehensive degree of the stored data in the data of the data warehouse;

the dimension determining module is used for describing business facts;

and the fact confirmation module is used for confirming the metric value in the service.

Further, the data application layer also comprises tagging the data and classifying the data by using a spark machine learning algorithm.

Further, the Hadoop-based distributed storage framework comprises Kafka storage media used for corresponding to different themes, and messages are processed between different themes through spark timing.

The embodiment of the invention provides a data warehouse system based on a city brain, which comprises: the system comprises a distributed file system based on Hadoop, data ETL, a five-layer data warehouse, online analysis processing, a distributed computing engine based on Hadoop and metadata; the distributed file system is used for storing the data source in a file form; the five-layer data warehouse is used for counting and storing the data source; the online analysis processing is used for responding to most analysis requirements in a preset time period; the distributed computing engine in Hadoop is used for computing the data source passing through the ETL of the data; the metadata is data describing data for identifying a resource; evaluating the resources; tracking changes of the resource in the using process; the method realizes simple and efficient management of a large amount of networked data; the information resources are effectively discovered, searched and integrally organized, and the used resources are effectively managed. The distributed file system and the calculation engine based on Hadoop are adopted to construct a distributed data warehouse system, and the multivariate heterogeneous data is stored, processed and analyzed in a unified manner; carry out reasonable layering to data warehouse, promote the rate of reuse of data, guarantee basically that the data on this layer in the several storehouses relies on the data acquisition of last layer, avoid the appearance of new demand at every turn to bring duplicative work load.

Furthermore, in a real-time system, Kafka is used as a storage medium of the message, the Kafka corresponds to different topics, and the message processing is performed between different topics through spark timing, so that compared with the traditional multi-bin MR calculation engine, frequent file reading and writing io is reduced, and the calculation efficiency is greatly improved.

Drawings

Fig. 1 is a data warehouse system based on a city brain according to embodiment 1 of the present invention;

FIG. 2 is a diagram of a relationship structure between fact tables and dimension tables of partial business DWD layers related to the Internet of things in a data warehouse of a city brain;

fig. 3 is a schematic structural diagram of a real-time alarm system warehouse system of a data warehouse system based on a city brain according to an embodiment of the present invention.

Detailed Description

In order to clearly and thoroughly show the technical solution of the present invention, the following description is made with reference to the accompanying drawings, but the scope of the present invention is not limited thereto.

Referring to fig. 1, a data warehouse system based on a city brain according to embodiment 1 of the present invention includes:

the system comprises a distributed storage frame based on Hadoop, a distributed file system, data ETL, a five-layer data warehouse, online analysis and processing, a distributed computing engine based on Hadoop and metadata; the distributed storage framework is used for storing massive external data sources; the distributed file system is used for storing the data source in a file form; the five-layer data warehouse is used for counting and storing the data source; the online analysis processing is used for responding to most analysis requirements in a preset time period; the distributed computing engine in Hadoop is used for computing the data source passing through the ETL of the data; the metadata is data describing data for identifying a resource; evaluating the resources; tracking changes of the resource in the using process; the method realizes simple and efficient management of a large amount of networked data; the information resources are effectively discovered, searched and integrally organized, and the used resources are effectively managed.

The Hadoop is a distributed system infrastructure developed by the Apache Foundation. A user can develop a distributed program without knowing the distributed underlying details. The power of the cluster is fully utilized to carry out high-speed operation and storage. Hadoop implements a Distributed File System (Hadoop Distributed File System), where one component is the HDFS. HDFS is characterized by high fault tolerance and is designed for deployment on inexpensive (low-cost) hardware; and it provides high throughput (high throughput) to access data of applications, suitable for applications with very large data sets. HDFS relaxes the requirements of (relax) POSIX and can access (streaming access) data in a file system in the form of streams. The most core design of the Hadoop framework is as follows: HDFS and MapReduce. HDFS provides storage for massive data, while MapReduce provides computation for massive data.

The data warehouse carries out real-time log processing analysis on the data source by adopting a Flume + Logstash + Kafka + Spark Streaming framework, referring to the figure, data of behavior data reaches a log server through Nginx, equipment app reaches a service server through Nginx, the server stores the data in a log file form, and the data is stored in hdfs through a subscription-release mode of Kafka.

The aforementioned Nginx (engine x) is a high performance HTTP and reverse proxy web server, and also provides IMAP/POP3/SMTP services.

Metadata is data of data, which is defined as: in a program, data is not an object to be processed, but data for changing the behavior of the program by changing its value. It functions to control program behavior in an interpreted manner during runtime.

In the embodiment of the invention, the system also comprises management layers such as distributed coordination, monitoring, timing scheduling, metadata management, authority management, quality management and the like.

The data warehouse system is a theme-oriented, integrated, relatively stable data set reflecting historical changes for supporting administrative decisions. Data warehouse architectures typically contain four levels: data source, data storage and management, data service, data application. A data source: the data source of the data warehouse comprises external data, the existing business system, document data and the like; data integration: the data extraction, cleaning, conversion and loading tasks are completed, and the data in the data source is loaded into the data warehouse in a fixed period by using an ETL (Extract-Transform-Load) tool. Data storage and management: the hierarchy mainly relates to the storage and management of data, including data warehouse, data mart, data warehouse detection, operation and maintenance tools, metadata management and the like. Data service: the data service is provided for the front end and the application, the data can be directly obtained from a data warehouse for the front end application to use, and the data service responsible for the front end application can also be provided for the front end application through an OnLine Analytical Processing (OLAP) server. Data application: the hierarchy is directly oriented to users and comprises a data query tool, a free report tool, a data analysis tool, a data mining tool and various application systems.

In order to improve the analysis efficiency and the reusability of table data in the bins, the bins need to be divided hierarchically, so that the final statistical requirement depends on the intermediate analysis result as much as possible, and the statistics is not performed from the original table for many times. Firstly, the layering of a plurality of bins is divided into five layers, namely a data introduction layer, namely an original data layer ODS, a fine granularity fact layer DWD, a data service common granularity layer DWS, a data subject accumulation layer (period summary) DWT and a data application layer ADS.

The original data layer ODS is used for collecting original data from a data source and storing the original data; the original data comprises log data of alarm devices such as GIS, government affairs and IOT and structured data stored in RDBMS. The partial data has two functions: firstly, a copy of source data is reserved in the HDFS system for storing records. And secondly, processing the subsequent ETL data based on the layer, and importing the data after cleaning the ETL data into the DWD layer.

The detail granularity fact layer DWD layer is used for constructing a detail layer fact table with the finest granularity by taking a business process as a modeling drive and based on the characteristics of each specific business process; the layer is a detail data layer, the business process is used as a modeling drive, and a detail layer fact table with the finest granularity is constructed based on the characteristics of each specific business process. The data use characteristics of enterprises can be combined, and certain important dimension attribute fields of the detailed fact table are subjected to proper redundancy, namely, broad tabulation processing.

The data service common granularity layer DWS layer: the method is used for constructing a summary index fact table of public granularity by taking an analyzed subject object as a modeling drive based on the upper application and the index requirements of products, and physically modeling by a wide-tabulation means; and constructing statistical indexes with standard naming and consistent calibers, providing public indexes for the upper layer, and establishing an aggregated broad table and a detailed fact table.

The data topic accumulation layer DWT layer: the system is used for summarizing an index fact table every day and performing broad tabulation treatment; the summary index fact table of the DWS layer is processed by the wide tabulation, only the summary granularity is not passed, the DWS layer is mostly used for summarizing the statistical result every day, and the DWT layer is mostly used for summarizing the accumulative result for 30 days.

The data application layer ADS layer: the system is used for storing the personalized statistical index data of the data product. Storing individualized statistical index data of data products, generally, carrying out statistics by a DWT layer or a DWS layer to obtain, and when some complex statistical indexes can not be obtained by DWT and DWS layer statistics, obtaining through a table in the DWD layer is needed.

In the embodiment of the present invention, the data flow of the data warehouse after receiving the external data source is as follows: firstly, a data source of an ODS layer is a part of a service library, HDFS is imported through Sqoop, log data are collected through flash, peak clipping is carried out through Kafka, and then the log data are landed in an HDFS system through Kafka and loaded to the ODS layer in hive. In addition, the data of the ODS layer is the most primitive data and is not processed.

Data ETL: when the data of the ODS layer loads and washes the DWD layer, the dirty data which is not in accordance with the requirement in the external data of the collection or the interface is processed, including but not limited to the following:

a) data with deviation of format content;

b) cleaning logic error data;

c) cleaning a missing field;

d) data that does not meet business requirements;

e) contradictory data;

f) data desensitization.

In an optional implementation manner of the embodiment of the present invention, the detailed granularity fact layer further includes a relationship model modeling layer and a dimension model modeling layer, the relationship model modeling layer is configured to construct a relationship model of a database, and the dimension model modeling layer is configured to construct a dimension model of data.

Relational modeling: relational databases are designed to comply with the requirements of the three-paradigm specification in order to reduce data redundancy. The association between the tables through the main foreign key does not reduce redundant fields and increase the flexibility between the tables, but an efficiency problem is caused, and frequent Join operation among a plurality of tables is needed for inquiring data, so that the inquiry efficiency is reduced.

Dimension modeling: different from paradigm modeling, dimension modeling is mainly applied to an OLAP system, usually a fact table is taken as a center to organize the table, the method is mainly oriented to business of a city brain, and the characteristic is that data redundancy possibly exists, so that the data acquisition efficiency is improved. In consideration of the problem of large data volume in a large data environment, a star constellation model is adopted.

In the DWD layer, data tables are divided into dimension tables and fact tables, where dimension tables are generally descriptive information for facts. Each dimension table corresponds to an object or concept in the real world, such as date, region, device type, etc., and is characterized by: the dimension table has wide range, less rows relative to the fact table, relatively fixed content and mostly coding table.

And each row of data in the fact table represents a business event. "facts" indicate measurable values of traffic events. The rows of each fact table include: metric values having an additive numerical type, and foreign keys associated with the dimension tables, typically two or more foreign keys, between which a many-to-many relationship between dimension tables is represented. It is characterized in that:

fact tables are very large, and the content is relatively narrow: the number of columns is small and changes often, with many additions per day.

The fact table is divided into: transactional fact tables, periodic snapshot fact tables, cumulative snapshot fact tables.

Transactional fact table: the data is taken as a line of data in the fact table in units of each transaction or event, such as an alarm prompt of the device, an alarm record, and the like. Once the transaction is committed and the fact table data is inserted, the data cannot be changed, and the updating mode is incremental updating.

Fig. 2 is a structural diagram showing a relationship between fact tables and dimension tables among partial service DWD layer tables related to the internet of things in a data warehouse of a city brain, wherein the fact tables include an alarm information table, a well lid displacement sensor, a fire platform sensor acquisition table and a toxic and harmful sensor acquisition table. The dimension tables corresponding to the fact tables include: list of bearers, list of data types, list of enterprises, list of components, list of device types.

Periodic snapshot fact table: the periodic snapshot fact table does not retain all data, but only data at fixed time intervals, such as daily or weekly population flows, or monthly alarm times.

Cumulative snapshot fact table: the cumulative snapshot fact table is used to track changes in the business facts. For example, a data warehouse may need to accumulate or store a case to track the progress of the case from alarm, to recording, processing, solving, etc. at various stages of time point data. The records of fact tables are also constantly updated as this business process progresses.

The most important for the modeling of the bins is the construction of the DWD layer: the DWD layer needs to build a dimension model, a star model is generally adopted, and the presentation state is generally a constellation model due to a plurality of fact tables.

In an optional implementation manner of the embodiment of the present invention, the dimensional model modeling layer includes:

the dimension determining module is used for describing business facts;

And at the DWD layer, a detail layer fact table with the finest granularity is constructed by taking the business as a model building driver and based on a specific business process. The fact table can be processed into a wide table.

The above procedure for modeling the dimensions of bins, followed by DWS, DWT, and ADS, has no relationship to modeling.

DWS layer: from the aspect of dimensionality, the current-day behaviors of all subject objects are counted, a theme broad table serving a DWT layer and some service detail data are served, and special requirements are met.

DWS layer: and constructing a full-scale wide table of the subject object based on the upper application and the index requirements of the product by taking the analyzed subject object as a modeling drive.

And then, executing the tasks of the previous day at 1 point in the morning by using the task scheduling system, and sending mails to developers by using the system when the scheduling tasks are abnormal, so that abnormal processing can be timely carried out.

After the data analysis is completed, a visualization tool can be displayed in real time, and OLAP multidimensional analysis related components based on HIVE include Kylin, Druid, Presto, Elasticissearch and the like, so that the decision and display system can read results in real time, and analysis of related machine learning algorithms in SparkMlib is supported.

In addition, secondary development can be carried out based on an OLAP tool, a client interface for providing services is constructed, the client interface is connected to Kylin through a rest API, and the Kylin carries out multi-dimensional cube calculation on a fact table of a hive star model in advance, and stores the result in HBase for efficient calling.

Meanwhile, data of an application layer in the HIVE can be labeled, and classification is performed by using a spark machine learning algorithm.

And constructing a real-time warehouse counting system in the third diagram based on the same data warehouse modeling mode.

Compared with databases ODS, DWD, DWS, DWT and ADS in Hive in an offline number bin, Kafka is used as a storage medium of messages in a real-time system, the Kafka corresponds to different topics respectively, and the messages are processed between the different topics through spark timing.

The embodiment of the invention provides a data warehouse system based on a city brain, which comprises: the system comprises a distributed storage frame based on Hadoop, a distributed file system, data ETL, a five-layer data warehouse, online analysis and processing, a distributed computing engine based on Hadoop and metadata; the distributed storage framework is used for storing massive external data sources; the distributed file system is used for storing the data source in a file form; the five-layer data warehouse is used for counting and storing the data source; the online analysis processing is used for responding to most analysis requirements in a preset time period; the distributed computing engine in Hadoop is used for computing the data source passing through the ETL of the data; the metadata is data describing data for identifying a resource; evaluating the resources; tracking changes of the resource in the using process; the method realizes simple and efficient management of a large amount of networked data; the information resources are effectively discovered, searched and integrally organized, and the used resources are effectively managed. The distributed data warehouse system is constructed by adopting a distributed storage frame and a calculation engine based on Hadoop, and multi-element heterogeneous data is stored, uniformly processed and analyzed; carry out reasonable layering to data warehouse, promote the rate of reuse of data, guarantee basically that the data on this layer in the several storehouses relies on the data acquisition of last layer, avoid the appearance of new demand at every turn to bring duplicative work load.

Examples are: an off-line and real-time alarm system warehouse system for the urban brain is constructed.

An off-line system:

1. an acquisition system: and collecting data of alarm, networking alarm, telephone alarm and the like of different devices distributed in a city, and storing the data on the HDFS system.

2. Loading data: the data on the HDFS system is imported into a raw data layer of a data warehouse HIVE, and a piece of raw data is kept and can be used for a subsequent ETL process.

3. Data cleaning: and according to the design of metadata in the system, cleaning the data in the original data layer according to a cleaning rule and loading the cleaned data to the data detail layer.

4. Common granularity summary layer: according to the business system requirements and metadata, different dimensionality analyses are carried out through ETL tools such as a button, a HIVE sql and the like, and a day-based summary table is constructed. The data source of the layer is a data detail layer.

5. And a periodic summary layer: and (4) carrying out summary statistics based on weeks, months, years or the last N days by using an ETL tool according to business requirements, wherein the source table is a public granularity summary layer and a data detail layer.

6. A data application layer: and (3) selecting a periodic summary layer from a table needing to be visualized, counting HIVE sql and preferably selecting a source, and when the periodic summary layer cannot obtain a desired field, sequentially considering a common granularity summary layer, a data detail layer and the like in the same way.

7. And scheduling the tasks, wherein the tasks are set to be executed in the morning of each day.

8. Visual display: the Superset or Kylin is connected to hive to perform visual display of business reports and multidimensional analysis of data tables, and meanwhile, the method can be used for constructing an ES search library to facilitate search of various alarm events.

9. And (3) performing prediction model training of a machine learning algorithm based on data in the data warehouse, and better serving an alarm system of a city brain.

A real-time system:

referring to fig. 3, 1, data of an acquisition system accessed by a real-time system is stored in an ODS layer in topic specified by Kafka.

2. In the ETL process of the data, the spark streaming accesses Kafka data of the ODS layer for processing, the cleaning rule and some code tables can be obtained from the ES, and the processed information is imported into the topic of the DW layer for storage.

3. The DW layer 2 can be divided into a DWD layer, a DWs layer, a DWT layer and other topics according to service requirements, and statistical aggregation of data is performed by SparkStreaming.

4. By connecting the Kafka with the Druid to display data in real time or importing the real-time data into Redis for visual display of the Webserver, when a certain index is abnormal, the alarm information can be acquired at the highest speed.

Therefore, when the alarm collector in a certain section receives abnormal signals, such as alarm signals of fire, flood and the like, the information can be accurately acquired by the real-time platform of the urban brain, and an alarm is immediately given out for effective processing.

Although the invention has been described in detail above with reference to a general description and specific examples, it will be apparent to one skilled in the art that modifications or improvements may be made thereto based on the invention. Accordingly, such modifications and improvements are intended to be within the scope of the invention as claimed.

Claims

1. A city brain-based data warehouse system, comprising: the system comprises a distributed file system based on Hadoop, data ETL, a five-layer data warehouse, online analysis processing, a distributed computing engine based on Hadoop and metadata; the distributed file system is used for storing the data source in a file form; the five-layer data warehouse is used for counting and storing the data source; the online analysis processing is used for responding to most analysis requirements in a preset time period; the distributed computing engine in Hadoop is used for computing the data source passing through the ETL of the data; the metadata is data describing data for identifying a resource; evaluating the resources; tracking changes of the resource in the using process; the method realizes simple and efficient management of a large amount of networked data; the information resources are effectively discovered, searched and integrally organized, and the used resources are effectively managed.

2. The city brain-based data warehouse system of claim 1, wherein the five-tier data warehouse comprises: the system comprises an original data layer, a detail granularity fact layer, a data service public granularity layer, a data subject accumulation layer and a data application layer; wherein the content of the first and second substances,

3. The city brain-based data warehouse system of claim 2, wherein the raw data comprises: the system comprises a geographic information system, a government affair system, log data and structured data of the equipment of the Internet of things.

4. The city brain-based data warehouse system of claim 3, wherein the statistical indicator data is obtained from the data topic accumulation layer and the data service common granularity layer, and is obtained from the original data layer when some statistical indicators are not available from the data topic accumulation layer and the data service common granularity layer.

5. The city brain-based data warehouse system of claim 1, wherein the five-tiered data warehouse employs an open-source Hive tiered data warehouse.

6. The city brain-based data warehouse system of claim 2, wherein the fine-grained fact layers further comprise a relational model modeling layer for building relational models of databases and a dimensional model modeling layer for building dimensional models of data; when the relational model of the database is designed and constructed, the specification requirement of a three-normal form is adopted; when the dimension model is constructed, a fact table is used as a center to organize a table data table.

7. The city brain-based data warehouse system of claim 6, wherein the data tables comprise dimension tables and fact tables; the dimension table is used for storing description information of the fact; the fact table comprises a transactional fact table, a periodic snapshot fact table and an accumulative snapshot fact table.

8. The city brain-based data warehouse system of claim 6 or 7, wherein the dimensional model employs a star model; the dimension model modeling layer comprises:

the dimension determining module is used for describing business facts;

9. The city brain-based data warehouse system of claim 2, wherein the data application layer further comprises tagging data for classification using spark's machine learning algorithm.

10. The city brain-based data warehouse system of claim 1, wherein the Hadoop-based distributed file system comprises Kafka storage media for corresponding to impassable topics, and wherein messages between different topics are processed by spark streaming.