CN112231301A - Yellow river water sand change data warehouse - Google Patents

Yellow river water sand change data warehouse Download PDF

Info

Publication number
CN112231301A
CN112231301A CN202011134223.9A CN202011134223A CN112231301A CN 112231301 A CN112231301 A CN 112231301A CN 202011134223 A CN202011134223 A CN 202011134223A CN 112231301 A CN112231301 A CN 112231301A
Authority
CN
China
Prior art keywords
data
source
warehouse
structured
layer
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Pending
Application number
CN202011134223.9A
Other languages
Chinese (zh)
Inventor
夏润亮
李涛
王敏
金锦
朱敏
刘启兴
李斌
俞彦
杨无双
冯兴凯
李冰
吴丹
郝臻
薛阳茹
焦莉华
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Yellow River Institute of Hydraulic Research
Original Assignee
Yellow River Institute of Hydraulic Research
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Yellow River Institute of Hydraulic Research filed Critical Yellow River Institute of Hydraulic Research
Priority to CN202011134223.9A priority Critical patent/CN112231301A/en
Publication of CN112231301A publication Critical patent/CN112231301A/en
Pending legal-status Critical Current

Links

Images

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F16/00Information retrieval; Database structures therefor; File system structures therefor
    • G06F16/20Information retrieval; Database structures therefor; File system structures therefor of structured data, e.g. relational data
    • G06F16/21Design, administration or maintenance of databases
    • G06F16/211Schema design and management
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F16/00Information retrieval; Database structures therefor; File system structures therefor
    • G06F16/20Information retrieval; Database structures therefor; File system structures therefor of structured data, e.g. relational data
    • G06F16/25Integrating or interfacing systems involving database management systems
    • G06F16/254Extract, transform and load [ETL] procedures, e.g. ETL data flows in data warehouses
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F16/00Information retrieval; Database structures therefor; File system structures therefor
    • G06F16/20Information retrieval; Database structures therefor; File system structures therefor of structured data, e.g. relational data
    • G06F16/28Databases characterised by their database models, e.g. relational or object models
    • G06F16/283Multi-dimensional databases or data warehouses, e.g. MOLAP or ROLAP

Abstract

The invention relates to a yellow river water and sand change data warehouse, which comprises a data source, a convergence layer, a storage layer and an application layer which are sequentially connected from bottom to top, wherein the data source is positioned at the bottommost layer and consists of structured data, semi-structured data and unstructured data, and is used for realizing the aggregation of source data to the convergence layer; the convergence layer is used for extracting required data from the data source, processing the data and transmitting the processed data to the storage layer; the storage layer is used for storing the data processed by the convergence layer; the application layer is a tool for accessing data by a user and is used for carrying out data analysis on the data. The data warehouse provided by the application can completely and uniformly depict various data related to various analysis objects and the relation among the data, and the application data analysis is multi-angle, multi-view mode and rotatable. According to the analysis requirements, a large amount of data is processed quickly and flexibly, and the query results are provided to various decision-making personnel in an intuitive and easily understood form.

Description

Yellow river water sand change data warehouse
Technical Field
The invention belongs to the technical field of software architecture, and particularly relates to a yellow river water sand change data warehouse.
Background
The decision making of any major problem needs to have a large amount of relevant data as support, and then helps a user to quickly obtain enough decision making information from the relevant data, and for various data involved in the basin water resource management, a scientific and effective decision making can be made only after various data and the interrelation among various data are analyzed.
In the related art, with the development of communication technology, the pipeline resources have higher requirements on the rapidity, the accuracy and the like of data acquisition. There is no better way to store and manage data.
Disclosure of Invention
In view of this, the present invention provides a yellow river water and sand change data warehouse to solve the problem that there is no better way to store and manage data in the prior art.
In order to achieve the purpose, the invention adopts the following technical scheme: a yellow river water sand change data warehouse, comprising: a data source, a convergence layer, a storage layer and an application layer which are connected in sequence from bottom to top, wherein,
the data source is positioned at the bottommost layer, consists of structured data, semi-structured data and unstructured data and is used for realizing the aggregation of source data to the convergence layer;
the convergence layer is used for extracting required data from the data source, processing the data and transmitting the processed data to the storage layer;
the storage layer is used for storing the data processed by the convergence layer;
the application layer is a tool for accessing data by a user and is used for carrying out data analysis on the data.
Further, the data source is also used for performing data management on the data, and displaying data source information according to different data hierarchy sequences, wherein the data source information comprises a data name, a data description, a data hierarchy, a resource address, a data type, a unit where the data is located, whether the data is accessed or not and a latest synchronization date; the data management includes synchronization and updating;
and the convergence layer extracts the data required by the data source in a data batch extraction or quasi-real-time data extraction mode.
Further, the processing the data includes:
and extracting, converting, cleaning, loading and processing stream data of the data.
Further, a multi-dimensional data model is constructed through fact tables, dimensions, measures and layers; the data warehouse is provided with a theme, the data is executed as data around the theme, the data takes a fact table as a center and is associated with a plurality of dimension tables, the fact table comprises a plurality of dimension and measurement, the dimension represents a specific visual angle of analysis data of a decision user, the measurement is the actual meaning and measurement index of the data, each dimension table describes a plurality of dimensions and values thereof, and each dimension is divided into different layers;
the multidimensional data model is used for defining an ETL process and mapping so as to extract, convert, clean, load and process stream data.
Further, the ETL process comprises:
creating dimensions, creating data cubes, creating a mapping, creating an ETL flow.
Further, the extracting the required data from the data source includes:
establishing three triggers of insertion, modification and deletion; when the data in the source table changes, the corresponding trigger writes the changed data into a temporary table, the extraction thread extracts the data from the temporary table, and the data extracted from the temporary table is marked or deleted;
adding a timestamp field on a source table, modifying the value of the timestamp field when updating and modifying table data in a system, and determining extracted data by comparing the system time with the value of the timestamp field when extracting data;
establishing an MD5 temporary table for the table to be extracted by adopting a data extraction tool, wherein the temporary table records a main key of a source table and an MD5 check code calculated according to data of all fields, and when data extraction is carried out, comparing the MD5 check code of the source table with the MD5 temporary table so as to determine whether the data in the source table is added, modified or deleted and update the MD5 check code;
the changed data is judged through log comparison, the extraction of the file data is generally carried out by total extraction, the time stamp field of the file or the MD5 check code of the calculated file can be saved before one extraction, the comparison is carried out during the next extraction, and if the data are the same, the extraction is ignored.
Further, the storage layer includes:
a data warehouse of structured data, a data warehouse of semi-structured data, and a data warehouse of unstructured data;
wherein the structured data is saved to a data repository of structured data;
the semi-structured data is processed into structured data and stored in a data warehouse of the structured data or the semi-structured data is kept and stored in the data warehouse of the semi-structured data;
unstructured data is processed into or kept from structured data to a data warehouse of structured data
Further, the structured data comprises a relational database and a structured report; the semi-structured data comprises a file; the non-structural data comprises WEB pages and flat text data.
Further, the data in the data source comprises:
river system, hydrological station, rainfall and historical rainfall data, land utilization, vegetation coverage, downstream flood risk map, and data of basic geographic information
Further, when the data in the data source is the water conservancy service data, the data source includes:
the basic class data warehouse is used for storing historical data;
and the real-time class data warehouse is used for storing real-time data.
By adopting the technical scheme, the invention can achieve the following beneficial effects:
the characteristics of the data warehouse determine the advantages of its existence, and two significant advantages of the data warehouse exist are:
(1) themes of data organization and analysis. The theme-oriented data organization mode can give a complete and consistent description to the data of the analysis objects at a higher level, and can completely and uniformly describe various data related to each analysis object and the relation among the data. The method effectively integrates data of different data sources, serves a certain theme and realizes the separation of application and data. Therefore, the method adapts to the characteristics of business activities and the dynamic characteristics of enterprise data, and fundamentally realizes the separation of data and application.
(2) A decision support function. The underlying goal of data warehouse organization is to support decision-making, and data warehouses can be operated in a variety of ways, where more sophisticated data analysis is applied, which should be multi-angle, multi-view mode, rotatable. According to the analysis requirements, a large amount of data is processed quickly and flexibly, and the query results are provided to various decision-making personnel in an intuitive and easily understood form.
Drawings
In order to more clearly illustrate the embodiments of the present invention or the technical solutions in the prior art, the drawings used in the description of the embodiments or the prior art will be briefly described below, it is obvious that the drawings in the following description are only some embodiments of the present invention, and for those skilled in the art, other drawings can be obtained according to the drawings without creative efforts.
FIG. 1 is a schematic structural diagram of a yellow river water sand change data warehouse according to the present invention;
FIG. 2 is a schematic diagram of a data processing flow of the yellow river water sand change data warehouse according to the present invention;
FIG. 3 is a schematic diagram of a data structure of a yellow river water sand change data warehouse according to the present invention;
fig. 4 is a schematic view of a process flow of the water conservancy business data warehouse for real-time or quasi-real-time decision support according to the present invention.
Detailed Description
In order to make the objects, technical solutions and advantages of the present invention more apparent, the technical solutions of the present invention will be described in detail below. It is to be understood that the described embodiments are merely exemplary of the invention, and not restrictive of the full scope of the invention. All other embodiments, which can be derived by a person skilled in the art from the examples given herein without any inventive step, are within the scope of the present invention.
A specific yellow river water sand change data warehouse provided in the embodiment of the present application is described below with reference to the accompanying drawings.
As shown in fig. 1, the yellow river water sand change data warehouse provided in the embodiment of the present application includes: a data source, a convergence layer, a storage layer and an application layer which are connected in sequence from bottom to top, wherein,
the data source is positioned at the bottommost layer, consists of structured data, semi-structured data and unstructured data and is used for realizing the aggregation of source data to the convergence layer;
the convergence layer is used for extracting required data from the data source, processing the data and transmitting the processed data to the storage layer;
the storage layer is used for storing the data processed by the convergence layer;
the application layer is a tool for accessing data by a user and is used for carrying out data analysis on the data.
The yellow river water and sand change database provided by the application analyzes rainfall and hydrological sediment data year by month in main sand producing areas of a yellow river, typical tributary field rainfall and flood sediment data, typical annual land utilization and forest and grass coverage data, typical annual terrace data, water and soil conservation engineering data, social economy and relevant test observation data, provides water and sand change metadata sets and data models facing different requirements aiming at the massive heterogeneous characteristics of the data, carries out extraction, cleaning, conversion and reconstruction of various data, and completes data warehouse construction by using technologies such as data mart, storage partition, index and the like.
Wherein, the data source: the data source is the basis of the data warehouse, is positioned at the bottommost layer of a data warehouse framework, is a data source spring of the data warehouse, is composed of structured, semi-structured and unstructured data, and realizes the aggregation of source data to a convergence layer through ETL and other technologies according to a metadata driving mechanism.
A convergence layer: extracting required data from a data source, extracting the required data from the data source through technologies such as data extraction, conversion, loading, stream data processing, crawler and the like, cleaning, and transmitting to a storage layer for storage.
A storage layer: and finishing data management according to the theme and storing the data in a spatial data warehouse. And extracting required data from the data source, cleaning, and finally loading the data into a data warehouse according to a predefined multidimensional data model to finish the process of converting the data from the data source to the target data warehouse. The unstructured data adopts a Hadoop system, namely a bottom-layer HDFS provides reliable distributed file storage, Hbase is used for storing semi-structured data, and MapReduce provides a high-performance programming model and computing power for the system. The structured data warehouse is arranged on the uppermost layer and provides SQL (structured query language) support for the whole ecological system, and data of the Hadoop system is converted into the main structured data warehouse through unstructured data.
An application layer: and accessing the data tool in the spatial data warehouse by the user, and analyzing the data by using the technologies of spatial data mining, spatial analysis, report analysis, visualization and the like, so as to finally realize the purpose of providing data mining service for the user.
Specifically, the method takes the main sand producing area of the yellow river as a key point, and the system acquires monthly rainfall data and typical tributary hourly rainfall data of more than 600 rainfall stations in a research area: and rainfall data for a typical tributary in "hours" time steps. And acquiring data of population, soil, water system, elevation, water and water sand test and the like of a research area. Acquiring actual measurement runoff, sand conveying amount, sand content and suspended load sediment particle size data month by month from a dry branch hydrological station with more than about 150 seats of Tongguan: flood silt data of raining in a field since tributary 1966 or 1977 was obtained. Extracting land utilization and vegetation coverage data, and supplementing the land utilization and vegetation coverage data of 2016 years in the whole area: and inverting the data of the forest and grass irrigation layer and the forest and grass vegetation withering layer by adopting a remote sensing ground experiment method. Collecting terrace information of a main sand producing area, taking a yellow river basin in Gansu province in the area as a key point, obtaining terrace remote sensing information of 2017 years of typical tributaries, combining statistical data, carrying out updating correction on the terrace information of 2017 years of the yellow river main sand producing area, and predicting the future development trend of the area and the quality of the terrace.
Preferably, the data source is further configured to perform data management on the data, and display data source information according to different data hierarchy orders, where the data source information includes a data name, a data description, a data hierarchy, a resource address, a data type, a unit in which the data is located, whether the data is accessed, and a latest synchronization date; the data management includes synchronization and updating;
and the convergence layer extracts the data required by the data source in a data batch extraction or quasi-real-time data extraction mode.
Specifically, the data source manages data source information, and displays the data source information according to different data hierarchy sequences, including data name, data description, data hierarchy, resource address, data type, unit in which the data is located, whether the data is accessed, and latest synchronization date. The important information of the data source is shown through concise layout, so that a user can quickly and comprehensively know the basic situation of the data source. Meanwhile, the quick retrieval of the desired data source information through the data hierarchy or according to the keywords described by the data source is supported.
In addition, both modification and synchronization operations may be performed on the data source. Items that can be modified include data name of data source, data description, data hierarchy, data type, data address, file address, access, username and password of database connection. When the data table structure changes, data synchronization operation can be carried out on data from non-HTTP sources, and the latest table structure is obtained through the synchronization database, so that the data accuracy and the monitoring real-time performance are guaranteed.
The process of data warehouse design may be viewed as a process from a real-world environment to an abstract model, and from an abstract model to a concrete implementation. This is done by relying on a variety of different data models. The design of the data warehouse is realized step by step in the sequential conversion process of the conceptual model, the logic model and the physical model. As shown, the process from reality to abstraction needs to rely on the support of the conceptual model, which is then logistized. And finally, the logic model is converted into a physical model of the data warehouse, and once the physical model of the data warehouse is completed, a reliable design scheme is provided for the specific implementation of the data warehouse.
In the design process of a data warehouse, all data are processed around a theme and are data sets, the data sets perform relatively completely consistent data description on analysis objects, namely, a fact table is taken as a center and is associated with a plurality of dimension tables, the fact table comprises a plurality of dimension and measurement, dimensions represent specific view angles of analysis data of decision users, the measurement is the actual meaning and measurement index of the data, each dimension table describes a plurality of dimensions and values thereof, and each dimension is divided into different layers.
Preferably, the processing the data includes:
and extracting, converting, cleaning, loading and processing stream data of the data.
Preferably, the multidimensional data model is constructed by fact tables, dimensions, measures and layers; the data warehouse is provided with a theme, the data is executed as data around the theme, the data takes a fact table as a center and is associated with a plurality of dimension tables, the fact table comprises a plurality of dimension and measurement, the dimension represents a specific visual angle of analysis data of a decision user, the measurement is the actual meaning and measurement index of the data, each dimension table describes a plurality of dimensions and values thereof, and each dimension is divided into different layers;
the multidimensional data model is used for defining an ETL process and mapping so as to extract, convert, clean, load and process stream data.
Preferably, the ETL process includes:
creating dimensions, creating data cubes, creating a mapping, creating an ETL flow.
Specifically, the implementation of the spatial data ETL in the present application is to utilize the extraction and conversion functions between the support data formats provided by the arccoolboxs tool. Because the data of the spatial data source database is standardized, the ARCTOLBOXS tool is used for realizing the ETL process of the spatial data. The process operation of extracting and converting the spatial data is realized by using a model generator of a model builder tool. For example, a contour line of a certain river section of a yellow river is selected, a DEM is generated, then a gradient image layer is derived, and then reclassification is carried out. As shown in the figure, firstly, the select tool completes the extraction of the range area of each river reach through the name attribute of the river reach, then the CLIP tool cuts contour line data according to the range area of each river reach, and then the TIN generation tool completes the generation of the surface of the irregular grid and converts the irregular grid into the RASTER format to obtain the DEM data. Grade may be generated by the SLOPE tool from DEM data and reclassified by the RECLASSIFY tool.
The text file realizes the relativity of the text data through the analysis of the text file and the mapping relation between the text file data and the relational data target field. In the ETL process, text data which do not meet requirements are filtered by setting filtering conditions and the like, and meanwhile, real-time incremental updating of the text data is achieved by analyzing data updating time stamps in the text data and comparing the data updating time stamps with writing time of relational data.
The ETL of the database is realized by the following steps: in the data warehouse, an existing operation type database data source is imported into a target database through an ETL process, the data source and a data target are clarified, and a connection method of different source databases is provided in ETL tool software, which specifically comprises the following steps:
(1) creating a dimension;
creating a dimension table, also referred to as a lookup table or reference table, contains relatively static data in the data store, typically storing information for a query. A dimension table is one of two objects commonly used in a star schema, a dimension containing levels, hierarchies, and attributes. Dimension attributes are used to describe dimension values, usually descriptive or literal. Dimensions typically collect detailed data at a low level and then aggregate or aggregate the data at a higher level, a simple aggregation or aggregation referred to as a hierarchy, for analysis services.
(2) Creating a data cube;
a cube fact table is created for storing business metrics, typically including fact metrics and foreign keys connecting dimension tables. The fact table is based on a dimension table, and records detailed data. In creating the fact table, the primary key is determined and the metric is defined. The numeric measure is usually numeric or additive, and the primary key used in analyzing the research fact table is a combined key consisting of all foreign keys and used for connecting with the primary key of the related dimension table.
(3) Creating a mapping;
the establishment of the mapping is a relatively complicated step in the establishment process of the data warehouse, and the main work of the mapping completion is to extract data from the data source module and load the converted data into the target data warehouse.
(4) Creating an ETL flow;
from the above analysis, an ETL flow is defined that describes the association between the mapping and the external activity. Where the process flow is designed and executed for final upload of the source database to the target data warehouse.
Preferably, the extracting the required data from the data source includes:
establishing three triggers of insertion, modification and deletion; when the data in the source table changes, the corresponding trigger writes the changed data into a temporary table, the extraction thread extracts the data from the temporary table, and the data extracted from the temporary table is marked or deleted;
adding a timestamp field on a source table, modifying the value of the timestamp field when updating and modifying table data in a system, and determining extracted data by comparing the system time with the value of the timestamp field when extracting data;
establishing an MD5 temporary table for the table to be extracted by adopting a data extraction tool, wherein the temporary table records a main key of a source table and an MD5 check code calculated according to data of all fields, and when data extraction is carried out, comparing the MD5 check code of the source table with the MD5 temporary table so as to determine whether the data in the source table is added, modified or deleted and update the MD5 check code;
the changed data is judged through log comparison, the extraction of the file data is generally carried out by total extraction, the time stamp field of the file or the MD5 check code of the calculated file can be saved before one extraction, the comparison is carried out during the next extraction, and if the data are the same, the extraction is ignored.
Specifically, to implement data application, from the implementation of extraction from a data warehouse, a commonly used method for capturing change data in incremental data extraction at present includes:
a trigger: the method comprises the steps of establishing required triggers on a table to be extracted, generally establishing three triggers of inserting, modifying and deleting, writing changed data into a temporary table by the corresponding trigger whenever data in a source table are changed, extracting data from the temporary table by an extraction thread, and marking or deleting the extracted data in the temporary table. The trigger mode has the advantages of higher data extraction performance and the defect of requiring the service table to establish the trigger and having certain influence on a service system.
Time stamping: the method is a change data capturing mode based on snapshot comparison, a timestamp field is added on a source table, and when the data of a modification table is updated in a system, the value of the timestamp field is modified simultaneously. When data extraction is performed, it is decided which data to extract by comparing the system time with the value of the timestamp field.
Comparing the whole table: a typical way of full-table alignment is to use MD5 check codes. The data extraction tool establishes a similarly structured MD5 temporary table for the table to be extracted in advance, and the temporary table records the primary key of the source table and the MD5 check code calculated according to the data of all the fields. And comparing the MD5 check codes of the source table and the MD5 temporary table every time data is extracted, so as to determine whether the data in the source table is added, modified or deleted, and updating the MD5 check codes.
Log comparison: the changed data is judged by analyzing the logs of the database itself. The data source of the data extraction processing can be a file, such as a txt file, an excel file, an xml file and the like, besides the relational database, the extraction of the file data is generally carried out in a full-scale manner, a time stamp of the file can be saved or an MD5 check code of the file can be calculated before one extraction, the comparison is carried out during the next extraction, and if the data source is the same as the relational database, the extraction can be ignored.
In some embodiments, as shown in fig. 2, the storage layer includes:
a data warehouse of structured data, a data warehouse of semi-structured data, and a data warehouse of unstructured data;
wherein the structured data is saved to a data warehouse of the structured data;
the semi-structured data is processed into structured data and stored in a data warehouse of the structured data or the semi-structured data is kept and stored in the data warehouse of the semi-structured data;
unstructured data is processed into structured data that is saved to a data warehouse of structured data or data that holds unstructured data that is saved to a data warehouse of unstructured data.
Preferably, the structured data comprises a relational database and a structured report; the semi-structured data comprises a file; the non-structural data comprises WEB pages and flat text data.
Preferably, as shown in fig. 3, the data in the data source includes:
river system, hydrological station, rainfall and historical rainfall data, land utilization, vegetation coverage, downstream flood risk map, and data of basic geographic information.
Specifically, the river water system data stores rivers of different levels of the yellow river basin, which are divided into a first level, a second level, a third level and a fourth level, and stores the names of the river water systems. The hydrologic station data stores the geographical position, name and other information of the downstream main hydrologic station in the yellow river. The rainfall station and historical rainfall data store the geographical position and name of the main rainfall station in the midstream region of the yellow river. The rainfall information of the rainfall stations at different times in different years is stored according to the time sequence. The land use data includes land use types of midstream areas of yellow rivers in 1978, 1980, 1998, 2000, 2010, 2016, and the main types include paddy fields, dry lands, orchards, tea gardens, woodlands, shrubs, natural pastures, artificial pastures, businesses, industrial storage lands, houses, public service lands, special lands, railways, highways, street lanes, rural roads, airports, port docks, pipeline transportation, hydraulic buildings, vacant lands, river surfaces, and the like. The time of vegetation coverage data in the mid-stream region of the yellow river includes 1978, 1980, 1998, 2000, 2010 and 2016. And designing a corresponding database table according to the requirement of the downstream flood risk graph and warehousing the risk graph. The flood risk graph is stored in a vector layer or service mode, and the layer attribute has at least hydrologic characteristic conditions and corresponding risk information. The basic geographic information data comprises information of a large dike, a production dike, a control and guide project, a section and the like in the downstream area of the yellow river.
In some embodiments, when the data in the data source is water conservancy service data, the data source includes:
the basic class data warehouse is used for storing historical data;
and the real-time class data warehouse is used for storing real-time data.
As shown in fig. 4, in the process of basin management and related service processing, real-time data processing and application are very important, and no matter daily service processing such as flood control and drought resistance, water resource management, and the like, or case meetings, decision meetings, special event discussions and the like corresponding to these services, there are various needs for various statistical data based on real-time data, such as cumulative rainfall in the past 24 hours, flood in the past 3 days, and the like. In terms of timeliness of warehousing and updating, statistical analysis of the data is significantly different from statistical analysis of historical data or direct use of statistical results of the historical data, so that the data has specific requirements on processing, and meanwhile, because the traditional historical statistical data is formed through a strict program, a strict mode and a strict method 'compilation' process and 'judgment', check and correction integrated with artificial intelligence, the two types of statistical data are distinguished in generation, processing and storage. Fig. 4 shows a specific embodiment of the process.
In summary, the data warehouse architecture is executed in the physical structure design stage, the ETL process is realized, good physical models are developed according to the designed logical model, the ETL and the like, the ETL process is executed, and the scheduling is optimized, which is beneficial to improving the data access, query execution, data warehouse maintenance, data uploading process and the like. In addition, in the physical implementation of the data warehouse, the data warehouse performance is optimized from the aspects of a data warehouse storage structure, partitioning, indexing, materialized view design and the like, so that rapid decision information acquisition and display are realized.
It is to be understood that the method embodiments provided above correspond to the method embodiments described above, and corresponding specific contents may be referred to each other, which are not described herein again.
As will be appreciated by one skilled in the art, embodiments of the present application may be provided as a method, system, or computer program product. Accordingly, the present application may take the form of an entirely hardware embodiment, an entirely software embodiment or an embodiment combining software and hardware aspects. Furthermore, the present application may take the form of a computer program product embodied on one or more computer-usable storage media (including, but not limited to, disk storage, optical storage, and the like) having computer-usable program code embodied therein.
The present application is described with reference to flowchart illustrations and/or block diagrams of methods, apparatus (systems), and computer program products according to embodiments of the application. It will be understood that each flow and/or block of the flow diagrams and/or block diagrams, and combinations of flows and/or blocks in the flow diagrams and/or block diagrams, can be implemented by computer program instructions. These computer program instructions may be provided to a processor of a general purpose computer, special purpose computer, embedded processor, or other programmable data processing apparatus to produce a machine, such that the instructions, which execute via the processor of the computer or other programmable data processing apparatus, create means for implementing the functions specified in the flowchart flow or flows and/or block diagram block or blocks.
These computer program instructions may also be stored in a computer-readable memory that can direct a computer or other programmable data processing apparatus to function in a particular manner, such that the instructions stored in the computer-readable memory produce an article of manufacture including instruction means which implement the function specified in the flowchart flow or flows and/or block diagram block or blocks.
These computer program instructions may also be loaded onto a computer or other programmable data processing apparatus to cause a series of operational steps to be performed on the computer or other programmable apparatus to produce a computer implemented process such that the instructions which execute on the computer or other programmable apparatus provide steps for implementing the functions specified in the flowchart flow or flows and/or block diagram block or blocks.
The above description is only for the specific embodiments of the present invention, but the scope of the present invention is not limited thereto, and any person skilled in the art can easily conceive of the changes or substitutions within the technical scope of the present invention, and all the changes or substitutions should be covered within the scope of the present invention. Therefore, the protection scope of the present invention shall be subject to the protection scope of the appended claims.

Claims (10)

1. A yellow river water sand change data warehouse, comprising: a data source, a convergence layer, a storage layer and an application layer which are connected in sequence from bottom to top, wherein,
the data source is positioned at the bottommost layer, consists of structured data, semi-structured data and unstructured data and is used for realizing the aggregation of source data to the convergence layer;
the convergence layer is used for extracting required data from the data source, processing the data and transmitting the processed data to the storage layer;
the storage layer is used for storing the data processed by the convergence layer;
the application layer is a tool for accessing data by a user and is used for carrying out data analysis on the data.
2. The yellow river sand change data store according to claim 1,
the data source is also used for carrying out data management on the data and displaying data source information according to different data hierarchy sequences, wherein the data source information comprises a data name, data description, data hierarchy, a resource address, a data type, a unit where the data is located, whether the data is accessed or not and a latest synchronization date; the data management includes synchronization and updating;
and the convergence layer extracts the data required by the data source in a data batch extraction or quasi-real-time data extraction mode.
3. The yellow river sand change data warehouse of claim 1, wherein the processing data comprises:
and extracting, converting, cleaning, loading and processing stream data of the data.
4. The yellow river sand change data store according to claim 3,
constructing a multi-dimensional data model through a fact table, dimensions, measures and layers; the data warehouse is provided with a theme, the data is executed as data around the theme, the data takes a fact table as a center and is associated with a plurality of dimension tables, the fact table comprises a plurality of dimension and measurement, the dimension represents a specific visual angle of analysis data of a decision user, the measurement is the actual meaning and measurement index of the data, each dimension table describes a plurality of dimensions and values thereof, and each dimension is divided into different layers;
the multidimensional data model is used for defining an ETL process and mapping so as to extract, convert, clean, load and process stream data.
5. The yellow river sand change data warehouse of claim 4, wherein the ETL process comprises:
creating dimensions, creating data cubes, creating a mapping, creating an ETL flow.
6. The yellow river sand change data warehouse of claim 1, wherein extracting the required data in the data source comprises:
establishing three triggers of insertion, modification and deletion; when the data in the source table changes, the corresponding trigger writes the changed data into a temporary table, the extraction thread extracts the data from the temporary table, and the data extracted from the temporary table is marked or deleted;
adding a timestamp field on a source table, modifying the value of the timestamp field when updating and modifying table data in a system, and determining extracted data by comparing the system time with the value of the timestamp field when extracting data;
establishing an MD5 temporary table for the table to be extracted by adopting a data extraction tool, wherein the temporary table records a main key of a source table and an MD5 check code calculated according to data of all fields, and when data extraction is carried out, comparing the MD5 check code of the source table with the MD5 temporary table so as to determine whether the data in the source table is added, modified or deleted and update the MD5 check code;
the changed data is judged through log comparison, the extraction of the file data is generally carried out by total extraction, the time stamp field of the file or the MD5 check code of the calculated file can be saved before one extraction, the comparison is carried out during the next extraction, and if the data are the same, the extraction is ignored.
7. The yellow river sand change data warehouse of claim 1, wherein the storage layer comprises:
a data warehouse of structured data, a data warehouse of semi-structured data, and a data warehouse of unstructured data;
wherein the structured data is saved to a data warehouse of the structured data;
the semi-structured data is processed into structured data and stored in a data warehouse of the structured data or the semi-structured data is kept and stored in the data warehouse of the semi-structured data;
unstructured data is processed into or kept from a structured data repository to a data repository of structured data.
8. The yellow river sand change data store according to claim 1,
the structured data comprises a relational database and a structured report; the semi-structured data comprises a file; the non-structural data comprises WEB pages and flat text data.
9. The yellow river sand change data warehouse of any one of claims 1 to 8, wherein the data in the data sources comprises:
river system, hydrological station, rainfall and historical rainfall data, land utilization, vegetation coverage, downstream flood risk map, and data of basic geographic information.
10. The yellow river water sand change data warehouse of claim 1, wherein when the data in the data source is water conservancy business data, the data source comprises:
the basic class data warehouse is used for storing historical data;
and the real-time class data warehouse is used for storing real-time data.
CN202011134223.9A 2020-10-21 2020-10-21 Yellow river water sand change data warehouse Pending CN112231301A (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN202011134223.9A CN112231301A (en) 2020-10-21 2020-10-21 Yellow river water sand change data warehouse

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN202011134223.9A CN112231301A (en) 2020-10-21 2020-10-21 Yellow river water sand change data warehouse

Publications (1)

Publication Number Publication Date
CN112231301A true CN112231301A (en) 2021-01-15

Family

ID=74109031

Family Applications (1)

Application Number Title Priority Date Filing Date
CN202011134223.9A Pending CN112231301A (en) 2020-10-21 2020-10-21 Yellow river water sand change data warehouse

Country Status (1)

Country Link
CN (1) CN112231301A (en)

Cited By (1)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN113609238A (en) * 2021-07-24 2021-11-05 全图通位置网络有限公司 Hadoop platform-based geographic entity spatial data processing method and storage medium

Citations (5)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN105488187A (en) * 2015-12-02 2016-04-13 北京四达时代软件技术股份有限公司 Method and device for extracting multi-source heterogeneous data increment
CN108280084A (en) * 2017-01-06 2018-07-13 上海前隆信息科技有限公司 A kind of construction method of data warehouse, system and server
CN109189764A (en) * 2018-09-20 2019-01-11 北京桃花岛信息技术有限公司 A kind of colleges and universities' data warehouse layered design method based on Hive
CN110109987A (en) * 2018-04-03 2019-08-09 中建材信息技术股份有限公司 A kind of agility data warehouse schema and its construction method and application
CN111581186A (en) * 2020-05-12 2020-08-25 黄河水利委员会黄河水利科学研究院 Construction method of yellow river water sand change data warehouse and public service platform

Patent Citations (5)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN105488187A (en) * 2015-12-02 2016-04-13 北京四达时代软件技术股份有限公司 Method and device for extracting multi-source heterogeneous data increment
CN108280084A (en) * 2017-01-06 2018-07-13 上海前隆信息科技有限公司 A kind of construction method of data warehouse, system and server
CN110109987A (en) * 2018-04-03 2019-08-09 中建材信息技术股份有限公司 A kind of agility data warehouse schema and its construction method and application
CN109189764A (en) * 2018-09-20 2019-01-11 北京桃花岛信息技术有限公司 A kind of colleges and universities' data warehouse layered design method based on Hive
CN111581186A (en) * 2020-05-12 2020-08-25 黄河水利委员会黄河水利科学研究院 Construction method of yellow river water sand change data warehouse and public service platform

Non-Patent Citations (1)

* Cited by examiner, † Cited by third party
Title
曹建军,等, 国防工业出版社 *

Cited By (1)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN113609238A (en) * 2021-07-24 2021-11-05 全图通位置网络有限公司 Hadoop platform-based geographic entity spatial data processing method and storage medium

Similar Documents

Publication Publication Date Title
CN108710625B (en) Automatic thematic knowledge mining system and method
CN111680025B (en) Method and system for intelligently assimilating space-time information of multi-source heterogeneous data oriented to natural resources
CN112149027A (en) City operation data visual management system
CN102023983B (en) Managing method of statistical space-time database
CN102103713A (en) Method and system for monitoring wetland resource and ecological environment
CN102750363B (en) Construction method of urban geographic information data warehouse
CN201853252U (en) Wetland resource and ecological environment supervising system
CN103679563A (en) Design and application of irrigation and water conservancy intelligent management integrated system
CN108009738A (en) A kind of coal mine management system of " figure " pattern
Hallett et al. Environmental information systems developments for planning sustainable land use
Laraichi et al. Data integration as the key to building a decision support system for groundwater management: Case of Saiss aquifers, Morocco
CN109558474A (en) The method and system in forest land and forest reserves geographical data bank are established based on Arcgis
CN108875087A (en) A method of description things space attribute is simultaneously searched based on the description
CN112231301A (en) Yellow river water sand change data warehouse
CN114416692B (en) Method for constructing river basin water environment management data resource system
Parent et al. Conceptual modeling for federated GIS over the Web
Deng et al. Homestead Engineering Planning Based on CAD Internet of Things Technology
Tao et al. Construction and application of natural resource data governance system
Souza et al. LADPU smart meter data
Mellor et al. Remote Sensing Victoria’s Public Land Forests—A Two Tiered Synoptic Approach
Shao et al. Research of data resource management platform in smart city
Wang et al. Spatial aided decision-making system for E-Government
He et al. Construction and Realization for Multi-dimensional Database Based on RDBMS.
CN112766146B (en) Multi-source data-based dynamic reservoir monitoring system and method
Pan Management-Oriented Upgrade and Construction of Urban Green Space Management System in Wuxi

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination
RJ01 Rejection of invention patent application after publication
RJ01 Rejection of invention patent application after publication

Application publication date: 20210115