CN114238362A

CN114238362A - Water conservancy data management system

Info

Publication number: CN114238362A
Application number: CN202210188819.XA
Authority: CN
Inventors: 沈羽翀; 缪明宝; 何卓彦; 潘颖
Original assignee: Guangzhou Guanbida Data Technology Co ltd
Current assignee: Guangzhou Guanbida Data Technology Co ltd
Priority date: 2022-03-01
Filing date: 2022-03-01
Publication date: 2022-03-25

Abstract

The invention relates to a water conservancy data management system which comprises a high-concurrency network processing module, an encoding and decoding module, a data storage module, a data query module, a data manipulation module and a data control module. Based on the method, the defects of water conservancy data management of big data are overcome through a base table management mechanism, a data management mechanism and a data cache mechanism, the requirements of real-time concurrent writing and frequent query of water conservancy data are met, and a water conservancy data management system with the characteristics of high efficiency and light weight is constructed. Meanwhile, the writing speed and the query speed of the water conservancy data management system are improved.

Description

Water conservancy data management system

Technical Field

The invention relates to the technical field of water conservancy, in particular to a water conservancy data management system.

Background

With the rapid development of the informatization technology and the coming of the era of the internet of things, more and more water conservancy informatization infrastructures and application systems are applied to the fields of water conservancy engineering construction and management, water administration business treatment and the like, and the data uploaded by water conservancy internet of things equipment is larger and larger. Meanwhile, as the information amount of the whole society (especially on the internet) is in an explosive growth situation, big data technology is in force. The big data technology is introduced into the water conservancy industry and becomes a necessary trend as a basic technology for water conservancy informatization and intelligent construction.

The water conservancy informatization covers various aspects of water conservancy project surveying, planning, designing, construction, operation management and maintenance, flood control, water resource management, water and soil conservation and other water administration management and the like. Water conservancy data form is various, the variety is various, and data total volume is huge and lasts high-speed the increase. For example, in recent years, the types and the number of monitoring devices are increased, and the uploading frequency of monitoring data across regions is increased, so that the quantity of collected monitoring data is increased rapidly; in flood control management business, the data volume generated by applying hydrological model forecasting, deduction and scheduling is also rapidly increased; unstructured data such as videos, images, and documents are accumulated in large quantities.

Currently, the open source community provides a variety of software tools for big data engineers, such as a distributed file system HDFS, a distributed cache system Kafka, a distributed memory computing engine Spark, or a data warehouse Hive. These software tools offer a variety of large data processing analysis solutions, but also increase the learning cost of the engineer and the difficulty of selecting the appropriate tool. The traditional relational database has good transactional performance but lacks high responsiveness of analyzing data, and the existing big data software has fast large data statistical computing capacity, but is unconscious in real-time writing and updating of data, and cannot meet the requirements of fast writing and fast data query in the water conservancy industry.

Therefore, the traditional mode for managing the water conservancy data has the defects.

Disclosure of Invention

Based on this, it is necessary to provide a water conservancy data management system for overcoming the defects of the conventional water conservancy data management method.

A water conservancy data management system, comprising:

the high-concurrency network processing module is used for processing a high-concurrency network accessed to the water conservancy data;

the encoding and decoding module is used for encoding and decoding the water conservancy data of the client or the server;

the data storage module is defined with a structure file, a data file or a data type and used for storing the water conservancy data;

the data query module is configured to execute a parsing program, a syntax tree optimization program, a link table, condition screening, accumulation calculation, grouping calculation or sequencing calculation related to the water conservancy data;

the data manipulation module is configured to execute data writing, data deleting or data updating related to the water conservancy data and is used for manipulating the water conservancy data;

and the data control module is configured to execute library table file management related to water conservancy data.

The water conservancy data management system comprises a high concurrency network processing module, an encoding and decoding module, a data storage module, a data query module, a data manipulation module and a data control module. Based on the method, the defects of water conservancy data management of big data are overcome through a base table management mechanism, a data management mechanism and a data cache mechanism, the requirements of real-time concurrent writing and frequent query of water conservancy data are met, and a water conservancy data management system with the characteristics of high efficiency and light weight is constructed. Meanwhile, the writing speed and the query speed of the water conservancy data management system are improved.

In one embodiment, the data file adopts a row-column mixed storage mode; the row-column mixed storage mode comprises a row storage format and a column storage format;

the row-column mixed storage mode comprises the following steps:

writing new written data of the data file in a line storage format;

and merging the newly written data into the historical data in a column storage mode after the data written in the row storage format exceeds a preset data threshold.

In one embodiment, the line store format comprises the steps of:

when the newly written data is smaller than the column cluster, the newly written data is added to the tail of the row storage file;

and writing the new written data into the column storage file when the new written data is larger than or equal to the column cluster.

In one embodiment, the tail of the line storage file comprises a representation version, a line number, a deleted line number, a deletion flag byte number and a deletion flag bit sequence.

In one embodiment, the column cluster comprises a storage file column cluster comprising a column cluster head and a data column sequence;

the column cluster head type comprises a column number, a type, a compression algorithm code, an actual bit width, a row number, a non-null sign byte number and a dictionary number;

the sequence of data columns includes a string type and a value type.

In one embodiment, the tail of the column storage file includes a representation version, a number of clusters per column, a number of rows, a number of deleted rows, a number of deletion flag bytes, and a deletion flag bit sequence.

In one embodiment, the data query module is based on the data query language DQL.

In one embodiment, the data manipulation module is based on a data manipulation language DML.

In one embodiment, the data control module is based on a data control language DCL.

In one embodiment, the library table file management comprises the steps of:

deleting the database to delete the database folder;

creating a database to create a database folder;

delete table to delete table folder;

a table is created to create a table folder and a STRUCT file is created.

Drawings

FIG. 1 is a block diagram of a water management system according to an embodiment;

FIG. 2 is a schematic diagram of a line storage format;

FIG. 3 is a diagram of a column storage format;

FIG. 4 is a schematic diagram of a marker deletion method;

fig. 5 is a schematic diagram of a data updating method.

Detailed Description

For better understanding of the objects, technical solutions and effects of the present invention, the present invention will be further explained with reference to the accompanying drawings and examples. Meanwhile, the following described examples are only for explaining the present invention, and are not intended to limit the present invention.

The embodiment of the invention provides a water conservancy data management system.

Fig. 1 is a block diagram of a water conservancy data management system according to an embodiment, and as shown in fig. 1, the water conservancy data management system according to an embodiment includes:

the high-concurrency network processing module 100 is used for processing a high-concurrency network accessed to the water conservancy data;

the encoding and decoding module 101 is used for encoding and decoding water conservancy data of a client or a server;

the data storage module 102 is defined with a structure file, a data file or a data type and used for storing water conservancy data;

a data query module 103 configured to execute a parser, a syntax tree optimizer, a link table, a condition filter, an accumulation calculation, a grouping calculation or a sorting calculation related to the water conservancy data;

a data manipulation module 104 configured to perform data writing, data deletion, or data updating related to the water conservancy data for manipulating the water conservancy data;

and the data control module 105 is configured to execute library table file management related to water conservancy data.

As shown in fig. 1, the water conservancy data management system faces to a client and a server, and performs interaction and management processing of data. The water conservancy data comprise various data related to water conservancy, including written data, query requests or interactive data of a water conservancy data management system and other databases.

When the client or the server accessing the water conservancy data management system has high data concurrency, the high-concurrency network processing module 100 executes high-concurrency network processing. The high-concurrency network processing module 100 is configured to execute a high-concurrency processing scheme, including multi-process or multi-threaded processing, to cope with high-concurrency network conditions.

The encoding and decoding module 101 is configured to encode and decode water conservancy data of a client or a server, and includes an encoder and a decoder, and the water conservancy data after being encoded and decoded provides a data base for configuration execution for the data storage module 102, the data query module 103, the data manipulation module 104, and the data control module 105.

The data storage module 102 is defined with a structure file, a data file or a data type.

The data storage module 102 defines a structure file, a data file or a data type on software by selecting a suitable storage device on hardware.

In one embodiment, the structure file includes table data information, field sequences, and multiple data file shares.

the row-column mixed storage mode comprises the following steps:

writing new written data of the data file in a line storage format;

Based on this, a row storage format is adopted for newly written data, and the data is written in an additional data mode, so that the water conservancy data management system has the writing advantage of concurrent small data, and when a row mode file exceeds a certain threshold, the data is merged into historical data in a column storage mode, specifically as follows:

(1) fig. 2 is a schematic diagram of a row storage format, and as shown in fig. 2, the row storage format stores all columns of the next row of data after the hydraulic data stores all columns of one row of data. When a single data object is written, the column does not need to be split, the data writing is faster than the column storage, and the characteristic of fast writing is achieved.

(2) Fig. 3 is a schematic diagram of a column storage format, and as shown in fig. 3, the column storage format is stored in units of column clusters (column clusters), first storing first column data of all rows, then storing second column data of all rows, and so on until the last column data of all rows. The column storage format can skip over columns which are not concerned when in query and data screening, directly position concerned conditions, have quick statistical computing capability, and can adopt the same type of data compression algorithm because one column cluster stores the same type of data. However, when writing data, column storage needs to split a row of records into single columns for storage, the number of writing times is obviously more than that of row storage, and in addition, the magnetic head needs to move and position on the disk, which takes a lot of time.

Therefore, the embodiment of the present invention further improves the row-column hybrid storage mode as follows:

first, in the line storage format, the following steps are included:

(1) when the newly written data is not larger than the size of the row cluster, adding the corresponding data to the tail part, wherein the memory simultaneously contains the file data;

(2) when newly written data reach the size of a row cluster, locking the file, converting memory data into a row storage format, writing the column storage format into a row file, and creating a new row file if the current row file has insufficient space;

(3) limiting the data file to be the row number of the column cluster;

(3) the row storage file end includes a sequence indicating the version (1 byte), the row number (4 bytes), the deleted row number (4 bytes), the deletion flag byte number (formula: total row number/8), and the deletion flag bit.

Second, in the column storage format, the following steps are included:

(1) and the column storage file is written after being converted from the row storage file, and is directly converted and written if the write-once data is overlarge.

As a preferred embodiment, the data file is limited to 32M (file too small increases IO, file too large is not conducive to parallel computing). If the data object has data larger than 32M, the storage limit is broken through and the data is stored according to the actual size of the data;

(2) the row number of column storage data cannot be changed after writing, the storage rate ((total row number-deleted row number)/total row number) is checked periodically, and if the storage rate is lower than a threshold value, a new file is created, and the undeleted data is moved;

(3) the column storage file column cluster comprises a column cluster head and a data column sequence, wherein the column cluster head type comprises a column number (2 bytes), a type (1 byte), a compression algorithm code number (1 byte), an actual bit width (1 byte), a line number (4 bytes), a non-null sign byte number (4 bytes), a dictionary number (8 bytes) and the like; the data column sequence comprises a character string type, a numerical value type and the like;

(4) the column storage file tail includes a representation version (1 byte), the number of clusters per column (4 bytes), the number of rows (4 bytes), the number of deleted rows (4 bytes), the number of deletion flag bytes (formula: total number of rows/8), and the deletion flag bit sequence.

In one embodiment, the data types of the hydraulic database storage system include char (1 byte), short (2 bytes), int (4 bytes), float (4 bytes), long (8 bytes), double (8 bytes), date (4 bytes), timestamp (8 bytes), string (variable length), and bind (variable length).

In one embodiment, the data query module 103 is based on the data query language dql (data query language).

Based on the data Query Language DQL, the data Query module 103 includes an SQL (Structured Query Language) parser, a syntax tree optimizer, a link table, condition screening, cumulative computation, grouping computation, and sorting computation.

The SQL analysis program of the data query language DQL comprises the steps of constructing a standard SQL grammar file, generating a grammar analysis C language code, embedding actions and constructing grammar tree generation logic.

The conditional screening of the data query language DQL comprises direct screening according to the empty flag bit and character string comparison, wherein the character string comparison comprises finding numbers meeting the conditions in a dictionary and finding a number list in a data sequence.

The accumulated calculation of the data query language DQL includes calculating the maximum value of the total number of rows (the number of rows minus the number of deletion flags 0), the sum (the total number of rows not deleted multiplied by the base number plus the sum of the deviation value), the average value (the sum divided by the number of rows not deleted), the minimum value of the deviation value, and the maximum value of the deviation value.

In one embodiment, the data Manipulation module 104 is based on a data Manipulation language DML (data management language).

Fig. 4 is a schematic diagram of a marker deletion method, and as shown in fig. 4, a marker deletion method is adopted for data deletion in the data manipulation language DML, and data is actually removed when data is merged next time. When creating a table, the system automatically adds a field identifier of library name, underline, table name, underline and ROW _ NUM to the table, when deleting data, finds the ROW _ NUM of the data record, appends the ROW _ NUM to a deletion marker file, and when inquiring data, excludes the data of the deletion marker file.

Fig. 5 is a schematic diagram of a data updating method, and as shown in fig. 5, data records are found by data updating in the data manipulation language DML according to a deletion condition, ROW _ NUM of data to be updated is written into a deletion marker file, the updated data is merged with the original record and then added to a line mode file, and direct data modification is replaced by a deletion and writing mode.

In one embodiment, the data Control module 105 is based on the data Control language DCL (data Control language).

Based on this, base table file management is performed by the data control language DCL.

As a preferred embodiment, the management of the library table file comprises the following steps:

deleting the database to delete the database folder;

creating a database to create a database folder;

delete table to delete table folder;

a table is created to create a table folder and a STRUCT file is created.

Based on the embodiment, the water conservancy data management system provided by the embodiment of the invention can meet the requirements of real-time concurrent writing and frequent query of water conservancy data, uses a cache based on statistics, a row-column combined storage mode and a compression algorithm based on column storage, and has the characteristics of high writing speed, high query speed and the like.

The water conservancy data management system according to any of the embodiments above includes a high concurrency network processing module 100, a coding and decoding module 101, a data storage module 102, a data query module 103, a data manipulation module 104, and a data control module 105. Based on the method, the defects of water conservancy data management of big data are overcome through a base table management mechanism, a data management mechanism and a data cache mechanism, the requirements of real-time concurrent writing and frequent query of water conservancy data are met, and a water conservancy data management system with the characteristics of high efficiency and light weight is constructed. Meanwhile, the writing speed and the query speed of the water conservancy data management system are improved.

The technical features of the above embodiments can be arbitrarily combined, and for the sake of brevity, all possible combinations of the technical features in the above embodiments are not described, but should be considered as the scope of the present specification as long as there is no contradiction between the combinations of the technical features.

The above examples only show some embodiments of the present invention, and the description thereof is more specific and detailed, but not construed as limiting the scope of the invention. It should be noted that, for a person skilled in the art, several variations and modifications can be made without departing from the inventive concept, which falls within the scope of the present invention. Therefore, the protection scope of the present patent shall be subject to the appended claims.

Claims

1. A water conservancy data management system, comprising:

2. The water conservancy data management system of claim 1, wherein the data files are stored in a mixed row and column mode; the row and column mixed storage mode comprises a row storage format and a column storage format;

wherein the row-column hybrid storage mode comprises the steps of:

the new written data of the data file is written in by adopting a line storage format;

3. The water conservancy data management system of claim 2, wherein the line storage format comprises the steps of:

when the new write data is smaller than the column cluster, the new write data is added to the tail of the row storage file;

4. The water conservancy data management system of claim 3, wherein the tail of the line storage file comprises a representation version, a number of lines, a number of deleted lines, a number of deletion flag bytes, and a sequence of deletion flag bits.

5. The water conservancy data management system of claim 3, wherein the column cluster comprises a stored file column cluster comprising a column cluster head and a sequence of data columns;

the data column sequence includes a string type and a value type.

6. The water conservancy data management system of claim 3, wherein the tail of the column storage file comprises a representation version, a number of clusters per column, a number of rows, a number of deleted rows, a number of bytes of a delete flag, and a sequence of delete flag bits.

7. The water conservancy data management system of claim 1, wherein the data query module is based on a Data Query Language (DQL).

8. The water conservancy data management system of claim 1, wherein the data manipulation module is based on a Data Manipulation Language (DML).

9. The water conservancy data management system of claim 1, wherein the data control module is based on a Data Control Language (DCL).

10. The water conservancy data management system according to any one of claims 1 to 9, wherein the library table file management comprises the steps of:

deleting the database to delete the database folder;

creating a database to create a database folder;

delete table to delete table folder;

a table is created to create a table folder and a STRUCT file is created.