CN114048192A

CN114048192A - Method for optimizing data storage of database

Info

Publication number: CN114048192A
Application number: CN202111472678.6A
Authority: CN
Inventors: 贾德星
Original assignee: Inspur Cloud Information Technology Co Ltd
Current assignee: Shanghai Yunxi Technology Co ltd
Priority date: 2021-12-06
Filing date: 2021-12-06
Publication date: 2022-02-15

Abstract

The invention discloses a method for optimizing stored data of a database, which relates to the technical field of database storage; writing data according to a database table, and storing the changed data column VALUEs by using a data row KV storage coding model, wherein the main KEY column VALUE KEY of the data row KV storage coding model records the complete data row, and other column VALUEs VALUE of the data row KV storage coding model only record the partial changed column field VALUEs of the data row; inquiring and merging KV data of data rows according to a database table, reading a main KEY column value KEY of the data rows by using a data row KV storage coding model, acquiring row data KEY and timestamps, sequencing according to the row data KEY and the timestamps to obtain KV data of the same row data KEY with the latest partially changed column field value, and merging the KV data of the same row data KEY.

Description

Method for optimizing data storage of database

Technical Field

The invention discloses a method, relates to the technical field of database storage, and particularly relates to a method for optimally storing data in a database.

Background

Relational databases organize data in a row-column two-dimensional table model, which is often written to the underlying storage system in row units during storage. However, currently, the distributed databases, such as CockroachDB, TiDB, and the like, which are mainstream in the industry, all use a Key-Value (Key Value) storage format when storing data rows, and the database engine encodes one row of data in the database table into one KV data and stores the KV data into a bottom storage engine, such as RocksDB, wherein Key is a Value of a table main Key column, and values of other columns are encoded into Value. When data is updated, even if only the value of one column field is modified, all the old data rows need to be read out, and after a certain column field value is modified, the data needs to be encoded into KV data again for storage. In this way, redundant storage and reading IO is increased, the usage amount of storage and writing IO is also increased, and meanwhile, the writing amplification of data is also increased, and the overall performance of the system is affected.

Disclosure of Invention

The invention provides a method for optimizing stored data of a database aiming at the problems in the prior art, and the specific scheme provided by the invention is as follows:

a method for optimizing data storage of a database comprises the steps of writing data according to a database table, storing changed data column VALUEs by using a data row KV storage coding model, wherein a complete data row is recorded by a main KEY column VALUE KEY of the data row KV storage coding model, and only partial changed column field VALUEs of the data row are recorded by other column VALUEs VALUE of the data row KV storage coding model;

inquiring and merging KV data of data rows according to a database table, reading a main KEY column value KEY of the data rows by using a data row KV storage coding model, acquiring row data KEY and timestamps, sequencing according to the row data KEY and the timestamps to obtain KV data of the same row data KEY with the latest partially changed column field value, and merging the KV data of the same row data KEY.

Further, the method for optimized storage of data in database according to claim 1, wherein the recording of the complete data row by the KEY value KEY of the data row KV storage coding model includes:

when a row of new data is inserted into the database table, the main KEY column VALUE is coded into KEY according to a data row KV storage coding model, other column VALUEs are coded into VALUE, the timestamp is recorded as the current time, and the total data identifier is set to be 1 to indicate that the current KV data is a complete data row.

Further, in the method for optimized storage of data in a database according to claim 1, the recording of only partial changed column field VALUEs of the data row by using other column VALUEs VALUE of the data row KV storage coding model includes:

when a row of data partial column field VALUEs of the database table are changed, only the modified column field VALUEs are encoded into VALUE, and the full data identification is set to 0, which means that the current KV data only contains partial column field VALUEs.

Further, the method for optimizing data storage of a database according to claim 1, wherein the obtaining KV data of the same row data KEY by sorting according to the row data KEY and the timestamp includes:

and carrying out forward sequencing from small to large according to the size of binary coding bytes of the row data KEY, carrying out reverse sequencing from large to small according to the timestamp, and reading KV data of the same row data KEY with the latest partially changed column field value.

Further, the method for optimizing data storage in a database according to claim 1, wherein the merging the KV data of the same row data KEY includes:

merging the KV data of the KEY data of the same row of data according to a merging rule, wherein the merging rule comprises the following steps:

starting from the KV data with the full data identifier of 0, traversing the KV data of the same row of data KEY until the KV data with the full data identifier of 1 is read, merging the traversed KV data, wherein the timestamp of KEY is the timestamp of the latest KEY, merging field column VALUEs in VALUE, and each field column takes the VALUE of the latest timestamp.

A system for optimizing data storage of database includes a data line KV storage coding model module,

the data row KV storage coding model module writes data according to a database table, and stores the changed data column VALUEs by using the data row KV storage coding model, wherein the complete data row is recorded by using the main KEY column VALUE KEY of the data row KV storage coding model, and only the field VALUEs of partial changed columns of the data row are recorded by using other column VALUEs VALUE of the data row KV storage coding model;

Further, in the system for optimized storage of data in database according to claim 1, the data row KV storage coding model module records a complete data row through the KEY value KEY of the data row KV storage coding model, including:

Further, in a system for database-optimized storage of data according to claim 1, the data row KV storage coding model module records only a part of the modified column field VALUEs of the data row through other column VALUEs VALUE of the data row KV storage coding model, including:

Further, the system for optimized data storage in database according to claim 1, wherein the data row KV storage coding model module performs sorting according to the row data KEY and the timestamp to obtain KV data of the same row data KEY, and the method includes:

Further, the system for optimized data storage in database according to claim 1, wherein the data row KV storage coding model module merges KV data of the same row data KEY, and includes:

The invention has the advantages that:

the invention provides a method for optimizing data storage of a database, which only stores the changed data column value instead of the whole row data when the data is changed by the partial column field value of the data row of the database table, can effectively reduce the storage IO of the data change,

by the method provided by the invention, only the data containing the required field is required to be read when the partial field of the data row is inquired, the data of the whole row is not required to be read, the data reading IO can be reduced,

by the method, partial column value KV data can be merged into complete row data, reading optimization of the whole row data is realized, and the performance of data query of the whole row is improved.

Drawings

In order to more clearly illustrate the embodiments of the present invention or the technical solutions in the prior art, the drawings used in the description of the embodiments or the prior art will be briefly introduced below, and it is obvious that the drawings in the following description are some embodiments of the present invention, and for those skilled in the art, other drawings can be obtained according to these drawings without creative efforts.

FIG. 1 is a schematic diagram of a data line KV coding model in the present invention.

FIG. 2 is a schematic diagram of a data reading process of the method of the present invention.

Fig. 3 is a schematic diagram of an example of data merging in the present invention.

Detailed Description

The present invention is further described below in conjunction with the following figures and specific examples so that those skilled in the art may better understand the present invention and practice it, but the examples are not intended to limit the present invention.

The invention provides a method for optimizing data storage of a database, which is characterized in that data is written in according to a database table, a data row KV storage coding model is used for storing changed data column VALUEs, wherein a complete data row is recorded through a main KEY column VALUE KEY of the data row KV storage coding model, and other column VALUEs VALUE of the data row KV storage coding model are used for only recording partial changed column field VALUEs of the data row;

When the data is changed in the field value of a part of columns of the data rows of the database table, only the changed data column values are stored, but not the data of the whole row, so that the storage IO of the data change can be effectively reduced, and the overall performance of the data change is improved; when a part of fields of a data row are inquired, only data containing the required fields need to be read, and when the whole row of data is inquired, the data rows of a plurality of part fields can be merged and returned according to the inquiry requirement, so that the data reading IO is reduced; the method can also combine partial column value data into complete row data, realize the reading optimization of the whole row data and improve the performance of the whole row data query.

In a specific application, when the database is optimized to store data in some embodiments of the method of the present invention, the data column values of the changes are stored by using the data row KV storage coding model.

In the data row KV storage coding model, a KEY is taken as an example, and as shown in fig. 1, the KEY may be divided into two parts, namely a "row data KEY" and a "timestamp", where the "row data KEY" includes a table ID, an index ID and a binary code of a primary KEY value, and is used to uniquely identify a certain row of data, and an additional timestamp indicates a write-in time of the row data, and may be used for multi-version management of the data.

The VALUE includes binary codes of other data list fields except the primary key list, including:

a CRC check VALUE for checking whether the VALUE has an error;

the full data identifier is used for indicating whether the current VALUE contains the VALUEs of all the data list fields;

column value data including values of column ID, column type, and column field, and various types of data are uniformly encoded into a binary format.

When new data is written in and a row of new data is inserted into a database table, a primary KEY VALUE is coded into KEY according to a data row KV storage coding model, a timestamp is recorded as the current time, other columns are coded into VALUE, and a full data identifier is set to be 1 to indicate that the current KV is a complete data row;

when data is modified, if the column fields of the whole row of data are modified, the encoding mode is the same as that of the newly added data, if only part of the column field VALUEs are changed, the full data identifier is 0, only the modified column field VALUEs are encoded into the VALUE, and the fact that the current KV data only contain part of the column field VALUEs is indicated.

When the KV data of a data row is queried according to a database table, forward sorting from small to large can be performed according to the size of the binary coded byte of the row data KEY, such as table ID, index ID, and primary KEY value, and reverse sorting according to the timestamp from large to small, which can ensure that the KV data of the latest modified partial field column value of the same row data KEY is read preferentially, wherein when reading a row of data, for the same row data KEY, the KV data with the total data identifier of 1 or all the KV data are read until the column field value required by the client is read iteratively according to the sorting rule.

When the KV data are merged, a background thread can be started, merging operation is carried out on the bottom-layer storage files, and multiple partial field KV values of the same row data KEY are merged into a complete column field KV value during merging. The merge rule is as follows:

in the combined VALUE, the total data identifier is 1;

traversing KV of KEY in the same row of data from KV of which the total data identification is 0 until KV of which the total data identification is 1 is read, and then merging the traversed KV, wherein a KEY timestamp is the timestamp of the latest KEY, field column VALUEs in VALUE are merged, and each column takes the VALUE of the latest timestamp;

each row of data KEY may retain multiple time-stamped versions of "full data" VALUE for multi-version management, and delete old versions of KV when garbage reclamation needs to be performed. An example of data consolidation is illustrated with reference to fig. 3.

Meanwhile, the invention also provides a system for optimizing and storing data in a database, which comprises a data line KV storage coding model module,

The information interaction, execution process and other contents between the modules in the system are based on the same concept as the method embodiment of the present invention, and specific contents can be referred to the description in the method embodiment of the present invention, and are not described herein again.

Similarly, the system can only store the changed data column value instead of the whole row of data when the data is changed by the partial column field value of the data row of the database table, so that the storage IO of data change can be effectively reduced.

It should be noted that not all steps and modules in the above flows and system structures are necessary, and some steps or modules may be omitted according to actual needs. The execution order of the steps is not fixed and can be adjusted as required. The system structure described in the above embodiments may be a physical structure or a logical structure, that is, some modules may be implemented by the same physical entity, or some modules may be implemented by a plurality of physical entities, or some components in a plurality of independent devices may be implemented together.

The above-mentioned embodiments are merely preferred embodiments for fully illustrating the present invention, and the scope of the present invention is not limited thereto. The equivalent substitution or change made by the technical personnel in the technical field on the basis of the invention is all within the protection scope of the invention. The protection scope of the invention is subject to the claims.

Claims

1. A method for optimizing data storage of a database is characterized in that data are written in according to a database table, a data row KV storage coding model is used for storing changed data column VALUEs, wherein a complete data row is recorded through a main KEY column VALUE KEY of the data row KV storage coding model, and only partial changed column field VALUEs of the data row are recorded through other column VALUEs VALUE of the data row KV storage coding model;

2. The method for optimized storage of data in database as claimed in claim 1, wherein said recording of the complete data row by the primary KEY column value KEY of the data row KV storage coding model comprises:

3. The method for optimizing storage data of the database according to claim 1 or 2, wherein the recording of only partial change column field VALUEs of the data rows by other column VALUEs VALUE of the data row KV storage coding model comprises:

4. The method as claimed in claim 1, wherein the obtaining KV data of the same row data KEY by sorting according to the row data KEY and the timestamp comprises:

5. The method as claimed in claim 3, wherein the merging the KV data of the same row data KEY comprises:

6. A system for optimizing data storage in database is composed of KV data storage and coding model module,

7. The system for optimized storage of data in database as claimed in claim 6, wherein the data row KV storage coding model module records complete data row by the KEY value KEY of the data row KV storage coding model, comprising:

8. The system for optimized storage of data in database as claimed in claim 6 or 7, wherein the data row KV storage coding model module records only partial changed column field VALUEs of data rows by using other column VALUEs VALUE of data row KV storage coding model, comprising:

9. The system of claim 6, wherein the data row KV memory coding model module obtains KV data of the same row data KEY by sorting according to the row data KEY and the timestamp, and the system comprises:

10. The system of claim 8, wherein the data row KV storage coding model module merges KV data of the same row data KEY, and comprises: