CN114048192A - Method for optimizing data storage of database - Google Patents

Method for optimizing data storage of database Download PDF

Info

Publication number
CN114048192A
CN114048192A CN202111472678.6A CN202111472678A CN114048192A CN 114048192 A CN114048192 A CN 114048192A CN 202111472678 A CN202111472678 A CN 202111472678A CN 114048192 A CN114048192 A CN 114048192A
Authority
CN
China
Prior art keywords
data
row
key
column
value
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Pending
Application number
CN202111472678.6A
Other languages
Chinese (zh)
Inventor
贾德星
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Shanghai Yunxi Technology Co ltd
Original Assignee
Inspur Cloud Information Technology Co Ltd
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Inspur Cloud Information Technology Co Ltd filed Critical Inspur Cloud Information Technology Co Ltd
Priority to CN202111472678.6A priority Critical patent/CN114048192A/en
Publication of CN114048192A publication Critical patent/CN114048192A/en
Pending legal-status Critical Current

Links

Images

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F16/00Information retrieval; Database structures therefor; File system structures therefor
    • G06F16/20Information retrieval; Database structures therefor; File system structures therefor of structured data, e.g. relational data
    • G06F16/21Design, administration or maintenance of databases
    • G06F16/217Database tuning
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F16/00Information retrieval; Database structures therefor; File system structures therefor
    • G06F16/20Information retrieval; Database structures therefor; File system structures therefor of structured data, e.g. relational data
    • G06F16/21Design, administration or maintenance of databases
    • G06F16/211Schema design and management
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F16/00Information retrieval; Database structures therefor; File system structures therefor
    • G06F16/20Information retrieval; Database structures therefor; File system structures therefor of structured data, e.g. relational data
    • G06F16/22Indexing; Data structures therefor; Storage structures
    • G06F16/2282Tablespace storage structures; Management thereof
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F16/00Information retrieval; Database structures therefor; File system structures therefor
    • G06F16/20Information retrieval; Database structures therefor; File system structures therefor of structured data, e.g. relational data
    • G06F16/24Querying
    • G06F16/245Query processing
    • G06F16/2453Query optimisation
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F16/00Information retrieval; Database structures therefor; File system structures therefor
    • G06F16/20Information retrieval; Database structures therefor; File system structures therefor of structured data, e.g. relational data
    • G06F16/24Querying
    • G06F16/245Query processing
    • G06F16/2455Query execution
    • G06F16/24553Query execution of query operations

Abstract

The invention discloses a method for optimizing stored data of a database, which relates to the technical field of database storage; writing data according to a database table, and storing the changed data column VALUEs by using a data row KV storage coding model, wherein the main KEY column VALUE KEY of the data row KV storage coding model records the complete data row, and other column VALUEs VALUE of the data row KV storage coding model only record the partial changed column field VALUEs of the data row; inquiring and merging KV data of data rows according to a database table, reading a main KEY column value KEY of the data rows by using a data row KV storage coding model, acquiring row data KEY and timestamps, sequencing according to the row data KEY and the timestamps to obtain KV data of the same row data KEY with the latest partially changed column field value, and merging the KV data of the same row data KEY.

Description

Method for optimizing data storage of database
Technical Field
The invention discloses a method, relates to the technical field of database storage, and particularly relates to a method for optimally storing data in a database.
Background
Relational databases organize data in a row-column two-dimensional table model, which is often written to the underlying storage system in row units during storage. However, currently, the distributed databases, such as CockroachDB, TiDB, and the like, which are mainstream in the industry, all use a Key-Value (Key Value) storage format when storing data rows, and the database engine encodes one row of data in the database table into one KV data and stores the KV data into a bottom storage engine, such as RocksDB, wherein Key is a Value of a table main Key column, and values of other columns are encoded into Value. When data is updated, even if only the value of one column field is modified, all the old data rows need to be read out, and after a certain column field value is modified, the data needs to be encoded into KV data again for storage. In this way, redundant storage and reading IO is increased, the usage amount of storage and writing IO is also increased, and meanwhile, the writing amplification of data is also increased, and the overall performance of the system is affected.
Disclosure of Invention
The invention provides a method for optimizing stored data of a database aiming at the problems in the prior art, and the specific scheme provided by the invention is as follows:
a method for optimizing data storage of a database comprises the steps of writing data according to a database table, storing changed data column VALUEs by using a data row KV storage coding model, wherein a complete data row is recorded by a main KEY column VALUE KEY of the data row KV storage coding model, and only partial changed column field VALUEs of the data row are recorded by other column VALUEs VALUE of the data row KV storage coding model;
inquiring and merging KV data of data rows according to a database table, reading a main KEY column value KEY of the data rows by using a data row KV storage coding model, acquiring row data KEY and timestamps, sequencing according to the row data KEY and the timestamps to obtain KV data of the same row data KEY with the latest partially changed column field value, and merging the KV data of the same row data KEY.
Further, the method for optimized storage of data in database according to claim 1, wherein the recording of the complete data row by the KEY value KEY of the data row KV storage coding model includes:
when a row of new data is inserted into the database table, the main KEY column VALUE is coded into KEY according to a data row KV storage coding model, other column VALUEs are coded into VALUE, the timestamp is recorded as the current time, and the total data identifier is set to be 1 to indicate that the current KV data is a complete data row.
Further, in the method for optimized storage of data in a database according to claim 1, the recording of only partial changed column field VALUEs of the data row by using other column VALUEs VALUE of the data row KV storage coding model includes:
when a row of data partial column field VALUEs of the database table are changed, only the modified column field VALUEs are encoded into VALUE, and the full data identification is set to 0, which means that the current KV data only contains partial column field VALUEs.
Further, the method for optimizing data storage of a database according to claim 1, wherein the obtaining KV data of the same row data KEY by sorting according to the row data KEY and the timestamp includes:
and carrying out forward sequencing from small to large according to the size of binary coding bytes of the row data KEY, carrying out reverse sequencing from large to small according to the timestamp, and reading KV data of the same row data KEY with the latest partially changed column field value.
Further, the method for optimizing data storage in a database according to claim 1, wherein the merging the KV data of the same row data KEY includes:
merging the KV data of the KEY data of the same row of data according to a merging rule, wherein the merging rule comprises the following steps:
starting from the KV data with the full data identifier of 0, traversing the KV data of the same row of data KEY until the KV data with the full data identifier of 1 is read, merging the traversed KV data, wherein the timestamp of KEY is the timestamp of the latest KEY, merging field column VALUEs in VALUE, and each field column takes the VALUE of the latest timestamp.
A system for optimizing data storage of database includes a data line KV storage coding model module,
the data row KV storage coding model module writes data according to a database table, and stores the changed data column VALUEs by using the data row KV storage coding model, wherein the complete data row is recorded by using the main KEY column VALUE KEY of the data row KV storage coding model, and only the field VALUEs of partial changed columns of the data row are recorded by using other column VALUEs VALUE of the data row KV storage coding model;
inquiring and merging KV data of data rows according to a database table, reading a main KEY column value KEY of the data rows by using a data row KV storage coding model, acquiring row data KEY and timestamps, sequencing according to the row data KEY and the timestamps to obtain KV data of the same row data KEY with the latest partially changed column field value, and merging the KV data of the same row data KEY.
Further, in the system for optimized storage of data in database according to claim 1, the data row KV storage coding model module records a complete data row through the KEY value KEY of the data row KV storage coding model, including:
when a row of new data is inserted into the database table, the main KEY column VALUE is coded into KEY according to a data row KV storage coding model, other column VALUEs are coded into VALUE, the timestamp is recorded as the current time, and the total data identifier is set to be 1 to indicate that the current KV data is a complete data row.
Further, in a system for database-optimized storage of data according to claim 1, the data row KV storage coding model module records only a part of the modified column field VALUEs of the data row through other column VALUEs VALUE of the data row KV storage coding model, including:
when a row of data partial column field VALUEs of the database table are changed, only the modified column field VALUEs are encoded into VALUE, and the full data identification is set to 0, which means that the current KV data only contains partial column field VALUEs.
Further, the system for optimized data storage in database according to claim 1, wherein the data row KV storage coding model module performs sorting according to the row data KEY and the timestamp to obtain KV data of the same row data KEY, and the method includes:
and carrying out forward sequencing from small to large according to the size of binary coding bytes of the row data KEY, carrying out reverse sequencing from large to small according to the timestamp, and reading KV data of the same row data KEY with the latest partially changed column field value.
Further, the system for optimized data storage in database according to claim 1, wherein the data row KV storage coding model module merges KV data of the same row data KEY, and includes:
merging the KV data of the KEY data of the same row of data according to a merging rule, wherein the merging rule comprises the following steps:
starting from the KV data with the full data identifier of 0, traversing the KV data of the same row of data KEY until the KV data with the full data identifier of 1 is read, merging the traversed KV data, wherein the timestamp of KEY is the timestamp of the latest KEY, merging field column VALUEs in VALUE, and each field column takes the VALUE of the latest timestamp.
The invention has the advantages that:
the invention provides a method for optimizing data storage of a database, which only stores the changed data column value instead of the whole row data when the data is changed by the partial column field value of the data row of the database table, can effectively reduce the storage IO of the data change,
by the method provided by the invention, only the data containing the required field is required to be read when the partial field of the data row is inquired, the data of the whole row is not required to be read, the data reading IO can be reduced,
by the method, partial column value KV data can be merged into complete row data, reading optimization of the whole row data is realized, and the performance of data query of the whole row is improved.
Drawings
In order to more clearly illustrate the embodiments of the present invention or the technical solutions in the prior art, the drawings used in the description of the embodiments or the prior art will be briefly introduced below, and it is obvious that the drawings in the following description are some embodiments of the present invention, and for those skilled in the art, other drawings can be obtained according to these drawings without creative efforts.
FIG. 1 is a schematic diagram of a data line KV coding model in the present invention.
FIG. 2 is a schematic diagram of a data reading process of the method of the present invention.
Fig. 3 is a schematic diagram of an example of data merging in the present invention.
Detailed Description
The present invention is further described below in conjunction with the following figures and specific examples so that those skilled in the art may better understand the present invention and practice it, but the examples are not intended to limit the present invention.
The invention provides a method for optimizing data storage of a database, which is characterized in that data is written in according to a database table, a data row KV storage coding model is used for storing changed data column VALUEs, wherein a complete data row is recorded through a main KEY column VALUE KEY of the data row KV storage coding model, and other column VALUEs VALUE of the data row KV storage coding model are used for only recording partial changed column field VALUEs of the data row;
inquiring and merging KV data of data rows according to a database table, reading a main KEY column value KEY of the data rows by using a data row KV storage coding model, acquiring row data KEY and timestamps, sequencing according to the row data KEY and the timestamps to obtain KV data of the same row data KEY with the latest partially changed column field value, and merging the KV data of the same row data KEY.
When the data is changed in the field value of a part of columns of the data rows of the database table, only the changed data column values are stored, but not the data of the whole row, so that the storage IO of the data change can be effectively reduced, and the overall performance of the data change is improved; when a part of fields of a data row are inquired, only data containing the required fields need to be read, and when the whole row of data is inquired, the data rows of a plurality of part fields can be merged and returned according to the inquiry requirement, so that the data reading IO is reduced; the method can also combine partial column value data into complete row data, realize the reading optimization of the whole row data and improve the performance of the whole row data query.
In a specific application, when the database is optimized to store data in some embodiments of the method of the present invention, the data column values of the changes are stored by using the data row KV storage coding model.
In the data row KV storage coding model, a KEY is taken as an example, and as shown in fig. 1, the KEY may be divided into two parts, namely a "row data KEY" and a "timestamp", where the "row data KEY" includes a table ID, an index ID and a binary code of a primary KEY value, and is used to uniquely identify a certain row of data, and an additional timestamp indicates a write-in time of the row data, and may be used for multi-version management of the data.
The VALUE includes binary codes of other data list fields except the primary key list, including:
a CRC check VALUE for checking whether the VALUE has an error;
the full data identifier is used for indicating whether the current VALUE contains the VALUEs of all the data list fields;
column value data including values of column ID, column type, and column field, and various types of data are uniformly encoded into a binary format.
When new data is written in and a row of new data is inserted into a database table, a primary KEY VALUE is coded into KEY according to a data row KV storage coding model, a timestamp is recorded as the current time, other columns are coded into VALUE, and a full data identifier is set to be 1 to indicate that the current KV is a complete data row;
when data is modified, if the column fields of the whole row of data are modified, the encoding mode is the same as that of the newly added data, if only part of the column field VALUEs are changed, the full data identifier is 0, only the modified column field VALUEs are encoded into the VALUE, and the fact that the current KV data only contain part of the column field VALUEs is indicated.
When the KV data of a data row is queried according to a database table, forward sorting from small to large can be performed according to the size of the binary coded byte of the row data KEY, such as table ID, index ID, and primary KEY value, and reverse sorting according to the timestamp from large to small, which can ensure that the KV data of the latest modified partial field column value of the same row data KEY is read preferentially, wherein when reading a row of data, for the same row data KEY, the KV data with the total data identifier of 1 or all the KV data are read until the column field value required by the client is read iteratively according to the sorting rule.
When the KV data are merged, a background thread can be started, merging operation is carried out on the bottom-layer storage files, and multiple partial field KV values of the same row data KEY are merged into a complete column field KV value during merging. The merge rule is as follows:
in the combined VALUE, the total data identifier is 1;
traversing KV of KEY in the same row of data from KV of which the total data identification is 0 until KV of which the total data identification is 1 is read, and then merging the traversed KV, wherein a KEY timestamp is the timestamp of the latest KEY, field column VALUEs in VALUE are merged, and each column takes the VALUE of the latest timestamp;
each row of data KEY may retain multiple time-stamped versions of "full data" VALUE for multi-version management, and delete old versions of KV when garbage reclamation needs to be performed. An example of data consolidation is illustrated with reference to fig. 3.
Meanwhile, the invention also provides a system for optimizing and storing data in a database, which comprises a data line KV storage coding model module,
the data row KV storage coding model module writes data according to a database table, and stores the changed data column VALUEs by using the data row KV storage coding model, wherein the complete data row is recorded by using the main KEY column VALUE KEY of the data row KV storage coding model, and only the field VALUEs of partial changed columns of the data row are recorded by using other column VALUEs VALUE of the data row KV storage coding model;
inquiring and merging KV data of data rows according to a database table, reading a main KEY column value KEY of the data rows by using a data row KV storage coding model, acquiring row data KEY and timestamps, sequencing according to the row data KEY and the timestamps to obtain KV data of the same row data KEY with the latest partially changed column field value, and merging the KV data of the same row data KEY.
The information interaction, execution process and other contents between the modules in the system are based on the same concept as the method embodiment of the present invention, and specific contents can be referred to the description in the method embodiment of the present invention, and are not described herein again.
Similarly, the system can only store the changed data column value instead of the whole row of data when the data is changed by the partial column field value of the data row of the database table, so that the storage IO of data change can be effectively reduced.
It should be noted that not all steps and modules in the above flows and system structures are necessary, and some steps or modules may be omitted according to actual needs. The execution order of the steps is not fixed and can be adjusted as required. The system structure described in the above embodiments may be a physical structure or a logical structure, that is, some modules may be implemented by the same physical entity, or some modules may be implemented by a plurality of physical entities, or some components in a plurality of independent devices may be implemented together.
The above-mentioned embodiments are merely preferred embodiments for fully illustrating the present invention, and the scope of the present invention is not limited thereto. The equivalent substitution or change made by the technical personnel in the technical field on the basis of the invention is all within the protection scope of the invention. The protection scope of the invention is subject to the claims.

Claims (10)

1. A method for optimizing data storage of a database is characterized in that data are written in according to a database table, a data row KV storage coding model is used for storing changed data column VALUEs, wherein a complete data row is recorded through a main KEY column VALUE KEY of the data row KV storage coding model, and only partial changed column field VALUEs of the data row are recorded through other column VALUEs VALUE of the data row KV storage coding model;
inquiring and merging KV data of data rows according to a database table, reading a main KEY column value KEY of the data rows by using a data row KV storage coding model, acquiring row data KEY and timestamps, sequencing according to the row data KEY and the timestamps to obtain KV data of the same row data KEY with the latest partially changed column field value, and merging the KV data of the same row data KEY.
2. The method for optimized storage of data in database as claimed in claim 1, wherein said recording of the complete data row by the primary KEY column value KEY of the data row KV storage coding model comprises:
when a row of new data is inserted into the database table, the main KEY column VALUE is coded into KEY according to a data row KV storage coding model, other column VALUEs are coded into VALUE, the timestamp is recorded as the current time, and the total data identifier is set to be 1 to indicate that the current KV data is a complete data row.
3. The method for optimizing storage data of the database according to claim 1 or 2, wherein the recording of only partial change column field VALUEs of the data rows by other column VALUEs VALUE of the data row KV storage coding model comprises:
when a row of data partial column field VALUEs of the database table are changed, only the modified column field VALUEs are encoded into VALUE, and the full data identification is set to 0, which means that the current KV data only contains partial column field VALUEs.
4. The method as claimed in claim 1, wherein the obtaining KV data of the same row data KEY by sorting according to the row data KEY and the timestamp comprises:
and carrying out forward sequencing from small to large according to the size of binary coding bytes of the row data KEY, carrying out reverse sequencing from large to small according to the timestamp, and reading KV data of the same row data KEY with the latest partially changed column field value.
5. The method as claimed in claim 3, wherein the merging the KV data of the same row data KEY comprises:
merging the KV data of the KEY data of the same row of data according to a merging rule, wherein the merging rule comprises the following steps:
starting from the KV data with the full data identifier of 0, traversing the KV data of the same row of data KEY until the KV data with the full data identifier of 1 is read, merging the traversed KV data, wherein the timestamp of KEY is the timestamp of the latest KEY, merging field column VALUEs in VALUE, and each field column takes the VALUE of the latest timestamp.
6. A system for optimizing data storage in database is composed of KV data storage and coding model module,
the data row KV storage coding model module writes data according to a database table, and stores the changed data column VALUEs by using the data row KV storage coding model, wherein the complete data row is recorded by using the main KEY column VALUE KEY of the data row KV storage coding model, and only the field VALUEs of partial changed columns of the data row are recorded by using other column VALUEs VALUE of the data row KV storage coding model;
inquiring and merging KV data of data rows according to a database table, reading a main KEY column value KEY of the data rows by using a data row KV storage coding model, acquiring row data KEY and timestamps, sequencing according to the row data KEY and the timestamps to obtain KV data of the same row data KEY with the latest partially changed column field value, and merging the KV data of the same row data KEY.
7. The system for optimized storage of data in database as claimed in claim 6, wherein the data row KV storage coding model module records complete data row by the KEY value KEY of the data row KV storage coding model, comprising:
when a row of new data is inserted into the database table, the main KEY column VALUE is coded into KEY according to a data row KV storage coding model, other column VALUEs are coded into VALUE, the timestamp is recorded as the current time, and the total data identifier is set to be 1 to indicate that the current KV data is a complete data row.
8. The system for optimized storage of data in database as claimed in claim 6 or 7, wherein the data row KV storage coding model module records only partial changed column field VALUEs of data rows by using other column VALUEs VALUE of data row KV storage coding model, comprising:
when a row of data partial column field VALUEs of the database table are changed, only the modified column field VALUEs are encoded into VALUE, and the full data identification is set to 0, which means that the current KV data only contains partial column field VALUEs.
9. The system of claim 6, wherein the data row KV memory coding model module obtains KV data of the same row data KEY by sorting according to the row data KEY and the timestamp, and the system comprises:
and carrying out forward sequencing from small to large according to the size of binary coding bytes of the row data KEY, carrying out reverse sequencing from large to small according to the timestamp, and reading KV data of the same row data KEY with the latest partially changed column field value.
10. The system of claim 8, wherein the data row KV storage coding model module merges KV data of the same row data KEY, and comprises:
merging the KV data of the KEY data of the same row of data according to a merging rule, wherein the merging rule comprises the following steps:
starting from the KV data with the full data identifier of 0, traversing the KV data of the same row of data KEY until the KV data with the full data identifier of 1 is read, merging the traversed KV data, wherein the timestamp of KEY is the timestamp of the latest KEY, merging field column VALUEs in VALUE, and each field column takes the VALUE of the latest timestamp.
CN202111472678.6A 2021-12-06 2021-12-06 Method for optimizing data storage of database Pending CN114048192A (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN202111472678.6A CN114048192A (en) 2021-12-06 2021-12-06 Method for optimizing data storage of database

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN202111472678.6A CN114048192A (en) 2021-12-06 2021-12-06 Method for optimizing data storage of database

Publications (1)

Publication Number Publication Date
CN114048192A true CN114048192A (en) 2022-02-15

Family

ID=80212364

Family Applications (1)

Application Number Title Priority Date Filing Date
CN202111472678.6A Pending CN114048192A (en) 2021-12-06 2021-12-06 Method for optimizing data storage of database

Country Status (1)

Country Link
CN (1) CN114048192A (en)

Similar Documents

Publication Publication Date Title
CN101751406B (en) Method and device for realizing column storage based relational database
CN103733195B (en) Manage the storage of the data for the search based on range
KR101708261B1 (en) Managing storage of individually accessible data units
US20070005874A1 (en) File system storing transaction records in flash-like media
US20070005615A1 (en) File system having inverted hierarchical structure
CN108062358B (en) Offline recovery method for deletion records of Innodb engine and storage medium
CN1983267A (en) File system having deferred verification of data integrity
CN105027071B (en) Manage the operation to data storage unit
WO2005086003A1 (en) Database system
CN105912687A (en) Mass distributed database memory cell
US20170242872A1 (en) Method for reliable and efficient filesystem metadata conversion
CN116257523A (en) Column type storage indexing method and device based on nonvolatile memory
CN105183949A (en) Railway main data cleaning method and system
CN106897174B (en) Fragment recovery method for MYSQL database
US20120317384A1 (en) Data storage method
CN111694853B (en) Data increment collection method and device based on lineage, storage medium and electronic equipment
CN114048192A (en) Method for optimizing data storage of database
Kircher et al. Efficient structural bulk updates on the pre/dist/size XML encoding
CN113360495B (en) Database query interruption recovery method, device, equipment and readable medium
CN114217741A (en) Storage method of storage device and storage device
CN111767436A (en) HASH index data storage and reading method and system
CN112286948B (en) Data storage method, data reading method and data storage device of time sequence database
RU2008120913A (en) MULTIDIMENSIONAL DATABASE AND METHOD OF MANAGING THE MULTIDIMENSIONAL DATABASE
US11928084B2 (en) Metadata store in multiple reusable append logs
CN111459949B (en) Data processing method, device and equipment for database and index updating method

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination
TA01 Transfer of patent application right

Effective date of registration: 20221214

Address after: Room 305-22, Building 2, No. 1158 Zhangdong Road and No. 1059 Dangui Road, China (Shanghai) Pilot Free Trade Zone, Pudong New Area, Shanghai, 200120

Applicant after: Shanghai Yunxi Technology Co.,Ltd.

Address before: 250100 No. 1036 Tidal Road, Jinan High-tech Zone, Shandong Province, S01 Building, Tidal Science Park

Applicant before: Inspur cloud Information Technology Co.,Ltd.

TA01 Transfer of patent application right