CN113778338A

CN113778338A - Distributed storage data reading efficiency optimization method, system, device and medium

Info

Publication number: CN113778338A
Application number: CN202111067855.2A
Authority: CN
Inventors: 储飞; 王伟哲; 贺岩; 张海亮; 王松楠
Original assignee: Beijing Dongfang Jinxin Technology Co ltd
Current assignee: Beijing Dongfang Jinxin Technology Co ltd
Priority date: 2021-09-13
Filing date: 2021-09-13
Publication date: 2021-12-10

Abstract

The invention relates to a method, a system, equipment and a medium for optimizing the reading efficiency of distributed storage data, wherein the method comprises the following steps: initializing an SSD, and dividing an SSD space into a main partition and an elimination partition which are respectively used for storing main data and duplicate data; receiving an upper layer IO request, and reading and writing IO data according to the received IO request; and after the preset trigger condition is reached, writing back the data stored in the SSD eliminated partition to the HDD. According to the invention, the SSD is divided into the two logic areas of the main partition and the obsolete partition, so that the storage space of the main data in the SSD can be effectively increased, and more main data can be stored. Meanwhile, the occupation ratio of the dirty data stored in the elimination partition by the SSD is reasonably set, and the HDD can write back the dirty data stored in the SSD elimination partition in time, so that the data reading efficiency is improved. The invention can be widely applied to the field of data reading.

Description

Distributed storage data reading efficiency optimization method, system, device and medium

Technical Field

The invention relates to a method, a system, equipment and a medium for optimizing reading efficiency of distributed storage data, and belongs to the technical field of databases.

Background

In the application of realizing separation of calculation and storage of a database, data is stored in a distributed storage cluster, the data reading and writing efficiency of the database is directly influenced by the reading and writing efficiency of a storage layer, and in order to improve the reading and writing efficiency of the storage layer, media like a Solid State Disk (SSD) are generally used as a cache of a Hard Disk Drive (Hard Disk Drive) Hard Disk in the existing distributed storage application, so that an acceleration effect is achieved. In order to ensure the reliability of stored data, the industry basically stores the primary and secondary copies of IO data on different nodes in a multi-copy manner; meanwhile, for IO data, no matter the master and the copy, SSD is advanced, and the purpose of rapidly responding to the upper layer request is achieved; and the background writes the data of the SSD back to the HDD in a write-back mode so as to release the space of the SSD to store new IO data.

However, as shown in fig. 1, SSDs all have a capacity limit (e.g., 1TB), and if calculated as three copies of data, the amount of main data in the SSD is only 1/3 of the SSD capacity, while the copies are 2/3 of the SSD capacity. In a distributed storage system, an upper layer IO request only reads main data, a storage layer is required to quickly respond to the read request, and it is desirable that all data to be read are on an SSD as much as possible, but only 1/3 valid data can be read from the SSD at present, 2/3 space is occupied by copy data, and the read efficiency can be improved by only 30% by using the SSD.

If the copy data is not written into the SSD but is directly written into the HDD, all the space of the SSD can be used for storing the main data, but resource contention can occur among the upper-layer random read IO, the random write IO (random IO for directly writing HDD with a large number of copies) and the IO written back into the HDD, so that the IO response rate is integrally slowed down, and the influence on upper-layer services is large.

Disclosure of Invention

In view of the foregoing problems, an object of the present invention is to provide a method, a system, a device, and a medium for optimizing read efficiency of distributed storage data, which can utilize an SSD to store more master data by reasonably allocating storage space of the SSD to improve read efficiency.

In order to achieve the purpose, the invention adopts the following technical scheme:

in a first aspect of the present invention, a distributed storage data reading efficiency optimization method is provided, including:

initializing an SSD, and dividing an SSD space into a main partition and an elimination partition which are respectively used for storing main data and duplicate data;

receiving an upper layer IO request, and storing IO data into an SSD according to the received IO request, or reading corresponding data from the SSD and/or the HDD;

and after the preset trigger condition is reached, writing the copy data stored in the SSD eliminated partition back to the HDD.

Preferably, the initializing the SSD includes setting an elimination partition proportion and an elimination partition brushing water level proportion.

Preferably, the proportion of the elimination subarea to the brushing water level of the elimination subarea is 20 percent and 40 percent respectively.

Preferably, the method for receiving an upper IO request and reading and writing IO data according to the received IO request includes:

judging the type of the received upper layer IO request:

if the IO request is written in the front end, storing the IO data to be written in a corresponding partition of the SSD, and updating an index tree and a bitmap in a memory based on the IO data; the bitmap comprises a first bitmap and a second bitmap which are respectively used for indicating whether data are stored in each sector of eliminated data and each sector of non-eliminated data of the HDD or not;

and if the IO request is read by the front end, generating a data index key based on the received IO request, searching an index tree in the memory, and reading corresponding data from the SSD or the HDD according to a searching result.

Preferably, the method for storing the IO data to be written in the corresponding partition of the SSD and updating the index tree and bitmap in the memory based on the IO data includes:

judging the IO request mark, if the IO data needing to be written currently is main data, storing the IO data into a main partition of the SSD, and if the IO data needing to be written currently is duplicate data, storing the IO data into a obsolete partition of the SSD;

generating a data index key, and updating an index tree in the memory based on the data index key;

and updating the corresponding bitmap in the memory according to the initial offset and the size of the current IO data.

Preferably, the method for generating a data index key based on the received IO request, searching the index tree, and reading corresponding data from the SSD or the HDD according to the search result includes:

generating a data index key according to the IO request;

and traversing the index tree based on the generated data index key, and reading the data from the SSD if the index tree can be found, or reading the data from the HDD if the index tree cannot be found.

Preferably, the method for writing back the copy data stored in the SSD obsolete partition to the HDD after the preset trigger condition is reached includes the following steps:

traversing the first bitmap and searching each section of the eliminated data of the HDD;

judging whether each corresponding section of the HDD eliminated data has dirty data, if not, entering the step three, otherwise entering the step five;

traversing the second bitmap and searching each section of the non-eliminated data of the HDD;

judging whether each corresponding section of the non-eliminated data of the HDD has dirty data, if so, entering the fifth step, otherwise, returning to the first step;

traversing the index tree to search the data index key;

reading the data on the SSD according to the data index key;

write the data read on the SSD back to the HDD according to the data index key;

judging whether the dirty data volume proportion stored in the SSD elimination subarea is lower than the preset elimination subarea back-brushing water level proportion, if so, finishing the back-writing, otherwise, returning to the step I.

In a second aspect of the present invention, a distributed storage data reading efficiency optimization system is provided, including:

the SSD initialization module is used for initializing the SSD, dividing the SSD space into a main partition and a obsolete partition, and storing main data and duplicate data respectively;

the IO request receiving and executing module is used for receiving an upper layer IO request, storing IO data into the SSD according to the received IO request, or reading corresponding data from the SSD and/or the HDD;

and the data write-back module is used for writing back the data stored in the SSD eliminated partition to the HDD after a preset trigger condition is reached, and releasing the SSD space for writing in the new IO data.

In a third aspect of the present invention, a processing device is provided, which at least includes a processor and a memory, where the memory stores a computer program, and the processor executes the computer program to implement the steps of the distributed storage data reading efficiency optimization method.

A fourth aspect of the present invention provides a computer storage medium having computer readable instructions stored thereon, the computer readable instructions being executable by a processor to implement the steps of the distributed storage data reading efficiency optimization method according to any one of claims 1 to 7.

Due to the adoption of the technical scheme, the invention has the following advantages:

1) according to the invention, the SSD is divided into two logic areas, namely the main partition and the elimination partition, the main data is stored in the main partition, and the duplicate data is stored in the elimination partition, so that the storage space of the main data in the SSD can be effectively increased, and more main data can be stored.

2) The dirty data volume ratio stored in the elimination partition of the SSD is reasonably set, the dirty data volume stored in the elimination partition of the SSD can be written back in time, and the data reading efficiency is improved.

3) The invention establishes an index tree mechanism, judges whether each zone stores dirty data by dividing the HDD zone and setting bitmap, and further improves the data reading efficiency.

Therefore, the method can be widely applied to the field of optimization of database data reading efficiency.

Drawings

FIG. 1 is a schematic diagram of existing SSD capacities;

FIG. 2 is a flowchart of a method for optimizing distributed storage data reading efficiency according to an embodiment of the present invention;

fig. 3 is a schematic diagram illustrating a ratio setting of SSD obsolete partitions according to an embodiment of the present invention;

FIG. 4 is a schematic diagram illustrating the setting of the ratio of the water level of the back-brushing in the elimination sub-area according to the embodiment of the present invention;

fig. 5 is a schematic diagram of a front-end write IO request processing flow provided in the embodiment of the present invention;

FIGS. 6(a) and 6(b) illustrate an index tree update process provided by an embodiment of the present invention;

FIG. 7 is a schematic diagram of HDD zone partitioning provided by an embodiment of the present invention;

fig. 8 is a schematic diagram illustrating a front-end read IO request processing flow according to an embodiment of the present invention;

FIG. 9 is a flowchart illustrating a write back task according to an embodiment of the present invention.

Detailed Description

The invention is described in detail below with reference to the figures and examples.

Example 1

As shown in fig. 2, the method for optimizing the reading efficiency of distributed storage data provided in this embodiment includes the following steps:

1) initializing the SSD, and dividing the SSD space into a main partition and a obsolete partition which are respectively used for storing main data and duplicate data.

It should be noted that, in the embodiment of the present invention, the main partition of the SSD space is used to store the master data, and the obsolete partition is used to store the copy data.

Specifically, when the SSD is initialized, the embodiment of the present invention mainly includes setting the occupation ratio of the elimination partition and the occupation ratio of the brushing water level of the elimination partition.

As shown in fig. 3, the obsolete partition occupancy refers to the percentage of obsolete partitions in the total SSD space capacity.

Alternatively, for an SSD with a total capacity of 1TB, the obsolete partition percentage may be set to 20%, then 800GB of space in the SSD is used as the main partition for storing the main data, and the remaining 200G of space is used as the obsolete partition for storing the duplicate data.

As shown in fig. 4, the obsolete partition refresh bit ratio is a percentage of the dirty data amount to the total capacity of the obsolete partition, and when the dirty data amount of the obsolete partition reaches the obsolete partition refresh bit ratio, the SSD data starts to be refreshed back to the HDD. The dirty data refers to data which is still on the SSD and is not brushed back to the HDD; if the data is flushed back to the HDD, the SSD space it occupies can be later freed up for subsequent IO use.

Optionally, in the embodiment of the present invention, the back-brushing water level ratio of the eliminated partition is set to 40%, because if the back-brushing water level ratio of the eliminated partition is too low, and if the amount of IO at the front end is large, the back-brushing operation of the HDD is too frequent, which may cause resource contention of the read/write IO and further affect the overall performance; if the proportion of the water level of the obsolete subarea to the refresh water level is too high, a large amount of dirty data cannot be refreshed timely, and the refresh rate cannot keep up with the front-end IO rate, so that the duplicate data cannot be written into the SSD timely and upper-layer services are blocked.

2) And receiving an upper layer IO request, and storing IO data into the SSD according to the received IO request, or reading corresponding data from the SSD and/or the HDD.

Specifically, the step 2) comprises the following steps:

2.1) judging the type of the received upper layer IO request, if the request is a front-end IO write request, entering the step 2.2), and if not, entering the step 2.3).

It should be noted that, in the embodiment of the present invention, each IO request issued by the upper layer is required to indicate whether current IO data is master data or replica data.

2.2) as shown in fig. 5, storing the IO data to be written into the corresponding partition of the SSD, and updating the index tree and bitmap in the memory based on the IO data.

Specifically, the method comprises the following steps:

2.2.1) judging the IO request mark, if the IO data needing to be written currently is main data, storing the IO data into a main partition of the SSD, and if the IO data needing to be written currently is copy data, storing the IO data into a obsolete partition of the SSD.

Optionally, when storing the IO data, the method includes: and according to the size of the received IO data, applying for a corresponding space on the SSD and storing the space in a corresponding partition.

2.2.2) generating a data index key, and updating the index tree in the memory based on the data index key.

As shown in fig. 6(a) and fig. 6(b), the embodiment of the present invention maintains an index tree in the memory, and the index tree is used to quickly locate the data index key according to the start offset (offset) of the IO data, and then read the data on the SSD according to the data index key, for write back or responding to the upper layer read request.

Preferably, when generating the data index key according to the received IO data, the following procedure may be adopted:

Struct io_key

{

long data _ size; // io size

long hdd _ no; // hdd numbering

long ssd _ no; // ssd numbering

long offset _ hdd; starting offset of// io on hdd

long offset _ ssd; starting offset of// io on ssd

Cool is _ dirty; // whether dirty data

}

2.2.3) updating the corresponding bitmap according to the offset (offset) and the size (size) of the current IO data in the memory.

As shown in fig. 7, in the embodiment of the present invention, the HDD capacity is partitioned into sectors according to 64MB, each sector corresponds to one bit, a bit of 1 indicates that the sector has dirty data, and a bit of 0 indicates that the sector has been written back or has no dirty data; preferably, in the embodiment of the present invention, two sets of bitmaps are maintained in the memory, where the first bitmap and the second bitmap are respectively used to indicate whether each segment of the HDD obsolete data and each segment of the non-obsolete data store data, and different types of bitmaps are respectively updated according to the front-end IO data type (master/copy).

And 2.3) generating a data index key based on the received IO request, searching the index tree in the memory, and reading corresponding data from the SSD or the HDD according to the searching result.

Specifically, as shown in fig. 8, the method includes the following steps:

2.3.1) generating a data index key based on the received IO request.

2.3.2) traversing the index tree in the memory based on the generated data index key, if the index tree can be found, reading the data from the SSD, otherwise, reading the data from the HDD.

3) And after the preset triggering condition is reached, the copy data stored in the SSD eliminated partition is written back to the HDD, and the SSD space is released for writing the new IO data.

As shown in fig. 9, when the dirty data amount stored in the SSD elimination partition reaches the preset elimination partition refresh water level ratio, the method of writing back the data stored in the elimination partition to the HDD includes:

3.1) traversing the first bitmap, and searching each section of the eliminated HDD data;

3.2) judging whether each corresponding section of the HDD eliminated data has dirty data, if not, entering step 3.3), otherwise, entering step 3.5);

3.3) traversing the second bitmap, and searching each section of the non-eliminated data of the HDD;

3.4) judging whether each corresponding section of the HDD non-obsolete data has dirty data, if so, entering the step 3.5), otherwise, returning to the step 3.1);

3.5) traversing the index tree to search the data index key;

3.6) reading the data on the SSD according to the data index key;

3.7) writing the data read from the SSD back to the HDD according to the data index key;

3.8) judging whether the dirty data volume ratio stored in the SSD elimination partition is lower than the preset elimination partition back-brushing water level ratio, if so, ending the back-writing, otherwise, returning to the step 3.1).

Example 2

Correspondingly, the embodiment 1 provides a distributed storage data reading efficiency optimization system. The system provided in this embodiment may implement the distributed storage data reading efficiency optimization method in embodiment 1, and the system may be implemented by software, hardware, or a combination of software and hardware. For example, the system may comprise integrated or separate functional modules or functional units to perform the corresponding steps in the methods of embodiment 1. Since the identification system of this embodiment is basically similar to the method embodiment, the description process of this embodiment is relatively simple, and reference may be made to the partial description of embodiment 1 for relevant points, and the embodiment of the system of this embodiment is only schematic.

The distributed storage data reading efficiency optimization system provided by the embodiment includes:

Example 3

This embodiment provides a processing device corresponding to the method for optimizing efficiency of reading distributed storage data provided in embodiment 1, where the processing device may be a processing device for a client, such as a mobile phone, a notebook computer, a tablet computer, a desktop computer, and the like, to execute the method of embodiment 1.

The processing equipment comprises a processor, a memory, a communication interface and a bus, wherein the processor, the memory and the communication interface are connected through the bus so as to complete mutual communication. The memory stores a computer program that can be executed on the processor, and the processor executes the distributed storage data reading efficiency optimization method provided by embodiment 1 when executing the computer program.

In some implementations, the Memory may be a high-speed Random Access Memory (RAM), and may also include a non-volatile Memory, such as at least one disk Memory.

In other implementations, the processor may be various general-purpose processors such as a Central Processing Unit (CPU), a Digital Signal Processor (DSP), and the like, and is not limited herein.

Example 4

The method for optimizing efficiency of reading data from distributed storage according to embodiment 1 may be embodied as a computer program product, and the computer program product may include a computer readable storage medium on which computer readable program instructions for executing the method for optimizing efficiency of reading data from distributed storage according to embodiment 1 are loaded.

The computer readable storage medium may be a tangible device that retains and stores instructions for use by an instruction execution device. The computer readable storage medium may be, for example, but not limited to, an electronic memory device, a magnetic memory device, an optical memory device, an electromagnetic memory device, a semiconductor memory device, or any combination of the foregoing.

It should be noted that the flowchart and block diagrams in the figures illustrate the architecture, functionality, and operation of possible implementations of systems, methods and computer program products according to various embodiments of the present application. Each block in the flowchart or block diagrams may represent a module, segment, or portion of code, which comprises one or more executable instructions for implementing the specified logical function(s).

Finally, it should be noted that: the above embodiments are only for illustrating the technical solutions of the present invention and not for limiting the same, and although the present invention is described in detail with reference to the above embodiments, those of ordinary skill in the art should understand that: modifications and equivalents may be made to the embodiments of the invention without departing from the spirit and scope of the invention, which is to be covered by the claims.

The above embodiments are only used for illustrating the present invention, and the structure, connection mode, manufacturing process, etc. of the components may be changed, and all equivalent changes and modifications performed on the basis of the technical solution of the present invention should not be excluded from the protection scope of the present invention.

Claims

1. A distributed storage data reading efficiency optimization method is characterized by comprising the following steps:

2. The method according to claim 1, wherein initializing the SSD includes setting a dead zone proportion and a dead zone refresh level proportion.

3. The distributed storage data reading efficiency optimization method according to claim 2, wherein the elimination partition proportion and the elimination partition back-brushing water level proportion are 20% and 40%, respectively.

4. The method for optimizing distributed storage data reading efficiency according to claim 2, wherein the method for receiving an upper layer IO request and reading and writing IO data according to the received IO request comprises:

judging the type of the received upper layer IO request:

5. The method for optimizing the reading efficiency of the distributed storage data according to claim 4, wherein the method for storing the IO data to be written into the corresponding partition of the SSD and updating the index tree and the bitmap in the memory based on the IO data comprises:

6. The method for optimizing the reading efficiency of the distributed storage data according to claim 4, wherein the method for generating the data index key based on the received IO request, searching the index tree, and reading the corresponding data from the SSD or the HDD according to the search result comprises:

generating a data index key according to the IO request;

7. The method for optimizing the reading efficiency of the distributed storage data according to claim 4, wherein the method for writing the copy data stored in the SSD obsolete partition back to the HDD after the preset trigger condition is reached comprises the following steps:

traversing the index tree to search the data index key;

reading the data on the SSD according to the data index key;

write the data read on the SSD back to the HDD according to the data index key;

8. A distributed storage data reading efficiency optimization system, comprising:

9. A processing device comprising at least a processor and a memory, the memory having stored thereon a computer program, characterized in that the steps of the distributed storage data reading efficiency optimization method of any one of claims 1 to 7 are performed by the processor when executing the computer program.

10. A computer storage medium having computer readable instructions stored thereon which are executable by a processor to perform the steps of the distributed storage data reading efficiency optimization method according to any one of claims 1 to 7.