CN113778338A - Distributed storage data reading efficiency optimization method, system, device and medium - Google Patents

Distributed storage data reading efficiency optimization method, system, device and medium Download PDF

Info

Publication number
CN113778338A
CN113778338A CN202111067855.2A CN202111067855A CN113778338A CN 113778338 A CN113778338 A CN 113778338A CN 202111067855 A CN202111067855 A CN 202111067855A CN 113778338 A CN113778338 A CN 113778338A
Authority
CN
China
Prior art keywords
data
ssd
partition
hdd
request
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Pending
Application number
CN202111067855.2A
Other languages
Chinese (zh)
Inventor
储飞
王伟哲
贺岩
张海亮
王松楠
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Beijing Dongfang Jinxin Technology Co ltd
Original Assignee
Beijing Dongfang Jinxin Technology Co ltd
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Beijing Dongfang Jinxin Technology Co ltd filed Critical Beijing Dongfang Jinxin Technology Co ltd
Priority to CN202111067855.2A priority Critical patent/CN113778338A/en
Publication of CN113778338A publication Critical patent/CN113778338A/en
Pending legal-status Critical Current

Links

Images

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F3/00Input arrangements for transferring data to be processed into a form capable of being handled by the computer; Output arrangements for transferring data from processing unit to output unit, e.g. interface arrangements
    • G06F3/06Digital input from, or digital output to, record carriers, e.g. RAID, emulated record carriers or networked record carriers
    • G06F3/0601Interfaces specially adapted for storage systems
    • G06F3/0602Interfaces specially adapted for storage systems specifically adapted to achieve a particular effect
    • G06F3/0608Saving storage space on storage systems
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F16/00Information retrieval; Database structures therefor; File system structures therefor
    • G06F16/20Information retrieval; Database structures therefor; File system structures therefor of structured data, e.g. relational data
    • G06F16/22Indexing; Data structures therefor; Storage structures
    • G06F16/2228Indexing structures
    • G06F16/2246Trees, e.g. B+trees
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F3/00Input arrangements for transferring data to be processed into a form capable of being handled by the computer; Output arrangements for transferring data from processing unit to output unit, e.g. interface arrangements
    • G06F3/06Digital input from, or digital output to, record carriers, e.g. RAID, emulated record carriers or networked record carriers
    • G06F3/0601Interfaces specially adapted for storage systems
    • G06F3/0602Interfaces specially adapted for storage systems specifically adapted to achieve a particular effect
    • G06F3/061Improving I/O performance
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F3/00Input arrangements for transferring data to be processed into a form capable of being handled by the computer; Output arrangements for transferring data from processing unit to output unit, e.g. interface arrangements
    • G06F3/06Digital input from, or digital output to, record carriers, e.g. RAID, emulated record carriers or networked record carriers
    • G06F3/0601Interfaces specially adapted for storage systems
    • G06F3/0628Interfaces specially adapted for storage systems making use of a particular technique
    • G06F3/0629Configuration or reconfiguration of storage systems
    • G06F3/0631Configuration or reconfiguration of storage systems by allocating resources to storage systems
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F3/00Input arrangements for transferring data to be processed into a form capable of being handled by the computer; Output arrangements for transferring data from processing unit to output unit, e.g. interface arrangements
    • G06F3/06Digital input from, or digital output to, record carriers, e.g. RAID, emulated record carriers or networked record carriers
    • G06F3/0601Interfaces specially adapted for storage systems
    • G06F3/0628Interfaces specially adapted for storage systems making use of a particular technique
    • G06F3/0638Organizing or formatting or addressing of data
    • G06F3/0644Management of space entities, e.g. partitions, extents, pools
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F3/00Input arrangements for transferring data to be processed into a form capable of being handled by the computer; Output arrangements for transferring data from processing unit to output unit, e.g. interface arrangements
    • G06F3/06Digital input from, or digital output to, record carriers, e.g. RAID, emulated record carriers or networked record carriers
    • G06F3/0601Interfaces specially adapted for storage systems
    • G06F3/0668Interfaces specially adapted for storage systems adopting a particular infrastructure
    • G06F3/067Distributed or networked storage systems, e.g. storage area networks [SAN], network attached storage [NAS]

Abstract

The invention relates to a method, a system, equipment and a medium for optimizing the reading efficiency of distributed storage data, wherein the method comprises the following steps: initializing an SSD, and dividing an SSD space into a main partition and an elimination partition which are respectively used for storing main data and duplicate data; receiving an upper layer IO request, and reading and writing IO data according to the received IO request; and after the preset trigger condition is reached, writing back the data stored in the SSD eliminated partition to the HDD. According to the invention, the SSD is divided into the two logic areas of the main partition and the obsolete partition, so that the storage space of the main data in the SSD can be effectively increased, and more main data can be stored. Meanwhile, the occupation ratio of the dirty data stored in the elimination partition by the SSD is reasonably set, and the HDD can write back the dirty data stored in the SSD elimination partition in time, so that the data reading efficiency is improved. The invention can be widely applied to the field of data reading.

Description

Distributed storage data reading efficiency optimization method, system, device and medium
Technical Field
The invention relates to a method, a system, equipment and a medium for optimizing reading efficiency of distributed storage data, and belongs to the technical field of databases.
Background
In the application of realizing separation of calculation and storage of a database, data is stored in a distributed storage cluster, the data reading and writing efficiency of the database is directly influenced by the reading and writing efficiency of a storage layer, and in order to improve the reading and writing efficiency of the storage layer, media like a Solid State Disk (SSD) are generally used as a cache of a Hard Disk Drive (Hard Disk Drive) Hard Disk in the existing distributed storage application, so that an acceleration effect is achieved. In order to ensure the reliability of stored data, the industry basically stores the primary and secondary copies of IO data on different nodes in a multi-copy manner; meanwhile, for IO data, no matter the master and the copy, SSD is advanced, and the purpose of rapidly responding to the upper layer request is achieved; and the background writes the data of the SSD back to the HDD in a write-back mode so as to release the space of the SSD to store new IO data.
However, as shown in fig. 1, SSDs all have a capacity limit (e.g., 1TB), and if calculated as three copies of data, the amount of main data in the SSD is only 1/3 of the SSD capacity, while the copies are 2/3 of the SSD capacity. In a distributed storage system, an upper layer IO request only reads main data, a storage layer is required to quickly respond to the read request, and it is desirable that all data to be read are on an SSD as much as possible, but only 1/3 valid data can be read from the SSD at present, 2/3 space is occupied by copy data, and the read efficiency can be improved by only 30% by using the SSD.
If the copy data is not written into the SSD but is directly written into the HDD, all the space of the SSD can be used for storing the main data, but resource contention can occur among the upper-layer random read IO, the random write IO (random IO for directly writing HDD with a large number of copies) and the IO written back into the HDD, so that the IO response rate is integrally slowed down, and the influence on upper-layer services is large.
Disclosure of Invention
In view of the foregoing problems, an object of the present invention is to provide a method, a system, a device, and a medium for optimizing read efficiency of distributed storage data, which can utilize an SSD to store more master data by reasonably allocating storage space of the SSD to improve read efficiency.
In order to achieve the purpose, the invention adopts the following technical scheme:
in a first aspect of the present invention, a distributed storage data reading efficiency optimization method is provided, including:
initializing an SSD, and dividing an SSD space into a main partition and an elimination partition which are respectively used for storing main data and duplicate data;
receiving an upper layer IO request, and storing IO data into an SSD according to the received IO request, or reading corresponding data from the SSD and/or the HDD;
and after the preset trigger condition is reached, writing the copy data stored in the SSD eliminated partition back to the HDD.
Preferably, the initializing the SSD includes setting an elimination partition proportion and an elimination partition brushing water level proportion.
Preferably, the proportion of the elimination subarea to the brushing water level of the elimination subarea is 20 percent and 40 percent respectively.
Preferably, the method for receiving an upper IO request and reading and writing IO data according to the received IO request includes:
judging the type of the received upper layer IO request:
if the IO request is written in the front end, storing the IO data to be written in a corresponding partition of the SSD, and updating an index tree and a bitmap in a memory based on the IO data; the bitmap comprises a first bitmap and a second bitmap which are respectively used for indicating whether data are stored in each sector of eliminated data and each sector of non-eliminated data of the HDD or not;
and if the IO request is read by the front end, generating a data index key based on the received IO request, searching an index tree in the memory, and reading corresponding data from the SSD or the HDD according to a searching result.
Preferably, the method for storing the IO data to be written in the corresponding partition of the SSD and updating the index tree and bitmap in the memory based on the IO data includes:
judging the IO request mark, if the IO data needing to be written currently is main data, storing the IO data into a main partition of the SSD, and if the IO data needing to be written currently is duplicate data, storing the IO data into a obsolete partition of the SSD;
generating a data index key, and updating an index tree in the memory based on the data index key;
and updating the corresponding bitmap in the memory according to the initial offset and the size of the current IO data.
Preferably, the method for generating a data index key based on the received IO request, searching the index tree, and reading corresponding data from the SSD or the HDD according to the search result includes:
generating a data index key according to the IO request;
and traversing the index tree based on the generated data index key, and reading the data from the SSD if the index tree can be found, or reading the data from the HDD if the index tree cannot be found.
Preferably, the method for writing back the copy data stored in the SSD obsolete partition to the HDD after the preset trigger condition is reached includes the following steps:
traversing the first bitmap and searching each section of the eliminated data of the HDD;
judging whether each corresponding section of the HDD eliminated data has dirty data, if not, entering the step three, otherwise entering the step five;
traversing the second bitmap and searching each section of the non-eliminated data of the HDD;
judging whether each corresponding section of the non-eliminated data of the HDD has dirty data, if so, entering the fifth step, otherwise, returning to the first step;
traversing the index tree to search the data index key;
reading the data on the SSD according to the data index key;
write the data read on the SSD back to the HDD according to the data index key;
judging whether the dirty data volume proportion stored in the SSD elimination subarea is lower than the preset elimination subarea back-brushing water level proportion, if so, finishing the back-writing, otherwise, returning to the step I.
In a second aspect of the present invention, a distributed storage data reading efficiency optimization system is provided, including:
the SSD initialization module is used for initializing the SSD, dividing the SSD space into a main partition and a obsolete partition, and storing main data and duplicate data respectively;
the IO request receiving and executing module is used for receiving an upper layer IO request, storing IO data into the SSD according to the received IO request, or reading corresponding data from the SSD and/or the HDD;
and the data write-back module is used for writing back the data stored in the SSD eliminated partition to the HDD after a preset trigger condition is reached, and releasing the SSD space for writing in the new IO data.
In a third aspect of the present invention, a processing device is provided, which at least includes a processor and a memory, where the memory stores a computer program, and the processor executes the computer program to implement the steps of the distributed storage data reading efficiency optimization method.
A fourth aspect of the present invention provides a computer storage medium having computer readable instructions stored thereon, the computer readable instructions being executable by a processor to implement the steps of the distributed storage data reading efficiency optimization method according to any one of claims 1 to 7.
Due to the adoption of the technical scheme, the invention has the following advantages:
1) according to the invention, the SSD is divided into two logic areas, namely the main partition and the elimination partition, the main data is stored in the main partition, and the duplicate data is stored in the elimination partition, so that the storage space of the main data in the SSD can be effectively increased, and more main data can be stored.
2) The dirty data volume ratio stored in the elimination partition of the SSD is reasonably set, the dirty data volume stored in the elimination partition of the SSD can be written back in time, and the data reading efficiency is improved.
3) The invention establishes an index tree mechanism, judges whether each zone stores dirty data by dividing the HDD zone and setting bitmap, and further improves the data reading efficiency.
Therefore, the method can be widely applied to the field of optimization of database data reading efficiency.
Drawings
FIG. 1 is a schematic diagram of existing SSD capacities;
FIG. 2 is a flowchart of a method for optimizing distributed storage data reading efficiency according to an embodiment of the present invention;
fig. 3 is a schematic diagram illustrating a ratio setting of SSD obsolete partitions according to an embodiment of the present invention;
FIG. 4 is a schematic diagram illustrating the setting of the ratio of the water level of the back-brushing in the elimination sub-area according to the embodiment of the present invention;
fig. 5 is a schematic diagram of a front-end write IO request processing flow provided in the embodiment of the present invention;
FIGS. 6(a) and 6(b) illustrate an index tree update process provided by an embodiment of the present invention;
FIG. 7 is a schematic diagram of HDD zone partitioning provided by an embodiment of the present invention;
fig. 8 is a schematic diagram illustrating a front-end read IO request processing flow according to an embodiment of the present invention;
FIG. 9 is a flowchart illustrating a write back task according to an embodiment of the present invention.
Detailed Description
The invention is described in detail below with reference to the figures and examples.
Example 1
As shown in fig. 2, the method for optimizing the reading efficiency of distributed storage data provided in this embodiment includes the following steps:
1) initializing the SSD, and dividing the SSD space into a main partition and a obsolete partition which are respectively used for storing main data and duplicate data.
It should be noted that, in the embodiment of the present invention, the main partition of the SSD space is used to store the master data, and the obsolete partition is used to store the copy data.
Specifically, when the SSD is initialized, the embodiment of the present invention mainly includes setting the occupation ratio of the elimination partition and the occupation ratio of the brushing water level of the elimination partition.
As shown in fig. 3, the obsolete partition occupancy refers to the percentage of obsolete partitions in the total SSD space capacity.
Alternatively, for an SSD with a total capacity of 1TB, the obsolete partition percentage may be set to 20%, then 800GB of space in the SSD is used as the main partition for storing the main data, and the remaining 200G of space is used as the obsolete partition for storing the duplicate data.
As shown in fig. 4, the obsolete partition refresh bit ratio is a percentage of the dirty data amount to the total capacity of the obsolete partition, and when the dirty data amount of the obsolete partition reaches the obsolete partition refresh bit ratio, the SSD data starts to be refreshed back to the HDD. The dirty data refers to data which is still on the SSD and is not brushed back to the HDD; if the data is flushed back to the HDD, the SSD space it occupies can be later freed up for subsequent IO use.
Optionally, in the embodiment of the present invention, the back-brushing water level ratio of the eliminated partition is set to 40%, because if the back-brushing water level ratio of the eliminated partition is too low, and if the amount of IO at the front end is large, the back-brushing operation of the HDD is too frequent, which may cause resource contention of the read/write IO and further affect the overall performance; if the proportion of the water level of the obsolete subarea to the refresh water level is too high, a large amount of dirty data cannot be refreshed timely, and the refresh rate cannot keep up with the front-end IO rate, so that the duplicate data cannot be written into the SSD timely and upper-layer services are blocked.
2) And receiving an upper layer IO request, and storing IO data into the SSD according to the received IO request, or reading corresponding data from the SSD and/or the HDD.
Specifically, the step 2) comprises the following steps:
2.1) judging the type of the received upper layer IO request, if the request is a front-end IO write request, entering the step 2.2), and if not, entering the step 2.3).
It should be noted that, in the embodiment of the present invention, each IO request issued by the upper layer is required to indicate whether current IO data is master data or replica data.
2.2) as shown in fig. 5, storing the IO data to be written into the corresponding partition of the SSD, and updating the index tree and bitmap in the memory based on the IO data.
Specifically, the method comprises the following steps:
2.2.1) judging the IO request mark, if the IO data needing to be written currently is main data, storing the IO data into a main partition of the SSD, and if the IO data needing to be written currently is copy data, storing the IO data into a obsolete partition of the SSD.
Optionally, when storing the IO data, the method includes: and according to the size of the received IO data, applying for a corresponding space on the SSD and storing the space in a corresponding partition.
2.2.2) generating a data index key, and updating the index tree in the memory based on the data index key.
As shown in fig. 6(a) and fig. 6(b), the embodiment of the present invention maintains an index tree in the memory, and the index tree is used to quickly locate the data index key according to the start offset (offset) of the IO data, and then read the data on the SSD according to the data index key, for write back or responding to the upper layer read request.
Preferably, when generating the data index key according to the received IO data, the following procedure may be adopted:
Struct io_key
{
long data _ size; // io size
long hdd _ no; // hdd numbering
long ssd _ no; // ssd numbering
long offset _ hdd; starting offset of// io on hdd
long offset _ ssd; starting offset of// io on ssd
Cool is _ dirty; // whether dirty data
}
2.2.3) updating the corresponding bitmap according to the offset (offset) and the size (size) of the current IO data in the memory.
As shown in fig. 7, in the embodiment of the present invention, the HDD capacity is partitioned into sectors according to 64MB, each sector corresponds to one bit, a bit of 1 indicates that the sector has dirty data, and a bit of 0 indicates that the sector has been written back or has no dirty data; preferably, in the embodiment of the present invention, two sets of bitmaps are maintained in the memory, where the first bitmap and the second bitmap are respectively used to indicate whether each segment of the HDD obsolete data and each segment of the non-obsolete data store data, and different types of bitmaps are respectively updated according to the front-end IO data type (master/copy).
And 2.3) generating a data index key based on the received IO request, searching the index tree in the memory, and reading corresponding data from the SSD or the HDD according to the searching result.
Specifically, as shown in fig. 8, the method includes the following steps:
2.3.1) generating a data index key based on the received IO request.
2.3.2) traversing the index tree in the memory based on the generated data index key, if the index tree can be found, reading the data from the SSD, otherwise, reading the data from the HDD.
3) And after the preset triggering condition is reached, the copy data stored in the SSD eliminated partition is written back to the HDD, and the SSD space is released for writing the new IO data.
As shown in fig. 9, when the dirty data amount stored in the SSD elimination partition reaches the preset elimination partition refresh water level ratio, the method of writing back the data stored in the elimination partition to the HDD includes:
3.1) traversing the first bitmap, and searching each section of the eliminated HDD data;
3.2) judging whether each corresponding section of the HDD eliminated data has dirty data, if not, entering step 3.3), otherwise, entering step 3.5);
3.3) traversing the second bitmap, and searching each section of the non-eliminated data of the HDD;
3.4) judging whether each corresponding section of the HDD non-obsolete data has dirty data, if so, entering the step 3.5), otherwise, returning to the step 3.1);
3.5) traversing the index tree to search the data index key;
3.6) reading the data on the SSD according to the data index key;
3.7) writing the data read from the SSD back to the HDD according to the data index key;
3.8) judging whether the dirty data volume ratio stored in the SSD elimination partition is lower than the preset elimination partition back-brushing water level ratio, if so, ending the back-writing, otherwise, returning to the step 3.1).
Example 2
Correspondingly, the embodiment 1 provides a distributed storage data reading efficiency optimization system. The system provided in this embodiment may implement the distributed storage data reading efficiency optimization method in embodiment 1, and the system may be implemented by software, hardware, or a combination of software and hardware. For example, the system may comprise integrated or separate functional modules or functional units to perform the corresponding steps in the methods of embodiment 1. Since the identification system of this embodiment is basically similar to the method embodiment, the description process of this embodiment is relatively simple, and reference may be made to the partial description of embodiment 1 for relevant points, and the embodiment of the system of this embodiment is only schematic.
The distributed storage data reading efficiency optimization system provided by the embodiment includes:
the SSD initialization module is used for initializing the SSD, dividing the SSD space into a main partition and a obsolete partition, and storing main data and duplicate data respectively;
the IO request receiving and executing module is used for receiving an upper layer IO request, storing IO data into the SSD according to the received IO request, or reading corresponding data from the SSD and/or the HDD;
and the data write-back module is used for writing back the data stored in the SSD eliminated partition to the HDD after a preset trigger condition is reached, and releasing the SSD space for writing in the new IO data.
Example 3
This embodiment provides a processing device corresponding to the method for optimizing efficiency of reading distributed storage data provided in embodiment 1, where the processing device may be a processing device for a client, such as a mobile phone, a notebook computer, a tablet computer, a desktop computer, and the like, to execute the method of embodiment 1.
The processing equipment comprises a processor, a memory, a communication interface and a bus, wherein the processor, the memory and the communication interface are connected through the bus so as to complete mutual communication. The memory stores a computer program that can be executed on the processor, and the processor executes the distributed storage data reading efficiency optimization method provided by embodiment 1 when executing the computer program.
In some implementations, the Memory may be a high-speed Random Access Memory (RAM), and may also include a non-volatile Memory, such as at least one disk Memory.
In other implementations, the processor may be various general-purpose processors such as a Central Processing Unit (CPU), a Digital Signal Processor (DSP), and the like, and is not limited herein.
Example 4
The method for optimizing efficiency of reading data from distributed storage according to embodiment 1 may be embodied as a computer program product, and the computer program product may include a computer readable storage medium on which computer readable program instructions for executing the method for optimizing efficiency of reading data from distributed storage according to embodiment 1 are loaded.
The computer readable storage medium may be a tangible device that retains and stores instructions for use by an instruction execution device. The computer readable storage medium may be, for example, but not limited to, an electronic memory device, a magnetic memory device, an optical memory device, an electromagnetic memory device, a semiconductor memory device, or any combination of the foregoing.
It should be noted that the flowchart and block diagrams in the figures illustrate the architecture, functionality, and operation of possible implementations of systems, methods and computer program products according to various embodiments of the present application. Each block in the flowchart or block diagrams may represent a module, segment, or portion of code, which comprises one or more executable instructions for implementing the specified logical function(s).
Finally, it should be noted that: the above embodiments are only for illustrating the technical solutions of the present invention and not for limiting the same, and although the present invention is described in detail with reference to the above embodiments, those of ordinary skill in the art should understand that: modifications and equivalents may be made to the embodiments of the invention without departing from the spirit and scope of the invention, which is to be covered by the claims.
The above embodiments are only used for illustrating the present invention, and the structure, connection mode, manufacturing process, etc. of the components may be changed, and all equivalent changes and modifications performed on the basis of the technical solution of the present invention should not be excluded from the protection scope of the present invention.

Claims (10)

1. A distributed storage data reading efficiency optimization method is characterized by comprising the following steps:
initializing an SSD, and dividing an SSD space into a main partition and an elimination partition which are respectively used for storing main data and duplicate data;
receiving an upper layer IO request, and storing IO data into an SSD according to the received IO request, or reading corresponding data from the SSD and/or the HDD;
and after the preset trigger condition is reached, writing the copy data stored in the SSD eliminated partition back to the HDD.
2. The method according to claim 1, wherein initializing the SSD includes setting a dead zone proportion and a dead zone refresh level proportion.
3. The distributed storage data reading efficiency optimization method according to claim 2, wherein the elimination partition proportion and the elimination partition back-brushing water level proportion are 20% and 40%, respectively.
4. The method for optimizing distributed storage data reading efficiency according to claim 2, wherein the method for receiving an upper layer IO request and reading and writing IO data according to the received IO request comprises:
judging the type of the received upper layer IO request:
if the IO request is written in the front end, storing the IO data to be written in a corresponding partition of the SSD, and updating an index tree and a bitmap in a memory based on the IO data; the bitmap comprises a first bitmap and a second bitmap which are respectively used for indicating whether data are stored in each sector of eliminated data and each sector of non-eliminated data of the HDD or not;
and if the IO request is read by the front end, generating a data index key based on the received IO request, searching an index tree in the memory, and reading corresponding data from the SSD or the HDD according to a searching result.
5. The method for optimizing the reading efficiency of the distributed storage data according to claim 4, wherein the method for storing the IO data to be written into the corresponding partition of the SSD and updating the index tree and the bitmap in the memory based on the IO data comprises:
judging the IO request mark, if the IO data needing to be written currently is main data, storing the IO data into a main partition of the SSD, and if the IO data needing to be written currently is duplicate data, storing the IO data into a obsolete partition of the SSD;
generating a data index key, and updating an index tree in the memory based on the data index key;
and updating the corresponding bitmap in the memory according to the initial offset and the size of the current IO data.
6. The method for optimizing the reading efficiency of the distributed storage data according to claim 4, wherein the method for generating the data index key based on the received IO request, searching the index tree, and reading the corresponding data from the SSD or the HDD according to the search result comprises:
generating a data index key according to the IO request;
and traversing the index tree based on the generated data index key, and reading the data from the SSD if the index tree can be found, or reading the data from the HDD if the index tree cannot be found.
7. The method for optimizing the reading efficiency of the distributed storage data according to claim 4, wherein the method for writing the copy data stored in the SSD obsolete partition back to the HDD after the preset trigger condition is reached comprises the following steps:
traversing the first bitmap and searching each section of the eliminated data of the HDD;
judging whether each corresponding section of the HDD eliminated data has dirty data, if not, entering the step three, otherwise entering the step five;
traversing the second bitmap and searching each section of the non-eliminated data of the HDD;
judging whether each corresponding section of the non-eliminated data of the HDD has dirty data, if so, entering the fifth step, otherwise, returning to the first step;
traversing the index tree to search the data index key;
reading the data on the SSD according to the data index key;
write the data read on the SSD back to the HDD according to the data index key;
judging whether the dirty data volume proportion stored in the SSD elimination subarea is lower than the preset elimination subarea back-brushing water level proportion, if so, finishing the back-writing, otherwise, returning to the step I.
8. A distributed storage data reading efficiency optimization system, comprising:
the SSD initialization module is used for initializing the SSD, dividing the SSD space into a main partition and a obsolete partition, and storing main data and duplicate data respectively;
the IO request receiving and executing module is used for receiving an upper layer IO request, storing IO data into the SSD according to the received IO request, or reading corresponding data from the SSD and/or the HDD;
and the data write-back module is used for writing back the data stored in the SSD eliminated partition to the HDD after a preset trigger condition is reached, and releasing the SSD space for writing in the new IO data.
9. A processing device comprising at least a processor and a memory, the memory having stored thereon a computer program, characterized in that the steps of the distributed storage data reading efficiency optimization method of any one of claims 1 to 7 are performed by the processor when executing the computer program.
10. A computer storage medium having computer readable instructions stored thereon which are executable by a processor to perform the steps of the distributed storage data reading efficiency optimization method according to any one of claims 1 to 7.
CN202111067855.2A 2021-09-13 2021-09-13 Distributed storage data reading efficiency optimization method, system, device and medium Pending CN113778338A (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN202111067855.2A CN113778338A (en) 2021-09-13 2021-09-13 Distributed storage data reading efficiency optimization method, system, device and medium

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN202111067855.2A CN113778338A (en) 2021-09-13 2021-09-13 Distributed storage data reading efficiency optimization method, system, device and medium

Publications (1)

Publication Number Publication Date
CN113778338A true CN113778338A (en) 2021-12-10

Family

ID=78842945

Family Applications (1)

Application Number Title Priority Date Filing Date
CN202111067855.2A Pending CN113778338A (en) 2021-09-13 2021-09-13 Distributed storage data reading efficiency optimization method, system, device and medium

Country Status (1)

Country Link
CN (1) CN113778338A (en)

Cited By (4)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN114442935A (en) * 2021-12-29 2022-05-06 天翼云科技有限公司 Method and device for scrubbing data, electronic equipment and storage medium
CN114546267A (en) * 2022-02-14 2022-05-27 深圳源创存储科技有限公司 Solid state disk based on big data calculation and solid state disk system
CN115016740A (en) * 2022-07-14 2022-09-06 杭州优云科技有限公司 Data recovery method and device, electronic equipment and storage medium
CN115544321A (en) * 2022-11-28 2022-12-30 厦门渊亭信息科技有限公司 Method and device for realizing graph database storage and storage medium

Citations (6)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN102023925A (en) * 2009-09-23 2011-04-20 翔晖科技股份有限公司 A solid state disk and the application method thereof
CN103645859A (en) * 2013-11-19 2014-03-19 华中科技大学 Disk array caching method for virtual SSD and SSD isomerous mirror image
CN106406750A (en) * 2016-08-23 2017-02-15 浪潮(北京)电子信息产业有限公司 Data operation method and system
US20170212680A1 (en) * 2016-01-22 2017-07-27 Suraj Prabhakar WAGHULDE Adaptive prefix tree based order partitioned data storage system
CN112000426A (en) * 2020-07-24 2020-11-27 新华三大数据技术有限公司 Data processing method and device
CN113138945A (en) * 2021-04-16 2021-07-20 宜通世纪科技股份有限公司 Data caching method, device, equipment and medium

Patent Citations (6)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN102023925A (en) * 2009-09-23 2011-04-20 翔晖科技股份有限公司 A solid state disk and the application method thereof
CN103645859A (en) * 2013-11-19 2014-03-19 华中科技大学 Disk array caching method for virtual SSD and SSD isomerous mirror image
US20170212680A1 (en) * 2016-01-22 2017-07-27 Suraj Prabhakar WAGHULDE Adaptive prefix tree based order partitioned data storage system
CN106406750A (en) * 2016-08-23 2017-02-15 浪潮(北京)电子信息产业有限公司 Data operation method and system
CN112000426A (en) * 2020-07-24 2020-11-27 新华三大数据技术有限公司 Data processing method and device
CN113138945A (en) * 2021-04-16 2021-07-20 宜通世纪科技股份有限公司 Data caching method, device, equipment and medium

Cited By (8)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN114442935A (en) * 2021-12-29 2022-05-06 天翼云科技有限公司 Method and device for scrubbing data, electronic equipment and storage medium
CN114442935B (en) * 2021-12-29 2023-08-04 天翼云科技有限公司 Method and device for brushing dirty data, electronic equipment and storage medium
CN114546267A (en) * 2022-02-14 2022-05-27 深圳源创存储科技有限公司 Solid state disk based on big data calculation and solid state disk system
CN114546267B (en) * 2022-02-14 2022-11-18 深圳源创存储科技有限公司 Solid state disk based on big data calculation and solid state disk system
CN115016740A (en) * 2022-07-14 2022-09-06 杭州优云科技有限公司 Data recovery method and device, electronic equipment and storage medium
CN115016740B (en) * 2022-07-14 2022-11-18 杭州优云科技有限公司 Data recovery method and device, electronic equipment and storage medium
CN115544321A (en) * 2022-11-28 2022-12-30 厦门渊亭信息科技有限公司 Method and device for realizing graph database storage and storage medium
CN115544321B (en) * 2022-11-28 2023-03-21 厦门渊亭信息科技有限公司 Method and device for realizing graph database storage and storage medium

Similar Documents

Publication Publication Date Title
US11907200B2 (en) Persistent memory management
US8549230B1 (en) Method, system, apparatus, and computer-readable medium for implementing caching in a storage system
CN113778338A (en) Distributed storage data reading efficiency optimization method, system, device and medium
US9081702B2 (en) Working set swapping using a sequentially ordered swap file
US7383290B2 (en) Transaction processing systems and methods utilizing non-disk persistent memory
CN107656834B (en) System and method for recovering host access based on transaction log and storage medium
KR20170098187A (en) Associative and atomic write-back caching system and method for storage subsystem
CN106445405B (en) Data access method and device for flash memory storage
JP2007501457A (en) Reassign ownership in a non-shared database system
CN108897642B (en) Method and device for optimizing log mechanism in persistent transactional memory system
CN103558992A (en) Off-heap direct-memory data stores, methods of creating and/or managing off-heap direct-memory data stores, and/or systems including off-heap direct-memory data store
US20160291881A1 (en) Method and apparatus for improving disk array performance
US20180107601A1 (en) Cache architecture and algorithms for hybrid object storage devices
US9135262B2 (en) Systems and methods for parallel batch processing of write transactions
JPH0644010A (en) Method and system for polling under sub-file in time zero-backup-copy-process
US10733101B2 (en) Processing node, computer system, and transaction conflict detection method
CN109165321B (en) Consistent hash table construction method and system based on nonvolatile memory
CN111611223B (en) Non-volatile data access method, system, electronic device and medium
US11379326B2 (en) Data access method, apparatus and computer program product
CN109739688B (en) Snapshot resource space management method and device and electronic equipment
CN109508140B (en) Storage resource management method and device, electronic equipment and system
CN114115711B (en) Quick buffer storage system based on nonvolatile memory file system
CN112000289B (en) Data management method for full flash storage server system and related components
CN115793957A (en) Method and device for writing data and computer storage medium
US20150113244A1 (en) Concurrently accessing memory

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination