CN113986118A

CN113986118A - Data processing method and device

Info

Publication number: CN113986118A
Application number: CN202111140712.XA
Authority: CN
Inventors: 余思明
Original assignee: New H3C Big Data Technologies Co Ltd
Current assignee: New H3C Big Data Technologies Co Ltd
Priority date: 2021-09-28
Filing date: 2021-09-28
Publication date: 2022-01-28

Abstract

The present application relates to the field of data storage technologies, and in particular, to a data processing method and apparatus. The method is applied to a storage server which comprises a first storage medium composed of an SSD and a second storage medium composed of an HDD, and comprises the following steps: receiving a target data writing request, and judging whether first data associated with the target data exists in the first storage medium; if yes, merging the target data and the first data to obtain second data; judging whether the data length of the second data meets a preset requirement or not; if the data length of the second data meets the preset requirement, adding the second data into a first disk brushing queue, wherein the first disk brushing queue is a high-priority disk brushing queue; and if the data length of the second data is judged not to meet the preset requirement, adding the second data into a second disk brushing queue, wherein the second disk brushing queue is a low-priority disk brushing queue.

Description

Data processing method and device

Technical Field

The present application relates to the field of data storage technologies, and in particular, to a data processing method and apparatus.

Background

In a storage array, in order to provide system performance of a storage system, an SSD is generally used to provide a write cache for an HDD, however, since the capacity of the SSD is always smaller than that of the HDD, the SSD will eventually exhaust its space as data is continuously written. At this time, dirty data that has been written before (data that has been written to the SSD but has not yet been written to the HDD is referred to as "dirty data", otherwise "clean data") needs to be written to the HDD, a process called disk flushing. Clean data that has been flushed can be obsolete to free up space to accept new data. The disk-flushing algorithm determines which data to write to the HDD, and generally selects data that has not been accessed recently, so that the data blocks of the SSD can be utilized more efficiently, and the overall performance of the system is better.

The SSD disk provides external write acceleration, a user write request needs to be continuously received, space is continuously consumed, and after long-time operation, the write performance which can be provided externally when the space is nearly exhausted depends on the performance of dirty data written in the HDD disk during disk refreshing. At present, the purpose of a disk-flushing strategy is to flush the coldest dirty data cached by an SSD disk into the HDD as soon as possible to achieve the effect that the dirty data is converted into clean data and space is released to continue providing data access.

Disclosure of Invention

The application provides a data processing method and device, which are used for solving the problem that in the prior art, a disk refreshing service is not compatible with an HDD (hard disk drive) disk enough, so that the disk refreshing efficiency is low.

In a first aspect, the present application provides a data processing method applied to a storage server, where the storage server includes a first storage medium composed of an SSD and a second storage medium composed of an HDD, the method includes:

receiving a target data writing request, and judging whether first data associated with the target data exists in the first storage medium;

if the first storage medium is judged to have first data associated with the target data, merging the target data and the first data to obtain second data;

judging whether the data length of the second data meets a preset requirement or not;

if the data length of the second data meets the preset requirement, adding the second data into a first disk brushing queue, wherein the first disk brushing queue is a high-priority disk brushing queue;

and if the data length of the second data is judged not to meet the preset requirement, adding the second data into a second disk brushing queue, wherein the second disk brushing queue is a low-priority disk brushing queue.

Optionally, the step of determining whether the first data associated with the target data exists in the first storage medium includes:

and judging whether the data space written by the target data is overlapped or continuous with the data space of the cached data, and if so, judging that the cached data is the first data associated with the target data.

Optionally, if the data length of the second data is greater than or equal to a set threshold, it is determined that the data length of the second data meets a preset requirement.

Optionally, the step of adding the second data to the first disk-flushing queue includes:

adding the second data to the tail of the first disk brushing queue;

the step of adding the second data to a second flash queue comprises:

and adding the second data to the tail part of the second disk brushing queue.

Optionally, the method further comprises:

when a disk refreshing instruction is received, judging whether to cache the data to be refreshed in the first disk refreshing queue;

if the data to be flushed are cached in the first disk flushing queue, storing part/all of the data to be flushed to a second storage medium from the head of the first disk flushing queue;

if the data to be flushed are not cached in the first disk flushing queue, whether the data to be flushed are cached in the second disk flushing queue is judged;

and if the data to be flushed are cached in the second disk flushing queue, storing part/all of the data to be flushed to a second storage medium from the head of the second disk flushing queue.

In a second aspect, the present application provides a data processing apparatus applied to a storage server including a first storage medium composed of an SSD and a second storage medium composed of an HDD, the apparatus including:

a receiving unit configured to receive a target data write request;

a first judgment unit operable to judge whether or not first data associated with the target data exists in the first storage medium;

if the first judging unit judges that first data associated with the target data exists in the first storage medium, merging the target data and the first data to obtain second data;

the second judging unit is used for judging whether the data length of the second data meets a preset requirement or not;

if the second judging unit judges that the data length of the second data meets a preset requirement, adding the second data into a first disk brushing queue, wherein the first disk brushing queue is a high-priority disk brushing queue;

and if the second judging unit judges that the data length of the second data does not meet the preset requirement, adding the second data into a second disk brushing queue, wherein the second disk brushing queue is a low-priority disk brushing queue.

Optionally, when determining whether first data associated with the target data exists in the first storage medium, the first determining unit is specifically configured to:

Optionally, when the second data is added to the first disk-flushing queue, the second determining unit is specifically configured to:

adding the second data to the tail of the first disk brushing queue;

when the second data is added to the second disk-flushing queue, the second determining unit is specifically configured to:

and adding the second data to the tail part of the second disk brushing queue.

Optionally, the apparatus further comprises:

the disk refreshing unit is used for judging whether to cache the data to be refreshed in the first disk refreshing queue or not when a disk refreshing instruction is received; if the data to be flushed are cached in the first disk flushing queue, storing part/all of the data to be flushed to a second storage medium from the head of the first disk flushing queue; if the data to be flushed are not cached in the first disk flushing queue, whether the data to be flushed are cached in the second disk flushing queue is judged; and if the data to be flushed are cached in the second disk flushing queue, storing part/all of the data to be flushed to a second storage medium from the head of the second disk flushing queue.

In a third aspect, an embodiment of the present application provides a data processing apparatus, including:

a memory for storing program instructions;

a processor for calling program instructions stored in said memory and for executing the steps of the method according to any one of the above first aspects in accordance with the obtained program instructions.

In a fourth aspect, the present application further provides a computer-readable storage medium storing computer-executable instructions for causing a computer to perform the steps of the method according to any one of the above first aspects.

In summary, the data processing method provided by the embodiment of the present application is applied to a storage server, where the storage server includes a first storage medium composed of an SSD and a second storage medium composed of an HDD, and the method includes: receiving a target data writing request, and judging whether first data associated with the target data exists in the first storage medium; if the first storage medium is judged to have first data associated with the target data, merging the target data and the first data to obtain second data; judging whether the data length of the second data meets a preset requirement or not; if the data length of the second data meets the preset requirement, adding the second data into a first disk brushing queue, wherein the first disk brushing queue is a high-priority disk brushing queue; and if the data length of the second data is judged not to meet the preset requirement, adding the second data into a second disk brushing queue, wherein the second disk brushing queue is a low-priority disk brushing queue.

By adopting the data processing method provided by the embodiment of the application, through judging the relevance of the dirty data, a plurality of related dirty data with small data length are combined into the dirty data with large data length, and the disk-brushing priority of the dirty data with large data length is defined to be high, so that the priority setting of the size of the dirty data is realized, and the better writing performance is provided for the outside when caching acceleration is performed on a service SSD which is more compatible with the HDD issued in the disk-brushing.

Drawings

In order to more clearly illustrate the embodiments of the present application or the technical solutions in the prior art, the drawings needed to be used in the description of the embodiments of the present application or the prior art will be briefly described below, it is obvious that the drawings in the following description are only some embodiments described in the present application, and other drawings can be obtained by those skilled in the art according to the drawings of the embodiments of the present application.

Fig. 1 is a detailed flowchart of a data processing method according to an embodiment of the present application;

fig. 2 is a detailed flowchart of another data processing method provided in the embodiment of the present application;

FIG. 3 is a detailed flowchart of a method for brushing a disc according to an embodiment of the present disclosure;

fig. 4 is a schematic structural diagram of a data processing apparatus according to an embodiment of the present application;

fig. 5 is a schematic structural diagram of another data processing apparatus according to an embodiment of the present application.

Detailed Description

The terminology used in the embodiments of the present application is for the purpose of describing particular embodiments only and is not intended to be limiting of the application. As used in this application and the claims, the singular forms "a", "an", and "the" are intended to include the plural forms as well, unless the context clearly indicates otherwise. It should also be understood that the term "and/or" as used herein is meant to encompass any and all possible combinations of one or more of the associated listed items.

It should be understood that although the terms first, second, third, etc. may be used in the embodiments of the present application to describe various information, the information should not be limited to these terms. These terms are only used to distinguish one type of information from another. For example, first information may also be referred to as second information, and similarly, second information may also be referred to as first information, without departing from the scope of the present application. Depending on the context, moreover, the word "if" as used may be interpreted as "at … …" or "when … …" or "in response to a determination".

Exemplarily, referring to fig. 1, a detailed flowchart of a data processing method provided in an embodiment of the present application is applied to a storage server, where the storage server includes a first storage medium composed of an SSD and a second storage medium composed of an HDD, and the method includes the following steps:

step 100: receiving a target data writing request, and judging whether first data associated with the target data exists in the first storage medium.

In this embodiment of the present application, when determining whether the first storage medium has the first data associated with the target data, a preferred implementation manner is:

For example, assuming that data 1 has been stored in the SSD and its data space is 0-15, if the data space of the currently written target data is 13-16 and its data space overlaps with the data space of data 1 (13-15), it is determined that data 1 is associated with the target data.

Assuming that the SSD has stored data 2 with a data space of 0-15, if the target data currently being written has a data space of 16-20 with a data space linearly continuous with the data space of data 1 (0-15,16-20), it is determined that data 1 is associated with the target data.

Assuming that the SSD has data 3 stored therein and the data space thereof is 0-15, if the data space of the currently written target data is 17-20 and the data space thereof does not overlap and is not linearly continuous with the data space of data 1, it is determined that data 1 is not associated with the target data.

Step 110: and if the first data associated with the target data exists in the first storage medium, merging the target data and the first data to obtain second data.

As described above, assume that data 1 has been stored in the SSD with data spaces 0-15, and that the target data to be currently written has data spaces 13-16, with data 1 being associated with the target data. Then, after the data 1 is merged with the target data, the resulting data space of the second data is 0-16.

Similarly, assume that data 2 has been stored in the SSD, with a data space of 0-15, and the target data currently being written has a data space of 16-20, with data 2 being associated with the target data. Then, after the data 2 is merged with the target data, the resulting data space of the second data is 0-20.

Step 120: and judging whether the data length of the second data meets a preset requirement or not.

In the embodiment of the application, if the data length of the second data is greater than or equal to a set threshold, it is determined that the data length of the second data meets a preset requirement.

That is, whether the data length of the second data meets the preset requirement may be determined by determining whether the data length of the second data is greater than or equal to a set threshold. If the threshold value is larger than or equal to the set threshold value, the preset requirement is met, and if the threshold value is smaller than the set threshold value, the preset requirement is not met.

Step 130: and if the data length of the second data meets the preset requirement, adding the second data into a first disk brushing queue, wherein the first disk brushing queue is a high-priority disk brushing queue.

Step 140: and if the data length of the second data is judged not to meet the preset requirement, adding the second data into a second disk brushing queue, wherein the second disk brushing queue is a low-priority disk brushing queue.

In this embodiment of the present application, the first disk-brushing queue is a high-priority disk-brushing queue, and the second disk-brushing queue is a low-priority disk-brushing queue, that is, the disk-brushing priority of the first disk-brushing queue is higher than that of the second disk-brushing queue.

That is, when it is determined that the disk-flushing operation is performed, if there is data to be flushed in the first disk-flushing queue with a high priority, the data to be flushed in the first disk-flushing queue needs to be processed preferentially. And when the data to be brushed does not exist in the first brushing queue, processing the data to be brushed in the second brushing queue.

The data processing method provided by the embodiment of the present application is described in detail below with reference to specific application scenarios. Exemplarily, referring to fig. 2, a detailed flowchart of a data processing method provided in an embodiment of the present application is shown, where a user write request is received, whether the write request is transparent write is determined, if yes, the write request is directly written into an HDD disk, if not, a data space associated with the write service is identified (i.e., whether the write request exists in other written first data that is overlapped or linearly continuous in a data space of target data corresponding to the write service), if yes, data merging processing is performed to obtain second data (dirty data), then, whether a data length of the dirty data (associated related data and target data) is greater than or equal to a set threshold is determined, and if yes, the dirty data is hung into a high-priority disk-flushing queue and written into an SSD disk; if not, the data is hung in a low-priority disk refreshing queue and written into an SSD disk.

Further, referring to fig. 3, a detailed flowchart of a disk-flushing method provided in this embodiment of the present application is shown, when a disk-flushing instruction is received, a disk-flushing is started, it is determined whether a high-priority queue is empty, if not, the head dirty data of the high-priority queue is picked up, further, it is determined whether the picked up dirty data amount reaches a word-flushing threshold, if yes, the picked up dirty data is flushed to the HDD disk, otherwise, the head dirty data of the high-priority queue is continuously picked up. And if the high-priority queue is empty, judging whether the low-priority queue is empty, if not, picking up the head dirty data of the low-priority queue, and downloading the picked up dirty data to an HDD (hard disk drive) when judging that the dirty data amount reaches a single disk-brushing threshold value. Of course, if both the high priority queue and the low priority queue are empty, it indicates that there is no dirty data to be flushed to the HDD disk.

For example, assume that the currently managed dirty data space (i.e., the space in the SSD where data has been written) is ([ 0-15 ], [ 56-63 ], [ 72-127 ], [ 144-.

If the data space of the target data written this time is 32-35, which is discontinuous and non-overlapping with the data space of other written data, then the target data written this time is an independent data, the data length is 4, which is smaller than the high priority threshold, and the target data is put into the tail of the low priority disk-brushing queue.

If the data space of the target data written at this time is 16-19 and the first data associated with the target data is 0-15, the second data obtained after final combination is 0-19, the data length is 20 and is greater than the high-priority threshold, and the second data needs to be put into the tail of the flash disk queue with high priority (the original 0-15 is picked from the high-priority queue and hung into the tail of the high-priority queue together).

If the data space of the target data written at this time is 52-55 and the first data associated with the target data is 56-63, the second data obtained after final merging is 52-63, the data length is 12, and is smaller than the high-priority threshold, and the second data needs to be put into the tail of the flash disk queue with low priority (the original 56-63 is picked from the low-priority queue and hung into the tail of the low-priority queue together).

Exemplarily, referring to fig. 4, a schematic structural diagram of a data processing apparatus provided in an embodiment of the present application is applied to a storage server, where the storage server includes a first storage medium composed of an SSD and a second storage medium composed of an HDD, and the apparatus includes:

a receiving unit 40 for receiving a target data write request;

a first judgment unit 41 configured to judge whether or not first data associated with the target data exists in the first storage medium;

if the first judging unit 41 judges that first data associated with the target data exists in the first storage medium, merging the target data and the first data to obtain second data;

a second judging unit 42, configured to judge whether a data length of the second data meets a preset requirement;

if the second determining unit 42 determines that the data length of the second data meets a preset requirement, adding the second data to a first disk brushing queue, where the first disk brushing queue is a high-priority disk brushing queue;

if the second determining unit 42 determines that the data length of the second data does not satisfy the preset requirement, the second data is added to a second disk brushing queue, where the second disk brushing queue is a low-priority disk brushing queue.

Optionally, when determining whether first data associated with the target data exists in the first storage medium, the first determining unit 41 is specifically configured to:

Optionally, when the second data is added to the first disk-flushing queue, the second determining unit 42 is specifically configured to:

adding the second data to the tail of the first disk brushing queue;

when the second data is added to the second disk-flushing queue, the second determining unit 42 is specifically configured to:

and adding the second data to the tail part of the second disk brushing queue.

Optionally, the apparatus further comprises:

The above units may be one or more integrated circuits configured to implement the above methods, for example: one or more Application Specific Integrated Circuits (ASICs), or one or more microprocessors (DSPs), or one or more Field Programmable Gate Arrays (FPGAs), among others. For another example, when one of the above units is implemented in the form of a Processing element scheduler code, the Processing element may be a general-purpose processor, such as a Central Processing Unit (CPU) or other processor capable of calling program code. For another example, these units may be integrated together and implemented in the form of a system-on-a-chip (SOC).

Further, in the data processing apparatus provided in the embodiment of the present application, from a hardware aspect, a schematic diagram of a hardware architecture of the data processing apparatus may be shown in fig. 5, where the data processing apparatus may include: a memory 50 and a processor 51, which,

the memory 50 is used for storing program instructions; the processor 51 calls the program instructions stored in the memory 50 and executes the above-described method embodiments according to the obtained program instructions. The specific implementation and technical effects are similar, and are not described herein again.

Optionally, the present application also provides a storage server comprising at least one processing element (or chip) for performing the above method embodiments.

Optionally, the present application also provides a program product, such as a computer-readable storage medium, having stored thereon computer-executable instructions for causing the computer to perform the above-described method embodiments.

Here, a machine-readable storage medium may be any electronic, magnetic, optical, or other physical storage device that can contain or store information such as executable instructions, data, and so forth. For example, the machine-readable storage medium may be: a RAM (random Access Memory), a volatile Memory, a non-volatile Memory, a flash Memory, a storage drive (e.g., a hard drive), a solid state drive, any type of storage disk (e.g., an optical disk, a dvd, etc.), or similar storage medium, or a combination thereof.

The systems, devices, modules or units illustrated in the above embodiments may be implemented by a computer chip or an entity, or by a product with certain functions. A typical implementation device is a computer, which may take the form of a personal computer, laptop computer, cellular telephone, camera phone, smart phone, personal digital assistant, media player, navigation device, email messaging device, game console, tablet computer, wearable device, or a combination of any of these devices.

For convenience of description, the above devices are described as being divided into various units by function, and are described separately. Of course, the functionality of the units may be implemented in one or more software and/or hardware when implementing the present application.

As will be appreciated by one skilled in the art, embodiments of the present application may be provided as a method, system, or computer program product. Accordingly, the present application may take the form of an entirely hardware embodiment, an entirely software embodiment or an embodiment combining software and hardware aspects. Furthermore, embodiments of the present application may take the form of a computer program product embodied on one or more computer-usable storage media (including, but not limited to, disk storage, CD-ROM, optical storage, and the like) having computer-usable program code embodied therein.

The present application is described with reference to flowchart illustrations and/or block diagrams of methods, apparatus (systems), and computer program products according to embodiments of the application. It will be understood that each flow and/or block of the flow diagrams and/or block diagrams, and combinations of flows and/or blocks in the flow diagrams and/or block diagrams, can be implemented by computer program instructions. These computer program instructions may be provided to a processor of a general purpose computer, special purpose computer, embedded processor, or other programmable data processing apparatus to produce a machine, such that the instructions, which execute via the processor of the computer or other programmable data processing apparatus, create means for implementing the functions specified in the flowchart flow or flows and/or block diagram block or blocks.

Furthermore, these computer program instructions may also be stored in a computer-readable memory that can direct a computer or other programmable data processing apparatus to function in a particular manner, such that the instructions stored in the computer-readable memory produce an article of manufacture including instruction means which implement the function specified in the flowchart flow or flows and/or block diagram block or blocks.

These computer program instructions may also be loaded onto a computer or other programmable data processing apparatus to cause a series of operational steps to be performed on the computer or other programmable apparatus to produce a computer implemented process such that the instructions which execute on the computer or other programmable apparatus provide steps for implementing the functions specified in the flowchart flow or flows and/or block diagram block or blocks.

The above description is only exemplary of the present application and should not be taken as limiting the present application, as any modification, equivalent replacement, or improvement made within the spirit and principle of the present application should be included in the scope of protection of the present application.

Claims

1. A data processing method applied to a storage server including a first storage medium composed of an SSD and a second storage medium composed of an HDD, the method comprising:

2. The method of claim 1, wherein determining whether first data associated with the target data exists in the first storage medium comprises:

3. The method according to claim 1 or 2, wherein if the data length of the second data is greater than or equal to a set threshold, it is determined that the data length of the second data meets a preset requirement.

4. The method of claim 1 or 2, wherein adding the second data to the first flush queue comprises:

adding the second data to the tail of the first disk brushing queue;

the step of adding the second data to a second flash queue comprises:

and adding the second data to the tail part of the second disk brushing queue.

5. The method of claim 4, wherein the method further comprises:

6. A data processing apparatus applied to a storage server including a first storage medium composed of an SSD and a second storage medium composed of an HDD, the apparatus comprising:

a receiving unit configured to receive a target data write request;

7. The apparatus according to claim 6, wherein when determining whether the first data associated with the target data exists in the first storage medium, the first determining unit is specifically configured to:

and judging whether the data space written by the target data is overlapped or continuous with the data space of the cached data, and if so, judging the cached data to be the first data associated with the target data.

8. The apparatus according to claim 6 or 7, wherein if the data length of the second data is greater than or equal to a set threshold, it is determined that the data length of the second data meets a preset requirement.

9. The apparatus according to claim 6 or 7, wherein, when adding the second data to the first disk-brushing queue, the second determining unit is specifically configured to:

adding the second data to the tail of the first disk brushing queue;

and adding the second data to the tail part of the second disk brushing queue.

10. The apparatus of claim 9, wherein the apparatus further comprises: