CN116893786A

CN116893786A - Data processing method and device, electronic equipment and storage medium

Info

Publication number: CN116893786A
Application number: CN202311134632.2A
Authority: CN
Inventors: 王辉; 王见; 孙明刚
Original assignee: Suzhou Inspur Intelligent Technology Co Ltd
Current assignee: Suzhou Inspur Intelligent Technology Co Ltd
Priority date: 2023-09-05
Filing date: 2023-09-05
Publication date: 2023-10-17
Anticipated expiration: 2043-09-05
Also published as: CN116893786B

Abstract

The application provides a data processing method, a device, electronic equipment and a storage medium, wherein the method comprises the following steps: acquiring write operation data of a disk array; the write operation data comprises disk request information and cache data; determining a target storage block for caching data according to the disk request information; determining a target cache subtree corresponding to the target storage block according to the corresponding relation between each cache subtree and the storage block; and writing the cache data into the target cache subtree. According to the method provided by the scheme, the cache data generated by different write operations are stored in the different cache subtrees in a scattered manner, and only the corresponding target cache subtree is locked when one write operation is performed, so that even if the write operation requests are more, the write operation requests are not blocked, and the I/O performance of the RAID card is improved.

Description

Data processing method and device, electronic equipment and storage medium

Technical Field

The present application relates to the field of computer technologies, and in particular, to a data processing method, a data processing device, an electronic device, and a storage medium.

Background

At present, a disk array (Redundant Arrays of Independent Disks, abbreviated as RAID) is generally adopted by a storage server to store data, which is a storage method capable of improving the data reading and writing efficiency and providing the data redundancy capability.

In the prior art, the cache of writing operation is generally realized based on a RAID card, cache data generated by writing operation is stored in a binary tree structure, a binary tree comprising a plurality of nodes is formed through multiple writing operations, one cache data corresponds to one binary tree node, and then the cache data currently stored in the binary tree is subjected to the underbrushing to a disk according to a preset disk underbrushing rule.

However, since the storage server may have simultaneous access by multiple users, in order to avoid collision, when any user performs a write operation, the binary tree will be locked, and if there are more write operation requests at this time, the write operation requests will be blocked, which reduces the I/O performance of the RAID card.

Disclosure of Invention

The application provides a data processing method, a data processing device, electronic equipment and a storage medium, which are used for solving the defects that in the prior art, more write operation requests are caused to block the write operation requests, the I/O performance of a RAID card is reduced and the like.

The first aspect of the present application provides a data processing method, comprising:

acquiring write operation data of a disk array; the write operation data comprises disk request information and cache data;

Determining a target storage block of the cache data according to the disk request information;

determining a target cache subtree corresponding to the target storage block according to the corresponding relation between each cache subtree and the storage block;

and writing the cache data into the target cache subtree.

In an alternative embodiment, the disk request information includes a first logical block address, and determining, according to the disk request information, a target storage block of the buffered data includes:

determining a target storage space of the cache data according to the first logic block address;

determining an address interval corresponding to each storage block in the target storage space according to the storage block division condition of the target storage space;

and determining a target storage block of the cache data according to the matching condition between the first logic block address and the address interval corresponding to each storage block.

In an optional implementation manner, the disk request information further includes a cache data length, and the determining, according to a matching condition between the first logical block address and an address interval corresponding to each storage block, the target storage block of the cache data includes:

Determining a starting storage block of the cache data according to the matching condition between the first logic block address and the address interval corresponding to each storage block;

judging whether the number of target storage blocks of the cache data is 1 or not according to the first logic block address and the cache data length;

and when the number of the target storage blocks of the cache data is determined to be 1, taking the initial storage block as the target storage block of the cache data.

In an optional implementation manner, the determining, according to the first logical block address and the cache data length, whether the number of target storage blocks of the cache data is 1 includes:

determining a cache space of the initial storage block according to the first logic block address and the address interval of the initial storage block;

and when the cache space of the initial storage block is not smaller than the length of the cache data, determining the number of target storage blocks of the cache data to be 1.

In an alternative embodiment, the method further comprises:

and when the buffer memory space of the initial storage block is smaller than the length of the buffer memory data, determining that the number of target storage blocks of the buffer memory data is not 1.

In an alternative embodiment, the method further comprises:

splitting the cache data into a plurality of continuous sub-cache data when the number of target storage blocks of the cache data is not 1;

and taking the initial storage block and a plurality of storage blocks which are continuous with the initial storage block as target storage blocks corresponding to the sub-cache data.

In an alternative embodiment, the splitting the cache data into a plurality of consecutive sub-cache data includes:

determining the residual data amount according to the difference between the length of the cached data and the cache space of the initial storage block;

determining the number of target storage blocks according to the residual data amount and the rated space size of each storage block;

and splitting the cache data into a plurality of continuous sub-cache data according to the number of the target storage blocks.

In an optional embodiment, before determining the target cache subtree corresponding to the target storage block according to the correspondence between each cache subtree and the storage block, the method further includes:

obtaining storage block division information;

and according to a preset buffer subtree allocation standard, determining the corresponding relation between each buffer subtree and each storage block according to the arrangement relation between each storage block represented by the storage block division information.

In an optional implementation manner, the preset cache subtree allocation standard includes that a plurality of storage blocks corresponding to any one cache subtree are not adjacent.

In an alternative embodiment, the writing the cache data to the target cache subtree includes:

according to a preset subtree construction standard, determining a target child node corresponding to the cache data in the target cache subtree according to the current composition information of the target cache subtree and a second logic block address corresponding to the cache data;

and writing the cached data into the target child node.

In an optional implementation manner, the determining, according to the current composition information of the target cache subtree and the second logical block address corresponding to the cache data, the target child node corresponding to the cache data in the target cache subtree includes:

determining a target path of the cache data in the target cache subtree according to a second logic block address corresponding to the history cache data currently stored by each node represented by the current composition information of the target cache subtree and a second logic block address corresponding to the cache data;

And determining a target child node corresponding to the cached data in the target cached subtree according to the target path.

In an alternative embodiment, after writing the cached data to the target child node, the method further comprises:

checking whether the target cache subtree meets a preset subtree rule;

and if the target cache subtree does not meet the preset subtree rule, performing node rotation operation on the node area corresponding to the target subtree.

In an alternative embodiment, the method further comprises:

when the cache data downloading process of the disk array is triggered, traversing each cache subtree, and determining cache data to be downloaded, which has the minimum second logic block address, in each cache subtree;

and brushing the data to be brushed down to the corresponding target disk according to the sequence from small to large of the second logic block address.

In an optional implementation manner, the traversing each of the cache subtrees, determining the cache data to be flushed with the smallest second logical block address in each of the cache subtrees includes:

performing medium sequence traversal on each cache subtree to obtain a medium sequence traversal result of each cache subtree;

Constructing a buffer data sub-queue corresponding to each buffer subtree according to the intermediate sequence traversal result of each buffer subtree;

the data to be flushed pointed by the head pointer of each cache data sub-queue is used as the data to be flushed with the minimum second logic block address in each cache subtree;

and the data to be flushed are sequenced from small to large according to the second logic block address in the middle sequence traversing result.

In an optional implementation manner, the swiping down the cache data to be swished to the corresponding target disk in the order from the small to the large of the second logical block address includes:

comparing the sizes of the second logic block addresses of the to-be-flushed cache data pointed by the head pointers of the cache data sub-queues to obtain a second logic block address comparison result;

determining a global minimum second logic block address and a target cache data sub-queue corresponding to the global minimum second logic block address according to the second logic block address comparison result;

the data to be flushed corresponding to the global minimum second logic block address are flushed;

and controlling the head pointer of the target cache data sub-queue to move backwards by one node, and returning to the step of comparing the sizes of the second logic block addresses of the to-be-flushed cache data pointed by the head pointer of each cache data sub-queue to obtain a second logic block address comparison result.

In an optional implementation manner, the flushing the data to be flushed corresponding to the global minimum second logical block address includes:

adding the data to be flushed corresponding to the global minimum second logic block address to a flushing queue;

and when the brushing queue meets a preset brushing condition, brushing the data to be brushed in the brushing queue.

In an alternative embodiment, the method further comprises:

and if the number of the data to be flushed stored in the flushing queue reaches a preset number threshold, determining that the flushing queue meets a preset flushing condition.

A second aspect of the present application provides a data processing apparatus comprising:

the acquisition module is used for acquiring write operation data of the disk array; the write operation data comprises disk request information and cache data;

the determining module is used for determining a target storage block of the cache data according to the disk request information;

the corresponding module is used for determining a target cache subtree corresponding to the target storage block according to the corresponding relation between each cache subtree and the storage block;

and the processing module is used for writing the cache data into the target cache subtree.

In an alternative embodiment, the disk request information includes a first logical block address, and the determining module is specifically configured to:

In an optional implementation manner, the disk request information further includes a cache data length, and the determining module is specifically configured to:

In an alternative embodiment, the determining module is specifically configured to:

In an alternative embodiment, the determining module is further configured to:

In an alternative embodiment, the corresponding module is further configured to:

acquiring storage block division information before determining a target cache subtree corresponding to the target storage block according to the corresponding relation between each cache subtree and the storage block;

In an alternative embodiment, the processing module is specifically configured to:

And writing the cached data into the target child node.

In an alternative embodiment, the processing module is further configured to:

after the cache data is written into the target sub-node, checking whether the target cache sub-tree meets a preset sub-tree rule or not;

In an alternative embodiment, the apparatus further comprises:

the system comprises a flushing module, a flushing module and a flushing module, wherein the flushing module is used for traversing each cache subtree when triggering the flushing process of the cache data of the disk array, and determining the cache data to be flushed with the minimum second logic block address in each cache subtree; and brushing the data to be brushed down to the corresponding target disk according to the sequence from small to large of the second logic block address.

In an alternative embodiment, the lower brush module is specifically configured to:

In an alternative embodiment, the lower brush module is further configured to:

A third aspect of the present application provides an electronic device, comprising: at least one processor and memory;

the memory stores computer-executable instructions;

the at least one processor executes the computer-executable instructions stored by the memory such that the at least one processor performs the method as described above in the first aspect and the various possible designs of the first aspect.

A fourth aspect of the application provides a computer readable storage medium having stored therein computer executable instructions which when executed by a processor implement the method as described above for the first aspect and the various possible designs of the first aspect.

The technical scheme of the application has the following advantages:

Drawings

In order to more clearly illustrate the embodiments of the present application or the technical solutions in the prior art, a brief description will be given below of the drawings required for the embodiments or the prior art descriptions, and it is obvious that the drawings in the following description are some embodiments of the present application, and other drawings may be obtained according to these drawings for a person having ordinary skill in the art.

FIG. 1 is a schematic diagram of a data processing system according to an embodiment of the present application;

FIG. 2 is a schematic flow chart of a data processing method according to an embodiment of the present application;

FIG. 3 is a logical view of partitioning a disk storage block according to an embodiment of the present application;

FIG. 4 is a logic diagram of a cache subtree allocation according to an embodiment of the present application;

FIG. 5 is a schematic diagram of an exemplary cache sub-tree according to an embodiment of the present application;

FIG. 6 is an exemplary node rotation logic diagram provided by an embodiment of the present application;

FIG. 7 is an exemplary cache data downshifting logic diagram provided by an embodiment of the present application;

FIG. 8 is a schematic overall flow chart of a data processing method according to an embodiment of the present application;

FIG. 9 is a schematic diagram of a data processing apparatus according to an embodiment of the present application;

fig. 10 is a schematic structural diagram of an electronic device according to an embodiment of the present application.

Specific embodiments of the present application have been shown by way of the above drawings and will be described in more detail below. These drawings and the written description are not intended to limit the scope of the disclosed concept in any way, but to illustrate the inventive concept to those skilled in the art by reference to specific embodiments.

Detailed Description

For the purpose of making the objects, technical solutions and advantages of the embodiments of the present application more apparent, the technical solutions of the embodiments of the present application will be clearly and completely described below with reference to the accompanying drawings in the embodiments of the present application, and it is apparent that the described embodiments are some embodiments of the present application, but not all embodiments of the present application. All other embodiments, which can be made by those skilled in the art based on the embodiments of the application without making any inventive effort, are intended to be within the scope of the application.

First, the terms involved in the present application will be explained:

RAID: the redundant array formed by the independent disks (Redundant Arrays of Independent Disks, RAID for short) can effectively improve the data reliability and the I/O performance by utilizing the redundant array formed by the member disks. With this technique, data is cut into a number of sections, which are individually stored on individual disks. The RAID card is a device capable of creating a hard disk RAID group, can improve the access speed of data through multi-disk parallel access, and simultaneously provides the capability of carrying out redundancy error correction on the data.

Furthermore, the terms "first," "second," and the like, are used for descriptive purposes only and are not to be construed as indicating or implying a relative importance or implicitly indicating the number of technical features indicated. In the following description of the embodiments, the meaning of "a plurality" is two or more, unless explicitly defined otherwise.

With the rapid development of big data and artificial intelligence technology, the data in the internet are more and more diversified. In addition, the number of netizens has also increased in recent decades, and as the number of netizens increases, the access amount of internet enterprises is increased suddenly, which brings about unprecedented challenges to the enterprises.

Enterprises typically use dedicated storage servers to store user data and provide access services to the outside. Today, the number of storage servers can reach millions for some large businesses. How to better utilize these storage resources is a technical direction that every enterprise is under the lead of research. RAID is one of the better developed storage methods and architectures in this process. RAID is a multi-hard disk management method, which improves the concurrent access speed of data by forming a plurality of disks into different RAID arrays and striping the data for storage. Meanwhile, the partial RAID array provides a redundancy function, and can recover the whole data under the condition that part of the disk is broken, so that the safety of data storage is greatly improved. Earlier, the RAID function was implemented in pure software, the computational power of the general purpose CPU used, a RAID card specially providing the RAID function was developed slowly in the middle and later stages of development, and the RAID card completed most of the computing implementation in the RAID function in a hardware manner, thereby improving the speed and reducing the power consumption. RAID cards have become an indispensable storage component in storage servers today.

In the prior art, the cache of writing operation is generally realized based on a RAID card, cache data generated by writing operation is stored in a binary tree structure, a binary tree comprising a plurality of nodes is formed through multiple writing operations, one cache data corresponds to one binary tree node, and then the cache data currently stored in the binary tree is flushed down to a disk according to a preset rule.

In view of the above problems, embodiments of the present application provide a data processing method, apparatus, electronic device, and storage medium, where the method includes: acquiring write operation data of a disk array; the write operation data comprises disk request information and cache data; determining a target storage block for caching data according to the disk request information; determining a target cache subtree corresponding to the target storage block according to the corresponding relation between each cache subtree and the storage block; and writing the cache data into the target cache subtree. According to the method provided by the scheme, the cache data generated by different write operations are stored in the different cache subtrees in a scattered manner, and only the corresponding target cache subtree is locked when one write operation is performed, so that even if the write operation requests are more, the write operation requests are not blocked, and the I/O performance of the RAID card is improved.

The following embodiments may be combined with each other, and the same or similar concepts or processes may not be described in detail in some embodiments. Embodiments of the present application will be described below with reference to the accompanying drawings.

First, a description will be given of a structure of a data processing system on which the present application is based:

the data processing method, the device, the electronic equipment and the storage medium provided by the embodiment of the application are suitable for processing RAID cache data. Fig. 1 is a schematic structural diagram of a data processing system according to an embodiment of the present application, which mainly includes a RAID card, a data acquisition device, and a data processing device. Specifically, the data processing device may process the cache data generated by the writing operation of the RAID card according to the obtained writing operation data.

The embodiment of the application provides a data processing method for processing RAID cache data. The execution body of the embodiment of the application is electronic equipment, such as a server, a desktop computer, a notebook computer, a tablet computer and other electronic equipment which can be used for processing RAID cache data.

Fig. 2 is a schematic flow chart of a data processing method according to an embodiment of the present application, where the method includes:

in step 201, write operation data of a disk array is acquired.

The write operation data comprises disk request information and cache data.

Step 202, determining a target storage block for caching data according to the disk request information.

The disk request information at least characterizes the disk brushing requirement of the cache data so as to specify the disk brushing area of the cache data.

It should be noted that, since the RAID card involves a plurality of disks, for any disk, the disk may be divided into a plurality of storage blocks in advance.

Specifically, a storage block corresponding to a disk under-brushing area of the cached data specified by the disk request information may be searched to determine a target storage block of the cached data.

Step 203, determining a target cache subtree corresponding to the target storage block according to the corresponding relation between each cache subtree and the storage block.

Specifically, a plurality of cache subtrees may be preset, and a correspondence between each cache subtree and a storage block may be established, where one cache subtree corresponds to a plurality of storage blocks. After determining a target storage block of the cache data, determining a cache subtree corresponding to the target storage block as a target cache subtree.

Step 204, writing the cached data into the target cache subtree.

Specifically, the target cache subtree may be first locked, and then unlocked after the cache data is successfully written into the target cache subtree. Even if other cache data are required to be written in by the writing operation, the writing of other cache data is not affected as long as the target cache subtree corresponding to the other cache data is different from the target cache subtree corresponding to the current cache data, so that the parallel processing of the writing operation is realized, the concurrency number of the writing cache data when the writing cache data is added into the memory is improved, and the I/O performance of the RAID card is improved.

On the basis of the foregoing embodiment, in order to improve the processing efficiency of the buffered data, as an implementation manner, in one embodiment, the disk request information includes a first Logical Block Address (LBA), and determining, according to the disk request information, a target storage block of the buffered data includes:

step 2021, determining a target storage space of the buffered data according to the first logical block address;

step 2022, determining address intervals corresponding to each storage block in the target storage space according to the storage block division condition of the target storage space;

In step 2023, a target storage block for caching data is determined according to the matching between the first logical block address and the address interval corresponding to each storage block.

The target storage space specifically refers to a storage space to which a disk under-brushing area for caching data belongs, and the first logical block address specifically refers to a writing LBA address carried by a writing operation request, which is also called a starting logical block address.

Specifically, it may be determined, according to the first logical block address, which storage space of the disk the buffered data will fall into, that is, determining the target storage space of the buffered data. Because each storage space of the magnetic disk is divided into a plurality of storage blocks in advance, each storage block comprises a section of continuous address, the address interval corresponding to each storage block in the target storage space can be determined according to the storage block division condition of the target storage space, so that the address interval of which storage block the cache data falls to is accurately determined according to the address of the first logic block, and further the target storage block of the cache data is determined. The target storage space and the target storage block are determined step by step in sequence according to the first logic block address, so that the target storage block is positioned quickly, and a foundation is laid for improving the processing efficiency of the cache data.

Exemplary, as shown in fig. 3, a logical map of disk storage block partitioning is provided in an embodiment of the present application. Taking a 1T disk as an example, the 1T disk is first divided into 1024 storage spaces with a size of 1G, each storage space corresponds to a segment of continuous addresses of the disk, then each storage space is divided into 16 storage blocks, each storage block is 64M, an address interval corresponding to a storage block 0 in the storage space 0 in fig. 3 is [0,63] and an address interval corresponding to a storage block 1 is [64,127], and so on. If the first logical block address of the cache data is 5, determining that the target storage space of the cache data is storage space 0, and determining that the target storage block is storage block 0.

Specifically, in an embodiment, since the data amounts of the cache data generated by each write operation are different, when the data amount of the cache data is large, a situation of crossing the storage blocks may occur, so in order to ensure the reliability of processing the cache data, the disc request information further includes a cache data Length (Length), and specifically, the initial storage block of the cache data may be determined according to the matching condition between the first logical block address and the address interval corresponding to each storage block; judging whether the number of target storage blocks of the cache data is 1 according to the first logic block address and the cache data length; when the number of target memory blocks of the cache data is determined to be 1, the starting memory block is taken as the target memory block of the cache data.

For example, if the first logical block address of the buffered data is 43, the initial storage block of the buffered data is determined to be storage block 0 in the storage space 0, and if the buffered data length is 10M, the sum of the first logical block address and the buffered data length is 53 and still belongs to the address interval of the initial buffer block, so that the number of target storage blocks of the buffered data is determined to be 1, and then the initial storage block is taken as the target storage block of the buffered data.

Specifically, in an embodiment, the buffer space of the initial storage block may be determined according to the address of the first logic block and the address interval of the initial storage block; when the buffer space of the initial storage block is not smaller than the length of the buffer data, the number of target storage blocks of the buffer data is determined to be 1.

Accordingly, in an embodiment, when the buffer space of the starting storage block is smaller than the buffer data length, it is determined that the number of target storage blocks of the buffer data is not 1.

Wherein the buffer space of the initial storage block specifically refers to the space available for storing the buffer data.

For example, if the first logical block address of the buffered data is 43, it is determined that the initial memory block of the buffered data is memory block 0 in memory space 0, and the address interval of the initial memory block is [0,63], so that it is determined that the buffer space of the initial memory block is 20M, and if the buffer space of the initial memory block is not less than the length of the buffered data, it is determined that the number of target memory blocks of the buffered data is 1. Otherwise, determining that the number of target storage blocks of the cache data is not 1.

Specifically, in an embodiment, when it is determined that the number of target storage blocks of the cache data is not 1, splitting the cache data into a plurality of consecutive sub-cache data; and taking the initial storage block and a plurality of storage blocks which are continuous with the initial storage block as target storage blocks corresponding to each sub-cache data.

Specifically, when the number of target storage blocks of the cache data is determined to be not 1, the specific number of target cache blocks is further determined, the cache data is split into a corresponding number of continuous sub-cache data, the initial storage block is taken as a first target storage block, a plurality of storage blocks continuous with the initial storage block are taken as other target storage blocks, and the continuous sub-cache data are sequentially distributed to the corresponding target storage blocks.

Specifically, in an embodiment, in order to avoid the situation that the obtained sub-cache data still has a cross-storage block after splitting the cache data, the remaining data amount may be determined according to the difference between the length of the cache data and the cache space of the initial storage block; determining the number of target storage blocks according to the residual data quantity and the rated space size of each storage block; and splitting the cache data into a plurality of continuous sub-cache data according to the number of the target storage blocks.

For example, if the length of the buffered data is determined to be 30M and the buffer space of the initial storage block is determined to be 20M, the remaining data amount is determined to be 10M, and since the rated space of the storage block is 64M in the embodiment of the present application, that is, the remaining data amount is smaller than the rated space of the storage block, the number of the target storage blocks can be determined to be 2, so that the buffered data can be split into one 20M sub-buffered data and one 10M sub-buffered data, or the buffered data can be directly divided into two 15M sub-buffered data.

If the number of the target storage blocks of the cache data is determined to be 2, the target storage blocks are respectively a storage block 0 and a storage block 1 in the storage space 0, the cache data can be differentiated into two parts of sub-cache data according to the cache space of the storage block 0 (the initial storage block), the two parts of sub-cache data are respectively a first sub-cache data with a size corresponding to the cache space of the initial cache block and a second sub-cache data with a residual data amount, and finally the storage block 0 is used as the target storage block of the first sub-cache data, and the storage block 1 is used as the target storage block of the second sub-cache data.

It should be noted that, in the case that the address of the first logical block in the RAID write operation is random, the data is hashed in the disk, and at this time, the specific number of the target storage blocks may also be set according to the cache space of the subsequent storage blocks, for example, the storage space of the storage block adjacent to the starting storage block is only 5M, which is smaller than the remaining data, and the number of the target storage blocks is determined to be greater than 2. The specific splitting logic of the cache data can be flexibly set according to the length of the cache data and the size of the cache space of the initial storage block, but the situation that the split sub-cache data cannot cross the storage block is ensured to avoid secondary splitting.

On the basis of the foregoing embodiment, in order to further improve the processing efficiency of the cached data, as an implementation manner, in an embodiment, before determining, according to the correspondence between each cache subtree and the storage block, the target cache subtree corresponding to the target storage block, the method further includes:

step 301, obtaining storage block division information;

step 302, according to a preset allocation standard of the cache subtrees, determining the corresponding relation between each cache subtree and each storage block according to the arrangement relation between each storage block represented by the storage block division information.

The preset cache subtree allocation standard comprises that a plurality of storage blocks corresponding to any cache subtree are not adjacent.

For example, if the storage block division information indicates that the storage space of 1G is divided into 16 storage blocks of 64M, the arrangement relationship between the storage blocks is determined to be a 4×4 array, and then the corresponding relationship between each cache subtree and the storage block is determined according to the cache subtree allocation criteria that a plurality of storage blocks corresponding to any cache subtree are not adjacent. By adopting the buffer subtree allocation standard that a plurality of storage blocks corresponding to any buffer subtree are not adjacent, the continuously initiated write operation corresponds to different buffer subtrees as far as possible, the concurrency number of the write operation is improved, and the processing efficiency of the buffer data is further improved.

For example, as shown in fig. 4, in the embodiment of the present application, the number of the adopted cache subtrees may be set according to the actual cache data management requirement, in the embodiment of the present application, 5 cache subtrees are taken as an example, and are respectively cache subtree 0, cache subtree 1, cache subtree 2, cache subtree 3 and cache subtree 4, where the storage blocks in the storage space 0 and the storage space 1 are all distributed in a 4×4 array, and the storage block numbers are 0-15. In order to realize that the several memory blocks corresponding to any cache subtree are not adjacent, in the memory space 0, the memory blocks with numbers 0, 5, 11 and 15 may be allocated to the cache subtree 0, the memory blocks with numbers 1, 6 and 11 may be allocated to the cache subtree 1, the memory blocks with numbers 2, 7 and 12 may be allocated to the cache subtree 2, the memory blocks with numbers 3, 8 and 13 may be allocated to the cache subtree 3, and the memory blocks with numbers 4, 9 and 14 may be allocated to the cache subtree 4. To ensure that the number of memory blocks allocated per cache subtree is approximately equal, there is a certain difference in the memory subtree allocation logic for each memory space, e.g., in cache space 1, memory blocks numbered 0, 5, 11, and 15 may be allocated to cache subtree 1, memory blocks numbered 1, 6, and 11 may be allocated to cache subtree 2, memory blocks numbered 2, 7, and 12 may be allocated to cache subtree 3, memory blocks numbered 3, 8, and 13 may be allocated to cache subtree 4, and memory blocks numbered 4, 9, and 14 may be allocated to cache subtree 0. And the like, until the corresponding relation between each storage block in each storage space and the cache subtree is established.

Specifically, in the case that the number of cache subtrees is 5, in order to achieve rapid positioning of the target cache subtrees, the number of the corresponding target cache subtree can be determined according to the ratio calculation condition of the LBA/64M/5. Wherein, the logical block address (Logical Block Address, abbreviated as LBA), in the embodiment of the application, the LBA represents the first logical block address.

Based on the foregoing embodiments, to improve the subsequent flushing efficiency of the cached data, as an implementation manner, in one embodiment, writing the cached data into the target cache subtree includes:

step 2041, according to the preset subtree construction standard, determining a target child node corresponding to the cache data in the target cache subtree according to the current composition information of the target cache subtree and the second logic block address corresponding to the cache data;

in step 2042, the cached data is written to the target child node.

It should be noted that the preset subtree construction criteria may be set according to practical situations, for example, the preset subtree construction criteria may be specified according to the construction criteria of the balanced binary tree. The second logic block address is specific to the LBA address corresponding to the cached data in the current target cache subtree, if the cached data is not split, the second logic block address is the same as the first logic block address, and if the cached data is split, the second logic block address refers to the LBA address of the sub-cached data in the corresponding target cache subtree.

The current composition information of the target cache subtree at least comprises a second logic block address of the history cache data currently stored by each node.

Specifically, when the cache data is written into the target cache subtree, the target sub-node corresponding to the cache data in the target cache subtree can be determined according to the size relationship between the second logic block address of the history cache data currently stored by each node of the target cache subtree and the second logic block address corresponding to the cache data to be written currently, and then the cache data is written into the target sub-node.

For any cache sub-tree, if the write order of the cache data of the cache sub-tree is: 10. 70, 20, 5, 90, 60, 30, 65, 1, 100, wherein the numbers represent the second logical block addresses corresponding to each cache data, the structure of the cache subtree is shown in fig. 5, and fig. 5 is a schematic diagram of the structure of an exemplary cache subtree according to an embodiment of the present application.

Specifically, in an embodiment, according to a second logical block address corresponding to the history cache data currently stored by each node represented by the current composition information of the target cache subtree and a second logical block address corresponding to the cache data, determining a target path of the cache data in the target cache subtree; and determining a target child node corresponding to the cached data in the target cache subtree according to the target path.

The target path may be specifically represented by binary encoding, for example, the path of the buffered data with the second logical block address of 60 in the buffered subtree is 101, and the path of the buffered data with the second logical block address of 65 in the buffered subtree is 1011 in fig. 5.

On the basis of the foregoing embodiment, to further improve the subsequent flushing efficiency of the cached data, as an implementation manner, after the cached data is written into the target child node, the method further includes:

step 401, checking whether the target cache subtree meets a preset subtree rule;

step 402, if the target cache subtree does not meet the preset subtree rule, performing a node rotation operation on the node area corresponding to the target subtree.

It should be noted that, the red-black balanced binary tree has better balance and provides access time complexity of O (log n), so that the buffer subtree can actually adopt the red-black balanced binary tree. However, because the red-black binary tree has too many rule restrictions in construction, many rotation operations are involved in the operations of inserting and deleting the nodes of the red-black binary tree, so that when data insertion and deletion operations are performed on each pair of cache subtrees, preset subtree rule verification is performed on the cache subtrees, and when the target cache subtrees do not meet the preset subtree rules, node rotation operations are performed on node areas corresponding to the target subtrees. The preset subtree rule may specifically refer to a preset red-black balanced binary tree rule, as shown in fig. 6, which is an exemplary node rotation logic diagram provided by the embodiment of the present application, and when performing node rotation, the node changes color according to the actual rotation situation, and each left-handed and right-handed operation affects father, grandfather, tertiary and brother nodes of the rotation node. Because the attribute contents of the nodes need to be modified after rotation, a plurality of nodes need to be locked in the rotation process, and the data damage caused by overlapping operation areas when concurrent access is prevented.

The preset subtree rule comprises the following steps: each node is either red or black; there cannot be red nodes connected together; the root nodes are black root; both child nodes of each red node are black. The leaf nodes are black, and the degree of emergence is 0; is a binary search tree. The red-black balanced binary tree can reach an approximately balanced state by satisfying these properties.

Based on the foregoing embodiment, to improve the cache data scrubbing efficiency, as an implementation manner, the method further includes:

step 501, when a cache data downloading process of a disk array is triggered, traversing each cache subtree, and determining cache data to be downloaded with a minimum second logic block address in each cache subtree;

step 502, the data to be flushed is flushed down to the corresponding target disk according to the order of the second logical block addresses from small to large.

The triggering condition of the cache data flushing process of the disk array (RAID) may be determined according to an actual situation, and the embodiment of the present application is not limited.

Specifically, when the cache data flushing process of RAID is triggered, firstly determining the cache data to be flushed with the smallest second logic block address in each cache subtree, then summarizing the cache data to be flushed with the smallest second logic block address in each cache subtree, flushing the cache data to be flushed to the corresponding target disk according to the sequence of the second logic block address from small to large, and ensuring that the cache data continuously flushed are similar in track area. Each time the data head is written, it only needs to move to the nearest track of the data to be written. The back and forth jump of the magnetic head under the random writing condition is reduced, the average seek time is reduced, and the I/O reading and writing performance is improved.

Specifically, in an embodiment, the middle-order traversal may be performed on each cache subtree to obtain the middle-order traversal result of each cache subtree; constructing a buffer data sub-queue corresponding to each buffer subtree according to the intermediate sequence traversal result of each buffer subtree; and taking the to-be-flushed cache data pointed by the head pointer of each cache data sub-queue as the to-be-flushed cache data with the minimum second logic block address in each cache subtree.

And the data to be flushed are sequenced from small to large according to the second logic block address in the intermediate sequence traversing result.

Illustratively, by performing the middle-order traversal of the cached sub-tree as shown in fig. 5, the middle-order traversal result of the cached sub-tree is determined as: 1. 5, 10, 20, 30, 60, 65, 70, 90 and 100, and further constructing a buffer data sub-queue corresponding to the buffer subtree according to the sequence represented by the intermediate sequence traversal result, wherein a head pointer of the buffer data sub-queue points to the buffer data to be flushed with the second logic block address of 1, i.e. the head pointer points to the buffer data to be flushed with the second logic block address of the buffer subtree. Since these zone addresses do not jump, much unnecessary track movement is reduced compared to directly brushing down the disk without using a buffer.

Specifically, in an embodiment, since the to-be-flushed cache data is from a plurality of cache subtrees, to further ensure the flushing efficiency of the cache data, the sizes of the second logical block addresses of the to-be-flushed cache data pointed by the head pointers of the cache data sub-queues can be compared to obtain a second logical block address comparison result; determining a global minimum second logic block address and a target cache data sub-queue corresponding to the global minimum second logic block address according to the second logic block address comparison result; the data to be flushed corresponding to the global minimum second logic block address is flushed; and controlling the head pointer of the target cache data sub-queue to move backwards by one node, and returning to the step of comparing the size of the second logic block address of the to-be-flushed cache data pointed by the head pointer of each cache data sub-queue to obtain a second logic block address comparison result.

Specifically, by comparing the sizes of the second logical block addresses of the to-be-flushed cache data pointed by the head pointers of the cache data sub-queues, the second logical block address of the to-be-flushed cache data that is flushed first is ensured to be the global minimum second logical block address. For any one of the sub-queues of the cache data, every time the cache data is flushed down, the head pointer is added with 1, so that the head pointer points to the next cache data to be flushed down until the head pointer = tail pointer in the sub-queue of the cache data, i.e. the sub-queue of the cache data is empty.

Specifically, in an embodiment, the data to be flushed corresponding to the global minimum second logical block address may be added to the flushing queue; and when the brushing descending queue meets the preset brushing descending condition, brushing the data to be brushed in the brushing descending queue.

It should be noted that, when the cache data is split, one cache data is split into a plurality of continuous sub-cache data, and the second logical block address of each sub-cache data is continuous, so that when the sub-cache data corresponding to different target cache subtrees is added to the flushing queue, each sub-cache data is adjacent in the flushing queue.

And if the number of the data to be flushed stored in the flushing queue reaches a preset number threshold, determining that the flushing queue meets a preset flushing condition. And when the preset brushing period is reached, determining that the brushing queue meets the preset brushing condition.

It should be noted that, by firstly caching the data to be brushed based on the brushing queue, and brushing the data to be brushed in the brushing queue when the brushing queue meets the preset brushing condition, the data to be brushed in the brushing queue can be intensively brushed, and the jump of the magnetic head is reduced, so that the brushing efficiency of the cached data is improved.

Exemplary, as shown in fig. 7, an exemplary cache data downshifting logic diagram is provided for an embodiment of the present application. Firstly, determining the cache data sub-queues of the cache subtrees 0-4, for example, the cache data sub-queues of the cache subtree 1 are 0, 5 and 20, and determining two information of each cache subtree: 1. the minimum LBA address in the tree; 2. the LBA address is currently operable. The minimum LBA address in the tree may be the minimum second logical block address, where the current operable LBA address indicates the LBA address of the next node deleted by the last node, that is, the second logical block address of the node pointed by the head pointer after the backward movement, and when the cache sub-tree outputs the data to be flushed for the first time, the current operable LBA address node is consistent with the minimum LBA address node in the tree. And determining the data to be flushed corresponding to the global minimum second logic block address by comparing the current operable LBA address nodes of the 5 cache subtrees, sequentially writing the data to be flushed into the flushing queue, and flushing the data to be flushed in the flushing queue when the flushing queue meets the preset flushing condition.

The method comprises the steps of determining whether a cache data to be flushed is stored in a flushing queue, wherein the flushing queue is of a linked list structure, deleting nodes corresponding to the cache data to be flushed in a cache subtree after any cache data to be flushed is stored in the flushing queue, and determining that the cache data to be flushed is completely stored in the flushing queue until all nodes of each cache subtree are deleted.

As shown in fig. 8, an overall flow chart of a data processing method according to an embodiment of the present application is shown, where a write request includes an LBA start address (a first logical block address) and a Length (a buffered data Length), and the method shown in fig. 8 is an exemplary implementation of the method shown in fig. 2, and the implementation principles of the two are the same and are not repeated.

The data processing method provided by the embodiment of the application obtains the write operation data of the disk array; the write operation data comprises disk request information and cache data; determining a target storage block for caching data according to the disk request information; determining a target cache subtree corresponding to the target storage block according to the corresponding relation between each cache subtree and the storage block; and writing the cache data into the target cache subtree. According to the method provided by the scheme, the cache data generated by different write operations are stored in the different cache subtrees in a scattered manner, and only the corresponding target cache subtree is locked when one write operation is performed, so that even if the write operation requests are more, the write operation requests are not blocked, and the I/O performance of the RAID card is improved. And when the cache data is flushed, the flush queue is added to store the sequenced cache data to be flushed, so that the data can be ensured to be flushed in a converging and sequencing way on the track space, the write operation parallelism is realized, the seek time of the magnetic head is stabilized, and the flush efficiency of the cache data is improved.

The embodiment of the application provides a data processing device for executing the data processing method provided by the embodiment.

Fig. 9 is a schematic structural diagram of a data processing apparatus according to an embodiment of the present application. The data processing device 90 includes: an acquisition module 901, a determination module 902, a correspondence module 903 and a processing module 904.

The system comprises an acquisition module, a storage module and a data processing module, wherein the acquisition module is used for acquiring write operation data of a disk array; the write operation data comprises disk request information and cache data; the determining module is used for determining a target storage block for caching data according to the disk request information; the corresponding module is used for determining a target cache subtree corresponding to the target storage block according to the corresponding relation between each cache subtree and the storage block; and the processing module is used for writing the cache data into the target cache subtree.

Specifically, in one embodiment, the disc request information includes a first logical block address, and the determining module is specifically configured to:

And determining a target storage block for caching the data according to the matching condition between the first logic block address and the address interval corresponding to each storage block.

Specifically, in an embodiment, the disk request information further includes a cache data length, and the determining module is specifically configured to:

judging whether the number of target storage blocks of the cache data is 1 according to the first logic block address and the cache data length;

when the number of target memory blocks of the cache data is determined to be 1, the starting memory block is taken as the target memory block of the cache data.

Specifically, in an embodiment, the determining module is specifically configured to:

when the buffer space of the initial storage block is not smaller than the length of the buffer data, the number of target storage blocks of the buffer data is determined to be 1.

Specifically, in an embodiment, the determining module is further configured to:

when the buffer space of the initial storage block is smaller than the length of the buffer data, the number of target storage blocks of the buffer data is determined to be not 1.

and taking the initial storage block and a plurality of storage blocks which are continuous with the initial storage block as target storage blocks corresponding to each sub-cache data.

determining the residual data amount according to the difference value between the length of the cached data and the cache space of the initial storage block;

determining the number of target storage blocks according to the residual data quantity and the rated space size of each storage block;

Specifically, in an embodiment, the corresponding module is further configured to:

acquiring storage block division information before determining a target cache subtree corresponding to a target storage block according to the corresponding relation between each cache subtree and the storage block;

Specifically, in an embodiment, the preset cache subtree allocation criteria includes that a plurality of storage blocks corresponding to any cache subtree are not adjacent.

Specifically, in an embodiment, the processing module is specifically configured to:

according to the preset subtree construction standard, determining a target sub-node corresponding to the cache data in the target cache subtree according to the current composition information of the target cache subtree and a second logic block address corresponding to the cache data;

and writing the cached data into the target child node.

and determining a target child node corresponding to the cached data in the target cache subtree according to the target path.

Specifically, in an embodiment, the processing module is further configured to:

after the cache data is written into the target sub-node, checking whether the target cache sub-tree meets a preset sub-tree rule;

Specifically, in an embodiment, the apparatus further comprises:

the system comprises a flushing module, a flushing module and a flushing module, wherein the flushing module is used for traversing each cache subtree and determining the to-be-flushed cache data with the minimum second logic block address in each cache subtree when the flushing process of the cache data of the RAID is triggered; and brushing the data to be brushed down to the corresponding target disk according to the sequence from the small address to the large address of the second logic block.

Specifically, in one embodiment, the lower brush module is specifically configured to:

the data to be flushed pointed by the head pointer of each cache data sub-queue is used as the data to be flushed with the minimum address of the second logic block in each cache subtree;

the data to be flushed corresponding to the global minimum second logic block address is flushed;

and controlling the head pointer of the target cache data sub-queue to move backwards by one node, and returning to the step of comparing the size of the second logic block address of the to-be-flushed cache data pointed by the head pointer of each cache data sub-queue to obtain a second logic block address comparison result.

and when the brushing descending queue meets the preset brushing descending condition, brushing the data to be brushed in the brushing descending queue.

Specifically, in an embodiment, the lower brush module is further configured to:

if the number of the data to be flushed stored in the flushing queue reaches a preset number threshold, determining that the flushing queue meets a preset flushing condition.

The specific manner in which the respective modules perform the operations in the data processing apparatus in this embodiment has been described in detail in the embodiments concerning the method, and will not be described in detail here.

The data processing device provided by the embodiment of the present application is configured to execute the data processing method provided by the foregoing embodiment, and its implementation manner and principle are the same and will not be described again.

The embodiment of the application provides electronic equipment for executing the data processing method provided by the embodiment.

Fig. 10 is a schematic structural diagram of an electronic device according to an embodiment of the present application. The electronic device 10 includes: at least one processor 1001 and memory 1002.

The memory stores computer-executable instructions; at least one processor executes computer-executable instructions stored in the memory, causing the at least one processor to perform the data processing method as provided in the above embodiments.

The electronic device provided by the embodiment of the present application is configured to execute the data processing method provided by the foregoing embodiment, and its implementation manner and principle are the same and will not be described again.

The embodiment of the application provides a computer readable storage medium, wherein computer executable instructions are stored in the computer readable storage medium, and when a processor executes the computer executable instructions, the data processing method provided by any embodiment is realized.

The storage medium including the computer executable instructions in the embodiments of the present application may be used to store the computer executable instructions of the data processing method provided in the foregoing embodiments, and the implementation manner and principle of the implementation are the same, and are not repeated.

In the several embodiments provided by the present application, it should be understood that the disclosed apparatus and method may be implemented in other manners. For example, the apparatus embodiments described above are merely illustrative, e.g., the division of the units is merely a logical function division, and there may be additional divisions when actually implemented, e.g., multiple units or components may be combined or integrated into another system, or some features may be omitted or not performed. Alternatively, the coupling or direct coupling or communication connection shown or discussed with each other may be an indirect coupling or communication connection via some interfaces, devices or units, which may be in electrical, mechanical or other form.

The units described as separate units may or may not be physically separate, and units shown as units may or may not be physical units, may be located in one place, or may be distributed on a plurality of network units. Some or all of the units may be selected according to actual needs to achieve the purpose of the solution of this embodiment.

In addition, each functional unit in the embodiments of the present application may be integrated in one processing unit, or each unit may exist alone physically, or two or more units may be integrated in one unit. The integrated units may be implemented in hardware or in hardware plus software functional units.

The integrated units implemented in the form of software functional units described above may be stored in a computer readable storage medium. The software functional unit is stored in a storage medium, and includes several instructions for causing a computer device (which may be a personal computer, a server, or a network device, etc.) or a processor (processor) to perform part of the steps of the methods according to the embodiments of the present application. And the aforementioned storage medium includes: a U-disk, a removable hard disk, a Read-Only Memory (ROM), a random access Memory (Random Access Memory, RAM), a magnetic disk, or an optical disk, or other various media capable of storing program codes.

It will be apparent to those skilled in the art that, for convenience and brevity of description, only the above-described division of the functional modules is illustrated, and in practical application, the above-described functional allocation may be performed by different functional modules according to needs, i.e. the internal structure of the apparatus is divided into different functional modules to perform all or part of the functions described above. The specific working process of the above-described device may refer to the corresponding process in the foregoing method embodiment, which is not described herein again.

Finally, it should be noted that: the above embodiments are only for illustrating the technical solution of the present application, and not for limiting the same; although the application has been described in detail with reference to the foregoing embodiments, it will be understood by those of ordinary skill in the art that: the technical scheme described in the foregoing embodiments can be modified or some or all of the technical features thereof can be replaced by equivalents; such modifications and substitutions do not depart from the spirit of the application.

Claims

1. A method of data processing, comprising:

and writing the cache data into the target cache subtree.

2. The method of claim 1, wherein the disk request information includes a first logical block address, and wherein the determining the target storage block of the cache data based on the disk request information includes:

3. The method of claim 2, wherein the disk request information further includes a cache data length, and the determining the target storage block of the cache data according to the matching between the address of the first logical block and the address interval corresponding to each storage block includes:

4. The method of claim 3, wherein the determining whether the number of target memory blocks of the buffered data is 1 according to the first logical block address and the buffered data length comprises:

5. The method according to claim 4, wherein the method further comprises:

6. The method of claim 5, wherein the method further comprises:

7. The method of claim 6, wherein splitting the cache data into a number of consecutive sub-cache data comprises:

8. The method of claim 1, wherein prior to determining the target cache sub-tree corresponding to the target storage block based on the correspondence between each cache sub-tree and storage block, the method further comprises:

obtaining storage block division information;

9. The method of claim 8, wherein the predetermined cache subtree allocation criteria includes that a plurality of memory blocks corresponding to any one of the cache subtrees are not adjacent.

10. The method of claim 1, wherein the writing the cached data to the target cache subtree comprises:

and writing the cached data into the target child node.

11. The method of claim 10, wherein determining the target child node corresponding to the cached data in the target cache subtree according to the current composition information of the target cache subtree and the second logical block address corresponding to the cached data comprises:

12. The method of claim 10, wherein after writing the cached data to the target child node, the method further comprises:

checking whether the target cache subtree meets a preset subtree rule;

13. The method according to claim 1, wherein the method further comprises:

14. The method of claim 13, wherein traversing each of the cache sub-trees to determine the cache data to be flushed having the smallest second logical block address in each of the cache sub-trees comprises:

15. The method of claim 14, wherein the swiping down the cache data to be swished to the corresponding target disk in the order of the second logical block address from smaller to larger comprises:

16. The method of claim 15, wherein the flushing the data to be flushed corresponding to the global minimum second logical block address comprises:

17. The method of claim 16, wherein the method further comprises:

18. A data processing apparatus, comprising:

19. An electronic device, comprising: at least one processor and memory;

the memory stores computer-executable instructions;

the at least one processor executing computer-executable instructions stored in the memory causes the at least one processor to perform the method of any one of claims 1 to 17.

20. A computer readable storage medium having stored therein computer executable instructions which when executed by a processor implement the method of any one of claims 1 to 17.