WO2024027140A1 - 一种数据处理方法、装置、设备、系统及可读存储介质 - Google Patents

一种数据处理方法、装置、设备、系统及可读存储介质 Download PDF

Info

Publication number
WO2024027140A1
WO2024027140A1 PCT/CN2023/077992 CN2023077992W WO2024027140A1 WO 2024027140 A1 WO2024027140 A1 WO 2024027140A1 CN 2023077992 W CN2023077992 W CN 2023077992W WO 2024027140 A1 WO2024027140 A1 WO 2024027140A1
Authority
WO
WIPO (PCT)
Prior art keywords
data
cabinet
strips
data block
block
Prior art date
Application number
PCT/CN2023/077992
Other languages
English (en)
French (fr)
Inventor
吴睿振
王凛
陈静静
张永兴
张旭
王小伟
Original Assignee
苏州元脑智能科技有限公司
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by 苏州元脑智能科技有限公司 filed Critical 苏州元脑智能科技有限公司
Publication of WO2024027140A1 publication Critical patent/WO2024027140A1/zh

Links

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F3/00Input arrangements for transferring data to be processed into a form capable of being handled by the computer; Output arrangements for transferring data from processing unit to output unit, e.g. interface arrangements
    • G06F3/06Digital input from, or digital output to, record carriers, e.g. RAID, emulated record carriers or networked record carriers
    • G06F3/0601Interfaces specially adapted for storage systems
    • G06F3/0602Interfaces specially adapted for storage systems specifically adapted to achieve a particular effect
    • G06F3/061Improving I/O performance
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F12/00Accessing, addressing or allocating within memory systems or architectures
    • G06F12/02Addressing or allocation; Relocation
    • G06F12/08Addressing or allocation; Relocation in hierarchically structured memory systems, e.g. virtual memory systems
    • G06F12/0802Addressing of a memory level in which the access to the desired data or data block requires associative addressing means, e.g. caches
    • G06F12/0877Cache access modes
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F3/00Input arrangements for transferring data to be processed into a form capable of being handled by the computer; Output arrangements for transferring data from processing unit to output unit, e.g. interface arrangements
    • G06F3/06Digital input from, or digital output to, record carriers, e.g. RAID, emulated record carriers or networked record carriers
    • G06F3/0601Interfaces specially adapted for storage systems
    • G06F3/0628Interfaces specially adapted for storage systems making use of a particular technique
    • G06F3/0638Organizing or formatting or addressing of data
    • G06F3/064Management of blocks
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F3/00Input arrangements for transferring data to be processed into a form capable of being handled by the computer; Output arrangements for transferring data from processing unit to output unit, e.g. interface arrangements
    • G06F3/06Digital input from, or digital output to, record carriers, e.g. RAID, emulated record carriers or networked record carriers
    • G06F3/0601Interfaces specially adapted for storage systems
    • G06F3/0628Interfaces specially adapted for storage systems making use of a particular technique
    • G06F3/0638Organizing or formatting or addressing of data
    • G06F3/0643Management of files
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F3/00Input arrangements for transferring data to be processed into a form capable of being handled by the computer; Output arrangements for transferring data from processing unit to output unit, e.g. interface arrangements
    • G06F3/06Digital input from, or digital output to, record carriers, e.g. RAID, emulated record carriers or networked record carriers
    • G06F3/0601Interfaces specially adapted for storage systems
    • G06F3/0628Interfaces specially adapted for storage systems making use of a particular technique
    • G06F3/0655Vertical data movement, i.e. input-output transfer; data movement between one or more hosts and one or more storage devices
    • G06F3/0656Data buffering arrangements
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F3/00Input arrangements for transferring data to be processed into a form capable of being handled by the computer; Output arrangements for transferring data from processing unit to output unit, e.g. interface arrangements
    • G06F3/06Digital input from, or digital output to, record carriers, e.g. RAID, emulated record carriers or networked record carriers
    • G06F3/0601Interfaces specially adapted for storage systems
    • G06F3/0668Interfaces specially adapted for storage systems adopting a particular infrastructure
    • G06F3/0671In-line storage system
    • G06F3/0673Single storage device
    • G06F3/0674Disk device
    • G06F3/0676Magnetic disk device

Definitions

  • This application relates to the field of computer technology, and in particular to a data processing method, device, equipment, system and non-volatile readable storage medium.
  • data is read from disk to memory or from memory to disk on a stripe-by-stripe basis. That is: reading one strip of data from the disk to the memory at a time or writing one strip of data from the memory to the disk at a time.
  • a stripe includes multiple data blocks, so the data of a stripe is transferred between disk and memory with data blocks as the smallest data unit.
  • a stripe includes 4 data blocks: C1, C2, C3, C4, and the time required to transfer these 4 data blocks between disk and memory is: 2 time units, 3 time units, 1 time unit respectively. Time unit, 4 time units.
  • a strip is considered completed when all data blocks in a strip are transmitted. Therefore, if the above 4 data blocks are transmitted at the same time, it will take 4 time units to complete the transmission of the strip. .
  • the actual transmission time of a stripe depends on the data block with the longest transmission time. Therefore, a stripe needs to wait for the longest data block in the stripe during actual transmission, resulting in a longer transmission time of the stripe. , affecting the efficiency of read and write operations.
  • the purpose of this application is to provide a data processing method, device, equipment, system and non-volatile readable storage medium to reduce the waiting time during stripe processing.
  • the specific plan is as follows:
  • this application provides a data processing method, including:
  • the method before determining the N intermediate processing results corresponding to the N strips in the temporary file exchange area of the cabinet, the method further includes:
  • For each stripe group perform the steps of determining N intermediate processing results corresponding to N stripes in the temporary file exchange area of the cabinet.
  • N strips are divided into a stripe group to obtain multiple stripe groups, including:
  • N strips are divided into one strip group to obtain multiple strip groups.
  • the strip processing time of any strip is: the sum of the block processing time of all data blocks included in the strip.
  • operating on corresponding disks in the cabinet according to each data block group includes:
  • the data in the corresponding disk in the cabinet is cached to the temporary file swap area according to each data block group.
  • it also includes:
  • N intermediate processing results and newly cached data are processed in the temporary file exchange area to obtain new processing results.
  • the new processing results after obtaining the new processing results, it also includes:
  • the switching device is connected to each cabinet in the current storage node;
  • operating on corresponding disks in the cabinet according to each data block group includes:
  • the method before determining the N intermediate processing results corresponding to the N strips in the temporary file exchange area of the cabinet, the method further includes:
  • the block processing time of any data block is the transmission clock count corresponding to the disk to which the data block belongs.
  • the block processing time of any data block is: the processing time of the unit write operation of the disk to which the data block belongs.
  • the method before sorting each data block in the N stripes according to block processing duration, the method further includes:
  • this application provides a data processing device, including:
  • Determination module used to determine N intermediate processing results corresponding to N strips in the temporary file exchange area of the cabinet; 2 ⁇ N ⁇ preset threshold X;
  • the data block sorting module is used to sort each data block in N strips according to the block processing time to obtain the block sequence
  • the data block reorganization module is used to divide the block sequence into N data block groups with an equal number of data blocks;
  • the disk operation module is used to operate the corresponding disk in the cabinet according to each data block group.
  • it also includes:
  • the stripe group generation module is used to determine N intermediate processing results corresponding to N strips in the temporary file exchange area of the cabinet. Previously, for all stripes corresponding to the current operation in the cabinet, N strips were divided into one stripe group to obtain multiple stripe groups;
  • the execution module is used to execute the steps in the determination module, the data block sorting module, the data block reorganization module and the disk operation module respectively for each stripe group.
  • the stripe group generation module is specifically used to:
  • N strips are divided into one strip group to obtain multiple strip groups.
  • the strip processing time of any strip is: the sum of the block processing time of all data blocks included in the strip.
  • the disk operation module is specifically used to:
  • the data in the corresponding disk in the cabinet is cached to the temporary file swap area according to each data block group.
  • it also includes:
  • the data processing module is used to process N intermediate processing results and newly cached data in the temporary file exchange area to obtain new processing results.
  • the data processing module is also used to:
  • the disk operation module is specifically used to:
  • it also includes:
  • the receiving module is used to receive the N intermediate processing results sent by the switching device before determining the N intermediate processing results corresponding to the N strips in the temporary file exchange area of the cabinet; the switching device is connected to each cabinet in the current storage node.
  • the block processing time of any data block is the transmission clock count corresponding to the disk to which the data block belongs.
  • the block processing time of any data block is: the processing time of the unit write operation of the disk to which the data block belongs.
  • it also includes:
  • the filling module is used to make the number of data blocks in the N strips equal if the number of data blocks in the N strips is not equal, and then enter the data block sorting module.
  • this application provides an electronic device, including:
  • Memory used to store computer programs
  • a processor is used to execute a computer program to implement the aforementioned disclosed data processing method.
  • this application provides a data processing system, including: multiple storage nodes, each storage node including a plurality of the above electronic devices.
  • the present application provides a non-volatile readable storage medium for saving a computer program, wherein the computer program When the computer program is executed by the processor, the aforementioned disclosed data processing method is implemented.
  • this application provides a data processing method, including: determining N intermediate processing results corresponding to N strips in the temporary file exchange area of the cabinet; 2 ⁇ N ⁇ preset threshold X; Each data block in the strip is sorted according to the block processing time to obtain a block sequence; the block sequence is divided into N data block groups with an equal number of data blocks; the corresponding disk in the cabinet is operated according to each data block group.
  • this application can process N strips in the temporary file exchange area at the same time. Specifically, each data block in the N strips can be sorted according to the block processing time to obtain a block sequence, and then the block sequence can be divided into the number of data blocks. N equal data block groups, so that the data blocks with small block processing time are reorganized together, then the data blocks with long block processing time are reorganized together, then when the corresponding disk in the cabinet is operated according to each data block group , it can reduce the probability that data blocks in the same data block group wait for each other, thereby reducing the waiting time during stripe processing.
  • stripe 1 includes 4 data blocks: C1, C2, C3, and C4, and the processing times of these 4 data blocks are: 2 time units, 3 time units, 1 time unit, and 4 time units respectively.
  • C4 which takes the longest time, requires 4 time units. Therefore, according to the existing technology, it takes 4 time units to complete the processing of stripe 1.
  • another strip 2 includes 4 data blocks: C5, C6, C7, and C8.
  • the processing times of these 4 data blocks are: 2 time units, 1 time unit, 1 time unit, and 1 time unit.
  • C5 which takes the longest time, requires 2 time units. Therefore, according to the existing technology, it needs to wait for 2 time units to complete the processing of stripe 2.
  • the data blocks have no specific order, for example: C1 can be arranged before C5 or after C5). Accordingly, C4, C2, C1, and C5 are reorganized into a data block group. The longest in the data block group is The time-consuming data block is C4, and C4 requires 4 time units, so it takes 4 time units to process this data block group.
  • the data processing device, equipment, system and non-volatile readable storage medium provided by this application also have the above technical effects.
  • Figure 1 is a flow chart of a data processing method disclosed in this application.
  • Figure 2 is a schematic diagram of the connection of cabinets in a node disclosed in this application.
  • Figure 3 is a schematic diagram comparing a prior art disclosed in this application and this application;
  • Figure 4 is a schematic diagram comparing the experimental effects of a prior art disclosed in this application and this application;
  • Figure 5 is a schematic diagram comparing the experimental effects of another prior art disclosed in this application and this application;
  • Figure 6 is a schematic diagram of a data processing device disclosed in this application.
  • Figure 7 is a schematic diagram of an electronic device disclosed in this application.
  • this application provides a data processing solution that can reduce the probability that data blocks in the same data block group wait for each other, thereby reducing the waiting time during stripe processing.
  • some embodiments of the present application disclose a data processing method, which is applied to any cabinet in the storage node, including:
  • a storage node includes at least one cabinet, and one cabinet may include at least one temporary file swap area and a corresponding controller.
  • the controller is used to control the data reading and writing work of the local cabinet.
  • a temporary file swap area is a memory medium, such as DDR (Double Data Rate, double rate synchronous dynamic random access memory). Stripes can be understood with reference to the following example: Assume that there are 3 disks in a cabinet: disk 1, disk 2, and disk 3. Each disk includes 5 data blocks.
  • the data block 1 in disk 1 and the data block in disk 2 can be Data block 1 in disk 3 and data block 1 in disk 3 form a stripe; correspondingly, data block 2 in disk 1, data block 2 in disk 2 and data block 2 in disk 3 form another stripe; By analogy, 5 strips can be obtained. Of course, strips can also be formed across cabinets. It can be seen that the unit used to implement the verification service across different disks is called a stripe.
  • the temporary file exchange area allows N strips to be processed at the same time.
  • the more strips the temporary file exchange area allows to be processed at the same time the higher the concurrency. The larger it is, the more efficient the processing is.
  • the computer's data processing also needs to consider factors such as available memory space and protocol concurrency limits. Therefore, the number of strips that are allowed to be processed simultaneously in the temporary file exchange area needs to be comprehensively considered to set an appropriate value.
  • the N intermediate processing results can be data obtained by the current cabinet from other cabinets or devices, or data read by the current cabinet from its own disk. Therefore, in a specific implementation, before determining the N intermediate processing results corresponding to the N strips in the temporary file exchange area of the cabinet, it also includes: receiving the N intermediate processing results sent by the switching device; and the switching device and the current storage Each cabinet in the node is connected. It can be seen that each cabinet in a storage node is connected through a switching device, such as a switch.
  • connection relationship between devices in a storage node is shown in Figure 2.
  • a storage section Each cabinet in the point is connected through a switch.
  • a cabinet corresponds to 4 disks, 1 memory area and a controller.
  • stripe 1 includes 4 data blocks: C1, C2, C3, C4, and the processing time of these 4 data blocks are: 2 time units, 3 time units, 1 time unit, 4 time units.
  • another strip 2 includes 4 data blocks: C5, C6, C7, and C8.
  • the processing times of these 4 data blocks are: 2 time units, 1 time unit, 1 time unit, and 1 time unit respectively.
  • the first data block group [C4, C2, C1, C5] and the second data block group [C3, C6, C7, C8] can be obtained. It can be seen that the number of data blocks in these two data block groups is equal, and the number of data block groups is 2, which is equal to the number of stripes processed simultaneously in the temporary file exchange area. Therefore, the first data block group and the second data block group can be considered as: new stripes obtained by reassembling each data block in the two stripes. Of course, the first data block group and the second data block group are not stripes in the true sense.
  • the first data block group obtained by reassembling each data block in the two stripes requires 4 time units to be processed, while the second data block group requires 1 time unit to be processed. Therefore, in some embodiments, the strip processing time can be shortened and the read and write performance can be improved.
  • the corresponding disk in the cabinet is operated according to each data block group, that is, the corresponding data block in the disk is read or written according to each data block group.
  • the first data block group check the locations of C4, C2, C1, and C5 in each disk of the cabinet, and then read C4, C2, C1, and C5 in the disk to the temporary file exchange area.
  • N strips can be processed simultaneously in the temporary file exchange area.
  • each data block in the N strips can be sorted according to the block processing time to obtain a block sequence, and then the block sequence can be divided into data N data block groups with the same number of blocks, so that the data blocks with small block processing time are reorganized together, then the data blocks with long block processing time are reorganized together, then the corresponding disks in the cabinet are reorganized according to each data block group.
  • the N intermediate processing results corresponding to the N strips in the temporary file exchange area of the cabinet also includes: for all the strips corresponding to the current operation in the cabinet, the N The stripes are divided into one stripe group to obtain multiple stripe groups; for each stripe group, the N intermediate processing results corresponding to the N strips are determined in the temporary file exchange area of the cabinet; the N strips are Each data block in the band is sorted according to the block processing time to obtain a block sequence; the block sequence is divided into N data block groups with an equal number of data blocks; and the steps for operating the corresponding disk in the cabinet are performed according to each data block group. Until all intermediate processing results corresponding to the current operation in the cabinet are processed.
  • the temporary file exchange area stores the intermediate processing results of all strips corresponding to the current operation.
  • This application uses the value N to set: the temporary file exchange area processes N intermediate processing results corresponding to N strips at the same time. . Assume that all strips corresponding to the current operation have S, then S/N times need to be processed, that is, there are N strip groups.
  • dividing N strips into a stripe group to obtain multiple stripe groups includes: dividing all the strips to be processed by the current operation according to the stripe group. Sort by processing time to obtain a strip sequence; in the strip sequence, divide N strips into a strip group to obtain multiple strip groups. It should be noted that all the strips to be processed in the current operation are sorted according to the strip processing time to obtain the strip sequence, and the strip group is intercepted from the strip sequence, then each strip arranged in front of the strip sequence is relative to the strip sequence. The processing time of each strip after the strip sequence is relatively small.
  • the strip group arranged in front of the strip sequence is processed according to this application first, so that the strips with shorter processing time can be processed first.
  • This application is executed in strip group [E, F], so that strips A and B that take less time can be processed first. That is to say: select a stripe group from each stripe group according to the order of each stripe group in the stripe sequence, and then execute this application until the processing of each stripe group is completed.
  • this application performs read or write operations on the corresponding data blocks in the disk according to each data block group. Therefore, operations on the corresponding disks in the cabinet are performed according to each data block group, including: for each stripe group, the data in the corresponding disk in the cabinet is cached to the temporary file exchange area according to each data block group. That is: read the corresponding data blocks from the disk to the memory according to each data block group.
  • the disk data required for the current user operation has been stored in the temporary file swap area, so all N intermediate processing results and newly cached data can be processed in the temporary file swap area. , to obtain new processing results.
  • the new processing results obtained by processing all N intermediate processing results and newly cached data can be placed in the local cabinet or sent to other cabinets for further processing. Therefore, in a specific implementation, after obtaining the new processing result, it also includes: sending the new processing result to other cabinets in the current storage node through the switching device; connecting the switching device to each cabinet in the current storage node; or connecting the new processing result to the other cabinets in the current storage node.
  • the processing results are written to the corresponding disks in the enclosure.
  • the new processing results obtained by processing all N intermediate processing results and the newly cached data are placed in the local cabinet, they are still processed according to the solution provided by this application. That is: determine all the strips corresponding to the new processing results in the current cabinet, divide the N strips into a strip group, and obtain multiple strip groups; for each strip group, execute the temporary files in the cabinet separately Determine N intermediate processing results corresponding to N strips in the exchange area; sort each data block in the N strips according to the block processing time to obtain a block sequence; divide the block sequence into N equal numbers of data blocks Data block group; follow the steps of operating the corresponding disk in the cabinet according to each data block group until all new processing results corresponding to the new processing results in the current cabinet are processed.
  • the data from the corresponding disk in the cabinet is cached according to each data block group.
  • the N intermediate processing results corresponding to the stripe group and the newly cached data can be processed in the temporary file exchange area to obtain the processing results.
  • the processing results can be placed in the local cabinet or sent to other cabinets for further processing. Therefore, after obtaining the processing result, the processing result can also be sent to other cabinets in the current storage node through the switching device; or the processing result can be written to the corresponding disk in the current cabinet.
  • the operation is performed on the corresponding disk in the cabinet according to each data block group, including: writing N intermediate data blocks according to each data block group.
  • the processing results are written to the corresponding disks in the enclosure. That is, the N intermediate processing results in the temporary file exchange area are used to modify the corresponding data blocks in the disk.
  • the block processing time of any data block is calculated by the transmission clock count corresponding to the disk to which the data block belongs.
  • the controller of a cabinet is equipped with a clock counter that records the estimated idle time of each disk in the cabinet. For example: a task is being executed on disk 1 in a certain cabinet, and it is expected that the task will take 10 seconds to complete. Therefore, the clock counter of the cabinet controller records the transmission clock count of disk 1 as: 10 seconds.
  • the block processing time of any data block is: the processing time of the unit write operation of the disk to which the data block belongs.
  • the unit write operation is: the time it takes to perform a write operation.
  • the block processing time can be the time required for the disk to perform a write operation, or the time required for the disk to perform a read operation, depending on the current operation. That is: if the current operation is to read data from the disk, then the block processing time is the time required for the disk to perform a read operation. And if the current operation is to write data to the disk, then the block processing time is the time required for the disk to perform a write operation.
  • the block processing time can also be determined using other means. For example: the average time it takes for a disk to perform multiple write operations or read operations.
  • the data blocks in the N strips before sorting the data blocks in the N strips according to the block processing time, it also includes: if the number of data blocks in the N strips is not equal, then sorting the N strips After the number of data blocks in is equal, the data blocks in the N strips are sorted according to the block processing time to obtain the block sequence; the block sequence is divided into N data block groups with an equal number of data blocks; according to each The steps for block groups to operate on the corresponding disks in the enclosure. That is to say, if the number of data blocks in different stripes is not equal, make the number of data blocks in each stripe equal before performing subsequent steps. Among them, invalid data blocks can be filled into small stripes to make the number of data blocks in each stripe equal.
  • stripe 1 includes 2 data blocks
  • stripe 2 includes 2 data blocks
  • stripe 3 includes 3 data blocks
  • blocks, so that stripe 1 and stripe 2 also include 3 data blocks.
  • the invalid data block There is no data or meaningless data containing all 0s is stored.
  • N determines the number of stripes that can be processed in parallel at the same time in the temporary file exchange area.
  • N is replaced by bn with a value of 4. Then, after grouping the 16 strips based on the number of bn, first select the strip that takes the least time. The strip group executes the solution provided by this application until each strip group is processed.
  • a stripe consists of 32 data blocks.
  • the strip time consumption of 16 strips is first calculated, and then the 16 strips are calculated based on the calculation results.
  • the strips are sorted, and then every 4 strips are selected from small to large as a strip group.
  • the blocks are sorted based on the time taken, and then each strip is selected from small to large.
  • 32 blocks are selected as a group for processing until all blocks within the stripe group are processed. Among them, when every 32 blocks are selected as a group, the original positions of these blocks need to be recorded accordingly so that the corresponding blocks can be read or written on the corresponding disk.
  • the counter time in the cabinet controller is used as the block elapsed time.
  • this application can reduce the probability that data blocks in the same data block group wait for each other, thereby reducing the waiting time during stripe processing. Transmission speed can be improved in any distributed storage scenario.
  • a data processing device including:
  • the determination module 601 is used to determine N intermediate processing results corresponding to N strips in the temporary file exchange area of the cabinet; 2 ⁇ N ⁇ preset threshold X;
  • the data block sorting module 602 is used to sort each data block in the N strips according to the block processing time to obtain a block sequence
  • the data block reorganization module 603 is used to divide the block sequence into N data block groups with an equal number of data blocks;
  • the disk operation module 604 is used to operate the corresponding disk in the cabinet according to each data block group.
  • the stripe group generation module is used to divide the N strips into one for all stripes corresponding to the current operation in the cabinet before determining the N intermediate processing results corresponding to the N strips in the temporary file exchange area of the cabinet. Strip group, get multiple strip groups;
  • the execution module is used to execute the steps in the determination module, the data block sorting module, the data block reorganization module and the disk operation module respectively for each stripe group.
  • the stripe group generation module is specifically used to:
  • N strips are divided into one strip group to obtain multiple strip groups.
  • the strip processing time of any strip is: the block address of all data blocks included in the strip. The sum of processing time.
  • the disk operation module is specifically used to:
  • the data in the corresponding disk in the cabinet is cached to the temporary file swap area according to each data block group.
  • the data processing module is used to process N intermediate processing results and newly cached data in the temporary file exchange area to obtain new processing results.
  • the data processing module is also used to:
  • the disk operation module is specifically used to:
  • the receiving module is used to receive the N intermediate processing results sent by the switching device before determining the N intermediate processing results corresponding to the N strips in the temporary file exchange area of the cabinet; the switching device is connected to each cabinet in the current storage node.
  • the block processing time of any data block is calculated by the transmission clock count corresponding to the disk to which the data block belongs.
  • the block processing time of any data block is: the processing time of the unit write operation of the disk to which the data block belongs.
  • it also includes:
  • the filling module is used to make the number of data blocks in the N strips equal if the number of data blocks in the N strips is not equal, and then enter the data block sorting module.
  • this embodiment provides a data processing device that can reduce the probability that data blocks in the same data block group wait for each other, thereby reducing the waiting time during strip processing.
  • An electronic device provided by some embodiments of the present application is introduced below.
  • An electronic device described below and a data processing method and device described above may be referred to each other.
  • an electronic device including:
  • Memory 701 used to store computer programs
  • the processor 702 is used to execute computer programs to implement the methods disclosed in any of the above embodiments.
  • the processor in the electronic device executes the computer program
  • the following steps can be implemented: determine N intermediate processing results corresponding to N strips in the temporary file exchange area of the cabinet; 2 ⁇ N ⁇ Preset threshold The corresponding disk in the operation.
  • the processor in the electronic device executes the computer program
  • the following steps can be implemented: sort all the strips to be processed in the current operation according to the strip processing time to obtain the strip sequence; in the strip In the sequence, N strips are divided into one strip group to obtain multiple strip groups.
  • the processor in the electronic device executes the computer program, the following steps can be implemented: after obtaining the new processing result, the new processing result is sent to other cabinets in the current storage node through the switching device; the switching device Connect to each cabinet in the current storage node; or write new processing results to the corresponding disk in the cabinet.
  • the processor in the electronic device executes the computer program, the following steps can be implemented: before determining the N intermediate processing results corresponding to the N strips in the temporary file exchange area of the cabinet, receive the exchange device N intermediate processing results are sent; the switching device is connected to each cabinet in the current storage node.
  • some embodiments of the present application provide an electronic device that can reduce the probability that data blocks in the same data block group wait for each other, thereby reducing the waiting time during stripe processing.
  • the server may specifically include: at least one processor, at least one memory, a power supply, a communication interface, an input-output interface, and a communication bus.
  • the memory is used to store computer programs, and the computer programs are loaded and executed by the processor to implement the corresponding methods disclosed in any of the foregoing embodiments.
  • the power supply is used to provide operating voltage for each hardware device on the server; the communication interface can create a data transmission channel between the server and external devices, and the communication protocol it follows is any communication protocol that can be applied to the technical solution of this application. This does not specifically limit it; the input and output interface is used to obtain external input data or output data to the outside world. Its specific interface type can be selected according to specific application needs, and is not specifically limited here.
  • memory can be a read-only memory, random access memory, magnetic disk or optical disk, etc.
  • the resources stored on it include operating system, computer program and data, etc.
  • the storage method can be short-term storage or permanent storage.
  • the operating system is used to manage and control various hardware devices and computer programs on the server to implement the processor's calculation and processing of data in the memory. It can be Windows Server, Netware, Unix, Linux, etc.
  • the computer program may further include computer programs that can be used to complete other specific tasks.
  • the data can also include application developer information and other data.
  • the terminal may include but is not limited to smartphones, tablets, and laptops. Or desktop computer, etc.
  • the terminal in some embodiments of the present application includes: a processor and a memory.
  • the processor may include one or more processing cores, such as a 4-core processor, an 8-core processor, etc.
  • the processor can be implemented using at least one hardware form among DSP (Digital Signal Processing, digital signal processing), FPGA (Field-Programmable Gate Array, field programmable gate array), and PLA (Programmable Logic Array, programmable logic array).
  • the processor can also include a main processor and a co-processor.
  • the main processor is a processor used to process data in the wake-up state, also called CPU (Central Processing Unit, central processing unit); a co-processor is used A low-power processor used to process data in standby mode.
  • the processor may be integrated with a GPU (Graphics Processing Unit, image processor), and the GPU is responsible for rendering and drawing content that needs to be displayed on the display screen.
  • the processor may also include an AI (Artificial Intelligence, artificial intelligence) processor, which is used to process computing operations related to machine learning.
  • Memory may include one or more computer non-volatile readable storage media, which may be non-transitory.
  • Memory may also include high-speed random access memory, as well as non-volatile memory, such as one or more disk storage devices, flash memory storage devices.
  • the memory is at least used to store the following computer program. After the computer program is loaded and executed by the processor, the relevant steps in the method executed by the terminal side disclosed in any of the foregoing embodiments can be implemented.
  • the resources stored in the memory may also include operating systems and data, and the storage method may be short-term storage or permanent storage. Among them, the operating system can include Windows, Unix, Linux, etc. Data may include but is not limited to image data, model parameters, etc.
  • the terminal may also include a display screen, an input and output interface, a communication interface, a sensor, a power supply, and a communication bus.
  • Some embodiments of the present application disclose a data processing system, including: multiple storage nodes, each storage node including multiple electronic devices according to the above embodiments.
  • the data processing system may be a distributed storage system, and each electronic device is a cabinet in any storage node in the distributed storage system.
  • a cabinet includes multiple disks and a temporary file swap area (such as DDR).
  • DDR temporary file swap area
  • the total number of strips corresponding to the current operation is determined in the temporary file exchange area of the current cabinet, and then N strips are divided into a strip group to obtain multiple strip groups; for any strip group, in the current Determine the N intermediate processing results corresponding to the N strips in the temporary file exchange area of the cabinet; sort the data blocks in the N strips according to the block processing time to obtain the block sequence; divide the block sequence into the number of data blocks N equal data block groups; cache the data of N stripes to the temporary file exchange area according to each data block group, and then cache the above N intermediate processing results and the newly cached N stripes of data in the temporary file exchange area Perform processing, obtain the processing results, and then transmit the processing results to other cabinets through switching equipment. It can be seen that when working across cabinets, the waiting time for processing a strip is shortened.
  • some embodiments of the present application provide a data processing system in which each node can Reduce the wait time during strip processing.
  • non-volatile readable storage medium provided by some embodiments of the present application.
  • the non-volatile readable storage medium described below and the data processing method, device and equipment described above can interact with each other. Reference.
  • a non-volatile readable storage medium used to store a computer program, wherein when the computer program is executed by a processor, the data processing method disclosed in the foregoing embodiments is implemented.
  • the following steps can be implemented: sort all the strips to be processed in the current operation according to the strip processing time, and obtain the strips. Sequence; in the strip sequence, N strips are divided into a strip group to obtain multiple strip groups.
  • N intermediate processing results and newly cached data are processed in the temporary file exchange area to obtain New processing results.
  • the following steps can be implemented: after obtaining the new processing result, the new processing result is sent to the current storage node through the switching device. Other cabinets; switching devices are connected to each cabinet in the current storage node; or new processing results are written to the corresponding disks in the cabinet.
  • RAM random access memory
  • ROM read-only memory
  • electrically programmable ROM electrically erasable programmable ROM
  • registers hard disks, removable disks, CD-ROMs, or anywhere in the field of technology. Any other known form of non-volatile readable storage medium.

Landscapes

  • Engineering & Computer Science (AREA)
  • Theoretical Computer Science (AREA)
  • Physics & Mathematics (AREA)
  • General Engineering & Computer Science (AREA)
  • General Physics & Mathematics (AREA)
  • Human Computer Interaction (AREA)
  • Information Retrieval, Db Structures And Fs Structures Therefor (AREA)

Abstract

本申请公开了计算机技术领域内的一种数据处理方法、装置、设备、系统及非易失性可读存储介质。本申请能够在机柜的临时文件交换区同时处理N个条带,具体可以将N个条带中的各数据块按照块处理时长进行排序,得到块序列,然后将块序列划分为数据块个数相等的N个数据块组,以使块处理时长小的各数据块重组在一起,那么块处理时长大的各数据块重组在一起,那么按照各数据块组对机柜中的相应磁盘进行操作时,就可以降低同一数据块组中的各数据块相互等待的概率,从而降低条带处理时的等待时长。相应地,本申请提供的一种数据处理装置、设备、系统及非易失性可读存储介质,也同样具有上述技术效果。

Description

一种数据处理方法、装置、设备、系统及可读存储介质
相关申请的交叉引用
本申请要求于2022年08月02日提交中国专利局,申请号为202210919507.1,申请名称为“一种数据处理方法、装置、设备、系统及可读存储介质”的中国专利申请的优先权,其全部内容通过引用结合在本申请中。
技术领域
本申请涉及计算机技术领域,特别涉及一种数据处理方法、装置、设备、系统及非易失性可读存储介质。
背景技术
目前,将数据从磁盘读取至内存或从内存写入磁盘是一个条带一个条带进行的。即:每次从磁盘读取一个条带的数据至内存或每次将一个条带的数据从内存写入磁盘。一个条带中包括多个数据块,因此一个条带的数据在磁盘和内存之间的传输以数据块为最小数据单元。
假设一个条带包括4个数据块:C1、C2、C3、C4,且这4个数据块在磁盘和内存之间传输所需的时间分别为:2个时间单位、3个时间单位、1个时间单位、4个时间单位。而一般情况下,一个条带内的所有数据块都传输完成这一个条带就被认为传输完成,因此若同时开始传输上述4个数据块,需要等待4个时间单位才能完成该条带的传输。可见,一个条带的实际传输时间取决于其中传输时间最长的数据块,故一个条带在实际传输时需要等待该条带中耗时最长的数据块,导致条带的传输时间较长,影响读写操作效率。
发明内容
有鉴于此,本申请的目的在于提供一种数据处理方法、装置、设备、系统及非易失性可读存储介质,以降低条带处理时的等待时长。其具体方案如下:
第一方面,本申请提供了一种数据处理方法,包括:
在机柜的临时文件交换区中确定N个条带对应的N个中间处理结果;2≤N≤预设阈值X;
将N个条带中的各数据块按照块处理时长进行排序,得到块序列;
将块序列划分为数据块个数相等的N个数据块组;
按照各数据块组对机柜中的相应磁盘进行操作。
在一些实施例中,在机柜的临时文件交换区中确定N个条带对应的N个中间处理结果之前,还包括:
针对当前操作在机柜中对应的所有条带,将N个条带划分为一个条带组,得到多个条带组;
针对每个条带组,分别执行在机柜的临时文件交换区中确定N个条带对应的N个中间处理结果的步骤。
在一些实施例中,针对当前操作要处理的所有条带,将N个条带划分为一个条带组,得到多个条带组,包括:
将当前操作要处理的所有条带按照条带处理时长进行排序,得到条带序列;
在条带序列中,将N个条带划分为一个条带组,得到多个条带组。
在一些实施例中,任一条带的条带处理时长为:该条带包括的所有数据块的块处理时长之和。
在一些实施例中,按照各数据块组对机柜中的相应磁盘进行操作,包括:
按照各数据块组将机柜中的相应磁盘中的数据缓存至临时文件交换区。
在一些实施例中,还包括:
在临时文件交换区对N个中间处理结果和新缓存的数据进行处理,得到新处理结果。
在一些实施例中,得到新处理结果之后,还包括:
将新处理结果通过交换设备发送至当前存储节点中的其他机柜;交换设备与当前存储节点中的各机柜相连;
将新处理结果写入机柜中的相应磁盘。
在一些实施例中,按照各数据块组对机柜中的相应磁盘进行操作,包括:
按照各数据块组将N个中间处理结果写入机柜中的相应磁盘。
在一些实施例中,在机柜的临时文件交换区中确定N个条带对应的N个中间处理结果之前,还包括:
接收交换设备发送的N个中间处理结果;交换设备与当前存储节点中的各机柜相连。
在一些实施例中,若对磁盘进行读操作,则任一数据块的块处理时长取该数据块所属磁盘对应的传输时钟计数。
在一些实施例中,若对磁盘进行写操作,则任一数据块的块处理时长为:该数据块所属磁盘的单位写操作的处理时长。
在一些实施例中,将N个条带中的各数据块按照块处理时长进行排序之前,还包括:
若N个条带中的数据块个数不等,则使N个条带中的数据块个数相等后,执行将N个条带中的各数据块按照块处理时长进行排序,得到块序列;将块序列划分为数据块个数相等的N个数据块组;按照各数据块组对机柜中的相应磁盘进行操作的步骤。
第二方面,本申请提供了一种数据处理装置,包括:
确定模块,用于在机柜的临时文件交换区中确定N个条带对应的N个中间处理结果;2≤N≤预设阈值X;
数据块排序模块,用于将N个条带中的各数据块按照块处理时长进行排序,得到块序列;
数据块重组模块,用于将块序列划分为数据块个数相等的N个数据块组;
磁盘操作模块,用于按照各数据块组对机柜中的相应磁盘进行操作。
在一些实施例中,还包括:
条带组生成模块,用于在机柜的临时文件交换区中确定N个条带对应的N个中间处理结果 之前,针对当前操作在机柜中对应的所有条带,将N个条带划分为一个条带组,得到多个条带组;
执行模块,用于针对每个条带组,分别执行确定模块、数据块排序模块、数据块重组模块以及磁盘操作模块中的步骤。
在一些实施例中,条带组生成模块具体用于:
将当前操作要处理的所有条带按照条带处理时长进行排序,得到条带序列;
在条带序列中,将N个条带划分为一个条带组,得到多个条带组。
在一些实施例中,任一条带的条带处理时长为:该条带包括的所有数据块的块处理时长之和。
在一些实施例中,磁盘操作模块具体用于:
按照各数据块组将机柜中的相应磁盘中的数据缓存至临时文件交换区。
在一些实施例中,还包括:
数据处理模块,用于在临时文件交换区对N个中间处理结果和新缓存的数据进行处理,得到新处理结果。
在一些实施例中,数据处理模块还用于:
得到新处理结果之后,将新处理结果通过交换设备发送至当前存储节点中的其他机柜;交换设备与当前存储节点中的各机柜相连;或将新处理结果写入机柜中的相应磁盘。
在一些实施例中,磁盘操作模块具体用于:
按照各数据块组将N个中间处理结果写入机柜中的相应磁盘。
在一些实施例中,还包括:
接收模块,用于在机柜的临时文件交换区中确定N个条带对应的N个中间处理结果之前,接收交换设备发送的N个中间处理结果;交换设备与当前存储节点中的各机柜相连。
在一些实施例中,若对磁盘进行读操作,则任一数据块的块处理时长取该数据块所属磁盘对应的传输时钟计数。
在一些实施例中,若对磁盘进行写操作,则任一数据块的块处理时长为:该数据块所属磁盘的单位写操作的处理时长。
在一些实施例中,还包括:
填充模块,用于若N个条带中的数据块个数不等,则使N个条带中的数据块个数相等后,进入数据块排序模块。
第三方面,本申请提供了一种电子设备,包括:
存储器,用于存储计算机程序;
处理器,用于执行计算机程序,以实现前述公开的数据处理方法。
第四方面,本申请提供了一种数据处理系统,包括:多个存储节点,每个存储节点包括多个如上的电子设备。
第五方面,本申请提供了一种非易失性可读存储介质,用于保存计算机程序,其中,计 算机程序被处理器执行时实现前述公开的数据处理方法。
通过以上方案可知,本申请提供了一种数据处理方法,包括:在机柜的临时文件交换区中确定N个条带对应的N个中间处理结果;2≤N≤预设阈值X;将N个条带中的各数据块按照块处理时长进行排序,得到块序列;将块序列划分为数据块个数相等的N个数据块组;按照各数据块组对机柜中的相应磁盘进行操作。
可见,本申请能够在临时文件交换区同时处理N个条带,具体可以将N个条带中的各数据块按照块处理时长进行排序,得到块序列,然后将块序列划分为数据块个数相等的N个数据块组,以使块处理时长小的各数据块重组在一起,那么块处理时长大的各数据块重组在一起,那么按照各数据块组对机柜中的相应磁盘进行操作时,就可以降低同一数据块组中的各数据块相互等待的概率,从而降低条带处理时的等待时长。
下面举例说明本申请的技术效果。假设条带1包括4个数据块:C1、C2、C3、C4,且这4个数据块的处理时间分别为:2个时间单位、3个时间单位、1个时间单位、4个时间单位,其中耗时最长的C4需要4个时间单位,故按照现有技术需要等待4个时间单位才能完成条带1的处理。假设另一条带2包括4个数据块:C5、C6、C7、C8,这4个数据块的处理时间分别为:2个时间单位、1个时间单位、1个时间单位、1个时间单位,其中耗时最长的C5需要2个时间单位,故按照现有技术需要等待2个时间单位才能完成条带2的处理。那么条带1和条带2的总处理时间为4+2=6个时间单位。若按照本申请在临时文件交换区同时处理条带1和条带2,那么排列C1-C8可得到块序列[C4、C2、C1、C5、C3、C6、C7、C8](耗时时间相等的数据块无特定先后顺序,如:C1可排列在C5之前,也可排列在C5之后),据此将C4、C2、C1、C5重组为一个数据块组,该数据块组中的最长耗时数据块为C4,C4需要4个时间单位,故处理该数据块组需要4个时间单位。而将C3、C6、C7、C8重组为另一个数据块组,该数据块组中的各数据块的耗时时间均为1个时间单位,故处理该数据块组需要1个时间单位。那么处理这两个数据块组总共需要4+1=5个时间单位,比现有技术的6个时间单位小1个时间单位。可见,本申请能够降低条带处理时的等待时长,从而提升读写操作效率。
相应地,本申请提供的一种数据处理装置、设备、系统及非易失性可读存储介质,也同样具有上述技术效果。
附图说明
为了更清楚地说明本申请一些实施例或现有技术中的技术方案,下面将对实施例或现有技术描述中所需要使用的附图作简单地介绍,显而易见地,下面描述中的附图仅仅是本申请的实施例,对于本领域普通技术人员来讲,在不付出创造性劳动的前提下,还可以根据提供的附图获得其他的附图。
图1为本申请公开的一种数据处理方法流程图;
图2为本申请公开的一种节点中各机柜的连接示意图;
图3为本申请公开的一种现有技术和本申请的对比示意图;
图4为本申请公开的一种现有技术和本申请的实验效果对比示意图;
图5为本申请公开的另一种现有技术和本申请的实验效果对比示意图;
图6为本申请公开的一种数据处理装置示意图;
图7为本申请公开的一种电子设备示意图。
具体实施方式
下面将结合本申请一些实施例中的附图,对本申请一些实施例中的技术方案进行清楚、完整地描述,显然,所描述的实施例仅仅是本申请一部分实施例,而不是全部的实施例。基于本申请中的实施例,本领域普通技术人员在没有做出创造性劳动前提下所获得的所有其他实施例,都属于本申请保护的范围。
目前,一个条带在实际传输时需要等待该条带中耗时最长的数据块,导致条带的传输时间较长,影响读写操作效率。为此,本申请提供了一种数据处理方案,能够降低同一数据块组中的各数据块相互等待的概率,从而降低条带处理时的等待时长。
参见图1所示,本申请一些实施例公开了一种数据处理方法,应用于存储节点中的任一机柜,包括:
S101、在机柜的临时文件交换区中确定N个条带对应的N个中间处理结果。
在一些实施例中,一个存储节点包括至少一个机柜,一个机柜可以包括至少一个临时文件交换区以及相应控制器。控制器用于控制本地机柜的数据读写工作。一个临时文件交换区即一个内存介质,如DDR(Double Data Rate,双倍速率同步动态随机存储器)。条带可以参照如下示例进行理解:假设一个机柜中有3个磁盘:磁盘1、磁盘2、磁盘3,每个磁盘包括5个数据块,那么可以将磁盘1中的数据块1、磁盘2中的数据块1、磁盘3中的数据块1组成一个条带;相应地,磁盘1中的数据块2、磁盘2中的数据块2、磁盘3中的数据块2组成另一个条带;据此类推,可得到5个条带。当然,还可以跨机柜组成条带。可见,横跨不同磁盘的用于实现校验服务的单位被称作条带。
由于机柜的临时文件交换区中有N个条带对应的N个中间处理结果,因此该临时文件交换区允许同时处理N个条带,临时文件交换区允许同时处理的条带越多,并发量就越大,处理效率就越高。当然,计算机对于数据的处理还需要考虑内存可用空间、协议的并发限制等因素,因此临时文件交换区允许同时处理的条带数量需要综合考虑各种因素,从而设定一个合适的取值。在一种示例中,2≤N≤预设阈值X,预设阈值X即为:综合考虑内存最大可用空间、协议并发限制等因素后可取的最大条带数。当N=1时,本申请与现有技术可实现一样的效果。N取自然数,即N=1、2、3、4、5……X,当N≥2时,方案效率及性能优于现有技术。
需要说明的是,N个中间处理结果可以是当前机柜从其他机柜或设备获取的数据,也可以是当前机柜从自身磁盘中读取的数据。因此在一种具体实施方式中,在机柜的临时文件交换区中确定N个条带对应的N个中间处理结果之前,还包括:接收交换设备发送的N个中间处理结果;交换设备与当前存储节点中的各机柜相连。可见,一个存储节点中的各机柜通过交换设备连接,交换设备如交换机。
在一种示例中,一个存储节点中各设备的连接关系请参见图2。如图2所示,一个存储节 点中的各机柜通过交换机连接。一个机柜对应有4个磁盘、1个内存区和一个控制器。
S102、将N个条带中的各数据块按照块处理时长进行排序,得到块序列。
S103、将块序列划分为数据块个数相等的N个数据块组。
请参照图3,假设条带1包括4个数据块:C1、C2、C3、C4,且这4个数据块的处理时间分别为:2个时间单位、3个时间单位、1个时间单位、4个时间单位。假设另一条带2包括4个数据块:C5、C6、C7、C8,这4个数据块的处理时间分别为:2个时间单位、1个时间单位、1个时间单位、1个时间单位。那么按照现有技术,条带1和条带2的总处理时间为4+2=6个时间单位。若按照本申请一些实施例,在临时文件交换区同时处理条带1和条带2,那么排列C1-C8可得到块序列[C4、C2、C1、C5、C3、C6、C7、C8],据此可得到第一数据块组[C4、C2、C1、C5],以及第二数据块组[C3、C6、C7、C8]。可见,这两个数据块组中的数据块个数相等,且数据块组的数量为2,与临时文件交换区同时处理的条带个数相等。因此第一数据块组和第二数据块组可以被认为是:对2个条带中的各数据块重组得到的新条带。当然,第一数据块组和第二数据块组并不是真正意义上的条带。
S104、按照各数据块组对机柜中的相应磁盘进行操作。
请参见图3,对2个条带中的各数据块重组得到的第一数据块组需要4个时间单位即可处理完成,而第二数据块组需要1个时间单位即可处理完成。因此在一些实施例中能够缩短条带处理时长,提升读写性能。
需要说明的是,按照各数据块组对机柜中的相应磁盘进行操作即:按照每个数据块组对磁盘中的相应数据块进行读或写操作。例如:按照第一数据块组中,在机柜的各磁盘中查C4、C2、C1、C5所在位置,然后将磁盘中的C4、C2、C1、C5读取至临时文件交换区。又如:按照第一数据块组中,在机柜的各磁盘中查C4、C2、C1、C5所在位置,然后将临时文件交换区中与C4、C2、C1、C5对应的中间处理结果写入磁盘中C4、C2、C1、C5所在位置。
可见,在一些实施例中能够在临时文件交换区同时处理N个条带,具体可以将N个条带中的各数据块按照块处理时长进行排序,得到块序列,然后将块序列划分为数据块个数相等的N个数据块组,以使块处理时长小的各数据块重组在一起,那么块处理时长大的各数据块重组在一起,那么按照各数据块组对机柜中的相应磁盘进行操作时,就可以降低同一数据块组中的各数据块相互等待的概率,从而降低条带处理时的等待时长。
基于上述实施例,需要说明的是,在机柜的临时文件交换区中确定N个条带对应的N个中间处理结果之前,还包括:针对当前操作在机柜中对应的所有条带,将N个条带划分为一个条带组,得到多个条带组;针对每个条带组,分别执行在机柜的临时文件交换区中确定N个条带对应的N个中间处理结果;将N个条带中的各数据块按照块处理时长进行排序,得到块序列;将块序列划分为数据块个数相等的N个数据块组;按照各数据块组对机柜中的相应磁盘进行操作的步骤,直至当前操作在机柜中对应的所有中间处理结果都被处理。可见,临时文件交换区中存有当前操作对应的所有条带的中间处理结果,本申请以N这一数值设定:临时文件交换区每次同时处理N个条带对应的N个中间处理结果。假设当前操作对应的所有条带有 S个,那么需要处理S/N次,即条带组有N个。
在一种具体实施方式中,针对当前操作要处理的所有条带,将N个条带划分为一个条带组,得到多个条带组,包括:将当前操作要处理的所有条带按照条带处理时长进行排序,得到条带序列;在条带序列中,将N个条带划分为一个条带组,得到多个条带组。需要说明的是,将当前操作要处理的所有条带按照条带处理时长排序得到条带序列,在条带序列中截取条带组,那么排列在条带序列前面的各条带相对于排列在条带序列后面的各条带的处理耗时时间相对较少,因此先对排列在条带序列前面的条带组按照本申请进行处理,可使处理耗时时间较短的条带先被处理。例如:条带序列为[A、B、C、D、E、F],假设N=2,那么划分条带序列可得到:第一条带组[A、B],第二条带组[C、D],第三条带组[E、F];那么先针对第一条带组[A、B],执行本申请,再针对第二条带组[A、B]、第三条带组[E、F]执行本申请,如此耗时时间较短的条带A、B可以先被处理。也即:按照各条带组在条带序列中的先后顺序在各条带组中选择一个条带组,再执行本申请,直至针对各条带组都完成处理。
在一种具体实施方式中,任一条带的条带处理时长为:该条带包括的所有数据块的块处理时长之和。假设条带1包括4个数据块:C1、C2、C3、C4,且这4个数据块的处理时间分别为:2个时间单位、3个时间单位、1个时间单位、4个时间单位,那么条带1的条带处理时长为:2+3+1+4=10个时间单位。
基于上述实施例,需要说明的是,本申请按照每个数据块组对磁盘中的相应数据块进行读或写操作。因此按照各数据块组对机柜中的相应磁盘进行操作,包括:针对每个条带组,按照各数据块组将机柜中的相应磁盘中的数据缓存至临时文件交换区。也即:按照每个数据块组将磁盘中的相应数据块读取至内存。
针对每个条带组完成数据缓存后,当前用户操作所需的磁盘数据均已存储在临时文件交换区中,因此可以在临时文件交换区对所有N个中间处理结果和新缓存的数据进行处理,以得到新处理结果。
其中,处理所有N个中间处理结果和新缓存的数据得到的新处理结果可以在本机柜中落盘,也可以发送至其他机柜进一步处理。因此在一种具体实施方式中,得到新处理结果之后,还包括:将新处理结果通过交换设备发送至当前存储节点中的其他机柜;交换设备与当前存储节点中的各机柜相连;或将新处理结果写入机柜中的相应磁盘。
其中,处理所有N个中间处理结果和新缓存的数据得到的新处理结果在本机柜中落盘时,仍按照本申请提供的方案进行。也即:确定新处理结果在当前机柜中对应的所有条带,将N个条带划分为一个条带组,得到多个条带组;针对每个条带组,分别执行在机柜的临时文件交换区中确定N个条带对应的N个中间处理结果;将N个条带中的各数据块按照块处理时长进行排序,得到块序列;将块序列划分为数据块个数相等的N个数据块组;按照各数据块组对机柜中的相应磁盘进行操作的步骤,直至新处理结果在当前机柜中对应的所有新处理结果都被处理。
在一种示例中,针对任一个条带组,按照各数据块组将机柜中的相应磁盘中的数据缓存 至临时文件交换区后,可以在临时文件交换区对该条带组对应的N个中间处理结果和新缓存的数据进行处理,以得到处理结果。其中,此处理结果可以在本机柜中落盘,也可以发送至其他机柜进一步处理。因此得到该处理结果之后,还可以将该处理结果通过交换设备发送至当前存储节点中的其他机柜;或将该处理结果写入当前机柜中的相应磁盘。
其中,将该处理结果写入当前机柜中的相应磁盘时,仍按照本申请提供的方案进行。也即:在机柜的临时文件交换区中确定该处理结果对应的N个条带;将N个条带中的各数据块按照块处理时长进行排序,得到块序列;将块序列划分为数据块个数相等的N个数据块组;按照各数据块组将当前处理结果写入当前机柜中的相应磁盘。
可见,按照本申请,可以等待一个操作的所有条带都被处理完后,再执行下一个操作的处理,如此则需要重复进行条带分组步骤。也可以在处理完成N个中间结果后,直接对该N个中间结果的处理结果进行下一个操作的处理,如此则无需重复进行条带分组。
在一种示例中,如果按照每个数据块组对磁盘中的相应数据块进行写操作,那么按照各数据块组对机柜中的相应磁盘进行操作,包括:按照各数据块组将N个中间处理结果写入机柜中的相应磁盘。也即:临时文件交换区中的N个中间处理结果是用于修改磁盘中的相应数据块的。
在一种具体实施方式中,若对磁盘进行读操作,则任一数据块的块处理时长取该数据块所属磁盘对应的传输时钟计数。一个机柜的控制器处设有一个时钟计数器,该时钟计数器记录本机柜内各磁盘的预计空闲时间。如:某一机柜中的磁盘1上有任务正在被执行,且预计该任务还需10秒才能完成,因此该机柜控制器的时钟计数器记录磁盘1的传输时钟计数为:10秒。
在一种具体实施方式中,若对磁盘进行写操作,则任一数据块的块处理时长为:该数据块所属磁盘的单位写操作的处理时长。单位写操作即:执行一次写操作需要花费的时长。
可见,块处理时长可以是磁盘执行一个写操作所需的时长,也可以是磁盘执行一个读操作所需的时长,具体跟随当前操作而定。即:如果当前操作是从磁盘里读出数据,那么块处理时长是磁盘执行一个读操作所需的时长。而如果当前操作是往磁盘里写数据,那么块处理时长是磁盘执行一个写操作所需的时长。当然,块处理时长也可以利用其他手段进行确定。例如:磁盘执行多个写操作或读操作所需时长的均值。
基于上述实施例,需要说明的是将N个条带中的各数据块按照块处理时长进行排序之前,还包括:若N个条带中的数据块个数不等,则使N个条带中的数据块个数相等后,执行将N个条带中的各数据块按照块处理时长进行排序,得到块序列;将块序列划分为数据块个数相等的N个数据块组;按照各数据块组对机柜中的相应磁盘进行操作的步骤。也就是说,如果不同条带中的数据块个数不等,则先使各个条带中的数据块个数相等后,再执行后续步骤。其中,可以往小条带中填充无效数据块来使各个条带中的数据块个数相等。例如:当N=3时,条带1包括2个数据块,条带2包括2个数据块,条带3包括3个数据块,那么往条带1和条带2中各填充一个无效数据块,使得条带1和条带2也包括3个数据块。其中,无效数据块中 无数据或存储有全0的无意义数据。
基于上述实施例,需要说明的是,N的取值决定了临时文件交换区可同时并行处理的条带个数。
在一种示例中,假设某一操作需要处理16个条带,且N用bn代替,取值为4,那么对16个条带基于bn的数量进行分组后,先选择最小耗时时间的条带组执行本申请提供的方案,直至各个条带组都被处理。一个条带包括32个数据块。
具体的伪代码如下所示:
如上述代码示意,首先对16个条带的条带耗时时间进行计算,然后根据计算结果对16个 条带进行排序,然后自小向大每4个条带选为一个条带组,针对每个条带组中的32×4个块,基于块耗时时间进行排序,然后自小向大每32个块选为一个组进行处理,直至该条带组内的所有块都被处理。其中,每32个块选为一个组时,需要相应记录这些块的原位置,以便在相应磁盘上读或写相应块。上述代码中采用机柜控制器中的计数器时间作为块耗时时间。
下述对比本申请和现有技术的实验效果。假设耗时时间一般为1个时间单位,在实际情况影响下,耗时时间可能变化为2-5个时间单位。假设实际情况下1-5个时间单位出现的几率分别为:50%,30%,10%,7%,3%。
假设N取值2或4,那么在总条带量分别为4、8、16时,仿真10000次取条带处理完的平均耗时,相应的方案效果对比如图4所示。如图4所示,现有技术在任意情况下处理完成的时间都比较长,而按照本申请,N取值越大时,处理完成时间越短。相同条件下,假设1-5个时间单位出现的几率均为20%,那么相应的方案效果对比如图5所示。可见,图5和图4反映了相同效果。
通过图4和图5的比较可知,本申请对于不同条带总量下都有改进效果,其改进效果随着N取值的增加而增加。
可见,本申请可以降低同一数据块组中的各数据块相互等待的概率,从而降低条带处理时的等待时长。在任何分布式存储场景下都可以提升传输速度。
下面对本申请一些实施例提供的一种数据处理装置进行介绍,下文描述的一种数据处理装置与上文描述的一种数据处理方法可以相互参照。
参见图6所示,本申请一些实施例公开了一种数据处理装置,包括:
确定模块601,用于在机柜的临时文件交换区中确定N个条带对应的N个中间处理结果;2≤N≤预设阈值X;
数据块排序模块602,用于将N个条带中的各数据块按照块处理时长进行排序,得到块序列;
数据块重组模块603,用于将块序列划分为数据块个数相等的N个数据块组;
磁盘操作模块604,用于按照各数据块组对机柜中的相应磁盘进行操作。
在一种具体实施方式中,还包括:
条带组生成模块,用于在机柜的临时文件交换区中确定N个条带对应的N个中间处理结果之前,针对当前操作在机柜中对应的所有条带,将N个条带划分为一个条带组,得到多个条带组;
执行模块,用于针对每个条带组,分别执行确定模块、数据块排序模块、数据块重组模块以及磁盘操作模块中的步骤。
在一种具体实施方式中,条带组生成模块具体用于:
将当前操作要处理的所有条带按照条带处理时长进行排序,得到条带序列;
在条带序列中,将N个条带划分为一个条带组,得到多个条带组。
在一种具体实施方式中,任一条带的条带处理时长为:该条带包括的所有数据块的块处 理时长之和。
在一种具体实施方式中,磁盘操作模块具体用于:
按照各数据块组将机柜中的相应磁盘中的数据缓存至临时文件交换区。
在一种具体实施方式中,还包括:
数据处理模块,用于在临时文件交换区对N个中间处理结果和新缓存的数据进行处理,得到新处理结果。
在一种具体实施方式中,数据处理模块还用于:
得到新处理结果之后,将新处理结果通过交换设备发送至当前存储节点中的其他机柜;交换设备与当前存储节点中的各机柜相连;或将新处理结果写入机柜中的相应磁盘。
在一种具体实施方式中,磁盘操作模块具体用于:
按照各数据块组将N个中间处理结果写入机柜中的相应磁盘。
在一种具体实施方式中,还包括:
接收模块,用于在机柜的临时文件交换区中确定N个条带对应的N个中间处理结果之前,接收交换设备发送的N个中间处理结果;交换设备与当前存储节点中的各机柜相连。
在一种具体实施方式中,若对磁盘进行读操作,则任一数据块的块处理时长取该数据块所属磁盘对应的传输时钟计数。
在一种具体实施方式中,若对磁盘进行写操作,则任一数据块的块处理时长为:该数据块所属磁盘的单位写操作的处理时长。
在一些实施例中,还包括:
填充模块,用于若N个条带中的数据块个数不等,则使N个条带中的数据块个数相等后,进入数据块排序模块。
其中,关于本实施例中各个模块、单元更加具体的工作过程可以参考前述实施例中公开的相应内容,在此不再进行赘述。
可见,本实施例提供了一种数据处理装置,能够降低同一数据块组中的各数据块相互等待的概率,从而降低条带处理时的等待时长。
下面对本申请一些实施例提供的一种电子设备进行介绍,下文描述的一种电子设备与上文描述的一种数据处理方法及装置可以相互参照。
参见图7所示,本申请一些实施例公开了一种电子设备,包括:
存储器701,用于保存计算机程序;
处理器702,用于执行计算机程序,以实现上述任意实施例公开的方法。
在一种具体实施方式中,电子设备中的处理器在执行计算机程序时,可以实现如下步骤:在机柜的临时文件交换区中确定N个条带对应的N个中间处理结果;2≤N≤预设阈值X;将N个条带中的各数据块按照块处理时长进行排序,得到块序列;将块序列划分为数据块个数相等的N个数据块组;按照各数据块组对机柜中的相应磁盘进行操作。
在一种具体实施方式中,电子设备中的处理器在执行计算机程序时,可以实现如下步 骤:在机柜的临时文件交换区中确定N个条带对应的N个中间处理结果之前,针对当前操作在机柜中对应的所有条带,将N个条带划分为一个条带组,得到多个条带组;针对每个条带组,分别执行在机柜的临时文件交换区中确定N个条带对应的N个中间处理结果的步骤。
在一种具体实施方式中,电子设备中的处理器在执行计算机程序时,可以实现如下步骤:将当前操作要处理的所有条带按照条带处理时长进行排序,得到条带序列;在条带序列中,将N个条带划分为一个条带组,得到多个条带组。
在一种具体实施方式中,电子设备中的处理器在执行计算机程序时,可以实现如下步骤:按照各数据块组将机柜中的相应磁盘中的数据缓存至临时文件交换区。
在一种具体实施方式中,电子设备中的处理器在执行计算机程序时,可以实现如下步骤:在临时文件交换区对N个中间处理结果和新缓存的数据进行处理,得到新处理结果。
在一种具体实施方式中,电子设备中的处理器在执行计算机程序时,可以实现如下步骤:得到新处理结果之后,将新处理结果通过交换设备发送至当前存储节点中的其他机柜;交换设备与当前存储节点中的各机柜相连;或将新处理结果写入机柜中的相应磁盘。
在一种具体实施方式中,电子设备中的处理器在执行计算机程序时,可以实现如下步骤:按照各数据块组将N个中间处理结果写入机柜中的相应磁盘。
在一种具体实施方式中,电子设备中的处理器在执行计算机程序时,可以实现如下步骤:在机柜的临时文件交换区中确定N个条带对应的N个中间处理结果之前,接收交换设备发送的N个中间处理结果;交换设备与当前存储节点中的各机柜相连。
可见,本申请一些实施例提供了一种电子设备,能够降低同一数据块组中的各数据块相互等待的概率,从而降低条带处理时的等待时长。
当电子设备是服务器时,该服务器具体可以包括:至少一个处理器、至少一个存储器、电源、通信接口、输入输出接口和通信总线。其中,存储器用于存储计算机程序,计算机程序由处理器加载并执行,以实现前述任一实施例公开的相应方法。电源用于为服务器上的各硬件设备提供工作电压;通信接口能够为服务器创建与外界设备之间的数据传输通道,其所遵循的通信协议是能够适用于本申请技术方案的任意通信协议,在此不对其进行具体限定;输入输出接口,用于获取外界输入数据或向外界输出数据,其具体的接口类型可以根据具体应用需要进行选取,在此不进行具体限定。存储器作为资源存储的载体,可以是只读存储器、随机存储器、磁盘或者光盘等,其上所存储的资源包括操作系统、计算机程序及数据等,存储方式可以是短暂存储或者永久存储。其中,操作系统用于管理与控制服务器上的各硬件设备以及计算机程序,以实现处理器对存储器中数据的运算与处理,其可以是Windows Server、Netware、Unix、Linux等。计算机程序除了包括能够用于完成前述任一实施例公开的方法的计算机程序之外,还可以进一步包括能够用于完成其他特定工作的计算机程序。数据除了可以包括图像数据、文本数据、模型参数等数据外,还可以包括应用程序的开发商信息等数据。
当电子设备是终端时,该终端具体可以包括但不限于智能手机、平板电脑、笔记本电脑 或台式电脑等。通常,本申请一些实施例中的终端包括有:处理器和存储器。其中,处理器可以包括一个或多个处理核心,比如4核心处理器、8核心处理器等。处理器可以采用DSP(Digital Signal Processing,数字信号处理)、FPGA(Field-Programmable Gate Array,现场可编程门阵列)、PLA(Programmable Logic Array,可编程逻辑阵列)中的至少一种硬件形式来实现。处理器也可以包括主处理器和协处理器,主处理器是用于对在唤醒状态下的数据进行处理的处理器,也称CPU(Central Processing Unit,中央处理器);协处理器是用于对在待机状态下的数据进行处理的低功耗处理器。在一些实施例中,处理器可以在集成有GPU(Graphics Processing Unit,图像处理器),GPU用于负责显示屏所需要显示的内容的渲染和绘制。一些实施例中,处理器还可以包括AI(Artificial Intelligence,人工智能)处理器,该AI处理器用于处理有关机器学习的计算操作。存储器可以包括一个或多个计算机非易失性可读存储介质,该计算机非易失性可读存储介质可以是非暂态的。存储器还可包括高速随机存取存储器,以及非易失性存储器,比如一个或多个磁盘存储设备、闪存存储设备。本申请一些实施例中,存储器至少用于存储以下计算机程序,其中,该计算机程序被处理器加载并执行之后,能够实现前述任一实施例公开的由终端侧执行的方法中的相关步骤。另外,存储器所存储的资源还可以包括操作系统和数据等,存储方式可以是短暂存储或者永久存储。其中,操作系统可以包括Windows、Unix、Linux等。数据可以包括但不限于图像数据、模型参数等。在一些实施例中,终端还可包括有显示屏、输入输出接口、通信接口、传感器、电源以及通信总线。
下面对本申请一些实施例提供的一种数据处理系统进行介绍,下文描述的一种数据处理系统与上文描述的一种数据处理方法、装置及设备可以相互参照。
本申请一些实施例公开了一种数据处理系统,包括:多个存储节点,每个存储节点包括多个如上实施例的电子设备。该数据处理系统可以是分布式存储系统,那么各电子设备为分布式存储系统中任一存储节点里的机柜。一个机柜中包括多个磁盘以及一个临时文件交换区(如DDR)。在一个存储节点内,各个机柜通过交换设备进行通信。
在分布式存储场景下,一个节点中的多个机柜通过网络连接,并通过上层主机进行控制,如图2所示。跨机柜工作时,受到传输协议、HOST控制流、工作状态等情况的影响,数据处理的时间损耗较大,而按照本申请可以降低数据跨机柜搬移所需时间。例如:在当前机柜的临时文件交换区中确定了当前操作对应的条带总量,之后将N个条带划分为一个条带组,得到多个条带组;针对任一条带组,在当前机柜的临时文件交换区中确定N个条带对应的N个中间处理结果;将N个条带中的各数据块按照块处理时长进行排序,得到块序列;将块序列划分为数据块个数相等的N个数据块组;按照各数据块组将N个条带的数据缓存至临时文件交换区,然后在临时文件交换区对上述N个中间处理结果和新缓存的N个条带的数据进行处理,得到处理结果,之后将此处理结果通过交换设备传输至其他机柜。可见,跨机柜工作时,处理一个条带的等待时长得到了缩短。
可见,本申请一些实施例提供了一种数据处理系统,该数据处理系统中每个节点都能够 降低条带处理时的等待时长。
下面对本申请一些实施例提供的一种非易失性可读存储介质进行介绍,下文描述的一种非易失性可读存储介质与上文描述的一种数据处理方法、装置及设备可以相互参照。
一种非易失性可读存储介质,用于保存计算机程序,其中,计算机程序被处理器执行时实现前述实施例公开的数据处理方法。
在一种具体实施方式中,非易失性可读存储介质中的计算机程序被处理器执行时可以实现如下步骤:在机柜的临时文件交换区中确定N个条带对应的N个中间处理结果;2≤N≤预设阈值X;将N个条带中的各数据块按照块处理时长进行排序,得到块序列;将块序列划分为数据块个数相等的N个数据块组;按照各数据块组对机柜中的相应磁盘进行操作。
在一种具体实施方式中,非易失性可读存储介质中的计算机程序被处理器执行时可以实现如下步骤:在机柜的临时文件交换区中确定N个条带对应的N个中间处理结果之前,针对当前操作在机柜中对应的所有条带,将N个条带划分为一个条带组,得到多个条带组;针对每个条带组,分别执行在机柜的临时文件交换区中确定N个条带对应的N个中间处理结果的步骤。
在一种具体实施方式中,非易失性可读存储介质中的计算机程序被处理器执行时可以实现如下步骤:将当前操作要处理的所有条带按照条带处理时长进行排序,得到条带序列;在条带序列中,将N个条带划分为一个条带组,得到多个条带组。
在一种具体实施方式中,非易失性可读存储介质中的计算机程序被处理器执行时可以实现如下步骤:针对每个条带组,按照各数据块组将机柜中的相应磁盘中的数据缓存至临时文件交换区。
在一种具体实施方式中,非易失性可读存储介质中的计算机程序被处理器执行时可以实现如下步骤:在临时文件交换区对N个中间处理结果和新缓存的数据进行处理,得到新处理结果。
在一种具体实施方式中,非易失性可读存储介质中的计算机程序被处理器执行时可以实现如下步骤:得到新处理结果之后,将新处理结果通过交换设备发送至当前存储节点中的其他机柜;交换设备与当前存储节点中的各机柜相连;或将新处理结果写入机柜中的相应磁盘。
在一种具体实施方式中,非易失性可读存储介质中的计算机程序被处理器执行时可以实现如下步骤:按照各数据块组将N个中间处理结果写入机柜中的相应磁盘。
在一种具体实施方式中,非易失性可读存储介质中的计算机程序被处理器执行时可以实现如下步骤:在机柜的临时文件交换区中确定N个条带对应的N个中间处理结果之前,接收交换设备发送的N个中间处理结果;交换设备与当前存储节点中的各机柜相连。
本申请涉及的“第一”、“第二”、“第三”、“第四”等(如果存在)是用于区别类似的对象,而不必用于描述特定的顺序或先后次序。应该理解这样使用的数据在适当情况下可以互换,以便这里描述的实施例能够以除了在这里图示或描述的内容以外的顺序实施。此外,术 语“包括”和“具有”以及他们的任何变形,意图在于覆盖不排他的包含,例如,包含了一系列步骤或单元的过程、方法或设备不必限于清楚地列出的那些步骤或单元,而是可包括没有清楚地列出的或对于这些过程、方法或设备固有的其它步骤或单元。
需要说明的是,在本申请中涉及“第一”、“第二”等的描述仅用于描述目的,而不能理解为指示或暗示其相对重要性或者隐含指明所指示的技术特征的数量。由此,限定有“第一”、“第二”的特征可以明示或者隐含地包括至少一个该特征。另外,各个实施例之间的技术方案可以相互结合,但是必须是以本领域普通技术人员能够实现为基础,当技术方案的结合出现相互矛盾或无法实现时应当认为这种技术方案的结合不存在,也不在本申请要求的保护范围之内。
本说明书中各个实施例采用递进的方式描述,每个实施例重点说明的都是与其它实施例的不同之处,各个实施例之间相同或相似部分互相参见即可。
结合本文中所公开的实施例描述的方法或算法的步骤可以直接用硬件、处理器执行的软件模块,或者二者的结合来实施。软件模块可以置于随机存储器(RAM)、内存、只读存储器(ROM)、电可编程ROM、电可擦除可编程ROM、寄存器、硬盘、可移动磁盘、CD-ROM、或技术领域内所公知的任意其它形式的非易失性可读存储介质中。
本文中应用了具体个例对本申请的原理及实施方式进行了阐述,以上实施例的说明只是用于帮助理解本申请的方法及其核心思想;同时,对于本领域的一般技术人员,依据本申请的思想,在具体实施方式及应用范围上均会有改变之处,综上所述,本说明书内容不应理解为对本申请的限制。

Claims (20)

  1. 一种数据处理方法,其特征在于,包括:
    在机柜的临时文件交换区中确定N个条带对应的N个中间处理结果;2≤N≤预设阈值X;
    将所述N个条带中的各数据块按照块处理时长进行排序,得到块序列;
    将所述块序列划分为数据块个数相等的N个数据块组;
    按照各数据块组对所述机柜中的相应磁盘进行操作。
  2. 根据权利要求1所述的方法,其特征在于,所述在机柜的临时文件交换区中确定N个条带对应的N个中间处理结果之前,还包括:
    针对当前操作在所述机柜中对应的所有条带,将N个条带划分为一个条带组,得到多个条带组;
    针对每个条带组,分别执行所述在机柜的临时文件交换区中确定N个条带对应的N个中间处理结果的步骤。
  3. 根据权利要求2所述的方法,其特征在于,所述针对当前操作要处理的所有条带,将N个条带划分为一个条带组,得到多个条带组,包括:
    将当前操作要处理的所有条带按照条带处理时长进行排序,得到条带序列;
    在所述条带序列中,将N个条带划分为一个条带组,得到多个条带组。
  4. 根据权利要求3所述的方法,其特征在于,所述方法还包括:
    通过所述临时文件交换区存储所述当前操作对应的所有条带的中间处理结果。
  5. 根据权利要求3所述的方法,其特征在于,任一条带的条带处理时长为:该条带包括的所有数据块的块处理时长之和。
  6. 根据权利要求1所述的方法,其特征在于,所述按照各数据块组对所述机柜中的相应磁盘进行操作,包括:
    按照各数据块组将所述机柜中的相应磁盘中的数据缓存至所述临时文件交换区。
  7. 根据权利要求6所述的方法,其特征在于,所述方法还包括:
    按照每个所述数据块对磁盘中的相应数据块进行读或写操作。
  8. 根据权利要求6所述的方法,其特征在于,还包括:
    在所述临时文件交换区对所述N个中间处理结果和新缓存的数据进行处理,得到新处理结果。
  9. 根据权利要求8所述的方法,其特征在于,所述得到新处理结果之后,还包括:
    将所述新处理结果通过交换设备发送至当前存储节点中的其他机柜;所述交换设备与当前存储节点中的各机柜相连;
    将所述新处理结果写入所述机柜中的相应磁盘。
  10. 根据权利要求1所述的方法,其特征在于,所述按照各数据块组对所述机柜中 的相应磁盘进行操作,包括:
    按照各数据块组将所述N个中间处理结果写入所述机柜中的相应磁盘。
  11. 根据权利要求1至10任一项所述的方法,其特征在于,所述在机柜的临时文件交换区中确定N个条带对应的N个中间处理结果之前,还包括:
    接收交换设备发送的所述N个中间处理结果;所述交换设备与当前存储节点中的各机柜相连。
  12. 根据权利要求1至10任一项所述的方法,其特征在于,若对磁盘进行读操作,则任一数据块的块处理时长取该数据块所属磁盘对应的传输时钟计数。
  13. 根据权利要求12所述的方法,其特征在于,所述机柜的控制器处设有一时钟计数器,所述方法还包括:
    通过所述时钟计数器记录所述机柜内各磁盘的预计空闲时间。
  14. 根据权利要求1至10任一项所述的方法,其特征在于,若对磁盘进行写操作,则任一数据块的块处理时长为:该数据块所属磁盘的单位写操作的处理时长。
  15. 根据权利要求14所述的方法,其特征在于,所述方法还包括;
    所述单位写操作的处理时长基于执行一次写操作需要花费的时长得到。
  16. 根据权利要求1至10任一项所述的方法,其特征在于,所述将所述N个条带中的各数据块按照块处理时长进行排序之前,还包括:
    若所述N个条带中的数据块个数不等,则使所述N个条带中的数据块个数相等后,执行所述将所述N个条带中的各数据块按照块处理时长进行排序的步骤。
  17. 一种数据处理装置,其特征在于,包括:
    确定模块,用于在机柜的临时文件交换区中确定N个条带对应的N个中间处理结果;2≤N≤预设阈值X;
    数据块排序模块,用于将所述N个条带中的各数据块按照块处理时长进行排序,得到块序列;
    数据块重组模块,用于将所述块序列划分为数据块个数相等的N个数据块组;
    磁盘操作模块,用于按照各数据块组对所述机柜中的相应磁盘进行操作。
  18. 一种电子设备,其特征在于,包括:
    存储器,用于存储计算机程序;
    处理器,用于执行所述计算机程序,以实现如权利要求1至16任一项所述的方法。
  19. 一种数据处理系统,其特征在于,包括:多个存储节点,每个存储节点包括多个如权利要求18所述的电子设备。
  20. 一种非易失性可读存储介质,其特征在于,用于保存计算机程序,其中,所述计算机程序被处理器执行时实现如权利要求1至16任一项所述的方法。
PCT/CN2023/077992 2022-08-02 2023-02-23 一种数据处理方法、装置、设备、系统及可读存储介质 WO2024027140A1 (zh)

Applications Claiming Priority (2)

Application Number Priority Date Filing Date Title
CN202210919507.1A CN114995770B (zh) 2022-08-02 2022-08-02 一种数据处理方法、装置、设备、系统及可读存储介质
CN202210919507.1 2022-08-02

Publications (1)

Publication Number Publication Date
WO2024027140A1 true WO2024027140A1 (zh) 2024-02-08

Family

ID=83021996

Family Applications (1)

Application Number Title Priority Date Filing Date
PCT/CN2023/077992 WO2024027140A1 (zh) 2022-08-02 2023-02-23 一种数据处理方法、装置、设备、系统及可读存储介质

Country Status (2)

Country Link
CN (1) CN114995770B (zh)
WO (1) WO2024027140A1 (zh)

Families Citing this family (1)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN114995770B (zh) * 2022-08-02 2022-12-27 苏州浪潮智能科技有限公司 一种数据处理方法、装置、设备、系统及可读存储介质

Citations (4)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN105224291A (zh) * 2015-09-29 2016-01-06 北京奇艺世纪科技有限公司 一种数据处理方法及装置
US20210311653A1 (en) * 2020-04-07 2021-10-07 Vmware, Inc. Issuing Efficient Writes to Erasure Coded Objects in a Distributed Storage System with Two Tiers of Storage
CN114816837A (zh) * 2022-06-28 2022-07-29 苏州浪潮智能科技有限公司 一种纠删码融合方法、系统、电子设备及存储介质
CN114995770A (zh) * 2022-08-02 2022-09-02 苏州浪潮智能科技有限公司 一种数据处理方法、装置、设备、系统及可读存储介质

Family Cites Families (2)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN109376100A (zh) * 2018-11-05 2019-02-22 浪潮电子信息产业股份有限公司 一种缓存写入方法、装置、设备及可读存储介质
CN113176858B (zh) * 2021-05-07 2022-12-13 锐捷网络股份有限公司 数据处理方法、存储系统及存储设备

Patent Citations (4)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN105224291A (zh) * 2015-09-29 2016-01-06 北京奇艺世纪科技有限公司 一种数据处理方法及装置
US20210311653A1 (en) * 2020-04-07 2021-10-07 Vmware, Inc. Issuing Efficient Writes to Erasure Coded Objects in a Distributed Storage System with Two Tiers of Storage
CN114816837A (zh) * 2022-06-28 2022-07-29 苏州浪潮智能科技有限公司 一种纠删码融合方法、系统、电子设备及存储介质
CN114995770A (zh) * 2022-08-02 2022-09-02 苏州浪潮智能科技有限公司 一种数据处理方法、装置、设备、系统及可读存储介质

Also Published As

Publication number Publication date
CN114995770A (zh) 2022-09-02
CN114995770B (zh) 2022-12-27

Similar Documents

Publication Publication Date Title
CN112035381B (zh) 一种存储系统及存储数据处理方法
CN104407933A (zh) 一种数据的备份方法及装置
CN106325758B (zh) 一种队列存储空间管理方法及装置
CN111813713B (zh) 数据加速运算处理方法、装置及计算机可读存储介质
WO2021223468A1 (zh) 一种基于ssd的日志数据保存方法、装置、设备和介质
EP2927779A1 (en) Disk writing method for disk arrays and disk writing device for disk arrays
WO2021103596A1 (zh) 一种数据迁移方法、装置和计算机可读存储介质
CN111124270B (zh) 缓存管理的方法、设备和计算机程序产品
WO2024027140A1 (zh) 一种数据处理方法、装置、设备、系统及可读存储介质
CN115576505B (zh) 一种数据存储方法、装置、设备及可读存储介质
CN105373484A (zh) 一种网络通信芯片中内存分配、存储和管理的方法
WO2024055571A1 (zh) 一种namespace设置方法、装置及可读存储介质
CN115129621B (zh) 一种内存管理方法、设备、介质及内存管理模块
US9069621B2 (en) Submitting operations to a shared resource based on busy-to-success ratios
CN106201918B (zh) 一种基于大数据量和大规模缓存快速释放的方法和系统
WO2024055529A1 (zh) 放置组成员选择方法、装置、设备及可读存储介质
CN116467235B (zh) 一种基于dma的数据处理方法、装置、电子设备及介质
CN104424142A (zh) 一种多核处理器系统中访问共享资源的方法与装置
CN115860080A (zh) 计算核、加速器、计算方法、装置、设备、介质及系统
WO2021057759A1 (zh) 内存迁移方法、装置及计算设备
CN114610231A (zh) 大位宽数据总线分段存储的控制方法、系统、设备及介质
WO2022001133A1 (zh) 一种提升软拷贝读性能的方法、系统、终端及存储介质
CN116601616A (zh) 一种数据处理装置、方法及相关设备
CN114745438B (zh) 多数据中心的缓存数据处理方法、装置、系统和电子设备
CN110413562A (zh) 一种具有自适应功能的同步系统和方法

Legal Events

Date Code Title Description
121 Ep: the epo has been informed by wipo that ep was designated in this application

Ref document number: 23848855

Country of ref document: EP

Kind code of ref document: A1