CN109558070B - Scalable storage system architecture - Google Patents

Scalable storage system architecture Download PDF

Info

Publication number
CN109558070B
CN109558070B CN201710893444.6A CN201710893444A CN109558070B CN 109558070 B CN109558070 B CN 109558070B CN 201710893444 A CN201710893444 A CN 201710893444A CN 109558070 B CN109558070 B CN 109558070B
Authority
CN
China
Prior art keywords
stripe
operation unit
request
basic operation
composite
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Active
Application number
CN201710893444.6A
Other languages
Chinese (zh)
Other versions
CN109558070A (en
Inventor
吴忠杰
易正利
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Beijing Memblaze Technology Co Ltd
Original Assignee
Beijing Memblaze Technology Co Ltd
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Beijing Memblaze Technology Co Ltd filed Critical Beijing Memblaze Technology Co Ltd
Priority to CN201710893444.6A priority Critical patent/CN109558070B/en
Publication of CN109558070A publication Critical patent/CN109558070A/en
Application granted granted Critical
Publication of CN109558070B publication Critical patent/CN109558070B/en
Active legal-status Critical Current
Anticipated expiration legal-status Critical

Links

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F3/00Input arrangements for transferring data to be processed into a form capable of being handled by the computer; Output arrangements for transferring data from processing unit to output unit, e.g. interface arrangements
    • G06F3/06Digital input from, or digital output to, record carriers, e.g. RAID, emulated record carriers or networked record carriers
    • G06F3/0601Interfaces specially adapted for storage systems
    • G06F3/0602Interfaces specially adapted for storage systems specifically adapted to achieve a particular effect
    • G06F3/061Improving I/O performance
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F11/00Error detection; Error correction; Monitoring
    • G06F11/07Responding to the occurrence of a fault, e.g. fault tolerance
    • G06F11/08Error detection or correction by redundancy in data representation, e.g. by using checking codes
    • G06F11/10Adding special bits or symbols to the coded information, e.g. parity check, casting out 9's or 11's
    • G06F11/1076Parity data used in redundant arrays of independent storages, e.g. in RAID systems
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F3/00Input arrangements for transferring data to be processed into a form capable of being handled by the computer; Output arrangements for transferring data from processing unit to output unit, e.g. interface arrangements
    • G06F3/06Digital input from, or digital output to, record carriers, e.g. RAID, emulated record carriers or networked record carriers
    • G06F3/0601Interfaces specially adapted for storage systems
    • G06F3/0602Interfaces specially adapted for storage systems specifically adapted to achieve a particular effect
    • G06F3/062Securing storage systems
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F3/00Input arrangements for transferring data to be processed into a form capable of being handled by the computer; Output arrangements for transferring data from processing unit to output unit, e.g. interface arrangements
    • G06F3/06Digital input from, or digital output to, record carriers, e.g. RAID, emulated record carriers or networked record carriers
    • G06F3/0601Interfaces specially adapted for storage systems
    • G06F3/0628Interfaces specially adapted for storage systems making use of a particular technique
    • G06F3/0629Configuration or reconfiguration of storage systems
    • G06F3/0631Configuration or reconfiguration of storage systems by allocating resources to storage systems
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F3/00Input arrangements for transferring data to be processed into a form capable of being handled by the computer; Output arrangements for transferring data from processing unit to output unit, e.g. interface arrangements
    • G06F3/06Digital input from, or digital output to, record carriers, e.g. RAID, emulated record carriers or networked record carriers
    • G06F3/0601Interfaces specially adapted for storage systems
    • G06F3/0628Interfaces specially adapted for storage systems making use of a particular technique
    • G06F3/0638Organizing or formatting or addressing of data
    • G06F3/064Management of blocks
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F3/00Input arrangements for transferring data to be processed into a form capable of being handled by the computer; Output arrangements for transferring data from processing unit to output unit, e.g. interface arrangements
    • G06F3/06Digital input from, or digital output to, record carriers, e.g. RAID, emulated record carriers or networked record carriers
    • G06F3/0601Interfaces specially adapted for storage systems
    • G06F3/0628Interfaces specially adapted for storage systems making use of a particular technique
    • G06F3/0662Virtualisation aspects
    • G06F3/0667Virtualisation aspects at data level, e.g. file, record or object virtualisation
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F3/00Input arrangements for transferring data to be processed into a form capable of being handled by the computer; Output arrangements for transferring data from processing unit to output unit, e.g. interface arrangements
    • G06F3/06Digital input from, or digital output to, record carriers, e.g. RAID, emulated record carriers or networked record carriers
    • G06F3/0601Interfaces specially adapted for storage systems
    • G06F3/0668Interfaces specially adapted for storage systems adopting a particular infrastructure
    • G06F3/0671In-line storage system
    • G06F3/0683Plurality of storage devices
    • G06F3/0689Disk arrays, e.g. RAID, JBOD
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F9/00Arrangements for program control, e.g. control units
    • G06F9/06Arrangements for program control, e.g. control units using stored programs, i.e. using an internal store of processing equipment to receive or retain programs
    • G06F9/46Multiprogramming arrangements
    • G06F9/50Allocation of resources, e.g. of the central processing unit [CPU]
    • G06F9/5005Allocation of resources, e.g. of the central processing unit [CPU] to service a request
    • G06F9/5011Allocation of resources, e.g. of the central processing unit [CPU] to service a request the resources being hardware resources other than CPUs, Servers and Terminals
    • G06F9/5016Allocation of resources, e.g. of the central processing unit [CPU] to service a request the resources being hardware resources other than CPUs, Servers and Terminals the resource being the memory

Abstract

An extensible storage system architecture is provided. A method for processing IO requests in a storage system of a scalable architecture is provided, comprising: responding to the received IO request, and acquiring a composite strip operation unit corresponding to the IO request; wherein the composite stripe operation unit comprises one or more stripe operation units, each stripe operation unit comprising one or more base operation units; a composite stripe operation unit that allocates resources for one or more stripe operation units included in the composite stripe operation unit; the composite stripe operation unit sequentially executing the one or more stripe operation units; and indicating that the IO request processing is complete.

Description

Scalable storage system architecture
Technical Field
The present application relates to storage systems, and more particularly to IO processing architecture for a storage system with scalable functionality and performance.
Background
Existing RAID (Redundant Arrays of Independent Disks, redundant array of independent disks) technology aggregates a plurality of physical drives (e.g., disks) into disk groups, partitions the stripes across the disk groups, and provides protection to the data on each stripe by redundancy techniques. In prior art RAID systems, when one or more of the plurality of drives fails, a spare drive is started and data reconstruction is performed to maintain the data protection capability and performance of the RAID system. After the failed drive is replaced, the data in the spare drive needs to be copied back to the new drive. RAID systems may provide a variety of levels of data protection, for example, RAID5 systems may be able to correct a failure of one block of drives, while RAID6 systems may be able to correct a failure of two blocks of drives.
RAID techniques lengthen IO paths and increase computational overhead. In order to fully play the performances of a plurality of SSD drivers, a multi-core multi-CPU technology is generally adopted, so that each CPU processes IO request requests concurrently as much as possible, and the aims of data protection and high performance are achieved.
Disclosure of Invention
In the application, a storage resource pool architecture is adopted, all storage resources provided by a driver are pooled, and then the resources are dynamically allocated to storage objects through an allocator. When a drive disk fails, reconstruction is performed for the storage object. Data migration is initiated when the drive resources of the storage system are not in balance.
The storage system needs to manage a huge amount of storage resources and provide expandability to quickly adapt to expansion of storage capacity and computing resources. And in order to cope with large-scale concurrent IO, it is desirable to separate the data of the IO from the processing procedure of the IO so as to provide a storage system with better expansibility.
According to a first aspect of the present application, there is provided a first method of processing an IO request according to the first aspect of the present application, comprising: responding to the received IO request, and acquiring a composite strip operation unit corresponding to the IO request; wherein the composite stripe operation unit comprises one or more stripe basic operation units, each stripe basic operation unit comprising one or more basic operation units; a composite stripe operation unit that allocates resources for one or more stripe base operation units included by the composite stripe operation unit; the composite stripe operation unit sequentially executes the one or more stripe base operation units; and indicating that the IO request processing is complete.
According to a first method for processing IO requests in the first aspect of the application, a second method for processing IO requests in the first aspect of the application is provided, wherein corresponding composite stripe operation units are acquired for each stripe accessed by the IO requests according to the stripe accessed by the IO requests; wherein each composite stripe operation unit accesses only regions of a single stripe that are accessed by the IO request.
According to the first or second method for processing IO requests in the first aspect of the application, a third method for processing IO requests in the first aspect of the application is provided, wherein a composite stripe operation unit corresponding to the type of IO request is acquired according to the type of IO request.
According to one of the first to third methods of processing an IO request of the first aspect of the present application, there is provided a fourth method of processing an IO request according to the first aspect of the present application, further comprising: and in response to receiving the IO request, the available IO processing unit is also acquired, and the acquired IO processing unit acquires the composite strip operation unit corresponding to the IO request.
According to a fourth method for processing an IO request in the first aspect of the present application, a fifth method for processing an IO request in the first aspect of the present application is provided, wherein if the IO request is a write request, an available IO processing unit is acquired for the IO request; and if the IO request is a read request, directly processing the IO request.
According to a fourth or fifth method of processing an IO request of the first aspect of the present application, there is provided a sixth method of processing an IO request according to the first aspect of the present application, wherein the IO processing unit assigned to the first IO request is dedicated to processing the first IO request, and the IO processing unit is not assigned to other IO requests until the processing of the first IO request is completed.
According to one of the fourth to sixth methods of processing an IO request according to the first aspect of the present application, there is provided a seventh method of processing an IO request according to the first aspect of the present application, further comprising: the IO processing unit identifies one or more strips providing the address range according to the address range accessed by the IO request, and acquires a corresponding composite strip operation unit for each strip.
The seventh method for processing an IO request according to the first aspect of the present application provides the eighth method for processing an IO request according to the first aspect of the present application, further including: in response to an IO request to write a first stripe, the IO processing unit requests a lock for the first stripe, and only after the request lock is successful, the IO processing unit acquires a composite stripe operation unit corresponding to the write stripe operation.
According to one of the first to eighth methods of processing an IO request of the first aspect of the present application, there is provided a ninth method of processing an IO request according to the first aspect of the present application, wherein the one or more stripe basic operation units include a first stripe basic operation unit and a second stripe basic operation unit; the first stripe basic operation unit is used for processing read stripe operation; when the composite strip operation unit allocates resources for the first strip basic operation unit, designating the striping of the accessed strip and allocating a storage space for storing read data; the composite stripe operation unit also controls execution of the first stripe basic operation unit first, and execution of the second stripe basic operation unit in response to completion of execution of the first stripe basic operation unit.
According to a ninth method for processing an IO request according to the first aspect of the present application, there is provided a tenth method for processing an IO request according to the first aspect of the present application, wherein: a second ribbon elementary operating unit for reconstructing the ribbon; and when the composite stripe operation unit allocates resources for the second stripe basic operation unit, designating a storage space for storing data for reconstructing the stripe and striping the stripe to which the data is to be written.
According to a ninth method of processing an IO request according to the first aspect of the present application, there is provided a method of processing an IO request according to the eleventh aspect of the present application, wherein: the second strip basic operation unit is used for checking the strip; and when the composite stripe operation unit allocates resources for the second stripe basic operation unit, designating a storage space for storing data for checking the stripe.
According to one of the first to eighth methods of processing an IO request according to the first aspect of the present application, there is provided a twelfth method of processing an IO request according to the first aspect of the present application, further comprising: the composite stripe operation unit selects a first stripe basic operation unit of the one or more stripe basic operation units according to the execution condition of the IO request, allocates the first stripe basic operation unit with a unit, and executes the first stripe basic operation unit.
According to one of the first to twelfth methods of processing an IO request according to the first aspect of the present application, there is provided a thirteenth method of processing an IO request according to the first aspect of the present application, further comprising: in response to adding processing of the first type of IO request, registering a first composite stripe operation unit comprised of one or more stripe basic operation units, wherein the first composite stripe operation unit records an execution order of the one or more stripe basic operation units included therein, and completing the processing of the first type of IO request by executing the one or more stripe basic operation units included in the first composite stripe operation unit in the specified order.
According to a thirteenth aspect of the present application, there is provided a method of processing an IO request according to the fourteenth aspect of the present application, further comprising: responding to the received IO request of the first type, and acquiring a first composite strip operation unit; a first composite stripe operation unit allocates resources for the stripe basic operation units it comprises; the first composite stripe operation unit executes the included stripe basic operation units according to a specified sequence; and indicating that the processing of the IO request of the first type is completed.
According to a second aspect of the present application there is provided a computer according to the second aspect of the present application comprising a processor and a memory, the memory storing a program comprising instructions which, when loaded into and executed on the processor, cause the processor to perform one of the methods of processing IO requests according to the first to thirteenth aspects of the present application.
According to a third aspect of the present application, there is provided a system for processing IO requests according to the third aspect of the present application, comprising: the composite strip operation unit acquisition module is used for responding to the received IO request and acquiring a composite strip operation unit corresponding to the IO request; wherein the composite stripe operation unit comprises one or more stripe basic operation units, each stripe basic operation unit comprising one or more basic operation units; a resource allocation module, configured to cause a composite stripe operation unit to allocate resources for one or more stripe basic operation units included in the composite stripe operation unit; a stripe basic operation unit execution module for causing the composite stripe operation unit to sequentially execute the one or more stripe basic operation units; and the indication module is used for indicating that the IO request processing is completed.
According to a fourth aspect of the present application there is provided a method of accessing a stripe according to the first aspect of the present application, comprising: acquiring a first stripe basic operation unit composed of a plurality of basic operation units; acquiring a first basic operation unit according to the first stripe basic operation unit, wherein the first basic operation unit allocates resources for accessing the stripe; acquiring a plurality of second basic operation units, each of which accesses one of the storage devices providing storage space for the stripe; and acquiring a third basic operation unit, wherein the third basic operation unit indicates that the access stripe is completed, and releasing the allocated resources.
According to a method of accessing a stripe according to the fourth aspect of the present application, there is provided a method of accessing a stripe according to the second aspect of the present application, wherein: the first basic operation unit records a plurality of second basic operation units that should be executed after completion of execution thereof, and acquires the plurality of second basic operation units from the first basic operation unit.
According to a method of accessing a stripe in accordance with the first or second aspect of the present application, there is provided a method of accessing a stripe in accordance with the fourth aspect of the present application, wherein the second basic operation unit records a third basic operation unit whose execution should be completed; and the third basic operation unit records that its execution is completed in dependence on all of the plurality of second basic operation units.
According to one of the methods of accessing a stripe according to the fourth aspect of the present application, there is provided a method of accessing a stripe according to the fourth aspect of the present application, wherein each of the plurality of second basic operation units accesses in parallel one of the storage devices providing storage space for the stripe.
According to a method of accessing a stripe according to the fourth aspect of the present application, there is provided a method of accessing a stripe according to the fifth aspect of the present application, further comprising: acquiring a fourth basic operation unit according to the first basic operation unit, wherein the fourth basic operation unit encodes data to be written into the stripe to generate check data; wherein the first basic operation unit records a fourth basic operation unit that should be executed after completion of its execution, and the fourth basic operation unit records a plurality of second basic operation units that should be executed after completion of its execution.
According to a method of accessing a stripe according to the fourth aspect of the present application, there is provided a method of accessing a stripe according to the sixth aspect of the present application, further comprising: acquiring a fifth basic operation unit according to the second basic operation unit, wherein the fifth basic operation unit decodes the data read from the stripe to generate data of the storage space with the fault of the stripe; wherein the second basic operation unit records a fifth basic operation unit that should be executed after completion of execution thereof, and the fifth basic operation unit records that the fifth basic operation unit is executed after completion of execution of all of the plurality of second basic operation units.
According to one of the methods of the first to sixth access stripes of the fourth aspect of the present application, there is provided a method of the seventh access stripe according to the fourth aspect of the present application, further comprising: the first basic operation unit acquires the dependency relationship of the plurality of basic operation units and generates acquisition and execution sequences of the plurality of basic operation units.
According to a method of accessing a stripe according to the fourth aspect of the present application, there is provided a method of accessing a stripe according to the eighth aspect of the present application, further comprising: acquiring a sixth basic operation unit, wherein the sixth basic operation unit encodes data to be written into the stripe to generate check data; wherein the first basic operation unit records a plurality of second basic operation units and fourth basic operation units which should be executed after the execution thereof is completed.
According to a method of an eighth access stripe of the fourth aspect of the present application, there is provided a method of a ninth access stripe of the fourth aspect of the present application, further comprising: acquiring a seventh basic operation unit, wherein the seventh basic operation unit writes the verification data into one of storage devices providing storage space for the strip; wherein the fourth basic operation unit records a seventh basic operation unit that should be executed after the execution thereof is completed.
According to a ninth method of accessing a stripe in accordance with the fourth aspect of the present application, there is provided the method of accessing a stripe in accordance with the fourth aspect of the present application, wherein the seventh basic operation unit records a third basic operation unit that should be executed after completion of its execution.
According to a ninth or tenth method of accessing a stripe according to the fourth aspect of the present application, there is provided a method of accessing a stripe according to the fourth aspect of the present application, wherein the third basic operation unit records that its execution is completed in dependence on both the plurality of second basic operation units and the seventh basic operation unit.
According to one of the methods of accessing a stripe according to the first to eleventh aspects of the present application, there is provided a method of accessing a stripe according to the twelfth aspect of the present application, further comprising: acquiring a second strip basic operation unit; acquiring an eighth basic operation unit according to the second stripe basic operation unit, wherein the eighth basic operation unit allocates resources for the access stripe; acquiring a ninth basic operation unit, wherein the ninth basic operation unit accesses one of storage devices for providing storage space for the strip; and acquiring a tenth basic operation unit, wherein the tenth basic operation unit indicates that the access stripe is completed, and releasing the resources allocated by the eighth basic operation unit.
According to a twelfth aspect of the present application, there is provided a method of accessing a stripe according to the thirteenth aspect of the present application, wherein the eighth basic operation unit records an eleventh basic operation unit that should be performed after completion of its execution, and acquires the eleventh basic operation unit according to the eighth basic operation unit; an eleventh basic operation unit codes data from the stripe to generate data for reconstructing the stripe; and acquiring the ninth operation unit according to the eleventh basic operation unit, wherein the eleventh basic operation unit records the ninth basic operation unit that should be executed after the execution thereof is completed.
According to a fifth aspect of the present application there is provided a method of first extended stripe operation according to the fifth aspect of the present application, comprising: in response to adding a first type of stripe operation, a first stripe basic operation unit composed of a plurality of basic operation units is registered, wherein the first stripe basic operation unit records that the first basic operation unit is executed first, the first basic operation unit records that one or more second basic operation units should be executed after its execution is completed, and the third basic operation unit records that its execution is completed in dependence on the one or more second basic operation units.
A method of first extended stripe operation according to a fifth aspect of the present application provides a method of second extended stripe operation according to the fifth aspect of the present application, further comprising: acquiring a first stripe basic operation unit in response to accessing a stripe in the first type of stripe operation; executing the first stripe basic operation unit to obtain a first basic operation unit, wherein the first basic operation unit allocates resources for accessing the stripe; acquiring a plurality of second basic operation units, each of which accesses one of the storage devices providing storage space for the stripe; and acquiring a third basic operation unit, wherein the third basic operation unit indicates that the access stripe is completed, and releasing the allocated resources.
According to a sixth aspect of the present application there is provided a first computer according to the sixth aspect of the present application comprising a processor and a memory, the memory storing a program comprising instructions which, when loaded into and executed on the processor, cause the processor to perform one of the methods provided according to the fourth or fifth aspects of the present application.
According to a seventh aspect of the present application there is provided a system of a first access stripe according to the seventh aspect of the present application, comprising: a strip basic operation unit acquisition module for acquiring a first strip basic operation unit composed of a plurality of basic operation units;
The first basic operation unit execution module is used for acquiring a first basic operation unit according to the first stripe basic operation unit, and the first basic operation unit allocates resources for accessing the stripe; a second basic operation unit execution module for acquiring a plurality of second basic operation units, each of which accesses one of the storage devices that provides the storage space for the stripe; and a third basic operation unit execution module configured to acquire a third basic operation unit, where the third basic operation unit indicates that the access stripe is completed, and release the allocated resource.
According to a third aspect of the present application there is provided a computer program comprising computer program code which, when loaded into and executed on a computer system, causes the computer system to perform the method of storage object based data reconstruction provided according to the first aspect of the present application.
According to a sixth aspect of the present application there is provided a program comprising program code which, when loaded into a storage system and executed thereon, causes the storage system to perform a method of object-based data reconstruction according to the first aspect of the present application.
Drawings
The application, as well as a preferred mode of use and further objectives and advantages thereof, will best be understood by reference to the following detailed description of illustrative embodiments when read in conjunction with the accompanying drawings, wherein:
FIG. 1 illustrates an architecture of a storage system according to an embodiment of the application;
FIG. 2 illustrates a structure of a memory object according to an embodiment of the present application;
FIG. 3 is a schematic diagram of a drive failing according to an embodiment of the present application;
FIG. 4 is a schematic diagram of an IO processing architecture in accordance with an embodiment of the present application;
FIGS. 5A-5E are schematic diagrams of a "stripe basic operation unit" according to an embodiment of the present application;
FIG. 6 illustrates a "compound stripe operation unit" for processing a reconstruction request in accordance with an embodiment of the present application;
FIG. 7A is a flow chart of processing IO requests in accordance with an embodiment of the present application; and
FIG. 7B is a flow chart of processing IO requests in accordance with yet another embodiment of the present application.
Detailed Description
Embodiments of the present application are described in detail below, and are illustrated in the accompanying drawings, wherein like or similar reference numerals refer to like or similar elements or elements having like or similar functions throughout. The embodiments described below by referring to the drawings are illustrative only and are not to be construed as limiting the application.
FIG. 1 illustrates an architecture of a storage system according to an embodiment of the present application. A storage system according to the present application includes a computer or server (collectively referred to as a host) and a plurality of storage devices (e.g., drives) coupled to the host. Preferably, the drive is a Solid State Drive (SSD). Optionally, a disk drive may also be included in embodiments in accordance with the present application.
The storage resources provided by the respective drives are maintained by a pool of storage resources. The storage resource pool records data blocks or chunks (Chunk, chunk for short) in each drive. By way of example, a chunk of data is a plurality of chunks of data that are contiguous in logical space or physical space in a drive of a predetermined size. The size of the large block of data may be, for example, hundreds of Kilobytes (KB) or Megabytes (MB). Alternatively, recorded in the storage resource are data blocks or chunks of data in the respective drives that have not been allocated to the storage object, which are also referred to as free data blocks or free data chunks. Storage resource pools are a virtualization technique to virtualize storage resources from physical drives into blocks or chunks of data for upper layer access or use. In a storage system there may be multiple storage resource pools, whereas in the example of fig. 1 only a single storage resource pool is shown.
A storage object is created for a storage object layer at a resource allocation layer, the storage object comprising a plurality of chunks. The allocator allocates large blocks in the pool of storage resources to create storage objects. According to the storage object provided in the embodiment of the application, a part of storage space of the storage system is represented. The storage object is a storage unit with a RAID function, and the storage object structure will be described in detail later with reference to fig. 2. A plurality of storage objects is provided in a storage object layer. The storage object may be created and destroyed. When a storage object is created, a desired number of large blocks are obtained from the storage resource pool by the allocator, and the large blocks constitute the storage object. A chunk may belong to only one storage object at a time. The large blocks that have been allocated to a storage object are no longer allocated to other storage objects. When a storage object is destroyed, the chunks that make up the storage object are released back into the storage resource pool and may be reassigned to other storage objects.
The storage system includes a plurality of virtual storage disks. The virtual storage disk provides an access interface for the application program and provides services to the outside. The virtual storage disk is composed of a number of storage objects. Multiple virtual storage disks of different attributes may be created as needed for use by the application. Virtual storage disks provide a logical address space.
FIG. 2 illustrates a memory object according to an embodiment of the application. The memory object includes a plurality of data blocks or chunks of data. In the example of FIG. 2, the storage objects include chunk 220, chunk 222, chunk 224, and chunk 226. The large blocks constituting the memory object come from different drives. Each drive may provide at most one chunk to one storage object. Referring to FIG. 2, chunk 220 is from drive 210, chunk 222 is from drive 212, chunk 224 is from drive 214, and chunk 226 is from drive 216. Thus, when a single drive fails, one or a few large blocks of memory objects are inaccessible. Through other large blocks of the storage object, the data of the storage object can be reconstructed to meet the requirement of data reliability.
Data protection is provided for the storage object through RAID technology, and high-performance access of the storage object is provided. Referring to FIG. 2, the storage object includes a plurality of RAID stripes (stripe 230, stripe 232 … … stripe 238), each consisting of storage space from a different chunk. Memory from different chunks of the same stripe may have the same or different address ranges. The stripe is the minimum write unit of the storage object, thereby improving performance by writing data to multiple drives in parallel. The read operation of the memory object is not limited in size. Data protection based on RAID technology is implemented in the stripe. Of the 4 pieces of storage space from each of the 4 large blocks that make up stripe 230, 3 pieces of storage space are used to store user data, while the other 1 piece of storage space is used to store parity data, such that data protection such as RAID 5 level is provided on stripe 230. Each segment of storage space that makes up a stripe is referred to as a "stripe".
Optionally, metadata is also stored in each chunk. In the example of fig. 2, the same metadata is stored on each of the chunks 220-226, so that the reliability of the metadata is ensured, and even if a part of the chunks belonging to the same storage object fails, the metadata can still be obtained from other chunks. The metadata is used for recording information such as a storage object to which the large block belongs, a data protection level (RAID level) of the storage object to which the large block belongs, and the erasing times of the large block. Still alternatively, as the storage object is reconstructed, the chunks that newly join the storage object have different metadata than other chunks of the storage object to describe the updated chunks.
FIG. 3 is a schematic diagram of an IO processing model in accordance with an embodiment of the present application. As shown in fig. 3, the application accesses the virtual storage disk. Several IO processing units are created for the storage resource pool according to a specified configuration, which can run completely concurrently on multiple CPUs. And for different IO requests from the virtual storage disk, distributing the IO requests to a specific IO processing unit for processing according to the parameters of the IO requests. By way of example, IO requests are distributed to different IO processing units depending on the type of IO request (read request or write request) and/or the logical address accessed by the IO request.
The IO processing unit is responsible for processing IO requests, such as read-write requests of applications and/or data reconstruction requests inside the storage system. The main operations of the IO processing unit include: mutual exclusion and synchronization between IO requests, RAID encoding and decoding, distributing IO requests to lower solid state drives, and processing the IO requests processed by the solid state drives back off. The IO processing unit may be a thread, process, or other piece of code executing on the CPU. In the example of FIG. 3, each IO processing unit is bound to one CPU or CPU core. The CPU or the CPU core is special for executing the IO processing unit bound with the CPU or the CPU core, so that the additional cost caused by thread switching is reduced.
As yet another example, for a portion of an IO request, instead of being processed by an IO processing unit, processing is performed in the same context as the process/thread that issued the IO request, thereby reducing the overhead introduced by context switching to reduce the processing delay of the portion of the IO request.
FIG. 4 is a schematic diagram of an IO processing architecture in accordance with an embodiment of the present application. The IO request is translated into an operation to access the stripe. And obtaining a strip serving the IO request according to the logic address accessed by the IO request. By way of example, since the size of IO requests is limited, write requests access at most one stripe, while read requests may access one or two stripes.
According to an embodiment of the application, IO requests (420, 422) are processed by a "composite stripe operation unit", each IO request being processed by one or two "composite stripe operation units" depending on the number of stripes accessed. Each "composite stripe operation unit" processes one type of IO request, e.g., read request, write request, read verify request, and reconstruct request. Wherein the read request is for reading data from the stripe, the write request is for writing data to the stripe, the read verify request is for reading user data from the stripe and verifying the consistency of the user data with its verify data, and the reconstruct request is for reconstructing the stripe (e.g., recovering data from a failed stripe and writing to a newly allocated memory space). Each "composite stripe operation unit" acts on only a single stripe, and for IO requests to access both stripes, two "composite stripe operation units" are generated for processing.
By way of example, a "composite stripe operation unit" is a state machine or code segment implemented as a state machine for controlling the execution and execution order of one or more "stripe base operation units" included in the "composite stripe operation unit". For example, a "composite stripe operation unit" initializes a "stripe basic operation unit" in a specified order (e.g., allocates memory for the "stripe basic operation unit", provides addresses and data of accessed stripes, etc.), executes the "stripe basic operation unit", identifies the "stripe basic operation unit" execution completion and execution result, and sequentially executes the next "stripe basic operation unit" constituting the "composite stripe operation unit" to complete the IO request.
For a specified type of IO request, a corresponding "composite stripe operation unit" is provided, having a specified "stripe base operation unit" and its execution order. There are "compound stripe operation units" in the storage system that correspond to multiple types of IO requests at the same time. In response to the IO request, acquiring and instantiating a corresponding 'composite stripe operation unit' according to the type of the IO request, and sequentially completing the processing of one or more 'stripe basic operation units' under the control of the 'composite stripe operation unit', thereby completing the processing of the IO request. The "composite stripe operation unit" describes a processing path of an IO request, and optionally includes a processing manner when an exception occurs in processing of the IO request.
According to an embodiment of the present application, a plurality of types of "composite stripe operation units" are provided in a storage system. For a new type of IO request, such as a de-allocation (Trim) operation provided by a new storage device, the new type of IO request is processed by adding a new "composite stripe operation unit" that controls the processing of one or more "stripe basic operation units".
The newly added "composite stripe operation unit" is optionally added to the storage system during its operation. The storage system provides a registration interface for the "composite stripe operation unit". To add a "composite stripe operation unit", a "composite stripe operation unit" is described which comprises one or more "stripe base operation units" and, optionally, also execution conditions of the "stripe base operation unit". Executing these "stripe base operation units" in a specified order (optionally also according to execution conditions) will complete the functions of the added "compound stripe operation units". And registering the added composite stripe operation unit to the storage system through a registration interface.
Referring to fig. 4, the "composite strip operating unit" 420 includes a "strip basic operating unit" 432, a "strip basic operating unit" 434, and a "strip basic operating unit" 436, and functions of the "composite strip operating unit" 420 are implemented by sequentially executing the "strip basic operating unit" 432, the "strip basic operating unit" 434, and the "strip basic operating unit" 436. Referring also to FIG. 6, a "composite stripe operation unit" for processing a reconstruct request is illustrated "
Is an example of (a). The "composite stripe operation unit" for processing the reconstruction request includes "stripe read", "stripe basic operation unit" 610 and "stripe reconstruct", "stripe basic operation unit" 620, and the reconstruction request is processed by sequentially executing "stripe basic operation unit" 610 and "stripe basic operation unit" 620. The "composite stripe operation unit" for processing the reconfiguration request, when executed, first initializes the "stripe basic operation unit" 610 and the "stripe basic operation unit" 620, including, for example, allocating memory for execution for the "stripe basic operation unit" 610 and the "stripe basic operation unit" 620, specifying the stripe to be accessed, indicating that there is a logical address segment in the stripe that is faulty, providing an address to carry read data, and the like. The "stripe basic operation unit" 610 and the "stripe basic operation unit" 620 are sequentially executed. Data is read from the specified stripe by executing the "stripe basic operation unit" 610. The "composite stripe operation unit" executes the "stripe basic operation unit" 620 in response to the execution of the "stripe basic operation unit" 610 being completed, calculates data of a logical address segment in which a failure exists from the read data, and writes the calculated data into an address space (e.g., a large block) for reconstruction. In response to the execution of the "stripe basic operation unit" 620 being completed, the "composite stripe operation unit" indicates that the reconstruction request processing is completed and memory allocated for the "stripe basic operation unit" 610 and the "stripe basic operation unit" 620 is reclaimed.
In yet another example, a "composite stripe operation unit" includes a plurality of selectable "stripe base operation units" each having an execution condition. For example, a "composite stripe operation unit" corresponding to a read request is executed under different conditions where the read data is located in one solid state storage device, two storage devices, or a solid state storage device (or a failed storage area) that constitutes a stripe, depending on the conditions.
Referring back to FIG. 4, the "stripe base operation unit" handles operations on a single stripe, e.g., read stripe operations, write stripe operations, verify stripe operations, etc. A "stripe basic operation unit" includes one or more basic operation units. "stripe basic operation units" schedule one or more "basic operation units" to complete an operation on a single stripe. The one or more basic operation units may execute in parallel or have a dependency relationship therebetween, and a "stripe basic operation unit" controls the parallel execution or execution sequence between the one or more basic operation units. For example, the storage space of one stripe is provided by 4 storage devices, and the "stripe basic operation unit" processing the read stripe operation includes 4 "basic operation units" for reading the storage devices, which 4 "basic operation units" can be executed in parallel. The "stripe basic operation unit" that handles the read stripe operation also recognizes that all of the 4 "basic operation units" are executed, and only the subsequent "basic operation units" are executed. By way of example, a "stripe basic operation unit" is implemented as a code segment that identifies the order of execution of the "basic operation units" it includes, and schedules execution of the "basic operation units".
The "basic operation unit" processes operations on at most one storage device. For example, a read command is issued to the storage device and the result read from the storage device is obtained. In another example, a "basic operating unit" does not access any storage devices, but rather validates or decodes data read from the storage devices by other "basic operating units" (to recover the failed data). The "basic operating unit" may issue one or more read/write commands to the storage device. The "basic operation unit" has a plurality of types, and each corresponds to a plurality of operations. For example, a "basic operation unit" for initializing an operation, allocating memory for one or more "basic operation units" of a "stripe basic operation unit", identifying dependency relationships of a plurality of "basic operation units" and determining an order in which a plurality of "basic operation units" are scheduled, and the like; a 'basic operation unit' for processing the read command, for sending the read command to the storage device and receiving the processing result of the read command; a "basic operation unit" for processing the write command, for issuing the write command to the storage device, and receiving a processing result of the write command. A "basic operation unit" for encoding data, for encoding user data to be written to the stripe, to generate check data; a "basic operation unit" for decoding data, for error correction decoding data read out from the stripe to recover failed data; the "basic operation unit" for ending the operation is used for releasing the memory allocated for one or more "basic operation units" of the "stripe basic operation unit". Further, the "basic operation unit" also describes the dependency relationship of the "basic operation unit" belonging to the same "stripe basic operation unit". For example, after recording in the "basic operation unit" which one or more "basic operation units" can be executed next after its own processing is completed; and/or record in the "basic operation unit" which "basic operation unit(s)" should be executed to completion before starting execution itself. By way of example, a "basic operating unit" is implemented as a code segment.
By various combinations of various "basic operation units", various "strip basic operation units" are provided according to an embodiment of the present application. The plurality of "stripe basic operation units" are in turn combined into a plurality of "compound stripe operation units". The various "compound stripe operation units" provide multiple functions, thereby providing extensibility of the functions to the storage system.
Fig. 5A to 5E are schematic diagrams of a "strip basic operation unit" according to an embodiment of the present application. FIG. 5A illustrates a "stripe basic operation unit" for a read stripe operation. By way of example, the stripe employs a RAID5 configuration, comprising 3 copies of user data and 1 copy of parity data (represented by "P"). In response to reading data from two sub-slices of the stripe, it is handled by the "stripe basic operation unit" for the read stripe operation shown in FIG. 5A. The "stripe basic operation unit" for read stripe operation of FIG. 5A includes 4 "basic operation units" (indicated by 510, 512, 514, and 516, respectively). The "basic operation unit" 510 is first scheduled for initialization operations such as allocating memory, designating addresses for access, and the like, and then the "basic operation units" 512 and 514 each access stripes located on different storage devices to read data. The "basic operation units" 512 and 514 may be executed in parallel. In response to the completion of the execution of both the "basic operation units" 512 and 514, the "basic operation unit" 516 is executed to end the operation.
FIG. 5B illustrates a "stripe base operation unit" for fault striping of a read stripe. As an example, given that a stripe in a stripe is faulty, the data of the faulty stripe is obtained by the "stripe base operation unit" for reading the faulty stripe of the stripe shown in fig. 5B. The "stripe basic operation unit" for the failed striping of the read stripe includes 6 "basic operation units" (indicated by 520, 522, 524, 528 and 529, respectively). The "basic operation unit" 520 is first scheduled for initialization. Next, the "basic operation units" 522, 524, and 526 each access stripes located on different storage devices to read data. Wherein "basic operation units" 522 and 524 read user data on the stripe, and "basic operation unit" 526 reads parity data for the stripe. The "basic operation units" 522, 524, and 524 may be executed in parallel or sequentially. In one example, the "basic operation unit" 520 for initialization schedules the execution order or scheduling policy of the "basic operation units" 522, 524, 526, 529, and 529. In another example, a "stripe basic operation unit" identifies dependencies of a plurality of "basic operation units" it contains, and schedules the "basic operation units". In response to completion of execution of the "basic operation units" 522, 524, and 526, the "basic operation unit" 528 is executed to perform data decoding to recover the data of the failed stripe (or portion thereof). Finally, a "basic operation unit" 529 is executed to end the processing of the "stripe basic operation unit" for the failure striping of the read stripe.
FIG. 5C illustrates a "stripe base operation unit" for writing a complete stripe. By way of example, the stripe employs a RAID5 configuration, comprising 3 stripes of user data, and 1 stripe of parity data (indicated by P). The "stripe basic operation units" for writing a complete stripe include 7 "basic operation units" (indicated by 530, 532, 534, 535, 536, 537, and 539, respectively). The "basic operation unit" 530 is first scheduled for initialization. Next, a "basic operation unit" 532 is performed to encode user data to be written to the stripe to generate check data for the stripe. Next, "basic operation units" 534, 535, 536, and 537 are executed in parallel to each access stripes located on different storage devices to write data. Where "base operating unit" 534, 35, 536, and 537 writes user data to the stripe, and "base operating unit" 537 writes parity data (generated by "base operating unit" 532) to the stripe. The "basic operating units" 534, 535, 536, and 537 may be executed in parallel or sequentially. In response to completion of execution of the "basic operation units" 534, 535, 536, and 537, the "basic operation unit" 539 is finally executed to end the processing of the "stripe basic operation unit" for writing the complete stripe.
FIG. 5D illustrates another "stripe base operation unit" for writing a complete stripe. The "stripe basic operation unit" for writing a complete stripe includes 7 "basic operation units" (indicated by 540, 542, 544, 545, 546, 547, and 549, respectively). The "basic operation unit" 540 is first scheduled for initialization. Next, "basic operation units" 542, 544, 545, and 546 are executed in parallel. Where "basic operation units" 544, 545, and 546 are used to write data to different storage devices, respectively, and "basic operation unit" 542 encodes data to be written to the storage devices to generate parity data. After the execution of the "basic operation unit" 542 is completed, the "basic operation unit" 547 is executed to write the verification data to the storage device. In response to completion of execution of the "basic operation units" 544, 545, 546, and 547, the "basic operation unit" 549 is finally executed to end the processing of the "stripe basic operation unit" for writing the complete stripe. Since the "basic operation units" 544, 545, and 546 do not depend on the "basic operation unit" 542, the "stripe basic operation unit" for writing a complete stripe shown in fig. 5D has a lower processing delay by scheduling the "basic operation units" 544, 545, and 546 in advance.
In yet another example of a "stripe basic operation unit" for writing a complete stripe, the storage system generates parity data as a byproduct during writing of data to the storage device, omitting, for example, the "basic operation unit" 542, and the "basic operation unit" 547 is executed after the processing of all of the "basic operation units" 544, 545, and 546 is completed. Finally, a "basic operation unit" 549 is executed to end the processing of the "stripe basic operation unit" for writing the complete stripe.
Thus, according to embodiments of the present application, different "stripe basic operation units" are formed by combining the "basic operation units" to meet the diversified demands of the storage system.
Fig. 5E illustrates a "stripe basic operation unit" for writing data to stripes in stripe reconstruction. The "stripe basic operation unit" for writing data to stripes in stripe reconstruction includes 4 "basic operation units" (indicated by 550, 552, 554, and 556, respectively). The "basic operation unit" 550 is first scheduled for initialization. Next, a "basic operation unit" 552 is executed to encode data to generate data to be written to the stripe. Next, a "basic operation unit" 554 is performed to write data into a new data stripe of the stripe. Finally, a "basic operation unit" 556 is executed to end the processing of the "stripe basic operation unit" for writing user data stripes in stripe reconstruction.
FIG. 5E shows a "stripe base operation unit" for writing data to stripes in stripe reconstruction, which may be referred to as "stripe base operation unit" 620 of FIG. 6. And the "stripe basic operation unit" 610 of fig. 6 may be a "stripe basic operation unit" obtained by removing the "basic operation unit" 528 from the "stripe basic operation unit" for reading the faulty stripe shown in fig. 5B.
According to embodiments of the present application, various types of "stripe basic operation units" are provided in a storage system or added to the storage system during operation of the storage system. The storage system provides a registration interface for the "stripe basic operation unit". To add a kind of "stripe basic operation unit", the "stripe basic operation unit" and one or more "basic operation units" included therein are described. Executing these "basic operation units" in the specified order will complete the functions of the added "stripe basic operation units". The "stripe basic operation unit" also records the "basic operation unit" that is executed first when called, and in each "basic operation unit" records the next "basic operation unit" that should be executed after its execution is completed and/or one or more "basic operation units" on which its execution depends. And registering the added 'stripe basic operation unit' to the storage system through a registration interface.
FIG. 7A is a flow chart of processing IO requests in accordance with an embodiment of the present application. In response to receiving the IO request, a corresponding "composite stripe operation unit" is obtained (710) depending on the type of IO request. By way of example, an application program accesses a virtual storage disk through an API (application programming interface) provided by the operating system (see also FIG. 1). In response, in a code segment that processes the API call, a type of IO request is identified, e.g., whether the IO request is a read request or a write request. The stripe accessed by the IO request is also identified. A corresponding "composite stripe operation unit" is obtained for each stripe accessed. For example, a logical address range accessed by a read request is provided by two stripes, with a corresponding "composite stripe operation unit" being obtained for each stripe's read request. The two "compound stripe operation units" acquired may be of the same type or of different types. For example, if the address range accessed by one stripe is normal and there is a failure in the address range accessed by another stripe, a different "compound stripe operation unit" is obtained to read data from the stripe in a different manner. For another example, a logical address range for write request access is provided by 1 stripe, obtaining a corresponding "compound stripe operation unit".
Under control of the acquired individual "composite stripe operation units", the "stripe base operation units" comprised by the "composite stripe operation units" are initialized in execution order (720). For example, memory is allocated for "stripe basic operation units", addresses of accessed stripes are specified, data to be written is provided, and the like. The "composite stripe operation unit" also indicates in order that the "stripe base operation unit" accesses the stripe (730). The "basic operation unit" of the "stripe basic operation unit" issues a read command/write command to the storage device that provides the stripe. And receiving a command execution result of the storage device. In response to the "stripe base operation unit" included in the "composite stripe operation unit" completing the access to the stripe in the order indicated by the "composite stripe operation unit", the processing of the "composite stripe operation unit" is completed. And in response to the completion of processing of one or more "composite stripe operation units" corresponding to the IO request, the IO request is processed.
Optionally, the code segment that handles the API call does not switch contexts, thereby obtaining memory allocated for the "stripe basic operation unit" from the current context, accessing the address of the stripe, the storage space to write data, and/or the storage space to store data to be read. By reducing the number of context switches, system load and IO request processing delay are reduced.
According to an embodiment of the application, multiple IO requests are processed concurrently. The composite stripe operation unit, the composite stripe operation unit and/or the basic operation unit can be created into a plurality of instances through initialization, and the composite stripe operation unit, the composite stripe operation unit and/or the basic operation unit are created for each IO request so as to process the IO request. Optionally, applications of multiple processes/threads each access the virtual storage disk through an API. In the context of each process/thread, the acquire "composite stripe operation unit" processes IO requests. In yet another example, a code segment that processes an API call caches IO requests, and for each IO request, obtains a corresponding "composite stripe operation unit", and under control of the "composite stripe operation unit", each "basic operation unit" of the "stripe basic operation unit" issues IO commands to a plurality of storage devices in an asynchronous manner and receives IO command processing results provided by the storage devices. Optionally, each of the plurality of CPUs or the plurality of CPU cores obtains the buffered IO request and obtains the corresponding "composite stripe operation unit" to process the IO request. Therefore, each IO request generation and processing process can be processed by a single CPU, a plurality of CPUs process a plurality of IO requests simultaneously, and the parallelism of processing IO requests is further increased along with the increase of the number of the CPUs or CPU cores, so that the expandability of the performance is provided for the storage system.
FIG. 7B is a flow chart of processing IO requests in accordance with yet another embodiment of the present application. In response to receiving the IO request (750), available IO processing units are acquired (see also FIG. 3), and the IO request is processed by the IO processing units assigned to the IO request (760). Optionally, during IO request processing, the IO processing unit assigned to the IO request is dedicated to processing the IO request and is not scheduled to other tasks. Still alternatively, the IO processing unit assigned to an IO request (called io_1) is dedicated to processing the IO request without being interrupted in the process from obtaining the IO request to sending the IO command to all storage devices accessed by the IO request, and may be scheduled for processing other tasks (or other IO requests) shortly after sending the IO command to all storage devices accessed by the IO request, and continue to process the IO request (io_1) before the expected storage device processes the IO command, so that the IO request is processed in time, and the processing delay of the IO request is reduced.
By way of example, an IO processing unit is a thread or process. Still by way of example, for a write request, the IO processing unit is assigned to the write request at step 760 because the storage device has a lower processing latency for the write command. For read requests, the IO processing unit is not assigned, and the read requests are directly processed in the code segment for processing the API call, so that read request processing delay is reduced.
The IO processing unit identifies the stripe for which IO requests access, and locks the stripe for which IO requests access (770). In some application scenarios, multiple write requests have conflicting access to a stripe. To resolve the conflict, it is necessary to have multiple IO requests for which there is a conflict be processed serially. This is achieved by locking. If the locking fails, the IO processing unit pauses the processing of the IO request until the locking is successful. In response to a successful lock, the IO processing unit obtains a "composite stripe operation unit" corresponding to the IO request (770).
Optionally, in some application scenarios, there is no access conflict between multiple IO requests, e.g., the application has handled a potential access conflict, or the solid state storage device provides an atomic command (supporting atomic processing of IO commands), then step 770 need not be performed. Still alternatively, for a read request, the stripe it accesses need not be locked, step 770 need not be performed, and only for a write request, step 770 is performed.
Under control of the acquired individual "composite stripe operation units", the "stripe base operation units" included in the "composite stripe operation units" are initialized in execution order (780). The "composite stripe operation unit" also sequentially indicates that the "stripe base operation unit" accesses the stripe (790). And in response to the completion of processing of one or more "composite stripe operation units" corresponding to the IO request, the IO request is processed.
The embodiment of the present application also provides a program comprising program code which, when loaded into and executed in a CPU, causes the CPU to perform one of the methods according to the embodiments of the present application provided above.
The present application also provides a program comprising program code which, when loaded into and executed on a host computer, causes a processor of the host computer to perform one of the methods provided above according to the embodiments of the present application.
It will be understood that each block of the block diagrams and flowchart illustrations, and combinations of blocks in the block diagrams and flowchart illustrations, respectively, can be implemented by various means including computer program instructions. These computer program instructions may be loaded onto a general purpose computer, special purpose computer, or other programmable data control apparatus to produce a machine, such that the instructions which execute on the computer or other programmable data control apparatus create means for implementing the functions specified in the flowchart block or blocks.
These computer program instructions may also be stored in a computer-readable memory that can direct a computer or other programmable data control apparatus to function in a particular manner, such that the instructions stored in the computer-readable memory produce an article of manufacture including computer-readable instructions for implementing the function specified in the flowchart block or blocks. The computer program instructions may also be loaded onto a computer or other programmable data control apparatus to cause a series of operational operations to be performed on the computer or other programmable apparatus to produce a computer implemented process such that the instructions which execute on the computer or other programmable apparatus provide operations for implementing the functions specified in the flowchart block or blocks.
Accordingly, blocks of the block diagrams and flowchart illustrations support combinations of means for performing the specified functions, combinations of operations for performing the specified functions and program instruction means for performing the specified functions. It will also be understood that each block of the block diagrams and flowchart illustrations, and combinations of blocks in the block diagrams and flowchart illustrations, can be implemented by special purpose hardware-based computer systems that perform the specified functions or operations, or combinations of special purpose hardware and computer instructions.
Although the present application has been described with reference to examples, which are intended for purposes of illustration only and not to be limiting of the application, variations, additions and/or deletions to the embodiments may be made without departing from the scope of the application.
Many modifications and other embodiments of the applications set forth herein will come to mind to one skilled in the art to which these embodiments pertain having the benefit of the teachings presented in the foregoing descriptions and the associated drawings. Therefore, it is to be understood that the applications are not to be limited to the specific embodiments disclosed and that modifications and other embodiments are intended to be included within the scope of the appended claims. Although specific terms are employed herein, they are used in a generic and descriptive sense only and not for purposes of limitation.

Claims (14)

1. A method of processing an IO request, comprising:
responding to the received IO request, and acquiring a composite strip operation unit corresponding to the IO request; wherein the composite stripe operation unit comprises one or more stripe basic operation units, each stripe basic operation unit comprising one or more basic operation units;
a composite stripe operation unit that allocates resources for one or more stripe base operation units included by the composite stripe operation unit;
the composite stripe operation unit sequentially executes the one or more stripe base operation units; and
indicating that the IO request processing is completed;
wherein each IO request is processed by one or two composite stripe operation units depending on the number of stripes accessed; each composite stripe operation unit processes one type of IO request, each composite stripe operation unit only acts on a single stripe, and for IO requests accessing two stripes, two composite stripe operation units are generated;
the stripe basic operation units schedule one or more basic operation units to finish the operation of a single stripe, the one or more basic operation units execute in parallel or have a dependency relationship, the stripe basic operation units control the parallel execution or execution sequence among the one or more basic operation units, and the basic operation units process the operation of at most one storage device.
2. The method of claim 1, wherein a corresponding composite stripe operation unit is obtained for each stripe accessed by the IO request according to the stripe accessed by the IO request; wherein each composite stripe operation unit accesses only regions of a single stripe that are accessed by the IO request.
3. The method of claim 1, wherein the composite stripe operation unit corresponding to the type of the IO request is obtained according to the type of the IO request.
4. The method of claim 1, further comprising:
and in response to receiving the IO request, the available IO processing unit is also acquired, and the acquired IO processing unit acquires the composite strip operation unit corresponding to the IO request.
5. The method of claim 2, wherein
If the IO request is a write request, acquiring an available IO processing unit for the IO request; and
if the IO request is a read request, the IO request is directly processed.
6. The method of claim 5, wherein
The IO processing unit assigned to the first IO request is dedicated to processing the first IO request, and the IO processing unit is not assigned to other IO requests until the processing of the first IO request is completed.
7. The method of claim 4, further comprising:
the IO processing unit identifies one or more strips providing the address range according to the address range accessed by the IO request, and acquires a corresponding composite strip operation unit for each strip.
8. The method of claim 7, further comprising:
in response to an IO request to write a first stripe, the IO processing unit requests a lock for the first stripe, and only after the request lock is successful, the IO processing unit acquires a composite stripe operation unit corresponding to the write stripe operation.
9. The method of one of claims 1-8, wherein the one or more stripe basic operation units comprise a first stripe basic operation unit and a second stripe basic operation unit; the first stripe basic operation unit is used for processing read stripe operation;
when the composite strip operation unit allocates resources for the first strip basic operation unit, designating the striping of the accessed strip and allocating a storage space for storing read data; the composite stripe operation unit also controls execution of the first stripe basic operation unit first, and execution of the second stripe basic operation unit in response to completion of execution of the first stripe basic operation unit.
10. The method according to claim 9, wherein:
a second ribbon elementary operating unit for reconstructing the ribbon;
and when the composite stripe operation unit allocates resources for the second stripe basic operation unit, designating a storage space for storing data for reconstructing the stripe and striping the stripe to which the data is to be written.
11. The method of one of claims 1-8, further comprising:
registering a first composite stripe operation unit composed of one or more stripe basic operation units in response to adding processing of the first type of IO request, wherein the first composite stripe operation unit records an execution order of the one or more stripe basic operation units included therein, and completes processing of the first type of IO request by executing the one or more stripe basic operation units included in the first composite stripe operation unit in a specified order; and
in response to receiving the IO request of the first type, a first composite stripe operation unit is acquired.
12. The method of one of claims 1-8, further comprising:
the composite stripe operation unit selects a first stripe basic operation unit of the one or more stripe basic operation units according to the execution condition of the IO request, allocates the first stripe basic operation unit with a unit, and executes the first stripe basic operation unit.
13. The method of one of claims 1-8, further comprising:
registering a first composite stripe operation unit composed of one or more stripe basic operation units in response to adding processing of the first type of IO request, wherein the first composite stripe operation unit records an execution order of the one or more stripe basic operation units included therein, and completes processing of the first type of IO request by executing the one or more stripe basic operation units included in the first composite stripe operation unit in a specified order; and
Responding to the received IO request of the first type, and acquiring a first composite strip operation unit; a first composite stripe operation unit allocates resources for the stripe basic operation units it comprises;
the first composite stripe operation unit executes the included stripe basic operation units according to a specified sequence; and indicating that the processing of the IO request of the first type is completed.
14. A computer comprising a processor and a memory, the memory storing a program comprising instructions which, when loaded into and executed on the processor, cause the processor to perform the method according to any of claims 1-13.
CN201710893444.6A 2017-09-27 2017-09-27 Scalable storage system architecture Active CN109558070B (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN201710893444.6A CN109558070B (en) 2017-09-27 2017-09-27 Scalable storage system architecture

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN201710893444.6A CN109558070B (en) 2017-09-27 2017-09-27 Scalable storage system architecture

Publications (2)

Publication Number Publication Date
CN109558070A CN109558070A (en) 2019-04-02
CN109558070B true CN109558070B (en) 2023-09-15

Family

ID=65864114

Family Applications (1)

Application Number Title Priority Date Filing Date
CN201710893444.6A Active CN109558070B (en) 2017-09-27 2017-09-27 Scalable storage system architecture

Country Status (1)

Country Link
CN (1) CN109558070B (en)

Citations (8)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN1834943A (en) * 2005-03-14 2006-09-20 富士通株式会社 Storage system, control method thereof, and program
CN101253492A (en) * 2005-06-30 2008-08-27 Bea系统公司 System and method for managing communications sessions in a network
CN102567214A (en) * 2011-12-01 2012-07-11 浪潮电子信息产业股份有限公司 Method for quickly initializing bitmap page management by RAID5
CN102622189A (en) * 2011-12-31 2012-08-01 成都市华为赛门铁克科技有限公司 Storage virtualization device, data storage method and system
CN102682012A (en) * 2011-03-14 2012-09-19 成都市华为赛门铁克科技有限公司 Method and device for reading and writing data in file system
WO2017025039A1 (en) * 2015-08-13 2017-02-16 北京忆恒创源科技有限公司 Flash storage oriented data access method and device
CN106469126A (en) * 2015-08-12 2017-03-01 北京忆恒创源科技有限公司 Process method and its storage control of I/O Request
CN106933490A (en) * 2015-12-29 2017-07-07 伊姆西公司 The method and apparatus that control is written and read operation to disk array

Family Cites Families (1)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US7155634B1 (en) * 2002-10-25 2006-12-26 Storage Technology Corporation Process for generating and reconstructing variable number of parity for byte streams independent of host block size

Patent Citations (8)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN1834943A (en) * 2005-03-14 2006-09-20 富士通株式会社 Storage system, control method thereof, and program
CN101253492A (en) * 2005-06-30 2008-08-27 Bea系统公司 System and method for managing communications sessions in a network
CN102682012A (en) * 2011-03-14 2012-09-19 成都市华为赛门铁克科技有限公司 Method and device for reading and writing data in file system
CN102567214A (en) * 2011-12-01 2012-07-11 浪潮电子信息产业股份有限公司 Method for quickly initializing bitmap page management by RAID5
CN102622189A (en) * 2011-12-31 2012-08-01 成都市华为赛门铁克科技有限公司 Storage virtualization device, data storage method and system
CN106469126A (en) * 2015-08-12 2017-03-01 北京忆恒创源科技有限公司 Process method and its storage control of I/O Request
WO2017025039A1 (en) * 2015-08-13 2017-02-16 北京忆恒创源科技有限公司 Flash storage oriented data access method and device
CN106933490A (en) * 2015-12-29 2017-07-07 伊姆西公司 The method and apparatus that control is written and read operation to disk array

Non-Patent Citations (4)

* Cited by examiner, † Cited by third party
Title
Storage Appliance System Based on Content Addressable Storage.Storage Appliance System Based on Content Addressable Storage.IEEE.2009,全文. *
朱超 ; 缑文海 ; 王福义 ; 邵耀来 ; .大中型医院PACS建设的经验与建议.医疗卫生装备.2012,(第07期),全文. *
蔡斌 ; 谢长生 ; 朱光喜 ; .基于分散式体系结构的高可靠文件存储系统的研究.计算机科学.2008,(第09期),全文. *
蔡杰明 ; 方沛 ; 贾思懿 ; 董欢庆 ; 刘振军 ; 刘国良 ; .多重条带布局的混合RAID系统研究.小型微型计算机系统.2017,(第05期),全文. *

Also Published As

Publication number Publication date
CN109558070A (en) 2019-04-02

Similar Documents

Publication Publication Date Title
CN111433732B (en) Storage device and computer-implemented method performed by the storage device
US20140325262A1 (en) Controlling data storage in an array of storage devices
CN111679795B (en) Lock-free concurrent IO processing method and device
US20220137849A1 (en) Fragment Management Method and Fragment Management Apparatus
US7970994B2 (en) High performance disk array rebuild
US20160217040A1 (en) Raid parity stripe reconstruction
US9990263B1 (en) Efficient use of spare device(s) associated with a group of devices
KR20150105323A (en) Method and system for data storage
US10649891B2 (en) Storage device that maintains mapping data therein
KR20180002259A (en) Structure and design method of flash translation layer
US8862819B2 (en) Log structure array
US11144219B2 (en) Ensuring sufficient available storage capacity for data resynchronization/reconstruction in a hyper-converged infrastructure
US11797435B2 (en) Zone based reconstruction of logical to physical address translation map
US10152278B2 (en) Logical to physical sector size adapter
CN111124262A (en) Management method, apparatus and computer readable medium for Redundant Array of Independent Disks (RAID)
US10395751B2 (en) Automated testing system and operating method thereof
WO2021080774A1 (en) Construction of a block device
CN109558068B (en) Data migration method and migration system
CN111124251B (en) Method, apparatus and computer readable medium for I/O control
US20210117320A1 (en) Construction of a block device
CN109558070B (en) Scalable storage system architecture
CN109558236B (en) Method for accessing stripes and storage system thereof
CN107562639B (en) Erase block read request processing method and device
CN107562654B (en) IO command processing method and device
CN107491263B (en) Data reconstruction method based on storage object

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination
CB02 Change of applicant information

Address after: 100192 A302, building B-2, Zhongguancun Dongsheng Science Park, No. 66, xixiaokou Road, Haidian District, Beijing

Applicant after: Beijing yihengchuangyuan Technology Co.,Ltd.

Address before: 100192 room A302, building B-2, Dongsheng Science Park, Zhongguancun, 66 xixiaokou Road, Haidian District, Beijing

Applicant before: BEIJING MEMBLAZE TECHNOLOGY Co.,Ltd.

CB02 Change of applicant information
GR01 Patent grant
GR01 Patent grant