WO2019056948A1 - 存储介质的管理方法、装置及可读存储介质 - Google Patents

存储介质的管理方法、装置及可读存储介质 Download PDF

Info

Publication number
WO2019056948A1
WO2019056948A1 PCT/CN2018/104288 CN2018104288W WO2019056948A1 WO 2019056948 A1 WO2019056948 A1 WO 2019056948A1 CN 2018104288 W CN2018104288 W CN 2018104288W WO 2019056948 A1 WO2019056948 A1 WO 2019056948A1
Authority
WO
WIPO (PCT)
Prior art keywords
storage medium
free
redundant space
dimension
maintenance
Prior art date
Application number
PCT/CN2018/104288
Other languages
English (en)
French (fr)
Inventor
周建华
周猛
Original Assignee
华为技术有限公司
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by 华为技术有限公司 filed Critical 华为技术有限公司
Priority to EP21213020.7A priority Critical patent/EP4036735B1/en
Priority to EP18857718.3A priority patent/EP3667504B1/en
Publication of WO2019056948A1 publication Critical patent/WO2019056948A1/zh
Priority to US16/824,259 priority patent/US11237929B2/en
Priority to US17/545,203 priority patent/US11714733B2/en

Links

Images

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F11/00Error detection; Error correction; Monitoring
    • G06F11/07Responding to the occurrence of a fault, e.g. fault tolerance
    • G06F11/16Error detection or correction of the data by redundancy in hardware
    • G06F11/20Error detection or correction of the data by redundancy in hardware using active fault-masking, e.g. by switching out faulty elements or by switching in spare elements
    • G06F11/2053Error detection or correction of the data by redundancy in hardware using active fault-masking, e.g. by switching out faulty elements or by switching in spare elements where persistent mass storage functionality or persistent mass storage control functionality is redundant
    • G06F11/2094Redundant storage or storage space
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F11/00Error detection; Error correction; Monitoring
    • G06F11/07Responding to the occurrence of a fault, e.g. fault tolerance
    • G06F11/16Error detection or correction of the data by redundancy in hardware
    • G06F11/1608Error detection by comparing the output signals of redundant hardware
    • G06F11/1612Error detection by comparing the output signals of redundant hardware where the redundant component is persistent storage
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F11/00Error detection; Error correction; Monitoring
    • G06F11/07Responding to the occurrence of a fault, e.g. fault tolerance
    • G06F11/08Error detection or correction by redundancy in data representation, e.g. by using checking codes
    • G06F11/10Adding special bits or symbols to the coded information, e.g. parity check, casting out 9's or 11's
    • G06F11/1076Parity data used in redundant arrays of independent storages, e.g. in RAID systems
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F11/00Error detection; Error correction; Monitoring
    • G06F11/07Responding to the occurrence of a fault, e.g. fault tolerance
    • G06F11/08Error detection or correction by redundancy in data representation, e.g. by using checking codes
    • G06F11/10Adding special bits or symbols to the coded information, e.g. parity check, casting out 9's or 11's
    • G06F11/1076Parity data used in redundant arrays of independent storages, e.g. in RAID systems
    • G06F11/1092Rebuilding, e.g. when physically replacing a failing disk
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F11/00Error detection; Error correction; Monitoring
    • G06F11/07Responding to the occurrence of a fault, e.g. fault tolerance
    • G06F11/16Error detection or correction of the data by redundancy in hardware
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F11/00Error detection; Error correction; Monitoring
    • G06F11/07Responding to the occurrence of a fault, e.g. fault tolerance
    • G06F11/16Error detection or correction of the data by redundancy in hardware
    • G06F11/1658Data re-synchronization of a redundant component, or initial sync of replacement, additional or spare unit
    • G06F11/1662Data re-synchronization of a redundant component, or initial sync of replacement, additional or spare unit the resynchronized component or unit being a persistent storage device
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F2201/00Indexing scheme relating to error detection, to error correction, and to monitoring
    • G06F2201/82Solving problems relating to consistency

Definitions

  • the present application relates to the field of data storage technologies, and in particular, to a storage medium management method, apparatus, and readable storage medium.
  • a storage system usually includes a chassis, a power supply, a storage medium, and the like.
  • the storage medium is an indispensable part of the storage system as a carrier for storing data.
  • the storage medium may include a hard disk, an optical disk, and the like.
  • the hard disk may be a solid state disk (SSD), a hard disk drive (HDD), or the like.
  • some storage media usually have a certain life cycle. If a failure occurs during the life cycle, the storage medium needs to be managed and maintained. In the related art, it is usually necessary to manually manage and maintain the storage medium.
  • the hard disk is used as a field replaceable unit (FRU) to support field plug-in replacement. Therefore, when the hard disk needs to be faulty, At the same time, the storage system generally issues an alarm prompt. At this time, the professional can damage and replace the damaged hard disk in the storage system, thereby realizing the management of the storage medium.
  • FRU field replaceable unit
  • the present application provides a storage medium management method, apparatus, and readable storage medium for solving the problem of low management efficiency of the prior art.
  • the technical solution is as follows:
  • the first aspect provides a storage medium management method, which is applied to a storage system, where the method includes:
  • the redundant space is pre-configured or configured based on a maintenance-free rate, a dimension-free period, an annual failure rate AFR of the storage medium, and a total number of components included in the storage medium, the maintenance-free rate And the dimension-free period is carried by a configuration instruction of the storage medium or by querying from a custom register, and the AFR is obtained by a query or carried by the configuration instruction.
  • the maintenance-free rate, the dimension-free period, the AFR of the storage medium, and the total number of components included in the storage medium are configured in advance, a redundant space for storing the data after the fault recovery is configured, so even when If there is a component failure in the storage medium, the data in the failed component can also be transferred to the redundant space for storage, so that the user does not need to manually replace the component, and the storage space can be realized in the specified period by using the redundant space.
  • Maintenance-free management within the room improves management efficiency.
  • the storing the restored data into the redundant space of the storage medium and mapping the address of the failed component to the redundant space further includes:
  • a redundant space of the capacity size is configured based on the storage medium.
  • the above-described configuration of the redundant space based on the configuration instruction may be performed before the restored data is stored in the redundant space of the storage medium and the address of the failed component is mapped to the redundant space Maintenance-free management based on this redundant space can occur when subsequent components fail.
  • the configuring, according to the storage medium, the redundant space of the capacity includes:
  • All the physical storage units that are divided are determined as redundant spaces of the capacity size.
  • the redundant space when configured based on the storage medium, the redundant space can be allocated to each component according to a certain ratio, thus improving the flexibility of configuration.
  • the configuration instruction further includes a query indication, where the query indication is used to indicate a maximum dimension-free capability, a dimension-free state, a dimension-free configuration parameter, a dimension-free judgment result, an available capacity, a dimension-free time, and the At least one of the mean time between failures MTBF of the storage medium.
  • the method further includes:
  • the reconfigured RAID policy is determined as the specified RAID policy.
  • the RAID policy is re-configured so that when the component of the storage medium fails, the data in the failed disk can be restored based on the reconfigured RAID policy, thereby ensuring management of the storage medium. Reliability.
  • the method before the recovering the data stored in the faulty component based on the specified disk array RAID policy, the method further includes:
  • a second aspect provides a management device for a storage medium, configured in a storage system, where the device includes:
  • a recovery module configured to recover data stored in the faulty component based on a specified disk array RAID policy when detecting that a component in the storage medium is faulty;
  • a storage module configured to store the restored data into a redundant space of the storage medium, and map an address of the failed component to the redundant space to implement management of the storage medium;
  • the redundant space is configured in a pre-fixed configuration or based on a maintenance-free rate, a dimension-free period, an annual failure rate AFR of the storage medium, and a total number of components included in the storage medium, the maintenance-free
  • the rate and the dimension-free period are carried by a configuration command to the storage medium or from a custom register, the AFR being obtained by a query or carried by the configuration command.
  • the device further includes:
  • a receiving module configured to receive a configuration instruction for the storage medium, where the configuration instruction carries the maintenance-free rate and the dimension-free period;
  • a first determining module configured to determine that the maintenance-free state is reached within the dimension-free period based on the maintenance-free rate, the dimension-free period, an AFR of the storage medium, and a total number of components included in the storage medium The size of the required redundant space;
  • a first configuration module configured to configure a redundant space of the capacity based on the storage medium.
  • the first configuration module is used to:
  • All the physical storage units that are divided are determined as redundant spaces of the capacity size.
  • the configuration instruction further includes a query indication, where the query indication is used to indicate a maximum dimension-free capability, a dimension-free state, a dimension-free configuration parameter, a dimension-free judgment result, an available capacity, a dimension-free time, and the At least one of the mean time between failures MTBF of the storage medium.
  • the device further includes:
  • a second configuration module configured to re-configure the RAID policy based on the storage medium remaining after being configured in the redundant space
  • a second determining module configured to determine the reconfigured RAID policy as the specified RAID policy.
  • the device further includes:
  • a query module configured to query whether a current remaining capacity of the redundant space is greater than or equal to a size of data stored in the faulty component
  • a triggering module configured to: when the size of the current remaining capacity of the redundant space is greater than or equal to a size of data stored in the failed component, triggering the recovery module to perform the performing based on the specified RAID policy The operation of restoring data stored in the failed component;
  • a third configuration module configured to determine, when the current remaining capacity size of the redundant space is smaller than a size of data stored in the faulty component, the current remaining capacity of the redundant space and the faulty component
  • the difference between the sizes of the stored data, and the physical storage unit of the difference size is configured as the redundant space from the reserved OP space of the storage system.
  • a management apparatus for a storage medium includes a processor and a memory, and the memory is configured to store a management apparatus supporting the storage medium to perform the foregoing first aspect A program of a storage medium management method, and data related to a management method for realizing the storage medium provided by the first aspect described above.
  • the processor is configured to execute a program stored in the memory.
  • the operating device of the storage device may further include a communication bus for establishing a connection between the processor and the memory.
  • a computer readable storage medium having stored therein instructions that, when run on a computer, cause the computer to perform the management method of the storage medium described in the first aspect above.
  • a computer program product comprising instructions which, when run on a computer, cause the computer to perform the method of managing the storage medium of the first aspect.
  • the technical solution provided by the present application has the beneficial effects that the data after the fault recovery is configured due to the pre-fixed configuration or based on the maintenance-free rate, the dimension-free period, the AFR of the storage medium, and the total number of components included in the storage medium. Redundant space, therefore, even if there is a component failure in the storage medium, the data in the failed component can be transferred to the redundant space for storage, so that the user does not need to manually replace the component, and the redundant space is utilized. Maintenance-free management of the storage medium can be realized, and management efficiency is improved.
  • FIG. 1 is a schematic structural diagram of a computer device according to an embodiment of the present application.
  • FIG. 2 is a diagram of a method for managing a storage medium according to an exemplary embodiment
  • FIG. 3A is a management device of a storage medium according to an exemplary embodiment
  • FIG. 3B is a management apparatus of a storage medium according to another exemplary embodiment
  • FIG. 3C is a management apparatus of a storage medium according to another exemplary embodiment
  • FIG. 3D is a management apparatus of a storage medium according to another exemplary embodiment.
  • Redundant space It is mainly used to store data after fault recovery. In actual applications, it can also be called hot standby redundancy. That is, when a component in the storage medium fails, the data stored in the component can be restored and transferred to the redundant space to implement data storage through the redundant space.
  • Maintenance-free rate The probability that the redundant space is not completely damaged. For example, if a storage medium includes n components and the component redundancy number is k, the maintenance-free rate means that no more than k faults occur in the n components. Probability. Generally, there are several requirements for the maintenance-free rate of three 9 or five 9 in the field. Among them, three 9 means that the probability of storage media needing maintenance is less than one thousandth, that is, the number of equipments to be maintained within 5 years is less than one set; five 9 means that the probability of storage media needing maintenance is less than one in 100,000. For example, if the maintenance-free rate is 99.999128%, it means that five 9-year maintenance-free requirements are required.
  • the dimension-free period the period in which the storage medium does not require maintenance. For example, if the dimension-free period is 3, it means that the storage medium is not required to be maintained for 3 years.
  • Mean Time Between Failure is a measure of the reliability of a device. It refers to the average working time between two adjacent faults. It is also called the mean fault interval or the mean time between failures. .
  • the MTBF of a disk array is generally not less than 50,000 hours, and the MTBF of a typical SSD is 1.5 million hours or 2 million hours.
  • AFR Annualized Failure Rate
  • Redundant Arrays of Independent Disks (RAID) strategy.
  • RAID 7 and RAID 6 policies are used.
  • the RAID 5 policy configured with 22+1 is taken as an example. Assume that an SSD includes 23 disks. In this case, there is one disk for storing check data, and the check data is used to appear in the SSD. The data in the failed disk is restored.
  • the 22+1 RAID5 policy only supports data recovery when one disk fails, and if two or more disks fail, data recovery cannot be performed.
  • OP also known as super-storage space
  • OP space refers to the space reserved by the storage system for the storage system to be used by the user, that is, the OP space is usually not used to store user data, for example, the OP Space can be used for system garbage collection.
  • the present application provides a storage medium management method, which divides a reasonable redundant space for a storage medium in a storage system, and utilizes the redundant space, so that in a non-dimensional period, when in a storage medium
  • a storage medium management method which divides a reasonable redundant space for a storage medium in a storage system, and utilizes the redundant space, so that in a non-dimensional period, when in a storage medium
  • manual replacement can be avoided, that is, maintenance-free effect can be achieved in the dimension-free period.
  • This not only reduces network deployment and operation and maintenance costs, enhances user experience, but also improves storage medium management. effectiveness.
  • FIG. 2 please refer to the embodiment shown in FIG. 2 below.
  • the management method of the storage medium provided by the present application may be performed by a storage system, and further, the storage system is configured in a host. It should be noted that, in addition to the storage medium, the storage system usually includes a chassis, a power supply, a fan, a battery backup unit (BBU), an interface card, a control module, and the like.
  • BBU battery backup unit
  • the storage medium may be a hard disk unit, a floppy disk, or the like. Further, the hard disk unit may include one or more hard disks.
  • the hard disk SSD has high performance and high reliability, so the SSD is widely used in the storage system. Therefore, the embodiment shown in FIG. 2 will be described in detail by taking the storage medium as an SSD as an example.
  • SSD SSD is mostly implemented by NAND Flash, which is a non-volatile random access storage medium, which is characterized by the fact that the data does not disappear after power off. It is different from the traditional volatile random access storage medium and volatile memory, and therefore can be used as an external memory.
  • NAND Flash is a non-volatile random access storage medium, which is characterized by the fact that the data does not disappear after power off. It is different from the traditional volatile random access storage medium and volatile memory, and therefore can be used as an external memory.
  • the physical form of SSD also includes a variety of 2.5-inch, HDD, printed circuit board (PCB) M.2, PCB with custom feature length, and independent single-chip package. Ball Grid Array (BGA) morphology and so on.
  • PCB printed circuit board
  • BGA Ball Grid Array
  • FIG. 1 is a schematic structural diagram of a computer device according to an embodiment of the present application.
  • the above storage system can be implemented by the computer device shown in FIG. 1.
  • the computer device includes at least one processor 101, a communication bus 102, a memory 103, and at least one communication interface 104.
  • the processor 101 can be a general purpose central processing unit (CPU), a microprocessor, an application-specific integrated circuit (ASIC), or one or more for controlling the execution of the program of the present application. integrated circuit.
  • CPU central processing unit
  • ASIC application-specific integrated circuit
  • Communication bus 102 can include a path for communicating information between the components described above.
  • the memory 103 can be a read-only memory (ROM) or other type of static storage device that can store static information and instructions, a random access memory (RAM), or other information that can store information and instructions.
  • ROM read-only memory
  • RAM random access memory
  • Type of dynamic storage device or Electro Scientific Erasable Programmable Read-Only Memory (EEPROM), Compact Disc Read-Only Memory (CD-ROM) or other optical disc storage, optical disc Storage (including compact discs, laser discs, optical discs, digital versatile discs, Blu-ray discs, etc.), magnetic disk storage media or other magnetic storage devices, or capable of carrying or storing desired program code in the form of instructions or data structures and capable of being Any other medium accessed by the computer, but is not limited thereto.
  • Memory 103 may be present independently and coupled to processor 101 via communication bus 102.
  • the memory 103 can also be integrated with the processor 101.
  • the communication interface 104 uses devices such as any transceiver for communicating with other devices or communication networks, such as Ethernet, Radio Access Network (RAN), Wireless Local Area Networks (WLAN), and the like.
  • devices such as any transceiver for communicating with other devices or communication networks, such as Ethernet, Radio Access Network (RAN), Wireless Local Area Networks (WLAN), and the like.
  • RAN Radio Access Network
  • WLAN Wireless Local Area Networks
  • processor 201 may include one or more CPUs, such as CPU0 and CPU1 shown in FIG.
  • a computer device can include multiple processors, such as processor 101 and processor 105 shown in FIG. Each of these processors can be a single-CPU processor or a multi-core processor.
  • a processor herein may refer to one or more devices, circuits, and/or processing cores for processing data, such as computer program instructions.
  • the computer device can also include an output device 106 and an input device 107.
  • Output device 106 is in communication with processor 101 and can display information in a variety of ways.
  • the output device 106 can be a liquid crystal display (LCD), a light emitting diode (LED) display device, a cathode ray tube (CRT) display device, or a projector. Wait.
  • Input device 107 is in communication with processor 101 and can receive user input in a variety of ways.
  • input device 107 can be a mouse, keyboard, touch screen device, or sensing device, and the like.
  • the computer device described above may be a general purpose computer device or a special purpose computer device.
  • the computer device may be a desktop computer, a portable computer, a network server, a personal digital assistant (PDA), a mobile phone, a tablet computer, a wireless terminal device, a communication device, or an embedded device.
  • PDA personal digital assistant
  • the embodiments of the present application do not limit the type of computer equipment.
  • the memory 103 is used to store program code for executing the solution of the present application, and is controlled by the processor 101 for execution.
  • the processor 101 is operative to execute the program code 108 stored in the memory 103.
  • One or more software modules may be included in program code 108.
  • the above storage system may determine data for developing an application by the processor 101 and one or more of the program codes 108 in the memory 103.
  • FIG. 2 is a management method of a storage medium according to an exemplary embodiment.
  • the management method of the storage medium is applied to the foregoing storage system, and the method may include the following implementation steps:
  • Step 201 Receive a configuration instruction for the storage medium, where the configuration instruction carries a maintenance-free rate and a dimension-free period.
  • the core of maintenance-free is to use redundant space so that components will not be completely damaged during the life cycle. Therefore, in an actual implementation, in order to enable the storage medium to achieve maintenance-free effect, the user can configure a redundant space of a reasonable capacity in the storage medium according to actual needs, so as to realize the dimension-free effect by using the redundant space.
  • the specific implementation of configuring a redundant space of a reasonable capacity is as described in steps 201 to 203.
  • the configuration command may be sent by the host to the storage system. Further, the configuration command may be sent by the user to the storage system through the host.
  • the user can customize the maintenance-free rate and the dimension-free period according to the actual requirements, and send the configuration command to the storage system to configure or change the dimension-free parameters of the storage system. If the storage system is not configured with a redundant space, the user sends the configuration command to indicate that the storage system is configured with the dimension-free parameter. If the storage system is configured with a redundant space, the user sends the configuration command. Indicates that the storage system changes the dimension-free parameters.
  • the maintenance-free rate carried in the configuration command can be configured to 99.999128%, and the dimension-free period is configured to 3.
  • the format of the configuration instruction may adopt a command form such as a standard small computer system interface (SCSI)/Non Volatile Memory Express (NVMe). , you can also use a custom form.
  • SCSI small computer system interface
  • NVMe Non Volatile Memory Express
  • the configuration command may specifically be: Dis_Maintain(Control).
  • Control can also include parameters such as dimension-free switch, maintenance-free rate, and dimension-free period.
  • the dimension-free switch is used to indicate that the dimension-free function is turned on or the dimension-free function is turned off.
  • the parameter setting can be performed when the dimension-free function is turned on. For example, the dimension-free parameters such as the maintenance-free rate and the dimension-free period are set.
  • DisMaintain_EN DisMaintain_EN.
  • DisMaintain_EN 10b
  • the above maintenance-free rate can be defined by DisMaintain_Rate
  • the maintenance-free rate and the dimension-free period are carried by the configuration command of the storage medium.
  • the maintenance-free rate and the dimension-free period may also be that the storage system queries from a customized register, that is, the maintenance-free rate and the dimension-free period may be stored in the register in advance by default. in. In this case, when the storage system receives the configuration command, the maintenance-free rate and the dimension-free period are queried from the register.
  • the configuration instruction further includes a query indication, where the query indication is used to indicate a maximum dimension-free capability, a dimension-free state, a dimension-free configuration parameter, a dimension-free judgment result, an available capacity, a dimension-free time, and an MTBF of the storage medium. At least one of them.
  • the configuration command can be Dis_Maintain (Control, Status), where Status indicates the query indication. Further, for the maintenance-free storage system that has been configured before, the status can be used to query the configured dimension-free parameters.
  • the maximum dimension-free capability refers to the maximum dimension-free period and maintenance-free rate that the system can support.
  • the storage system may also perform the following steps before performing the following steps: If the dimension-free period and the maintenance-free rate of the configuration command exceed the maximum dimension-free capability, if the dimension-free period and the maintenance-free rate carried in the configuration command exceed the maximum dimension-free capability, you can prompt that the configuration cannot be performed. Otherwise, Subsequent step 202 can continue.
  • the above dimension-free state includes an open state and a closed state. Further, in an actual application, the dimension-free state may be defined as DisMaintain_Status.
  • the above-mentioned dimension-free configuration parameter refers to the maintenance-free rate and the dimension-free period that have been set before. Further, the dimension-free configuration parameter can be defined as DisMaintain_Para.
  • the result of the above-mentioned dimension-free judgment includes success and failure. For example, if the configuration can be successful, the displayed is displayed. If the configuration is unsuccessful (for example, when the configured dimension-free parameter exceeds the maximum dimension-free capability of the system), the failed is displayed. Further, the result of the dimension-free judgment can be defined as DisMaintain_judge.
  • the above available capacity refers to the capacity available to the user after the dimension-free function is enabled. Further, the available capacity can be defined as DisMaintain_Capa.
  • the above dimension-free time includes the time when the dimension-free function is enabled and the remaining dimension-free time. Further, the dimension-free time can be defined as DisMaintain_Time.
  • the query indication may also be used to indicate that other information is queried, for example, it may also be used to indicate a query AFR or the like.
  • the above description is only taken as an example of carrying the query indication in the configuration command.
  • the query indication may be carried by a single command, which is not limited in this embodiment. .
  • Step 202 Determine a capacity size of a redundant space required to reach a maintenance-free state in the dimension-free period based on the maintenance-free rate, the dimension-free period, the AFR of the storage medium, and the total number of components included in the storage medium.
  • the storage system may determine that the maintenance-free period is reached in the maintenance-free period by using the following formula (1) based on the maintenance-free rate, the dimension-free period, the AFR of the storage medium, and the total number of components included in the storage medium.
  • the size of the redundant space required for the state is based on the maintenance-free rate, the dimension-free period, the AFR of the storage medium, and the total number of components included in the storage medium.
  • This n represents the total number of components included in the storage medium.
  • This k represents the capacity of the redundant space.
  • the AFR can be obtained through a query, and the query method is as described above, and details are not repeated herein.
  • the AFR may also be carried by the configuration command, that is, the AFR may also be input by the user after the query based on the actually used storage medium.
  • the capacity size of the redundant space required to reach the maintenance-free state in the dimension-free period can be determined.
  • the storage medium is 25 1TB SSDs
  • the physical bare capacity of the SSD is 25TB, wherein if the system reserved OP space is 7TB, the user available physical capacity is 18TB.
  • the maintenance-free rate carried in the received configuration command is 5
  • the dimension-free period is 3 years
  • the AFR is determined to be 0.438% by query.
  • the storage medium includes components that are storage components for independent faulty units in the system, that is, the components are independent of each other, and if one component is damaged, it does not affect other components around.
  • the determination of the size of the redundant space is based on the maintenance-free rate, the dimension-free period, the annual failure rate AFR of the storage medium, and the total number of components included in the storage medium.
  • the redundant space may also be fixedly configured, that is, the capacity of the redundant space is fixedly set in advance. In this case, the foregoing calculation process does not need to be performed, and only needs to be performed based on the capacity of the preset fixed capacity. Subsequent configuration operations can be.
  • the redundancy ratio refers to the ratio between the number of redundancy and the total number of components. Therefore, when the user is designing the system (for example, configuring the dimension-free parameters), it is necessary to comprehensively measure the ratio of the input of the redundancy ratio, the investment of the maintenance, and the input of the spare parts.
  • Step 203 Configure a redundant space of the capacity based on the storage medium.
  • a specific implementation of configuring a redundant space of the capacity may include: determining a ratio of a capacity of the redundant space to a total capacity of the storage medium, from the storage medium Each of the included components is divided into physical storage units of the ratio, and all the divided physical storage units are determined as redundant spaces of the capacity.
  • the storage system can determine that the capacity of the redundant space accounts for 5/25 of the total capacity of the storage medium, and therefore, for the SSD.
  • the storage system can divide a physical unit of 5/25 size from each disk, and then determine all the physical storage units that are divided into redundant spaces of the capacity, so A 5TB-sized redundant space is configured from the storage medium.
  • the specific implementation of configuring the redundant space based on the storage medium based on the storage medium is merely exemplary.
  • the capacity may be configured based on the storage medium.
  • the steps of the redundant space For example, in another possible implementation, 5TB can be selected as the redundant space from 18TB available to the user, that is, 5 disks are selected from the 18 disks, and the 5 disks are used as the above. Redundant space. Or, in another possible implementation manner, the redundant space may be evenly distributed in the 18 disks, which is not limited in this embodiment of the present application.
  • the RAID policy is re-configured based on the storage medium remaining after the redundant space configuration, and the reconfigured RAID policy is determined to be the designated RAID. Strategy.
  • the RAID policy configuration needs to be re-configured, such as configuring the 21+2 erasure code (Erasure Code). , EC) or RAID6, 22+1 RAID5, etc., so that when a component failure occurs in the storage medium, the data in the failed disk can be restored based on the reconfigured RAID policy.
  • Erasure Code 21+2 erasure code
  • EC erasure Code
  • RAID6, 22+1 RAID5 etc.
  • the reconfigured RAID policy may be consistent with the original RAID policy, which is not limited in this application.
  • the maintenance space can be used to implement the maintenance-free management. For the specific implementation, refer to the following steps 204 to 205.
  • Step 204 When it is detected that there is a component failure in the storage medium, recover the data stored in the failed component based on the specified RAID policy.
  • the storage system If a fault occurs in the storage medium, to prevent data loss, the storage system starts the reconfigured RAID policy for data recovery. At this time, the damaged components are no longer used.
  • the configured redundant space is sufficient to store data in the damaged component. If the configured remaining space is currently large enough to store data in the corrupted component, perform the following step 205, but if the configured redundant space currently has insufficient capacity to store the damaged component Data (for example, some abnormalities lead to an increase in the failure rate of the storage medium), in order to continue to ensure that the storage medium can be maintenance-free, a spare space can be borrowed from the storage medium or the OP space of the system for hot standby redundancy.
  • the size of the current remaining capacity of the redundant space is greater than or equal to the size of the data stored in the failed component, if the current remaining capacity of the redundant space is greater than or equal to that stored in the failed component.
  • the size of the data is performed as follows. If the current remaining capacity of the redundant space is smaller than the size of the data stored in the failed component, it is determined that the current remaining capacity of the redundant space is stored in the failed component.
  • the difference between the sizes of the data, and the physical storage unit of the difference size is configured from the reserved OP space of the storage system as the redundant space.
  • the reserved OP space of the storage system refers to an OP space reserved for storage by the system and accessible by the user for improving the performance and reliability of the storage system.
  • the configured remaining capacity of the redundant space is sufficient to store the damaged component.
  • the following step 205 can be performed. If the current remaining capacity of the redundant space is smaller than the size of the data stored in the failed component, the configured remaining capacity of the redundant space is not large enough to store the data in the damaged component. Since the user's available capacity space cannot be reduced, a part of the space needs to be borrowed from the OP space of the storage system to store the restored data, and the address of the failed component is mapped into the borrowed part of the space.
  • the space is borrowed from the OP space of the storage system to store the restored data.
  • the storage medium may also be used.
  • the OP space borrows a portion of the space to store the recovered data.
  • Step 205 Store the restored data into a redundant space of the storage medium, and map an address of the failed component to the redundant space to implement management of the storage medium.
  • the recovered data needs to be stored in the redundant space of the storage medium to utilize the redundant space to store data in the failed component.
  • the address of the failed component needs to be mapped to the redundant space, thus, for the user. It is said that the data is written according to the address of the component that was originally faulty, but in fact the data has been transferred to the redundant space for storage.
  • the storage system stores the restored data to the redundant space of the storage medium, and then the 0GB to 200GB space of the address of the failed component Disk1. Map to this redundant space. In this way, for the user, the data originally planned to be stored in the space of 0 GB to 200 GB of Disk 1, but the data is no longer stored in the failed component, but stored in the redundant space.
  • the redundant space is configured in advance or based on the maintenance-free rate, the dimension-free period, the annual failure rate AFR of the storage medium, and the total number of components included in the storage medium.
  • the maintenance-free rate and the dimension-free period are carried by a configuration command of the storage medium or by querying from a custom register, and the AFR is obtained by a query or carried by the configuration instruction.
  • the storage medium management method can also be applied to an all-flash memory (for example, a solid state hard disk array).
  • an all-flash memory for example, a solid state hard disk array.
  • SSA Solid-State Array
  • AFA All Flash Array
  • the redundant space for storing the data after the fault recovery is configured due to the pre-fixed configuration or based on the maintenance-free rate, the dimension-free period, the AFR of the storage medium, and the total number of components included in the storage medium. Even if there is a component failure in the storage medium, the data in the failed component can be transferred to the redundant space for storage, so that the user does not need to manually replace the component, and the storage medium can be realized by using the redundant space. Maintenance-free management improves management efficiency.
  • FIG. 3A is a management device of a storage medium according to an exemplary embodiment.
  • the management device of the storage medium may be implemented by software, hardware, or a combination of the two.
  • the device includes:
  • the recovery module 310 is configured to perform step 204 in the embodiment shown in FIG. 2;
  • the storage module 320 is configured to perform step 205 in the embodiment shown in FIG. 2.
  • the apparatus further includes:
  • the receiving module 330 is configured to perform step 201 in the embodiment shown in FIG. 2;
  • a first determining module 340 configured to perform step 202 in the foregoing embodiment shown in FIG. 2;
  • the first configuration module 350 is configured to perform step 203 in the embodiment shown in FIG. 2 above.
  • the first configuration module 350 is configured to:
  • All the physical storage units that are divided are determined as redundant spaces of the capacity size.
  • the apparatus further includes:
  • the second configuration module 360 is configured to re-configure the RAID policy based on the storage medium remaining after the redundant space configuration.
  • the second determining module 370 is configured to determine the reconfigured RAID policy as the specified RAID policy.
  • the apparatus further includes:
  • the query module 380 is configured to query whether the current remaining capacity of the redundant space is greater than or equal to the size of data stored in the faulty component;
  • the triggering module 390 is configured to trigger the recovery module 310 to perform the step 204 in the embodiment shown in FIG. 2 when the size of the current remaining capacity of the redundant space is greater than or equal to the size of the data stored in the faulty component.
  • the third configuration module 312 is configured to determine, when the current remaining capacity size of the redundant space is smaller than the size of the data stored in the faulty component, the current remaining capacity of the redundant space and the data stored in the failed component.
  • the difference between the sizes and the reserved OP space from the storage system configures the difference in size of the physical storage unit as the redundant space.
  • the redundant space for storing the data after the fault recovery is configured due to the pre-fixed configuration or based on the maintenance-free rate, the dimension-free period, the AFR of the storage medium, and the total number of components included in the storage medium. Even if there is a component failure in the storage medium, the data in the failed component can be transferred to the redundant space for storage, so that the user does not need to manually replace the component, and the storage medium can be realized by using the redundant space. Maintenance-free management improves management efficiency.
  • the management method of the storage medium provided by the foregoing embodiment implements the management method of the storage medium, only the division of each functional module described above is used for illustration. In actual applications, the function allocation may be different according to requirements.
  • the function module is completed, that is, the internal structure of the device is divided into different functional modules to complete all or part of the functions described above.
  • the management device of the storage medium provided by the foregoing embodiment is the same as the embodiment of the management method of the storage medium, and the specific implementation process is described in detail in the method embodiment, and details are not described herein again.
  • the computer program product includes one or more computer instructions.
  • the computer can be a general purpose computer, a special purpose computer, a computer network, or other programmable device.
  • the computer instructions can be stored in a computer readable storage medium or transferred from one computer readable storage medium to another computer readable storage medium, for example, the computer instructions can be from a website site, computer, server or data center Transfer to another website site, computer, server, or data center by wire (eg, coaxial cable, fiber optic, Digital Subscriber Line (DSL)) or wireless (eg, infrared, wireless, microwave, etc.).
  • the computer readable storage medium can be any available media that can be accessed by a computer or a data storage device such as a server, data center, or the like that includes one or more available media.
  • the usable medium may be a magnetic medium (for example, a floppy disk, a hard disk, a magnetic tape), an optical medium (for example, a digital versatile disc (DVD)), or a semiconductor medium (for example, a solid state disk (SSD)). )Wait.
  • a magnetic medium for example, a floppy disk, a hard disk, a magnetic tape
  • an optical medium for example, a digital versatile disc (DVD)
  • DVD digital versatile disc
  • SSD solid state disk
  • a person skilled in the art may understand that all or part of the steps of implementing the above embodiments may be completed by hardware, or may be instructed by a program to execute related hardware, and the program may be stored in a computer readable storage medium.
  • the storage medium mentioned may be a read only memory, a magnetic disk or an optical disk or the like.

Landscapes

  • Engineering & Computer Science (AREA)
  • Theoretical Computer Science (AREA)
  • Quality & Reliability (AREA)
  • Physics & Mathematics (AREA)
  • General Engineering & Computer Science (AREA)
  • General Physics & Mathematics (AREA)
  • Information Retrieval, Db Structures And Fs Structures Therefor (AREA)
  • Debugging And Monitoring (AREA)

Abstract

本申请公开了一种存储介质的管理方法、装置及可读存储介质,属于数据存储技术领域。该方法包括:检测到存储介质中存在部件出现故障时,基于指定RAID策略对该出现故障的部件中存储的数据进行恢复;将恢复后的数据存储至该存储介质的冗余空间中,并将该出现故障的部件的地址映射至该冗余空间;其中,该冗余空间是预先固定配置的或基于免维护率、免维周期、存储介质的AFR和存储介质包括的部件总数量确定后配置的,免维护率和免维周期是通过对该存储介质的配置指令携带或者从自定义的寄存器中查询得到,该AFR是通过查询得到或由配置指令携带。本申请不需要用户手工更换部件,利用该冗余空间可以实现对存储介质的免维护管理,提高了管理效率。

Description

存储介质的管理方法、装置及可读存储介质 技术领域
本申请涉及数据存储技术领域,特别涉及一种存储介质的管理方法、装置及可读存储介质。
背景技术
随着数据存储技术的快速发展,存储系统得到了广泛的应用。存储系统中通常包括机箱、电源、存储介质等。其中,存储介质作为存储数据的载体,是存储系统中不可或缺的一部分,目前,该存储介质可以包括硬盘、光盘等。进一步地,该硬盘可以为固态硬盘(Solid State Disk,SSD)、硬盘驱动器(Hard Disk Drive,HDD)等。
在实际应用场景中,一些存储介质通常存在一定的生命周期,如果在该生命周期内出现故障,则需要对存储介质进行管理和维护。在相关技术中,通常需要人工对存储介质进行管理和维护,譬如,以硬盘为例,由于硬盘作为现场可更换单元(Field Replace Unit,FRU)能够支持现场拔插更换,因此,当硬盘需要故障时,存储系统一般会发出报警提示,此时,可以由专业人员对存储系统中已损坏的硬盘进行拔插更换,从而实现对存储介质的管理。
在实现本申请的过程中,发现相关技术至少存在以下问题:在上述提供的存储介质的管理方法中,由于当硬盘出现故障时,需要人工进行操作,因此,导致管理效率较低。
发明内容
本申请提供了一种存储介质的管理方法、装置及可读存储介质,用于解决现有技术的管理效率较低问题。所述技术方案如下:
第一方面,提供了一种存储介质的管理方法,应用于存储系统中,所述方法包括:
当检测到存储介质中存在部件出现故障时,基于指定磁盘阵列RAID策略对所述出现故障的部件中存储的数据进行恢复;
将恢复后的数据存储至所述存储介质的冗余空间中,并将所述出现故障的部件的地址映射至所述冗余空间,以实现对所述存储介质的管理;
其中,所述冗余空间是预先配置的或基于免维护率、免维周期、所述存储介质的年失效率AFR和所述存储介质包括的部件总数量确定后配置的,所述免维护率和所述免维周期是通过对所述存储介质的配置指令携带或者从自定义的寄存器中查询得到,所述AFR是通过查询得到或由所述配置指令携带。
在本申请实施例中,由于预先基于免维护率、免维周期、存储介质的AFR和存储介质包括的部件总数量,配置了用于存储故障恢复后的数据的冗余空间,因此,即使当存储介质中存在部件出现故障,也可以将出现故障的部件中的数据转移至该冗余空间中存储,如此,不需要用户手工更换部件,利用该冗余空间可以实现对该存储介质在指定周期内的免维护管理,提高了管理效率。
可选地,所述将恢复后的数据存储至所述存储介质的冗余空间中,并将所述出现故障的部件的地址映射至所述冗余空间之前,还包括:
接收对所述存储介质的配置指令,所述配置指令携带所述免维护率和所述免维周期;
基于所述免维护率、所述免维周期、所述存储介质的AFR和所述存储介质包括的部件总数量,确定在所述免维周期内达到免维护状态所需的冗余空间的容量大小;
基于所述存储介质,配置所述容量大小的冗余空间。
上述在将恢复后的数据存储至所述存储介质的冗余空间中,并将所述出现故障的部件的地址映射至所述冗余空间之前,基于配置指令进行冗余空间的配置,可以使得后续部件出现故障时,能够基于该冗余空间进行免维护管理。
可选地,所述基于所述存储介质,配置所述容量大小的冗余空间,包括:
确定所述冗余空间的容量大小占所述存储介质的总容量的比例;
从所述存储介质包括的每个部件中划分出所述比例的物理存储单元;
将划分出的所有物理存储单元确定为所述容量大小的冗余空间。
上述可以在基于存储介质配置冗余空间时,可以按照一定的比例,将冗余空间分配在各个部件中,如此,提高了配置了灵活性。
可选地,所述配置指令中还携带查询指示,所述查询指示用于指示查询最大免维能力、免维状态、免维配置参数、免维判断结果、可用容量、免维时间以及所述存储介质的平均故障间隔时间MTBF中的至少一个。
在实际实现中,对于之前已经配置过的免维护存储系统,可以通过该Status来查询已配置过的免维参数等。进一步地,可以根据查询到的免维参数,进一步来判断本次是否可以配置成功,保证了实现的可靠性。
可选地,所述基于所述存储介质,配置所述容量大小的冗余空间之后,还包括:
基于经过冗余空间配置后剩余的存储介质,重新进行RAID策略配置;
将重新配置后的RAID策略确定为所述指定RAID策略。
上述在冗余空间配置成功后,重新进行RAID策略配置,以便于后续当存储介质出现部件故障时,可以基于该重新配置后的RAID策略恢复回失效盘中的数据,从而保证了对存储介质管理的可靠性。
可选地,所述基于指定磁盘阵列RAID策略对所述出现故障的部件中存储的数据进行恢复之前,还包括:
查询所述冗余空间当前剩余的容量大小是否大于或等于所述出现故障的部件中存储的数据的大小;
若所述冗余空间当前剩余的容量大小大于或等于所述出现故障的部件中存储的数据的大小,则执行所述基于所述指定RAID策略对所述出现故障的部件中存储的数据进行恢复的操作;
若所述冗余空间当前剩余的容量大小小于所述出现故障的部件中存储的数据的大小,则确定所述冗余空间当前剩余的容量与所述出现故障的部件中存储的数据的大小之间的差值,并从所述存储系统的预留OP空间借用所述差值大小的物理存储单元作为所述冗余空间。
上述在基于指定RAID策略对所述出现故障的部件中存储的数据进行恢复之前,查询配置的冗余空间当前剩余的容量大小是否足够存储出现故障的部件中存储的数据,以便于存储系统能够根据实际情况,确定是否需要从存储系统的预留OP空间借用部分空间来作为热备冗余,如此,保证了即使在冗余空间当前剩余的容量不足以存储出现故障的部件中存储的数据时,仍能够继续保证存储介质能够免维护。如此,提高了数据存储 的可靠性。
第二方面,提供了一种存储介质的管理装置,配置于存储系统中,所述装置包括:
恢复模块,用于当检测到存储介质中存在部件出现故障时,基于指定磁盘阵列RAID策略对所述出现故障的部件中存储的数据进行恢复;
存储模块,用于将恢复后的数据存储至所述存储介质的冗余空间中,并将所述出现故障的部件的地址映射至所述冗余空间,以实现对所述存储介质的管理;
其中,所述冗余空间是预先固定配置的或基于免维护率、免维周期、所述存储介质的年失效率AFR和所述存储介质包括的部件总数量确定后配置的,所述免维护率和所述免维周期是通过对所述存储介质的配置指令携带或者从自定义的寄存器中查询得到,所述AFR是通过查询得到或由所述配置指令携带。
可选地,所述装置还包括:
接收模块,用于接收对所述存储介质的配置指令,所述配置指令携带所述免维护率和所述免维周期;
第一确定模块,用于基于所述免维护率、所述免维周期、所述存储介质的AFR和所述存储介质包括的部件总数量,确定在所述免维周期内达到免维护状态所需的冗余空间的容量大小;
第一配置模块,用于基于所述存储介质,配置所述容量大小的冗余空间。
可选地,所述第一配置模块用于:
确定所述冗余空间的容量大小占所述存储介质的总容量的比例;
从所述存储介质包括的每个部件中划分出所述比例的物理存储单元;
将划分出的所有物理存储单元确定为所述容量大小的冗余空间。
可选地,所述配置指令中还携带查询指示,所述查询指示用于指示查询最大免维能力、免维状态、免维配置参数、免维判断结果、可用容量、免维时间以及所述存储介质的平均故障间隔时间MTBF中的至少一个。
可选地,所述装置还包括:
第二配置模块,用于基于经过冗余空间配置后剩余的存储介质,重新进行RAID策略配置;
第二确定模块,用于将重新配置后的RAID策略确定为所述指定RAID策略。
可选地,所述装置还包括:
查询模块,用于查询所述冗余空间当前剩余的容量大小是否大于或等于所述出现故障的部件中存储的数据的大小;
触发模块,用于当所述冗余空间当前剩余的容量大小大于或等于所述出现故障的部件中存储的数据的大小时,触发所述恢复模块执行所述基于所述指定RAID策略对所述出现故障的部件中存储的数据进行恢复的操作;
第三配置模块,用于当所述冗余空间当前剩余的容量大小小于所述出现故障的部件中存储的数据的大小时,确定所述冗余空间当前剩余的容量与所述出现故障的部件中存储的数据的大小之间的差值,并从所述存储系统的预留OP空间配置所述差值大小的物理存储单元作为所述冗余空间。
第三方面,提供了一种存储介质的管理装置,所述存储介质的管理装置的结构中包括处理器和存储器,所述存储器用于存储支持存储介质的管理装置执行上述第一方面所 提供的存储介质的管理方法的程序,以及存储用于实现上述第一方面所提供的存储介质的管理方法所涉及的数据。所述处理器被配置为用于执行所述存储器中存储的程序。所述存储设备的操作装置还可以包括通信总线,该通信总线用于该处理器与存储器之间建立连接。
第四方面,提供了一种计算机可读存储介质,所述计算机可读存储介质中存储有指令,当其在计算机上运行时,使得计算机执行上述第一方面所述的存储介质的管理方法。
第五方面,提供了一种包含指令的计算机程序产品,当其在计算机上运行时,使得计算机执行上述第一方面所述的存储介质的管理方法。
上述第二方面、第三方面、第四方面和第五方面所获得的技术效果与第一方面中对应的技术手段获得的技术效果近似,在这里不再赘述。
本申请提供的技术方案带来的有益效果是:由于预先固定配置或基于免维护率、免维周期、存储介质的AFR和存储介质包括的部件总数量,配置了用于存储故障恢复后的数据的冗余空间,因此,即使当存储介质中存在部件出现故障,也可以将出现故障的部件中的数据转移至该冗余空间中存储,如此,不需要用户手工更换部件,利用该冗余空间可以实现对该存储介质的免维护管理,提高了管理效率。
附图说明
图1是本申请实施例提供的一种计算机设备的结构示意图;
图2是根据一示例性实施例示出的一种存储介质的管理方法;
图3A是根据一示例性实施例示出的一种存储介质的管理装置;
图3B是根据另一示例性实施例示出的一种存储介质的管理装置;
图3C是根据另一示例性实施例示出的一种存储介质的管理装置;
图3D是根据另一示例性实施例示出的一种存储介质的管理装置。
具体实施方式
为使本申请的目的、技术方案和优点更加清楚,下面将结合附图对本申请实施方式作进一步地详细描述。
在对本申请实施例进行详细介绍之前,先对本申请实施例涉及的名词、应用场景和系统架构进行简单介绍。
首先,对本申请实施例涉及的名词进行简单介绍。
冗余空间:主要用于存储故障恢复后的数据,在实际应用中,还可以称为热备冗余。也即是,当存储介质中的某部件出现故障时,可以将该部件中存储的数据进行恢复后转移至该冗余空间中,以通过该冗余空间实现数据存储。
免维护率:冗余空间不全部损坏的概率,例如,若某存储介质中包括n个部件,部件冗余数为k,则免维护率是指该n个部件中出现不大于k个故障的概率。通常在本领域中,免维护率存在3个9或5个9等几种要求。其中,3个9意味着存储介质需要维护的概率小于千分之一,即5年内需要维护的设备套数小于1套;5个9意味着存储介质需要维护的概率小于十万分之一。例如,若免维护率为99.999128%,则说明需要满足5个9的免维护率要求。
免维周期:存储介质不需要维护的周期,例如,若该免维周期为3,则意味着3年 内不需要对该存储介质进行维护。
平均故障间隔时间:(Mean Time Between Failure,MTBF),是用于衡量一个设备的可靠性指标,是指相邻两次故障之间的平均工作时间,也称为平均故障间隔或平均无故障时间。例如,磁盘阵列的MTBF一般不低于50000小时,典型的SSD的MTBF是150万小时或200万小时。
年失效率:(Annualized Failure Rate,AFR),是指设备一年内失效的概率。该AFR与上述MTBF相对应,例如,若该SSD的MTBF是150万小时或200万小时,则对应的AFR分别为0.584%或0.438%。
独立冗余磁盘阵列:(Redundant Arrays of Independent Disks,RAID)策略,目前,使用较多的包括RAID5策略和RAID6策略等。接下来,以配置22+1的RAID5策略为例进行说明,假设某SSD包括23个盘,此时,相当于有一个盘用于存储校验数据,该校验数据用于对该SSD中出现故障的盘中的数据进行恢复。并且,该22+1的RAID5策略只支持一个盘出现故障时进行数据恢复,若出现两个或两个以上的盘出现故障,则无法进行数据恢复。
预留空间:(Over-provisioning,OP)又称超供空间,是指存储系统预留的供存储系统使用而用户不可用的空间,即该OP空间通常不用于存储用户数据,例如,该OP空间可以用于系统垃圾回收等作用。
接下来,对本申请实施例涉及的应用场景进行简单介绍。
在使用存储介质进行数据存储的过程中,即使在存储介质的生命周期内,也可能由于一些不可避免的因素导致存储介质中出现部件损坏的情况。目前,当部件损坏时,一般需要人工对损坏的部件进行拔插更换,如此,不仅导致提高了网络部署和运维成本的问题,还使得存储介质的管理效率较低。为此,本申请提供了一种存储介质的管理方法,该方法针对存储系统中的存储介质划分出合理的冗余空间,并利用该冗余空间,使得在免维周期内,当存储介质中出现部件损坏时,可以避免需要人工进行手动更换,即在免维周期内能够达到免维护的效果,如此,不仅减低了网络部署和运维成本,增强了用户体验,还提高了存储介质的管理效率。其具体实现请参见如下图2所示的实施例。
最后,对本本申请实施例涉及的系统架构进行简单介绍。
本申请提供的存储介质的管理方法可以由存储系统来执行,进一步地,该存储系统配置于主机中。需要说明的是,该存储系统除了包括有存储介质外,还通常包括有机箱、电源、风扇、备用电源组(Battery Backup Unit,BBU)、接口卡、控制模块等。
在实际实现中,该存储介质可以为硬盘单元、软盘等,进一步地,该硬盘单元可以包括一个或者多个硬盘。其中,由于硬盘SSD具备高性能和高可靠性,所以SSD在存储系统中的应用较为广泛,因此,如下图2所示的实施例将以该存储介质是SSD为例进行详细说明。
接下来,对SSD进行简单的介绍:固态硬盘SSD多使用与非门闪存(NAND Flash)来实现,该NAND Flash是一种非易失性随机访问存储介质,其特点是断电后数据不消失,它不同于传统的易失性随机访问存储介质和挥发性存储器,因此,可以作为外部存储器使用。目前SSD的物理形态也包括多种,有和HDD相同大小的2.5寸、也有独立印刷线路板(Printed Circuit Board,PCB)的M.2,有定制特性长度的PCB形态,还有独立单芯片封装的球珊阵列结构(Ball Grid Array,BGA)形态等等。
图1是本申请实施例提供的一种计算机设备的结构示意图。上述存储系统可以通过图1所示的计算机设备来实现。参见图1,该计算机设备包括至少一个处理器101,通信总线102,存储器103以及至少一个通信接口104。
处理器101可以是一个通用中央处理器(Central Processing Unit,CPU),微处理器,特定应用集成电路(application-specific integrated circuit,ASIC),或一个或多个用于控制本申请方案程序执行的集成电路。
通信总线102可包括一通路,在上述组件之间传送信息。
存储器103可以是只读存储器(read-only memory,ROM)或可存储静态信息和指令的其它类型的静态存储设备,随机存取存储器(random access memory,RAM))或者可存储信息和指令的其它类型的动态存储设备,也可以是电可擦可编程只读存储器(Electrically Erasable Programmable Read-Only Memory,EEPROM)、只读光盘(Compact Disc Read-Only Memory,CD-ROM)或其它光盘存储、光碟存储(包括压缩光碟、激光碟、光碟、数字通用光碟、蓝光光碟等)、磁盘存储介质或者其它磁存储设备、或者能够用于携带或存储具有指令或数据结构形式的期望的程序代码并能够由计算机存取的任何其它介质,但不限于此。存储器103可以是独立存在,通过通信总线102与处理器101相连接。存储器103也可以和处理器101集成在一起。
通信接口104,使用任何收发器一类的装置,用于与其它设备或通信网络通信,如以太网,无线接入网(RAN),无线局域网(Wireless Local Area Networks,WLAN)等。
在具体实现中,作为一种实施例,处理器201可以包括一个或多个CPU,例如图1中所示的CPU0和CPU1。
在具体实现中,作为一种实施例,计算机设备可以包括多个处理器,例如图1中所示的处理器101和处理器105。这些处理器中的每一个可以是一个单核(single-CPU)处理器,也可以是一个多核(multi-CPU)处理器。这里的处理器可以指一个或多个设备、电路、和/或用于处理数据(例如计算机程序指令)的处理核。
在具体实现中,作为一种实施例,计算机设备还可以包括输出设备106和输入设备107。输出设备106和处理器101通信,可以以多种方式来显示信息。例如,输出设备106可以是液晶显示器(liquid crystal display,LCD),发光二级管(light emitting diode,LED)显示设备,阴极射线管(cathode ray tube,CRT)显示设备,或投影仪(projector)等。输入设备107和处理器101通信,可以以多种方式接收用户的输入。例如,输入设备107可以是鼠标、键盘、触摸屏设备或传感设备等。
上述的计算机设备可以是一个通用计算机设备或者是一个专用计算机设备。在具体实现中,计算机设备可以是台式机、便携式电脑、网络服务器、掌上电脑(Personal Digital Assistant,PDA)、移动手机、平板电脑、无线终端设备、通信设备或者嵌入式设备。本申请实施例不限定计算机设备的类型。
其中,存储器103用于存储执行本申请方案的程序代码,并由处理器101来控制执行。处理器101用于执行存储器103中存储的程序代码108。程序代码108中可以包括一个或多个软件模块。上述存储系统可以通过处理器101以及存储器103中的程序代码108中的一个或多个软件模块,来确定用于开发应用的数据。
接下来,结合附图2对本申请实施例进行详细地介绍。请参考图2,该图2是根据一示例性实施例示出的一种存储介质的管理方法,该存储介质的管理方法应用于上述存 储系统中,该方法可以包括如下几个实现步骤:
步骤201:接收对该存储介质的配置指令,该配置指令携带免维护率和免维周期。
免维护的核心是利用冗余空间使得部件在生命周期内不会全部损坏。因此,在实际实现中,为了能够使得存储介质达到免维护的效果,用户可以根据实际需求,在存储介质中配置合理容量大小的冗余空间,以利用该冗余空间实现免维的效果。其中,配置合理容量大小的冗余空间的具体实现如步骤201至步骤203所述。
其中,该配置指令可以由主机发送给该存储系统,进一步地,该配置指令可以由用户通过主机发送给该存储系统。
也即是,用户可以根据实际需求,自定义设置免维护率和免维周期,并通过配置指令向存储系统进行下发,以配置或更改存储系统的免维参数。其中,若该存储系统之前未配置有冗余空间,则用户下发该配置指令意味着指示存储系统配置免维参数,若该存储系统之前配置有冗余空间,则用户下发该配置指令意味着指示存储系统更改免维参数。
例如,若用户希望在3年内存储介质达到99.999128%的免维护率要求,则在该配置指令中携带的免维护率可以配置为99.999128%,免维周期配置为3。
需要说明的是,在实际实现过程中,该配置指令的格式可以采用标准小型计算机系统接口(Small Computer System Interface,SCSI)/非易失性存储器传输标准(Non Volatile Memory express,NVMe)等命令形式,也可以采用自定义形式。
譬如,若该配置指令的格式采用SCSI或NVMe的命令形式,则该配置指令具体可以为:Dis_Maintain(Control)。其中,Control中还可以包含免维开关、免维护率、免维周期等参数。
其中,免维开关用于指示打开免维功能或者关闭免维功能,在免维功能打开的状态下可以进行参数设置,譬如,设置免维护率、免维周期等免维参数。
进一步地,上述免维开关可以通过DisMaintain_EN来定义,例如,当DisMaintain_EN=10b时,表示关闭免维功能,当DisMaintain_EN=01b时,表示打开免维功能。
若需要更换免维参数时,还可以定义免维调整参数DisMaintain_Adjust,当需要调整免维参数时,定义DisMaintain_Adjust=1,这意味着更改为新的免维护率和免维周期。
进一步地,上述免维护率可以通过DisMaintain_Rate来进行定义,免维周期可以通过DisMaintain_Cycle来进行定义。例如,若免维护率DisMaintain_Rate=05h,免维周期DisMaintain_Cycle=03h,则说明需要达到5个9的要求,且需要达到3年免维的目的。
需要说明的是,上述仅是以该免维护率和免维周期是通过对存储介质的配置指令携带为例进行说明。在另一种实现方式中,该免维护率和免维周期还可以是该存储系统从自定义的寄存器中查询得到,也即是,该免维护率和免维周期可以预先默认存储在该寄存器中。在该种情况下,当存储系统接收到配置指令时,从该寄存器中查询该免维护率和免维周期。
进一步地,该配置指令中还携带查询指示,该查询指示用于指示查询最大免维能力、免维状态、免维配置参数、免维判断结果、可用容量、免维时间以及该存储介质的MTBF中的至少一个。
此时,该配置指令中可以配置多个参数,例如,该配置指令可以为Dis_Maintain(Control,Status),其中,Status表示该查询指示。进一步,对于之前已经配置过的免维护存储系统,可以通过该Status来查询已配置过的免维参数等。
其中,该最大免维能力是指系统能够支持的最大的免维周期和免维护率。此时,在一种可能的实现方式中,若经过查询确定该最大免维能力DisMaintain_Max为10年免维周期,以及支持8个9的免维护率,则存储系统在执行如下步骤之前,还可以判断配置指令中携带的免维周期和免维护率是否超过该最大免维能力,如果配置指令中携带的免维周期和免维护率超过该最大免维能力,则可以提示无法进行配置,否则,可以继续执行后续步骤202。
当然,需要说明的是,在实际实现中,还可以不判断配置指令中携带的免维周期和免维护率是否超过该最大免维能力,即直接执行后续步骤。
上述免维状态包括打开状态和关闭状态,进一步地,在实际应用中,免维状态可以定义为DisMaintain_Status。
上述免维配置参数是指之前设置过的免维护率和免维周期,进一步地,该免维配置参数可以定义为DisMaintain_Para。
上述该免维判断结果包括成功和失败,例如,如果可以配置成功,则显示succeed,如果配置不成功(例如,当配置的免维参数超出系统最大免维能力时),则显示failed。进一步地,该免维判断结果可以定义为DisMaintain_judge。
上述可用容量是指开启免维功能后用户可用的容量。进一步地,该可用容量可以定义为DisMaintain_Capa。
上述免维时间包括开启免维功能的时间以及剩余免维时间。进一步地,该免维时间可以定义为DisMaintain_Time。
上述通过查询该存储介质的MTBF,可以确定对应的AFR。进一步地,该存储介质的MTBF可以定义为DisMaintain_MTBF,如若查询到DisMaintain_MTBF=200万小时,则可以确定对应的AFR为0.438%。
需要说明的是,上述仅是以该查询指示用于指示查询最大免维能力、免维状态、免维配置参数、免维判断结果、可用容量、免维时间以及该存储介质的MTBF中的至少一个为例进行说明,在实际实现中,该查询指示还可能用于指示查询其它信息,例如,还可以用于指示查询AFR等。
另外,还需要说明的是,上述仅是以该配置指令中携带该查询指示为例进行说明,在实际实现中,还可以单独通过一个命令携带该查询指示,本申请实施例对此不做限定。
步骤202:基于该免维护率、该免维周期、该存储介质的AFR和该存储介质包括的部件总数量,确定在该免维周期内达到免维护状态所需的冗余空间的容量大小。
进一步地,该存储系统可以基于该免维护率、该免维周期、该存储介质的AFR和该存储介质包括的部件总数量,通过如下公式(1),确定在该免维周期内达到免维护状态所需的冗余空间的容量大小:
ΣP{X=k}=ΣC(n,k)*p^k*(1-p)^(n-k)  (1);
其中,ΣP{X=k}代表免维护率,即表示n出现不大于k个故障的概率。该n代表存储介质包括的总部件数量。该k代表冗余空间的容量大小。该p代表总失效率,由AFR和免维周期决定,即p=AFR*year,该year代表上述免维周期。
其中,该AFR可以通过查询得到,其查询方法如前文所述,这里不再重复赘述。或者,该AFR还可以由上述配置指令携带,也即是,该AFR还可以由用户基于实际使用的存储介质进行查询后通过该配置指令输入。
由此可见,通过上述公式(1),可以确定在该免维周期内达到免维护状态所需的冗余空间的容量大小。
例如,若该存储介质为25个1TB的SSD,该SSD的物理裸容量为25TB,其中,假设系统预留的系统OP空间为7TB,则用户可用物理容量为18TB。进入SSD的免维状态后(即接收到上述配置指令后),假设接收到的配置指令中携带的免维护率为5个9,免维周期为3年,通过查询确定AFR为0.438%,则通过如上公式(1)可以确定需要配置的冗余空间为5TB,即需要配置5个SSD的空间,此时,用户可用物理容量降低为13TB。
需要说明的是,在本申请中,该存储介质包括的部件为系统中用于独立故障单元的存储部件,即各个部件之间彼此独立,若一个部件损坏,不会影响到周围的其它部件。
需要说明的是,这里仅是以基于免维护率、免维周期、该存储介质的年失效率AFR和该存储介质包括的部件总数量确定冗余空间的大小为例进行说明。在实际实现中,该冗余空间还可以是固定配置的,即该冗余空间的容量大小已经预先固定设置,此时,不需要执行上述计算过程,只需要基于预先固定设置的容量大小,执行后续的配置操作即可。
还需要说明的是,依赖冗余实现免维是存在代价的,部件力度增加可以在一定程度上降低冗余比,如果存储介质中的部件较多,则实现免维的代价相对就会比较小。例如,以AFR为0.438%为例,若一个硬盘框24个盘,如果需要达到3年免维护的效果,则通过如上公式(1)可以计算出需要冗余5个盘;如果部件总数增加到240个,则到达3年的免维护效果,冗余盘数量相对减少到13个;若部件总数进一步增加,则冗余比也会进一步降低。其中,该冗余比是指冗余数与部件总数量之间的比值。因此,用户在进行系统设计(如,配置免维参数)时,需要综合衡量冗余比投入、维护投入、备件投入等投入产生比。
步骤203:基于该存储介质,配置该容量大小的冗余空间。
在一种可能的实现方式中,基于该存储介质,配置该容量大小的冗余空间的具体实现可以包括:确定该冗余空间的容量大小占该存储介质的总容量的比例,从该存储介质包括的每个部件中划分出该比例的物理存储单元,将划分出的所有物理存储单元确定为该容量大小的冗余空间。
以上述举例为例,若确定需要配置5TB大小的冗余空间,则存储系统可以确定该冗余空间的容量大小占该存储介质的总容量的比例为5/25,因此,对于该SSD包括的25个盘中的每个盘,该存储系统可以从该每个盘中划分出5/25大小的物理单元,之后,将划分出的所有物理存储单元确定为该容量大小的冗余空间,如此,实现了从该存储介质中配置5TB大小的冗余空间。
当然,需要说明的是,上述基于该存储介质配置该容量大小的冗余空间的具体实现仅是示例性的,在另一实施例中,还可以通过其它方式实现基于该存储介质配置该容量大小的冗余空间的步骤。例如,在另一种可能的实现方式中,还可以从用户可用的18TB中选择5TB作为冗余空间,也即是,从该18个盘中选择5个盘,并将该5个盘作为上述冗余空间。或者,在另一种可能的实现方式中,还可以将该冗余空间平均分配在该18个盘中,本申请实施例对此不做限定。
进一步地,基于该存储介质,配置该容量大小的冗余空间之后,还需要基于经过冗余空间配置后剩余的存储介质,重新进行RAID策略配置,将重新配置后的RAID策略确定为该指定RAID策略。
由于在该存储介质中进行冗余空间配置后,相当于对存储介质的物理单元重新进行了划分,因此,此时,需要重新进行RAID策略配置,如配置21+2的纠删码(Erasure Code,EC)或RAID6、22+1的RAID5等,以便于后续当存储介质出现部件故障时,可以基于该重新配置后的RAID策略恢复回失效盘中的数据。
需要说明的是,在一种可能的实现方式中,重新配置后的RAID策略也可以和原RAID策略保持一致,本申请对此不做限定。
存储系统配置完冗余空间后,当检测到存储介质中存在部件出现故障时,即可利用该冗余空间实现免维护管理,其具体实现请参见如下步骤204至步骤205。
步骤204:当检测到存储介质中存在部件出现故障时,基于该指定RAID策略对出现故障的部件中存储的数据进行恢复。
如果检测到存储介质中存在部件出现故障,为了防止数据丢失,存储系统启动重新配置后的指定RAID策略进行数据恢复,此时,已损坏的部件不再使用。
进一步地,基于指定RAID策略对该出现故障的部件中存储的数据进行恢复之前,还可以查询已配置的冗余空间是否足够存储已损坏的部件中的数据。如果已配置的冗余空间当前剩余的容量大小足够存储已损坏的部件中的数据,则执行如下步骤205,但如果已配置的冗余空间当前剩余的容量大小不足以存储已损坏的部件中的数据(如,某些异常导致存储介质的故障率增加),为了能够继续保证存储介质能够免维护,可以从存储介质或系统的OP空间中借一部分空间来做热备冗余。
具体地,查询该冗余空间当前剩余的容量大小是否大于或等于该出现故障的部件中存储的数据的大小,若该冗余空间当前剩余的容量大小大于或等于该出现故障的部件中存储的数据的大小,则执行如下步骤205,若该冗余空间当前剩余的容量大小小于该出现故障的部件中存储的数据的大小,则确定该冗余空间当前剩余的容量与出现故障的部件中存储的数据的大小之间的差值,并从该存储系统的预留OP空间配置该差值大小的物理存储单元作为该冗余空间。其中,该存储系统的预留OP空间指的是为了提升存储系统性能和可靠性而从存储介质空间中预留的用于系统访问的而用户不能访问的OP空间。
也即是,如果该冗余空间当前剩余的容量大小大于或等于该出现故障的部件中存储的数据的大小,则说明已配置的冗余空间当前剩余的容量大小足够存储已损坏的部件中的数据,此时,可以执行如下步骤205。如果该冗余空间当前剩余的容量大小小于该出现故障的部件中存储的数据的大小,则说明已配置的冗余空间当前剩余的容量大小不足以存储已损坏的部件中的数据,此时,由于不能减少用户的可用容量空间,因此,需要从存储系统的OP空间中借一部分空间来存储恢复后的数据,并将该出现故障的部件的地址映射至借用的该部分空间中。
需要说明的是,这里仅是以从存储系统的OP空间中借一部分空间来存储恢复后的数据为例进行说明,在实际实现中,若该存储系统的OP空间不足,则还可以从存储介质的OP空间中借一部分空间来存储恢复后的数据。
步骤205:将恢复后的数据存储至该存储介质的冗余空间中,并将该出现故障的部件的地址映射至该冗余空间,以实现对该存储介质的管理。
由于出现故障的部件已经损坏,无法再使用,因此,需要将恢复后的数据存储至该存储介质的冗余空间中,以利用该冗余空间存储出现故障的部件中的数据。
另外,在实际应用场景中,为了提高用户体验,通常希望部件的损坏对用户来说是 不感知的,为此,需要将出现故障的部件的地址映射至该冗余空间,如此,对于用户来说还是按照原来出现故障的部件的地址写数据,而实际上该数据已转移至冗余空间中进行存储。
例如,假设出现故障的部件的地址为Disk1的0GB~200GB空间,则存储系统将恢复后的数据存储至该存储介质的冗余空间后,将该出现故障的部件的地址Disk1的0GB~200GB空间映射至该冗余空间。如此,对于用户来说,原计划存到Disk1的0GB~200GB空间的数据,但实际上该数据已不再存储到出现故障的部件中,而是存储至该冗余空间中。
需要说明的是,如前文所述,该冗余空间是预先固定配置的或基于免维护率、免维周期、该存储介质的年失效率AFR和该存储介质包括的部件总数量确定后配置的,该免维护率和该免维周期是通过对该存储介质的配置指令携带或从自定义的寄存器中查询得到,该AFR是通过查询得到或由该配置指令携带。
需要说明的是,上述仅是以该存储介质的管理方法应用于存储系统中为例进行说明,在另一实施例中,该存储介质的管理方法还可以应用于全闪存(例如,固态硬盘阵列(Solid-State Array,SSA)或全闪存阵列(All Flash Array,AFA))系统中,其具体实现原理类似,这里不再重复赘述。
在本申请实施例中,由于预先固定配置或基于免维护率、免维周期、存储介质的AFR和存储介质包括的部件总数量,配置了用于存储故障恢复后的数据的冗余空间,因此,即使当存储介质中存在部件出现故障,也可以将出现故障的部件中的数据转移至该冗余空间中存储,如此,不需要用户手工更换部件,利用该冗余空间可以实现对该存储介质的免维护管理,提高了管理效率。
参见图3A,该图3A是根据一示例性实施例示出的一种存储介质的管理装置,该存储介质的管理装置可以由软件、硬件或者两者的结合实现,该装置包括:
恢复模块310,用于执行上述图2所示实施例中的步骤204;
存储模块320,用于执行行数图2所示实施例中的步骤205。
可选地,请参考图3B,该装置还包括:
接收模块330,用于执行上述图2所示实施例中的步骤201;
第一确定模块340,用于执行上述图2所示实施例中的步骤202;
第一配置模块350,用于执行上述图2所示实施例中的步骤203。
可选地,该第一配置模块350用于:
确定该冗余空间的容量大小占该存储介质的总容量的比例;
从该存储介质包括的每个部件中划分出该比例的物理存储单元;
将划分出的所有物理存储单元确定为该容量大小的冗余空间。
可选地,请参考图3C,该装置还包括:
第二配置模块360,用于基于经过冗余空间配置后剩余的存储介质,重新进行RAID策略配置;
第二确定模块370,用于将重新配置后的RAID策略确定为所述指定RAID策略。
可选地,请参考图3D,该装置还包括:
查询模块380,用于查询该冗余空间当前剩余的容量大小是否大于或等于该出现故障的部件中存储的数据的大小;
触发模块390,用于当该冗余空间当前剩余的容量大小大于或等于该出现故障的部 件中存储的数据的大小时,触发该恢复模块310执行上述图2所示实施例中的步骤204;
第三配置模块312,用于当该冗余空间当前剩余的容量大小小于该出现故障的部件中存储的数据的大小时,确定该冗余空间当前剩余的容量与出现故障的部件中存储的数据的大小之间的差值,并从该存储系统的预留OP空间配置该差值大小的物理存储单元作为该冗余空间。
在本申请实施例中,由于预先固定配置或基于免维护率、免维周期、存储介质的AFR和存储介质包括的部件总数量,配置了用于存储故障恢复后的数据的冗余空间,因此,即使当存储介质中存在部件出现故障,也可以将出现故障的部件中的数据转移至该冗余空间中存储,如此,不需要用户手工更换部件,利用该冗余空间可以实现对该存储介质的免维护管理,提高了管理效率。
需要说明的是:上述实施例提供的存储介质的管理装置在实现存储介质的管理方法时,仅以上述各功能模块的划分进行举例说明,实际应用中,可以根据需要而将上述功能分配由不同的功能模块完成,即将设备的内部结构划分成不同的功能模块,以完成以上描述的全部或者部分功能。另外,上述实施例提供的存储介质的管理装置与存储介质的管理方法实施例属于同一构思,其具体实现过程详见方法实施例,这里不再赘述。
在上述实施例中,可以全部或部分地通过软件、硬件、固件或者其任意结合来实现。当使用软件实现时,可以全部或部分地以计算机程序产品的形式实现。所述计算机程序产品包括一个或多个计算机指令。在计算机上加载和执行所述计算机指令时,全部或部分地产生按照本申请实施例所述的流程或功能。所述计算机可以是通用计算机、专用计算机、计算机网络、或者其他可编程装置。所述计算机指令可以存储在计算机可读存储介质中,或者从一个计算机可读存储介质向另一个计算机可读存储介质传输,例如,所述计算机指令可以从一个网站站点、计算机、服务器或数据中心通过有线(例如:同轴电缆、光纤、数据用户线(Digital Subscriber Line,DSL))或无线(例如:红外、无线、微波等)方式向另一个网站站点、计算机、服务器或数据中心进行传输。所述计算机可读存储介质可以是计算机能够存取的任何可用介质或者是包含一个或多个可用介质集成的服务器、数据中心等数据存储设备。所述可用介质可以是磁性介质(例如:软盘、硬盘、磁带)、光介质(例如:数字通用光盘(Digital Versatile Disc,DVD))、或者半导体介质(例如:固态硬盘(Solid State Disk,SSD))等。
本领域普通技术人员可以理解实现上述实施例的全部或部分步骤可以通过硬件来完成,也可以通过程序来指令相关的硬件完成,所述的程序可以存储于一种计算机可读存储介质中,上述提到的存储介质可以是只读存储器,磁盘或光盘等。
以上所述为本申请提供的实施例,并不用以限制本申请,凡在本申请的精神和原则之内,所作的任何修改、等同替换、改进等,均应包含在本申请的保护范围之内。

Claims (13)

  1. 一种存储介质的管理方法,应用于存储系统中,其特征在于,所述方法包括:
    当检测到存储介质中存在部件出现故障时,基于指定磁盘阵列RAID策略对所述出现故障的部件中存储的数据进行恢复;
    将恢复后的数据存储至所述存储介质的冗余空间中,并将所述出现故障的部件的地址映射至所述冗余空间,以实现对所述存储介质的管理;
    其中,所述冗余空间是固定配置的或基于免维护率、免维周期、所述存储介质的年失效率AFR和所述存储介质包括的部件总数量确定后配置的,所述免维护率和所述免维周期是通过对所述存储介质的配置指令携带或者从自定义的寄存器中查询得到,所述AFR是通过查询得到或由所述配置指令携带。
  2. 如权利要求1所述的方法,其特征在于,所述将恢复后的数据存储至所述存储介质的冗余空间中,并将所述出现故障的部件的地址映射至所述冗余空间之前,还包括:
    接收对所述存储介质的配置指令,所述配置指令携带所述免维护率和所述免维周期;
    基于所述免维护率、所述免维周期、所述存储介质的AFR和所述存储介质包括的部件总数量,确定在所述免维周期内达到免维护状态所需的冗余空间的容量大小;
    基于所述存储介质,配置所述容量大小的冗余空间。
  3. 如权利要求2所述的方法,其特征在于,所述基于所述存储介质,配置所述容量大小的冗余空间,包括:
    确定所述冗余空间的容量大小占所述存储介质的总容量的比例;
    从所述存储介质包括的每个部件中划分出所述比例的物理存储单元;
    将划分出的所有物理存储单元确定为所述容量大小的冗余空间。
  4. 如权利要求1或2所述的方法,其特征在于,所述配置指令中还携带查询指示,所述查询指示用于指示查询最大免维能力、免维状态、免维配置参数、免维判断结果、可用容量、免维时间以及所述存储介质的平均故障间隔时间MTBF中的至少一个。
  5. 如权利要求2所述的方法,其特征在于,所述基于所述存储介质,配置所述容量大小的冗余空间之后,还包括:
    基于经过冗余空间配置后剩余的存储介质,重新进行RAID策略配置;
    将重新配置后的RAID策略确定为所述指定RAID策略。
  6. 如权利要求1所述的方法,其特征在于,所述基于指定磁盘阵列RAID策略对所述出现故障的部件中存储的数据进行恢复之前,还包括:
    查询所述冗余空间当前剩余的容量大小是否大于或等于所述出现故障的部件中存储的数据的大小;
    若所述冗余空间当前剩余的容量大小大于或等于所述出现故障的部件中存储的数据的大小,则执行所述指定RAID策略对所述出现故障的部件中存储的数据 进行恢复的操作;
    若所述冗余空间当前剩余的容量大小小于所述出现故障的部件中存储的数据的大小,则确定所述冗余空间当前剩余的容量与所述出现故障的部件中存储的数据的大小之间的差值,并从所述存储系统的预留OP空间配置所述差值大小的物理存储单元作为所述冗余空间。
  7. 一种存储介质的管理装置,配置于存储系统中,其特征在于,所述装置包括:
    恢复模块,用于当检测到存储介质中存在部件出现故障时,基于指定磁盘阵列RAID策略对所述出现故障的部件中存储的数据进行恢复;
    存储模块,用于将恢复后的数据存储至所述存储介质的冗余空间中,并将所述出现故障的部件的地址映射至所述冗余空间,以实现对所述存储介质的管理;
    其中,所述冗余空间是固定配置的或基于免维护率、免维周期、所述存储介质的年失效率AFR和所述存储介质包括的部件总数量确定后配置的,所述免维护率和所述免维周期是通过对所述存储介质的配置指令携带或者从自定义的寄存器中查询得到,所述AFR是通过查询得到或由所述配置指令携带。
  8. 如权利要求7所述的装置,其特征在于,所述装置还包括:
    接收模块,用于接收对所述存储介质的配置指令,所述配置指令携带所述免维护率和所述免维周期;
    第一确定模块,用于基于所述免维护率、所述免维周期、所述存储介质的AFR和所述存储介质包括的部件总数量,确定在所述免维周期内达到免维护状态所需的冗余空间的容量大小;
    第一配置模块,用于基于所述存储介质,配置所述容量大小的冗余空间。
  9. 如权利要求8所述的装置,其特征在于,所述第一配置模块用于:
    确定所述冗余空间的容量大小占所述存储介质的总容量的比例;
    从所述存储介质包括的每个部件中划分出所述比例的物理存储单元;
    将划分出的所有物理存储单元确定为所述容量大小的冗余空间。
  10. 如权利要求7或8所述的装置,其特征在于,所述配置指令中还携带查询指示,所述查询指示用于指示查询最大免维能力、免维状态、免维配置参数、免维判断结果、可用容量、免维时间以及所述存储介质的平均故障间隔时间MTBF中的至少一个。
  11. 如权利要求8所述的装置,其特征在于,所述装置还包括:
    第二配置模块,用于基于经过冗余空间配置后剩余的存储介质,重新进行RAID策略配置;
    第二确定模块,用于将重新配置后的RAID策略确定为所述指定RAID策略。
  12. 如权利要求7所述的装置,其特征在于,所述装置还包括:
    查询模块,用于查询所述冗余空间当前剩余的容量大小是否大于或等于所述出现故障的部件中存储的数据的大小;
    触发模块,用于当所述冗余空间当前剩余的容量大小大于或等于所述出现故障的部件中存储的数据的大小时,触发所述恢复模块执行所述基于所述指定RAID策略对所述出现故障的部件中存储的数据进行恢复的操作;
    第三配置模块,用于当所述冗余空间当前剩余的容量大小小于所述出现故障的部件中存储的数据的大小时,确定所述冗余空间当前剩余的容量与所述出现故障的部件中存储的数据的大小之间的差值,并从所述存储系统的预留OP空间配置所述差值大小的物理存储单元作为所述冗余空间。
  13. 一种计算机可读存储介质,所述计算机可读存储介质上存储有指令,其特征在于,所述指令被处理器执行时实现权利要求1-6所述的任一项方法的步骤。
PCT/CN2018/104288 2017-09-22 2018-09-06 存储介质的管理方法、装置及可读存储介质 WO2019056948A1 (zh)

Priority Applications (4)

Application Number Priority Date Filing Date Title
EP21213020.7A EP4036735B1 (en) 2017-09-22 2018-09-06 Method, apparatus and readable storage medium
EP18857718.3A EP3667504B1 (en) 2017-09-22 2018-09-06 Storage medium management method, device and readable storage medium
US16/824,259 US11237929B2 (en) 2017-09-22 2020-03-19 Method and apparatus, and readable storage medium
US17/545,203 US11714733B2 (en) 2017-09-22 2021-12-08 Method and apparatus, and readable storage medium

Applications Claiming Priority (2)

Application Number Priority Date Filing Date Title
CN201710868918.1 2017-09-22
CN201710868918.1A CN107766180B (zh) 2017-09-22 2017-09-22 存储介质的管理方法、装置及可读存储介质

Related Child Applications (1)

Application Number Title Priority Date Filing Date
US16/824,259 Continuation US11237929B2 (en) 2017-09-22 2020-03-19 Method and apparatus, and readable storage medium

Publications (1)

Publication Number Publication Date
WO2019056948A1 true WO2019056948A1 (zh) 2019-03-28

Family

ID=61267421

Family Applications (1)

Application Number Title Priority Date Filing Date
PCT/CN2018/104288 WO2019056948A1 (zh) 2017-09-22 2018-09-06 存储介质的管理方法、装置及可读存储介质

Country Status (4)

Country Link
US (2) US11237929B2 (zh)
EP (2) EP4036735B1 (zh)
CN (2) CN107766180B (zh)
WO (1) WO2019056948A1 (zh)

Families Citing this family (6)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN107766180B (zh) 2017-09-22 2020-08-14 成都华为技术有限公司 存储介质的管理方法、装置及可读存储介质
CN108595117B (zh) * 2018-03-29 2021-04-23 记忆科技(深圳)有限公司 一种动态容量调整过程中安全平滑的方法
CN109445681B (zh) 2018-08-27 2021-05-11 华为技术有限公司 数据的存储方法、装置和存储系统
CN109144787A (zh) * 2018-09-03 2019-01-04 郑州云海信息技术有限公司 一种数据恢复方法、装置、设备及可读存储介质
CN111221473B (zh) * 2019-12-30 2023-06-06 河南创新科信息技术有限公司 一种存储系统介质免维护的方法
CN111428280B (zh) * 2020-06-09 2020-11-17 浙江大学 SoC安全芯片密钥信息完整性存储及错误自修复方法

Citations (4)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN101609420A (zh) * 2009-07-17 2009-12-23 杭州华三通信技术有限公司 实现磁盘冗余阵列重建的方法和磁盘冗余阵列及其控制器
US20120290868A1 (en) * 2011-05-09 2012-11-15 Cleversafe, Inc. Assigning a dispersed storage network address range in a maintenance free storage container
WO2013075519A1 (en) * 2011-11-23 2013-05-30 International Business Machines Corporation Use of virtual drive as hot spare for raid group
CN107766180A (zh) * 2017-09-22 2018-03-06 成都华为技术有限公司 存储介质的管理方法、装置及可读存储介质

Family Cites Families (36)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US5978291A (en) * 1998-09-30 1999-11-02 International Business Machines Corporation Sub-block redundancy replacement for a giga-bit scale DRAM
DE19947041C2 (de) * 1999-09-30 2001-11-08 Infineon Technologies Ag Integrierter dynamischer Halbleiterspeicher mit redundanten Einheiten von Speicherzellen und Verfahren zur Selbstreparatur
US7685126B2 (en) * 2001-08-03 2010-03-23 Isilon Systems, Inc. System and methods for providing a distributed file system utilizing metadata to track information about data stored throughout the system
US7263583B2 (en) * 2004-10-05 2007-08-28 International Business Machines Corporation On demand, non-capacity based process, apparatus and computer program to determine maintenance fees for disk data storage system
US7603530B1 (en) * 2005-05-05 2009-10-13 Seagate Technology Llc Methods and structure for dynamic multiple indirections in a dynamically mapped mass storage device
US20070067665A1 (en) * 2005-09-19 2007-03-22 Ebrahim Hashemi Apparatus and method for providing redundant arrays storage devices
US7617361B2 (en) * 2006-03-29 2009-11-10 International Business Machines Corporation Configureable redundant array of independent disks
US7603605B2 (en) * 2007-01-08 2009-10-13 Arm Limited Performance control of an integrated circuit
JP2009043030A (ja) * 2007-08-09 2009-02-26 Hitachi Ltd ストレージシステム
US8473779B2 (en) * 2008-02-29 2013-06-25 Assurance Software And Hardware Solutions, Llc Systems and methods for error correction and detection, isolation, and recovery of faults in a fail-in-place storage array
US8572311B1 (en) * 2010-01-11 2013-10-29 Apple Inc. Redundant data storage in multi-die memory systems
US8880843B2 (en) * 2010-02-10 2014-11-04 International Business Machines Corporation Providing redundancy in a virtualized storage system for a computer system
US9727414B2 (en) * 2010-12-01 2017-08-08 Seagate Technology Llc Fractional redundant array of silicon independent elements
US8601311B2 (en) * 2010-12-14 2013-12-03 Western Digital Technologies, Inc. System and method for using over-provisioned data capacity to maintain a data redundancy scheme in a solid state memory
CN102097133B (zh) * 2010-12-31 2012-11-21 中国人民解放军装备指挥技术学院 一种海量存储系统的可靠性测试系统及测试方法
US9021053B2 (en) * 2011-09-02 2015-04-28 Compuverde Ab Method and device for writing data to a data storage system comprising a plurality of data storage nodes
CN105843557B (zh) * 2016-03-24 2019-03-08 天津书生云科技有限公司 冗余存储系统、冗余存储方法和冗余存储装置
JP2013125513A (ja) * 2011-12-16 2013-06-24 Samsung Electronics Co Ltd 不揮発性半導体記憶装置及びその管理方法
US8799705B2 (en) * 2012-01-04 2014-08-05 Emc Corporation Data protection in a random access disk array
CN102708019B (zh) * 2012-04-28 2014-12-03 华为技术有限公司 一种硬盘数据恢复方法、装置及系统
CN102902602B (zh) * 2012-09-19 2015-08-19 华为技术有限公司 数据热备份的方法、装置及存储系统
KR102025080B1 (ko) * 2013-01-02 2019-09-25 삼성전자 주식회사 스토리지 시스템 및 스토리지 시스템의 여분 공간 조절 방법
US9081753B2 (en) * 2013-03-14 2015-07-14 Microsoft Technology Licensing, Llc Virtual disk recovery and redistribution
US9176818B2 (en) * 2013-03-14 2015-11-03 Microsoft Technology Licensing, Llc N-way parity for virtual disk resiliency
EP2942715B1 (en) 2013-09-24 2017-11-08 Huawei Technologies Co., Ltd. Data migration method, data migration apparatus and storage device
CN103488547A (zh) * 2013-09-24 2014-01-01 浪潮电子信息产业股份有限公司 一种raid组故障硬盘快速重建的方法
CN106503024A (zh) * 2015-09-08 2017-03-15 北京国双科技有限公司 日志信息处理方法和装置
CN106484331B (zh) * 2015-09-29 2019-04-12 华为技术有限公司 一种数据处理方法、装置及闪存设备
US10048877B2 (en) * 2015-12-21 2018-08-14 Intel Corporation Predictive memory maintenance
US9875052B2 (en) * 2016-03-15 2018-01-23 International Business Machines Corporation Storage capacity allocation using distributed spare space
CN109690681B (zh) * 2016-06-24 2021-08-31 华为技术有限公司 处理数据的方法、存储装置、固态硬盘和存储系统
TWI607303B (zh) * 2016-10-14 2017-12-01 喬鼎資訊股份有限公司 具有虛擬區塊及磁碟陣列架構之資料儲存系統及其管理方法
US10241877B2 (en) * 2016-12-12 2019-03-26 International Business Machines Corporation Data storage system employing a hot spare to proactively store array data in absence of a failure or pre-failure event
CN106681865B (zh) * 2017-01-16 2020-07-07 北京腾凌科技有限公司 业务恢复方法及装置
US10915405B2 (en) * 2017-05-26 2021-02-09 Netapp, Inc. Methods for handling storage element failures to reduce storage device failure rates and devices thereof
CN109558333B (zh) * 2017-09-27 2024-04-05 北京忆恒创源科技股份有限公司 具有可变额外存储空间的固态存储设备命名空间

Patent Citations (4)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN101609420A (zh) * 2009-07-17 2009-12-23 杭州华三通信技术有限公司 实现磁盘冗余阵列重建的方法和磁盘冗余阵列及其控制器
US20120290868A1 (en) * 2011-05-09 2012-11-15 Cleversafe, Inc. Assigning a dispersed storage network address range in a maintenance free storage container
WO2013075519A1 (en) * 2011-11-23 2013-05-30 International Business Machines Corporation Use of virtual drive as hot spare for raid group
CN107766180A (zh) * 2017-09-22 2018-03-06 成都华为技术有限公司 存储介质的管理方法、装置及可读存储介质

Non-Patent Citations (1)

* Cited by examiner, † Cited by third party
Title
See also references of EP3667504A4 *

Also Published As

Publication number Publication date
EP4036735B1 (en) 2024-02-14
EP3667504A4 (en) 2020-10-14
CN111966540A (zh) 2020-11-20
US20220100623A1 (en) 2022-03-31
US11714733B2 (en) 2023-08-01
EP3667504A1 (en) 2020-06-17
EP4036735A1 (en) 2022-08-03
US20200218621A1 (en) 2020-07-09
EP3667504B1 (en) 2021-12-22
CN111966540B (zh) 2024-03-01
CN107766180A (zh) 2018-03-06
US11237929B2 (en) 2022-02-01
CN107766180B (zh) 2020-08-14

Similar Documents

Publication Publication Date Title
US11714733B2 (en) Method and apparatus, and readable storage medium
US10191676B2 (en) Scalable storage protection
US9632702B2 (en) Efficient initialization of a thinly provisioned storage array
WO2015083308A1 (en) Page retirement in a nand flash memory system
US8438429B2 (en) Storage control apparatus and storage control method
US11137918B1 (en) Administration of control information in a storage system
US11921588B2 (en) System and method for data protection during power loss of a storage system
US20220291996A1 (en) Systems, methods, and devices for fault resilient storage
US10733069B2 (en) Page retirement in a NAND flash memory system
WO2022108620A1 (en) Peer storage devices sharing host control data
US11269715B2 (en) Systems and methods for adaptive proactive failure analysis for memories
JP2011028520A (ja) ディスクアレイ装置及び物理ディスクの復元方法
US11126486B2 (en) Prediction of power shutdown and outage incidents
US11983428B2 (en) Data migration via data storage device peer channel
US11853163B2 (en) Selective rebuild of interrupted devices in data storage device arrays
US20220075525A1 (en) Redundant Array of Independent Disks (RAID) Management Method, and RAID Controller and System
US11221952B1 (en) Aggregated cache supporting dynamic ratios in a vSAN architecture
US10394673B2 (en) Method and system for hardware accelerated copyback
US20240143518A1 (en) Using Control Bus Communication to Accelerate Link Negotiation
US11733927B2 (en) Hybrid solid-state drive
JP5047342B2 (ja) ストレージ装置およびその制御方法

Legal Events

Date Code Title Description
121 Ep: the epo has been informed by wipo that ep was designated in this application

Ref document number: 18857718

Country of ref document: EP

Kind code of ref document: A1

ENP Entry into the national phase

Ref document number: 2018857718

Country of ref document: EP

Effective date: 20200310

NENP Non-entry into the national phase

Ref country code: DE