WO2021088368A1 - 一种存储器的修复方法及装置 - Google Patents

一种存储器的修复方法及装置 Download PDF

Info

Publication number
WO2021088368A1
WO2021088368A1 PCT/CN2020/095660 CN2020095660W WO2021088368A1 WO 2021088368 A1 WO2021088368 A1 WO 2021088368A1 CN 2020095660 W CN2020095660 W CN 2020095660W WO 2021088368 A1 WO2021088368 A1 WO 2021088368A1
Authority
WO
WIPO (PCT)
Prior art keywords
memory
data
read
storage
card
Prior art date
Application number
PCT/CN2020/095660
Other languages
English (en)
French (fr)
Inventor
董凌
郭瑜
杜开田
Original Assignee
华为技术有限公司
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by 华为技术有限公司 filed Critical 华为技术有限公司
Publication of WO2021088368A1 publication Critical patent/WO2021088368A1/zh

Links

Images

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F11/00Error detection; Error correction; Monitoring
    • G06F11/07Responding to the occurrence of a fault, e.g. fault tolerance
    • G06F11/14Error detection or correction of the data by redundancy in operation
    • G06F11/1479Generic software techniques for error detection or fault masking

Definitions

  • This application relates to the field of computers, and in particular to a method and device that can repair memory.
  • the memory is a memory component used to store programs and various data information.
  • the memory usually uses a bistable semiconductor circuit, a complementary metal oxide semiconductor (Complementary Metal Oxide Semiconductor, CMOS) transistor, or a memory cell of a magnetic material to store a binary code.
  • CMOS complementary Metal Oxide Semiconductor
  • the memory can be divided into semiconductor memory and magnetic surface memory according to the different materials used.
  • SD card also known as SD memory card
  • SD memory card is a new generation of memory device based on semiconductor flash memory, due to its small size, high data transmission speed, and hot-swappable features.
  • portable devices such as digital cameras, personal digital assistants (Personal Digital Assistant, PDA) and multimedia players.
  • operating system vendors such as VMware have introduced the use of SD cards as storage media for booting operating systems, thereby promoting the application of SD cards in the enterprise market.
  • SD card uses relatively cheap raw materials, relatively simple firmware, poor reliability and serviceability, and a relatively high failure rate. Therefore, when making and using SD cards, various manufacturers will take certain measures to monitor the use of SD cards to ensure the stability of the system.
  • the present application provides a method and device for repairing a memory, which can obtain data to be read from other memories in the system after a data read failure occurs in the memory, and use the data to repair errors in the memory, improving The availability of the system and reduce the cost of system operation.
  • the present application provides a method for repairing a memory.
  • the method includes a memory controller connected to a first memory and a second memory.
  • the method includes: receiving a first read request, the The first read request is used to instruct to read the first data in the first memory; when reading the first data in the first memory fails, obtain the first data from the second memory;
  • the first memory sends a write request, the write request carries the first data; receives a response message sent by the first memory, the response message is used to indicate that the first data has been successfully written The first memory; determining, according to the response message, that the logic block error of the first memory has been repaired.
  • the memory when reading data from the first memory fails, the memory is not set to a fault state first, but the data is acquired from the second memory in the system that stores the data to be read, and the acquired data Write back to the first memory.
  • This approach can repair bad blocks in the first memory by writing data back to the first memory. Specifically, if the error occurred in the first memory is due to the generation of logical bad blocks, the error can be repaired by writing data back to the first memory; if the error occurred in the first memory is due to the generation of physical bad blocks , The error cannot be repaired by writing the data back to the first memory.
  • the method provided by the first aspect of the present application improves the stability of the system by repairing the logical bad blocks in the first memory, and can reduce the frequency that the memory needs to be replaced, thereby reducing the operating cost of the system.
  • the first memory and the second memory are SD cards, and the first memory and the second memory store the same data.
  • the SD card has both the advantages of low cost and the disadvantage of low reliability, two or more SD cards can be mirrored while using the SD card, so that when one of the SD cards fails Data is obtained from other SD cards, which can improve the stability of the system while reducing the operating cost of the system.
  • the method further includes: receiving a second read request, where the second read request is used to instruct to read the second data in the first memory; When the second data in the first storage fails, an acquisition request is sent to the second storage; when the acquisition of the second data in the second storage fails, it is determined that the first storage and the second storage have generated physical bad blocks, and the recorded The number of read errors of the first memory and the number of read errors of the second memory are respectively incremented by one.
  • the storage controller when the storage controller fails to obtain data from the first storage and the second storage, the storage cannot be repaired by writing data.
  • the first storage and the second storage are both A physical bad block is generated, and the recorded read error times of the first memory and the second memory are respectively incremented by 1. In this way, it is helpful to remind the storage controller to replace the storage in time, thereby improving the reliability of the system.
  • the method further includes: issuing a warning when the number of read errors of the first memory or the number of read errors of the second memory reaches a threshold.
  • the storage controller can monitor the number of read errors of the first memory and the second memory, and when the number of read errors of a certain memory reaches a threshold, it is considered that the memory needs to be replaced, and a warning is issued.
  • This solution can promptly replace the memory with too many errors, thereby improving the stability of system operation.
  • the storage controller is further connected to a third storage, and the first storage, the second storage, and the third storage form a redundant array of independent disks RAID 5.
  • Acquiring the first data in the second memory specifically includes: restoring the first data according to the verification information of the first data stored in the second memory.
  • the technical solution provided in this application can be applied to the RAID 5 architecture, thereby enhancing the versatility of the system provided in this application.
  • the method further includes: receiving a third read request, where the third read request is used to instruct to read the third data in the first memory; When the third data of the first memory fails, obtain the third data from the second memory; send a write request to the first memory, and the write request carries the third data; determine that the data writing to the first memory fails; The number of read errors and the number of write errors of the first memory are respectively increased by 1.
  • the storage controller adds 1 to the number of write errors and the number of read errors respectively. In this way, errors that occur in the first memory can be recorded more finely, so that the storage controller is subsequently reminded to replace the first memory in time, which can improve the reliability of the system.
  • the method further includes: comparing the number of write errors of the first memory with a first threshold; and comparing the number of read errors of the first memory with a second threshold; When the number of write errors of the first memory reaches the first threshold or the number of read errors of the first memory reaches the second threshold, a warning is issued.
  • a first threshold for the number of write errors and a second threshold for the number of read errors are respectively set in the storage controller, and it is determined whether the number of read errors of the first memory reaches the first threshold and the number of write errors. Whether to reach the second threshold.
  • the present application provides a memory controller connected to a first memory and a second memory, and the memory controller includes: a receiving module for receiving a first read request, the first read The fetch request is used to instruct to read the first data in the first storage; the obtaining module is used to obtain the first data from the second storage when the reading of the first data in the first storage fails; the writing module, Used to send a write request to the first memory, the write request carries the first data; the monitoring module is used to receive a response message sent by the first memory, the response message is used to indicate that the first data has been written into the first memory; According to the response message, it is determined that the logical bad block error of the first memory has been repaired.
  • the first memory and the second memory are SD cards, and the first memory and the second memory store the same data.
  • the receiving module is further configured to: receive a second read request, where the second read request is used to instruct to read the second data in the first memory; When fetching the second data in the first memory fails, send a second read request to the second memory; the monitoring module is also used for: when reading the second data in the second memory fails, determine the first memory and the second memory A logical bad block is generated in the memory; the recorded number of read errors of the first memory and the number of read errors of the second memory are respectively incremented by 1.
  • the monitoring module is further configured to: issue a warning when the number of read errors of the first memory or the number of read errors of the second memory reaches a threshold.
  • the storage controller is further connected to a third storage, and the first storage, the second storage, and the third storage form a redundant array of independent disks RAID 5.
  • Obtain the module It is specifically used for restoring the first data according to the verification information of the first data stored in the second memory.
  • the receiving module is further configured to: receive a third read request, where the third read request is used to instruct to read the third data in the first memory; and the acquisition module It is also used to: when reading the third data of the first memory fails, obtain the third data from the second memory; the writing module is also used to: send a write request to the first memory, and the write request carries The third data; the monitoring module is also used to: determine that data writing to the first memory fails, and add 1 to the recorded number of read errors and write errors of the first memory respectively.
  • the monitoring module is further configured to: compare the number of write errors of the first memory with a first threshold; compare the number of read errors of the first memory with a second threshold ; When the number of write errors of the first memory reaches the first threshold or the number of read errors of the first memory reaches the second threshold, a warning is issued.
  • the present application provides a computer device including a memory and a processor, the memory is used to store program code, and the processor is used to call the program code in the memory to execute any one of the implementation manners in the first aspect Provided method.
  • the present application provides a computer-readable storage medium that stores program code, and the program code can be called by a computer device to execute the method provided in any one of the implementation manners in the first aspect.
  • Fig. 1 is a schematic diagram of a system using a RAID 1 architecture provided by the present application.
  • Fig. 2 is a schematic flowchart of an embodiment provided by the present application.
  • Fig. 3 is a schematic diagram of a system using a RAID 5 architecture provided by the present application.
  • Fig. 4 is a schematic flowchart of another embodiment provided by the present application.
  • Fig. 5 is a schematic diagram of functional modules of a storage controller provided by the present application.
  • FIG. 6 is a schematic diagram of the architecture of a computer device including a storage controller provided by the present application.
  • read interference refers to interference to other pages in the same data block when reading a certain page of the storage medium.
  • each physical block contains more pages.
  • each memory cell can store multiple information bits, such as multi-level cell (MLC), triple-level cell (TLC) or quad-level cell (Quad-Level Cell, QLC), the problem of read interference will be more serious.
  • Reading interference will cause errors in the reading, resulting in loss of data.
  • the monitoring module is set to monitor the number of SD failures. When the number of failures of an SD card in the system exceeds a certain number, the monitoring module will issue a warning, prompting the system to replace the SD card, so as not to affect the stability of the system.
  • the computer device may directly set the SD card to a fault state or set the state of the address range where the SD card is faulty to unavailable. This approach will not only increase the system-level failure rate and reduce the availability of the system, but also cause the SD card in the system to be replaced frequently, which increases the operating cost of the system.
  • the present application provides a method and device for repairing logical bad blocks of a memory.
  • several memories form a master-slave array.
  • When reading data from the main memory fails read the data from the slave memory and use the data to repair the error in the main memory. If the error in the main memory can be repaired, the main memory is not set to a fault state or the address where the fault occurs is set to an unusable range. In this way, the logical bad blocks in the memory can be repaired, thereby improving the reliability of the system, and reducing the number of times the memory needs to be replaced, thereby reducing the operating cost of the system.
  • FIG. 1 is a schematic diagram of a system adopting a RAID 1 architecture provided by an embodiment of the present application.
  • the computing system 100 includes a computer device 110, an SD Redundant Array of Independent Disks (RAID) controller 120, a master SD card 130 and a slave SD card 140.
  • the computer device 110 is connected to the SD RAID controller 120.
  • the computer device 110 can be connected to the SD RAID controller 120 through a Universal Serial Bus (USB) interface, or can be quickly interconnected through peripheral components.
  • USB Universal Serial Bus
  • the express, PCIe) interface and other interfaces are connected to the SD RAID controller 120, which is not limited in this application.
  • the master SD card 130 and the slave SD card 140 are used to store data, and the master SD card 130 and the slave SD card 140 themselves also have controllers (not shown in the figure). Since the SD card as a flash memory device is prone to errors such as read interference, the SD controller can perform data verification on the data written to the SD card or the data read from the SD card. Various methods can be used to check data, such as parity check, where odd check means that when each byte is transmitted, an additional bit is added as a check bit. When the number of "1"s in the actual data is When the number is even, the parity bit is "1", otherwise the parity bit is "0", which ensures that the transmitted data meets the requirements of odd parity. When the receiver receives the data, it will detect the number of "1" in the data according to the requirements of odd parity. If it is an odd number, it means the transmission is correct, otherwise it means the transmission is wrong.
  • parity check where odd check means that when each byte is transmitted, an additional bit is added as a check bit.
  • the memory referred to in this application can be a flash memory device including an SD card, because the flash memory device may have logic errors due to read disturbances and other reasons, and it can be overcome by writing back the correct data; the memory referred to in this application is also It can be a storage device such as a traditional mechanical hard disk, because the mechanical hard disk may cause electromagnetic interference when reading and writing the tracks of adjacent sectors, which may cause logic errors. Therefore, various types of memory, as long as the technical solutions provided by this application are adopted, are all within the protection scope of this application.
  • the SD RAID controller 120 is respectively connected to the master SD card 130 and the slave SD card 140 through a bus.
  • the master SD card 130 and the slave SD card 140 form a RAID 1 array.
  • the RAID 1 array implements data redundancy through disk data mirroring, and generates mutually backup data on a pair of independent disks. When the original data is busy, the data can be read directly from the mirror copy, so RAID 1 can improve the read performance; at the same time, because the RAID 1 array always maintains a completed data backup, the data of this type of disk array Security and availability are also high.
  • the SD RAID controller can be a hardware device located between the computer device 110 and the SD card as shown in Figure 1, or it can be a chip on the same circuit board as the SD, or it can exist in the form of software. On the computer device 110.
  • This application does not limit the actual form and position of the SD RAID controller in the system. As long as the SD RAID controller functions as described in this application, that is, within the scope of protection of this application, the SD RAID controller is used in the following for the convenience of description. The RAID controller is described for the hardware.
  • the SD RAID controller 120 Since the master SD card 130 and the slave SD card 140 are mirrored, when the computer device 110 writes data to the SD card through the SD RAID controller 120, the SD RAID controller will write the data to the master at the same time. SD card 130 and slave SD card 140.
  • the SD RAID controller 120 When the computer device 110 reads data through the SD RAID controller 120, the SD RAID controller 120 first sends a read operation request to the master SD card 130 to obtain the data.
  • the SD RAID controller 120 sends a read operation request to the slave SD card 140 to obtain the data.
  • Fig. 2 is a schematic flowchart of an embodiment provided by the present application.
  • this embodiment is based on the RAID 1 architecture shown in FIG. 1.
  • the SD RAID controller 120 fails to read data from the main SD card 130, it reads the data from the slave SD card 140, and The read data is used to repair errors in the main SD card 130.
  • the specific process is as follows:
  • the SD RAID controller 120 receives the read operation request sent by the computer device 110.
  • the SD RAID controller 120 sends a read operation request to the main SD card 130 to read the data in the first data block in the main SD card according to the read operation request sent by the computer device 110.
  • S210 Determine whether the SD RAID controller 120 successfully reads the data in the main SD card 130.
  • the controller in the main SD card 130 will respond to the read data. Perform verification. If the data in the main SD 130 can be read by the SD RAID controller 120 and pass the verification, it is determined that the data reading is successful; otherwise, it is determined that the data reading fails.
  • S220 When the SD RAID controller 120 successfully reads the data in the main SD card 130, it sends the data to the computer device 110.
  • S230 When the SD RAID controller 120 fails to read the data in the main SD card 130, the SD RAID controller 120 sends a read operation request to the slave SD card 140 according to the read operation request to read the first data in the slave SD card 140 The data in the block.
  • the data in the first data block in the slave SD card 140 and the data in the first data block in the master SD card 130 are in a mirror image relationship.
  • the data of the master SD card 130 and the slave SD card 140 are completely mirrored.
  • the previous data is written to the master SD card 130 through the SD RAID controller 120, it will also be a mirror image.
  • the same data is written from the SD card 140. Therefore, the data in the master SD card 130 and the data in the slave SD card 140 are the same, and the SD RAID controller 120 may send a read operation request to the slave SD card 140.
  • step S240 Determine whether the SD RAID controller 120 successfully reads the data from the SD card 140. When the data reading is successful, go to step S250; when the data reading fails, go to step S280.
  • step S210 This step is similar to step S210.
  • the controller in the SD card verifies the read data, and determines whether the data is successfully read according to the verification result.
  • the SD RAID controller 120 sends a write operation request to the main SD card 130 to write the data read from the SD into the main SD card 130.
  • a bad block Since the previous reading of data from the main SD card 130 failed, it can be determined that a bad block has occurred in the main SD card 130. In order to improve the stability of the system as much as possible and reduce the replacement of the SD card, it may be judged whether the error occurred in the main SD card 130 is a physical bad block or a logical bad block.
  • a physical bad block is also called a media bad block, which refers to a physical error in the medium corresponding to the data block used to store data. For example, a certain bit in the data block can only display a fixed value when a certain bit fails. The stuck-at fault.
  • Logical bad block refers to the logical problem of the data in the data block, for example, the index value of the index block is not arranged in order; it can also be because the value of some bits in the data block is wrong, for example, a certain bit has been received Affected by read interference of adjacent data blocks, the value of this bit changes from 0 to 1, resulting in step S210 that the controller of the main SD card 130 verifies that the read data is erroneous data. Since the logical bad block does not involve a problem on the medium, the error can be repaired by writing the correct data from the SD card 140 to the main SD card 130.
  • step S260 Determine whether the data writing to the main SD card 130 is successful. When the data is successfully written, it jumps to step S220, that is, sends the data to the computer device 110. When the data writing fails, go to step S270.
  • the main SD card 130 may send a response message to the SD RAID controller 120, where the response message is used to indicate that the first data has been successfully written to the main SD card 130.
  • step S270 After adding 1 to the recorded number of read errors and write errors of the main SD card 130 respectively, jump to step S220, that is, send the data to the computer device 110.
  • S280 Add 1 to the recorded number of read errors of the master SD card 130 and the number of read errors of the slave SD card 140 respectively.
  • the bad blocks that occur in the two SD cards are considered physical bad blocks at this time, and the recorded read error times of the main SD card 130 and the read error times of the slave SD card 140 are respectively incremented by 1 .
  • S290 The SD RAID controller 120 sends a read failure response to the computer device 110.
  • the computer device 110 cannot obtain the requested data from the SD RAID controller 120, so the SD RAID controller 120 sends to the computer device 110 Read failure response.
  • steps S270 and S280 of the present application when a read and write error occurs between the master SD card 130 and the slave SD card 140, the number of read errors and the number of write errors of the master SD card 130 and the slave SD card 140 will be recorded accordingly.
  • whether to replace the SD card can be determined according to the number of read and write errors that have occurred in the SD card.
  • the read error threshold and the write error threshold can be set separately. When judging whether the SD card needs to be replaced, it can be judged whether the current read error times of the SD card has reached the set read error threshold and whether the current write error times of the SD card has reached the set write error threshold.
  • the controller on the SD card will issue an alarm to prompt the user to replace the SD card.
  • the write error threshold can be set to be less than the read error threshold.
  • the write error threshold can be set to 10 times, and the read error threshold can be set to 20 times.
  • the number of read errors and the number of write errors that occur on the SD card can be considered comprehensively, for example, the number of read errors that occur on the SD card is multiplied by the weight of read errors plus the number of write errors multiplied by write Wrong weight, and then judge whether the result exceeds the preset value.
  • RAID 10 is a product that combines the RAID 1 and RAID 0 standards. It continuously divides data in units of bits or bytes and reads/writes multiple disks in parallel while performing disk mirroring for each disk for redundancy. Its advantage is that it has both the extraordinary speed of RAID 0 and the high data reliability of RAID 1, and it can support larger capacity storage. Since the architecture of RAID 10 can be regarded as multiple disk combinations using RAID 1, the method process described in FIG. 2 can be used in each disk combination, which is not described in detail in this application.
  • Fig. 4 is a schematic diagram of a system adopting a RAID 5 architecture provided by the present application.
  • RAID 5 is a storage solution that takes into account storage performance, data security, and storage cost.
  • the computing system includes a computer device, an SD RAID controller, and at least three SD cards (three SD cards are shown in the figure).
  • RAID5 does not back up the stored data, but stores the data and the corresponding parity information on each disk that composes RAID5, and the stored data and the corresponding parity information are stored on different disks. .
  • the SD RAID controller reads the data on one of the SD cards, it can use the parity data stored on the other SD cards to obtain the correct data. Since one parity information can correspond to multiple data, the disk usage rate of RAID 5 is higher than that of RAID 1, which can reduce storage costs.
  • the SD RAID controller 320 receives the read operation request sent by the computer device 310.
  • the SD RAID controller 320 sends a read operation request to the first SD card 330 to read the data in the first data block in the first SD card according to the read operation request sent by the computer device 310.
  • the data stored in each SD card is different, so there is no master-slave relationship.
  • the SD RAID controller 320 receives a read operation request, it first determines according to the information in the read operation request that the data to be read is located in the first data block in the first SD card 330, and then sends the read operation to the first SD card 330. Operate the request to obtain the data.
  • S410 Determine whether the SD RAID controller 320 is successful in reading the data in the first SD card 330.
  • the check data corresponding to the data stored in each SD card is stored in other SD cards in the array.
  • the data in the SD card can be checked Verify the data for recovery.
  • the part of the data can be obtained according to the verification data corresponding to the part of the data.
  • step S440 Determine whether the SD RAID controller 320 is successful in restoring the data to be read according to the verification data in the second SD card 340.
  • step S450 When the data recovery is successful, go to step S450; when the data recovery fails, go to step S480.
  • the SD RAID controller 320 sends a write operation request to the first SD card 330 to write the data recovered according to the check data in the second SD card 340 to the first SD card 330.
  • step S460 Determine whether writing the restored data into the first SD card 330 is successful. When the data is successfully written, it jumps to step S420, that is, sends the data to the computer device 310. When the data writing fails, go to step S470.
  • step S470 After adding 1 to the recorded number of read errors and write errors of the first SD card 430 respectively, jump to step S420, that is, send the data to the computer device 310.
  • S480 Add 1 to the recorded number of read errors of the first SD card 430 and the number of read errors of the second SD card respectively.
  • S490 The SD RAID controller 320 sends a read failure response to the computer device 310.
  • Fig. 5 is a schematic diagram of functional modules of a storage controller for repairing a memory provided by an embodiment of the present application.
  • the storage controller 500 includes a receiving module 510, an acquiring module 520, a writing module 530, and a monitoring module 540, where:
  • the receiving module 510 is configured to receive a first read request, where the first read request is used to instruct to read the first data in the first memory;
  • the acquiring module 520 is configured to acquire the first data from the second memory when the first data in the first memory fails to be read;
  • the writing module 530 is used to write the first data back to the first memory.
  • the monitoring module 540 is configured to, when the first data is successfully written back to the first memory, determine that a logical bad block has been generated in the first memory and the logical bad block has been repaired, and when the writing module 530 writes the first data When returning to the first memory fails, add 1 to the recorded number of write errors and read errors of the first memory respectively.
  • the above-mentioned storage controller 500 is also used to perform other steps of repairing the storage as shown in FIG. 2 and FIG. 4.
  • the receiving module 510 is used to perform steps S200 in Figure 2 and step S400 in Figure 4;
  • the obtaining module 520 is used to perform steps S205, S210, S230, S240 in Figure 2 and the steps in Figure 4 S405, S410, S430, S440 and other steps;
  • the writing module 530 is used to execute steps S220, S250 in Figure 2 and steps S420, S450 in Figure 4;
  • the monitoring module is used to perform steps S270, S280 in Figure 2 , S290 and steps S470, S480, S490 and other steps in Figure 4.
  • the specific process of each module executing each step please refer to the introduction of Figure 2 and Figure 4 above, and will not be repeated here.
  • FIG. 6 is a schematic structural diagram of a computer device 600 for adjusting processor power provided according to an embodiment of the present application.
  • the computer device 600 in this embodiment may be one of the specific implementations of the computer device in the foregoing embodiments.
  • the computer device 600 includes a processor 601, and the processor 601 is connected to a memory 605.
  • the processor 601 can be a field programmable gate array (English full name: Field Programmable Gate Array, abbreviation: FPGA), or a digital signal processor (English full name: Digital Signal Processor, abbreviation: DSP) or other calculation logic or any of the above calculation logic combination.
  • the processor 601 may also be a single-core processor or a multi-core processor.
  • the memory 605 can be Random Access Memory (RAM), flash memory, Read Only Memory (ROM), Erasable Programmable Read Only Memory (EPROM), and erasable when charged.
  • RAM Random Access Memory
  • ROM Read Only Memory
  • EPROM Erasable Programmable Read Only Memory
  • Programmable read-only memory Electrically Erasable Programmable read only memory
  • register or any other form of storage medium known in the art.
  • the memory can be used to store program instructions. When the program instructions are executed by the processor 601, the processor 601 Perform the method described in the above embodiment.
  • connection line 609 is used to transfer information between the various components of the communication device.
  • the connection line 609 may use a wired connection mode or a wireless connection mode, which is not limited in this application.
  • the connection 609 is also connected to a network interface 604.
  • the network interface 604 uses connection devices such as but not limited to cables or twisted wires to realize communication with other devices or the network 611.
  • the network interface 604 can also be interconnected with the network 611 in a wireless manner.
  • the computer device 600 further includes a storage controller 612, and the functions of the storage controller 612 may refer to the above description of the functions of the storage controllers in FIG. 1, FIG. 3, and FIG. 5, which will not be repeated here in this application.
  • the storage controller 612 may be connected to one or more external storage controllers 602, and implement the method flow described in FIG. 2 and FIG. 4 of the present application.
  • Some features of the embodiments of the present application may be implemented/supported by the processor 601 executing program instructions or software codes in the memory 605.
  • the software components loaded on the memory 605 can be summarized in terms of function or logic, for example, the monitoring module 620, the determining module 630, and the adjusting module 640 shown in FIG. 6.
  • the processor 601 executes the transaction related to the above-mentioned function/logic module in the memory.
  • FIG. 6 is only an example of the computer device 600, and the computer device 600 may include more or fewer components than those shown in FIG. 6, or may have different component configurations.
  • the various components shown in FIG. 6 can be implemented by hardware, software, or a combination of hardware and software.
  • the memory and the processor may be implemented in one module, and the instructions in the memory may be written into the memory in advance, or may be loaded by the subsequent processor during execution.

Landscapes

  • Engineering & Computer Science (AREA)
  • Theoretical Computer Science (AREA)
  • Quality & Reliability (AREA)
  • Physics & Mathematics (AREA)
  • General Engineering & Computer Science (AREA)
  • General Physics & Mathematics (AREA)
  • Techniques For Improving Reliability Of Storages (AREA)

Abstract

一种修复存储器的方法和装置。所述方法应用于存储控制器,所述存储控制器与第一存储器和第二存储器相连接,所述方法包括:接收第一读取请求,所述第一读取请求用于指示读取所述第一存储器中的第一数据;当读取所述第一存储器中的第一数据失败时,从所述第二存储器中获取所述第一数据;向所述第一存储器发送写请求,所述写请求中携带有所述第一数据;接收所述第一存储器发送的响应消息,所述响应消息用于指示所述第一数据已经被成功写入所述第一存储器;根据所述响应消息确定所述第一存储器的逻辑坏块错误已经被修复。该方法可以实现对存储器中逻辑坏块的修复,提升系统的可用性,减少存储器需要更换的次数,降低系统的运行成本。

Description

一种存储器的修复方法及装置 技术领域
本申请涉及计算机领域,特别涉及一种可以修复存储器的方法和装置。
背景技术
存储器是用来存储程序和各种数据信息的记忆部件。存储器通常是利用双稳态半导体电路、互补金属氧化物半导体(Complementary Metal Oxide Semiconductor,CMOS)晶体管或磁性材料的存储元来存储二进制代码。存储器根据使用材质的不同可以分成半导体存储器和磁表面存储器等类型。
安全数码(Secure Memory Card,SD)卡,又称SD存储卡,是一种基于半导体快闪记忆器的新一代记忆设备,由于它具有体积小、数据传输速度块以及可热插拔等优良特性,被广泛地用于便携式装置上使用,例如数码相机、个人数码助理(Personal Digital Assistant,PDA)和多媒体播放器等。近年来,VMware等操作系统厂商推出了使用SD卡作为引导操作系统的存储介质,从而推广了SD卡在企业级市场的应用范围。
SD卡作为一款消费机的电子存储产品,使用了相对廉价的原材料,固件也相对简单,可靠性和可服务型性均较差,同时故障率相对较高。因此,各厂商在制作和使用SD卡时,都会采取一定的措施监控SD卡的使用情况,以保证系统的稳定性。
发明内容
现有技术中,当存储器发生数据读取错误时,会将该存储器设置为故障状态或者将发生故障的地址范围的状态设置为不可用,会降低系统的可用性并导致系统中的存储器需要被频繁更换。为了解决上述问题,本申请提供一种修复存储器的方法和装置,可以当存储器发生数据读取失败后,会从系统中的其他存储器获取待读取数据,并用该数据修复存储器中的错误,提高系统的可用性并降低系统运行的成本。
第一方面,本申请提供一种存储器的修复方法,该方法包括用于存储控制器,该存储控制器与第一存储器和第二存储器相连接,该方法包括:接收第一读取请求,该第一读取请求用于指示读取第一存储器中的第一数据;当读取所述第一存储器中的第一数据失败时,从所述第二存储器中获取所述第一数据;向所述第一存储器发送写请求,所述写请求中携带有所述第一数据;接收所述第一存储器发送的响应消息,所述响应消息用于指示所述第一数据已经被成功写入所述第一存储器;根据所述响应消息确定所述第一存储器的逻辑块错误已经被修复。
根据上述方法,当从第一存储器读取数据失败时,先不将该存储器设置为故障状态,而是从系统中存储有待读取数据的第二存储器中获取该数据,并将获取到的数据写回第一存储器。这种做法可以通过将数据写回第一存储器来修复第一存储器中的坏块。具体来说,如果第一存储器中发生的错误是由于产生了逻辑坏块,则可以通过将数据写回第一存储器来修复该错误;如果第一存储器中发生的错误时由于产生了物理坏块,则无法通过将数据写回第一存储器来修复该错误。因此,如果接收到指示第一数据已经被成功 写入第一存储器的响应消息时,可以确定第一存储器中发生的逻辑坏块错误已经被修复。本申请第一方面所提供的方法通过修复第一存储器中的逻辑坏块提高系统的稳定性,并可以降低存储器需要被更换的频率,从而降低系统的运行成本。
根据第一方面,在一种可能的实现方式中,第一存储器和第二存储器是SD卡,且第一存储器和第二存储器存储相同的数据。
由于SD卡兼具使用成本低廉的优点和可靠性较低的缺点,因此可以在使用SD卡的同时,将两个及以上的SD卡组成镜像,这样可以当其中某个SD卡出现故障时从其他SD卡中获得数据,从而可以在降低系统运行成本的前提下提高系统的稳定性。
根据第一方面,在另一种可能的实现方式中,该方法还包括:接收第二读取请求,该第二读取请求用于指示读取第一存储器中的第二数据;当读取第一存储器中的第二数据失败时,向第二存储器发送获取请求;当获取第二存储器中的第二数据失败时,确定第一存储器和第二存储器产生了物理坏块,并将记录的第一存储器的读错误次数和第二存储器的读错误次数分别加1。
根据上述方法,当存储控制器从第一存储器和第二存储器中获取数据均失败时,已经无法通过会写数据对存储器进行修复,为了系统的可靠性,可以认定第一存储器和第二存储器均产生了物理坏块,并将记录的第一存储器和第二存储器的读错误次数分别加1。通过这种做法,有利于后续提醒存储控制器及时更换存储器,从而提高系统的可靠性。
根据第一方面,在一种可能的实现方式中,该方法还包括:当第一存储器的读错误次数或第二存储器的读错误次数达到阈值时,发出警告。
根据上述做法,存储控制器可以对第一存储器和第二存储器的读错误次数进行监控,当某个存储器的读错误次数达到阈值时,认为该存储器需要更换,从而发出警告。该方案可以及时提示更换发生错误次数过多的存储器,从而提高了系统运行的稳定性。
根据第一方面,在另一种可能的实现方式中,该存储控制器还与第三存储器相连接,该第一存储器、第二存储器和第三存储器构成独立磁盘冗余阵列RAID 5,从第二存储器中获取第一数据,具体包括:根据第二存储器中保存的第一数据的校验信息,恢复第一数据。
通过上述做法,可以将本申请提供的技术方案应用于RAID 5的架构下,从而增强了本申请所提供的系统的通用性。
根据第一方面,在另一种可能的实现方式中,该方法还包括:接收第三读取请求,该第三读取请求用于指示读取第一存储器中的第三数据;当读取第一存储器的第三数据失败时,从第二存储器中获取第三数据;向第一存储器发送写请求,该写请求中携带有第三数据;确定向第一存储器写入数据失败;将记录的所述第一存储器的读错误次数和写错误次数分别加1。
根据上述方法,如果将第一数据写回第一存储器失败,从而无法修复第一存储器中的坏块,可以认为第一存储器中发生了物理坏块。且因为第一存储器在写数据方面也发生了错误,因此存储控制器将写错误次数和读错误次数分别加1。通过这种方法,可以更精细地记录第一存储器中发生的错误,从而后续提醒存储控制器及时更换第一存储器,可以提高系统的可靠性。
根据第一方面,在另一种可能的实现方式中,该方法还包括:将第一存储器的写错 误次数与第一阈值进行比较;将第一存储器的读错误次数与第二阈值进行比较;当第一存储器的写错误次数达到第一阈值或第一存储器的读错误次数达到第二阈值时,发出警告。
根据上述方法,在存储控制器中分别设定了针对写错误次数的第一阈值和针对读错误次数的第二阈值,并分别判断第一存储器的读错误次数是否达到第一阈值以及写错误次数是否达到第二阈值。通过这种方法,可以更精细的判断第一存储器是否具有足够的可靠性,从而提高了整个系统的可靠性。
第二方面,本申请提供一种存储控制器,该存储控制器与第一存储器和第二存储器相连接,该存储控制器包括:接收模块,用于接收第一读取请求,该第一读取请求用于指示读取第一存储器中的第一数据;获取模块,用于当读取第一存储器中的第一数据失败时,从第二存储器中获取该第一数据;写入模块,用于向第一存储器发送写请求,该写请求中携带有第一数据;监控模块用于接收第一存储器发送的响应消息,该响应消息用于指示第一数据已经被写入第一存储器;根据该响应消息确定第一存储器的逻辑坏块错误已经被修复。
根据第二方面,在一种可能的实现方式中,第一存储器和第二存储器是SD卡,且第一存储器和第二存储器存储相同的数据。
根据第二方面,在另一种可能的实现方式中,接收模块还用于:接收第二读取请求,该第二读取请求用于指示读取第一存储器中的第二数据;当读取第一存储器中的第二数据失败时,向第二存储器发送第二读取请求;监控模块还用于:当读取第二存储器中的第二数据失败时,确定第一存储器和第二存储器产生了逻辑坏块;将记录的第一存储器的读错误次数和第二存储器的读错误次数分别加1。
根据第二方面,在另一种可能的实现方式中,监控模块还用于:当第一存储器的读错误次数或第二存储器的读错误次数达到阈值时,发出警告。
根据第二方面,在另一种可能的实现方式中,该存储控制器还与第三存储器相连接,该第一存储器、第二存储器和第三存储器构成独立磁盘冗余阵列RAID 5,获取模块具体用于:根据第二存储器中保存的第一数据的校验信息,恢复第一数据。
根据第二方面,在另一种可能的实现方式中,接收模块还用于:接收第三读取请求,该第三读取请求用于指示读取第一存储器中的第三数据;获取模块还用于:当读取第一存储器的第三数据失败时,从第二存储器中获取第三数据;写入模块还用于:向所述第一存储器发送写请求,该写请求中携带有第三数据;监控模块还用于:确定向第一存储器写入数据失败,将记录的第一存储器的读错误次数和写错误次数分别加1。
根据第二方面,在另一种可能的实现方式中,监控模块还用于:将第一存储器的写错误次数与第一阈值进行比较;将第一存储器的读错误次数与第二阈值进行比较;当第一存储器的写错误次数达到第一阈值或第一存储器的读错误次数达到第二阈值时,发出警告。
第三方面,本申请提供一种计算机装置,该计算机装置包括存储器和处理器,该存储器用于存储程序代码,该处理器用于调用存储器中的程序代码以执行第一方面中任意一种实现方式提供的方法。
第四方面,本申请提供一种计算机可读存储介质,该计算机可读存储介质存储程序 代码,该程序代码可以被计算机装置调用以执行第一方面中任意一种实现方式提供的方法。
附图说明
图1是本申请所提供的一种采用RAID 1架构的系统的示意图。
图2是本申请所提供的一个实施例的流程示意图。
图3是本申请所提供的一种采用RAID 5架构的系统的示意图。
图4是本申请所提供的另一个实施例的流程示意图。
图5是本申请所提供的一种存储控制器的功能模块的示意图。
图6是本申请所提供的一种包含存储控制器的计算机装置的架构的示意图。
具体实施方式
SD卡作为一种与非门闪存(NAND Flash)介质,具有读干扰(read disturb)等典型的失效问题。其中,读干扰是指,读取存储介质的某一个页面时,会产生对同一个数据块中的其他页面的干扰。随着闪存制程的提高,每一个物理块所包含的页面更多。而当每个存储单元可以存储多个信息比特,例如多阶存储单元(Multi-Level Cell,MLC)、三阶储存单元(Triple-Level Cell,TLC)或者四阶储存单元(Quad-Level Cell,QLC),读干扰的问题会更加严重。读干扰会导致读取出来的错误出错,从而丢失数据。当SD卡的同一个存储单元被反复读取时,容易使得相邻的存储单元发生比特跳变,从而产生逻辑坏块。
为了弥补SD卡故障率相对较高与企业级用户对可靠性要求较高之间的矛盾,硬件厂家在制作SD卡时,通常会给SD卡预装故障保护功能。例如,设置监控模块对SD发生故障的次数进行监控,当系统中某个SD卡发生故障次数超过一定数量后,监控模块会发出警告,提示系统更换该SD卡,以免影响系统的稳定性。
在上述方法中,当SD卡发生逻辑坏块后,计算机装置可能会直接将该SD卡设置为故障状态、或者将该SD卡发生故障的地址范围的状态设置为不可用。这种做法不但会增加系统级的故障率,降低系统的可用性,而且会导致系统中的SD卡需要被频繁替换,增加了系统的运行成本。
本申请提供一种可修复存储器逻辑坏块的方法和装置。为了降低系统级的故障率并降低系统的运行成本,在本申请提供的技术方案中,将若干个存储器构成主从阵列。当从主存储器读取数据失败后,从从存储器读取数据并利用该数据修复主存储器中的错误。如果主存储器中的错误可以被修复,则不将该主存储器设置为故障状态或将发生故障的地址设置为不可用范围。通过这种做法,可以实现对存储器中逻辑坏块的修复,从而提升系统的可靠性,并可以减少存储器需要更换的次数,从而降低系统的运行成本。
图1是本申请实施例所提供的一种采用RAID 1架构的系统的示意图。
如图1所示,计算系统100包括计算机装置110、SD独立磁盘冗余阵列(Redundant Array of Independent Disks,RAID)控制器120、主SD卡130和从SD卡140。计算机装置110与SD RAID控制器120相连,其中,计算机装置110既可以通过通用串行总线(Universal Serial Bus,USB)接口与SD RAID控制器120连接,也可以通过外围组件快 速互联(peripheral component interconnect express,PCIe)接口等其他接口与SD RAID控制器120连接,本申请不对此进行限定。
主SD卡130和从SD卡140用于存储数据,且主SD卡130和从SD卡140本身也具有控制器(图中未示出)。由于SD卡作为闪存设备容易发生读干扰等错误,SD控制器可以对写入SD卡的数据或从SD卡中读取的数据进行数据校验。对于数据的校验可以采用多种方法,例如奇偶校验,其中,奇校验即当传送每一个字节的时候另外附加一位作为校验位,当实际数据中“1”的个数为偶数的时候,这个校验位就是“1”,否则这个校验位就是“0”,这样就可以保证传送数据满足奇校验的要求。在接收方收到数据时,将按照奇校验的要求检测数据中“1”的个数,如果是奇数,表示传送正确,否则表示传送错误。
本申请为了叙述方便,采用SD卡作为例子进行说明,但本申请并不限定存储器的类型。本申请所指的存储器既可以是包含SD卡在内的闪存设备,因为闪存设备都可能因为读扰动等原因发生逻辑错误,并通过将正确的数据回写来克服;本申请所指的存储器也可以是传统的机械硬盘等存储设备,因为机械硬盘可能因为临近扇区的磁道进行读写时发生电磁干扰,从而产生逻辑错误。因此,各种类型的存储器,只要采用本申请所提供的技术方案,均在本申请的保护范围之中。
SD RAID控制器120通过总线分别与主SD卡130和从SD卡140连接。其中,主SD卡130和从SD卡140构成一个RAID 1阵列,RAID 1阵列是通过磁盘数据镜像实现数据冗余,在成对的独立磁盘上产生互为备份的数据。当原始数据繁忙时,可直接从镜像拷贝中读取数据,因此RAID 1可以提高读取性能;同时,由于RAID 1阵列总是保持一份完成的数据备份,所以该种类型的磁盘阵列的数据安全性和可用性也较高。另外,SD RAID控制器既可以如图1所示是位于计算机装置110和SD卡之间的硬件装置,也可以是与SD位于同一块电路板上的一个芯片,也可以以软件的形态存在于计算机装置110上。本申请不对SD RAID控制器的实际形态和在系统中的位置进行限定,只要SD RAID控制器的作用如本申请所述,即处理本申请的保护范围之中,下文中为了方便叙述采用以SD RAID控制器为硬件进行说明。
由于主SD卡130和从SD卡140之间是镜像的关系,因此,当计算机装置110通过SD RAID控制器120向SD卡中写入数据时,SD RAID控制器会将该数据同时写入主SD卡130和从SD卡140中。而当计算机装置110通过SD RAID控制器120读取数据时,SD RAID控制器120会首先向从主SD卡130中发送读操作请求以获取数据。当从主SD卡130中读取数据失败后,SD RAID控制器120再向从SD卡140发送读操作请求以获取数据。
图2是本申请所提供一个实施例的流程示意图。
如图2所示,该实施例是在图1所示的RAID 1架构下,当SD RAID控制器120从主SD卡130中读取数据失败后,从从SD卡140中读取数据,并利用读取到的数据修复主SD卡130中的错误。具体流程如下:
S200:SD RAID控制器120接收计算机装置110发送的读操作请求。
S205:SD RAID控制器120根据计算机装置110发送的读操作请求,向所述主SD卡130发送读操作请求以读取主SD卡中第一数据块中的数据。
S210:判断SD RAID控制器120读取主SD卡130中的数据是否成功。
上文中已经提到,由于SD卡中存储的数据具有的一定的概率发生错误,因此,当主SD卡130中的数据被读取时,主SD卡130中的控制器会对被读取的数据进行校验,若主SD130中的数据可以被SD RAID控制器120读取且通过校验,则判断为读取数据成功;反之则判断为读取数据失败。
S220:当SD RAID控制器120读取主SD卡130中的数据成功时,将该数据发送给计算机装置110。
S230:当SD RAID控制器120读取主SD卡130中的数据失败时,SD RAID控制器120根据读操作请求,向从SD卡140发送读操作请求以读取从SD卡140中第一数据块中的数据。其中,从SD卡140中第一数据块中的数据与主SD卡130中第一数据块中的数据是镜像关系。
在图1所示的RAID 1架构下,主SD卡130和从SD卡140的数据是完全的镜像关系,之前的数据在通过SD RAID控制器120写入主SD卡130时,也会将一份同样的数据写入从SD卡140。因此,主SD卡130中的数据和从SD卡140中的数据是相同的,SD RAID控制器120可以向从SD卡140发送读操作请求。
S240:判断SD RAID控制器120读取从SD卡140中的数据是否成功。当读取数据成功时,跳转至步骤S250;当读取数据失败时,跳转至步骤S280。
该步骤和步骤S210类似,当从从SD卡140中读取数据时,从SD卡中的控制器对被读取的数据进行校验,根据校验结果判断读取数据是否成功。
S250:SD RAID控制器120向主SD卡130发送写操作请求,以将从从SD中读取的数据写入主SD卡130中。
由于之前从主SD卡130中读取数据失败,因此可以判断主SD卡130发生了坏块。为了尽可能的提高系统的稳定性以及减少对SD卡的更换,可以先判断主SD卡130中发生的错误属于物理坏块还是逻辑坏块。其中,物理坏块也称介质坏块,是指用于存储数据的数据块对应的介质本身发生了物理意义上的错误,例如数据块中的某个比特位出现故障只能显示某个固定数值的固定性故障(stuck-at fault)。在这种情况下,数据块中存储的数据通常是没有规律的,且数据块对应的介质存在电路故障等问题,无法通过将正确的数据写回来修复主SD卡130的坏块。逻辑坏块则是指数据块内的数据存在逻辑问题,例如索引块的索引值没有按照顺序排列;也可以是因为数据块中部分比特位的数值发生了错误,例如某个比特位由于收到临近数据块的读干扰影响,该比特位的值由0变化成了1,导致在步骤S210中,主SD卡130的控制器校验读取的数据为错误数据。由于逻辑坏块不涉及介质上的问题,因此可以通过将从SD卡140中的正确数据写入主SD卡130的方式修复该错误。
S260:判断数据写入主SD卡130是否成功。当数据写入成功时,跳转至步骤S220,即将该数据发送给计算机装置110。当数据写入失败时,跳转至步骤S270。
具体的,当数据写入主SD卡130成功时,主SD卡130可以向SD RAID控制器120发送响应消息,该响应消息用于指示第一数据已经被成功写入主SD卡130了。
S270:将记录的主SD卡130的读错误次数和写错误次数分别加上1后,跳转至步骤S220,即将该数据发送给计算机装置110。
当将从SD卡140中的正确数据写入主SD卡130失败后,可以判断主SD卡不仅发生了物理坏块,导致从主SD卡中读取的数据存在错误,而且主SD卡在接收数据写入时也会发生错误,因此将记录的主SD卡的读错误次数和写错误次数分别加上1,并将从从SD卡140中读出的数据发送给计算机装置110。
S280:将记录的主SD卡130的读错误次数和从SD卡140的读错误次数分别加上1。
由于从主SD卡130和从SD卡140中读出的数据都是错误的,此时无法验证这两个SD卡中发生的坏块是物理坏块还是逻辑坏块。为了增加系统的稳定性,此时将这两个SD卡发生的坏块都认为是物理坏块,将记录的主SD卡130的读错误次数和从SD卡140的读错误次数分别加上1。
S290:SD RAID控制器120向计算机装置110发送读取失败响应。
由于SD RAID控制器120从主SD卡130和从SD卡140读取都失败了,计算机装置110无法从SD RAID控制器120处获取所请求的数据,因此SD RAID控制器120向计算机装置110发送读取失败响应。
此外,本申请的步骤S270和S280中,当主SD卡130和从SD卡140发生读写错误时,会相应的记录主SD卡130和从SD卡140的读错误次数和写错误次数。为了提高计算系统的稳定性,可以根据SD卡发生过的读写错误次数确定是否更换SD卡。具体来说,在一种可能的实现方式中,可以分别设定读错误阈值和写错误阈值。当判断SD卡是否需要更换时,可以分别判断该SD卡当前的读错误次数是否达到了设定的读错误阈值以及判断该SD卡当前的写错误次数是否达到了设定的写错误阈值,当其中任何一种错误的次数达到了对应的阈值时,该SD卡上的控制器将发出告警,提示用户更换该SD卡。由于写错误的严重程度要远超过读错误,因此可以将写错误阈值设定为小于读错误阈值,例如可以将写错误阈值设定为10次,将读错误阈值设定为20次。在本申请的另一种实现方式中,可以将SD卡发生的读错误次数和写错误次数综合考虑,例如将该SD卡发生的读错误次数乘以读错误权重加上写错误次数乘以写错误权要,再判断该结果是否超过预设值。
在图1所示的架构中,由于RAID 1最多只能包括两个SD卡,因此系统所支持的总存储容量会受到影响。为了适应大容量的存储要求,本申请还可以用于RAID 10的架构。RAID 10是将RAID 1和RAID 0标准结合的产物,在连续地以位或字节为单位分割数据并且并行读/写多个磁盘的同时,为每一块磁盘作磁盘镜像进行冗余。它的优点是同时拥有RAID 0的超凡速度和RAID 1的数据高可靠性,并可以支持更大容量的存储。由于RAID 10的架构可以视为多个采用RAID 1的磁盘组合,因此可以在每个磁盘组合中使用图2所述的方法流程,本申请不对此进行赘述。
由于RAID 1和RAID 10都需要建立的完整的镜像备份,所以采用RAID 1和RAID10的系统实际可用的存储容量是该系统所包含的硬件提供的存储容量一半。为了提高系统的存储容量的利用率,本申请还可以应用于RAID 5架构下。
图4是本申请提供的采用RAID 5架构的系统的示意图。
RAID 5是一种存储性能、数据安全和存储成本兼顾的存储解决方案。如图4所示,计算系统包括计算机装置、SD RAID控制器和至少3个SD卡(图示为3个SD卡)。RAID5不是对存储的数据进行备份,而是把数据和相对应的奇偶校验信息存储到组成 RAID5的各个磁盘上,并且所存储的数据和相对应的奇偶校验信息分别存储于不同的磁盘上。当SD RAID控制器读取其中一个SD卡上的数据发生错误后,可以利用其它SD卡上保存的奇偶校验数据获得正确的数据。由于一个奇偶校验信息可以对应多个数据,因此RAID 5的磁盘使用率要比RAID 1高,从而可以降低存储成本。
S400:SD RAID控制器320接收计算机装置310发送的读操作请求。
S405:SD RAID控制器320根据计算机装置310发送的读操作请求,向第一SD卡330发送读操作请求以读取第一SD卡中第一数据块中的数据。
和RAID 1架构采用的镜像存储方式不同,在RAID 10架构中,每个SD卡存储的数据都是不同的,因此不存在主从的关系。当SD RAID控制器320接收到读操作请求时,先根据该读操作请求中的信息确定待读取的数据位于第一SD卡330中的第一数据块,再向第一SD卡330发送读操作请求以获取该数据。
S410:判断SD RAID控制器320读取第一SD卡330中的数据是否成功。
S420:当SD RAID控制器320读取第一SD卡330中的数据成功时,将该数据发送给计算机装置310。
S430:当SD RAID控制器320读取第一SD卡330中的数据失败时,SD RAID控制器320确定第二SD卡340中存有第一SD卡330中的第一数据块中数据对应的校验数据,并向第二SD卡340发送获取请求以根据该校验数据获取所请求的数据。
在RAID 5的架构下,每个SD卡所保存的数据对应的校验数据都保存在阵列中的其他SD卡中,当阵列汇总只有一个SD卡损坏时,该SD卡中的数据可以根据校验数据进行恢复。在该实施例中,当读取第一SD卡中的数据失败后,可以根据这部分数据对应的校验数据获取该部分数据。
S440:判断SD RAID控制器320根据第二SD卡340中的校验数据恢复待读取数据是否成功。当恢复数据成功时,跳转至步骤S450;当恢复数据失败时,跳转至步骤S480.
S450:SD RAID控制器320向第一SD卡330发送写操作请求,以将根据第二SD卡340中的校验数据恢复的数据写入第一SD卡330。
S460:判断将恢复的数据写入第一SD卡330是否成功。当数据写入成功时,跳转至步骤S420,即将该数据发送给计算机装置310。当数据写入失败时,跳转至步骤S470。
S470:将记录的第一SD卡430的读错误次数和写错误次数分别加上1后,跳转至步骤S420,即将该数据发送给计算机装置310。
S480:将记录的第一SD卡430的读错误次数和第二SD卡的读错误次数分别加上1。
S490:SD RAID控制器320向计算机装置310发送读取失败响应。
和上文所述的实施例类似,在本实施例中,也可以根据SD卡的读错误次数和写错误次数确定是否需要更换SD卡,此处不再赘述。
图5是本申请的实施例所提供的用于修复存储器的存储控制器的功能模块的示意图。如图5所示,该存储控制器500包括接收模块510、获取模块520、写入模块530以及监控模块540,其中:
接收模块510,用于接收第一读取请求,该第一读取请求用于指示读取第一存储器中的第一数据;
获取模块520,用于当读取第一存储器中的第一数据失败时,从第二存储器中获取该 第一数据;
写入模块530,用于将第一数据写回第一存储器。
监控模块540,用于当所述第一数据写回所述第一存储器成功时,确定第一存储器产生了逻辑坏块且该逻辑坏块已经被修复以及当写入模块530将第一数据写回第一存储器失败时,将记录的第一存储器的写错误次数和读错误次数分别加1。
上述存储控制器500还用于执行如图2和图4所示的修复存储器的其他步骤。具体来说,接收模块510用于执行图2中的步骤S200和图4中的步骤S400等步骤;获取模块520用于执行图2中的步骤S205、S210、S230、S240以及图4中的步骤S405、S410、S430、S440等步骤;写入模块530用于执行图2中的步骤S220、S250以及图4中的步骤S420、S450等步骤;监控模块用于执行图2中的步骤S270、S280、S290以及图4中的步骤S470、S480、S490等步骤。各个模块执行各个步骤的具体流程请见上文对图2和图4的介绍,此处不再赘述。
图6为依据本申请的实施例提供的用于调整处理器功率的计算机装置600的结构示意图。本实施例中的计算机装置600可以是上述各实施例中的计算机装置的其中一种具体实现方式。
如图6所示,该计算机装置600包括处理器601,处理器601与存储器605连接。处理器601可以为现场可编程门阵列(英文全称:Field Programmable Gate Array,缩写:FPGA),或数字信号处理器(英文全称:Digital Signal Processor,缩写:DSP)等计算逻辑或以上任意计算逻辑的组合。处理器601也可以为单核处理器或多核处理器。
存储器605可以是随机存取存储器(Random Access Memory,RAM)、闪存、只读存储器(Read Only Memory,ROM)、可擦除可编程只读存储器(Erasable Programmable Read Only Memory,EPROM)、带电可擦可编程只读存储器(Electrically Erasable Programmable read only memory,EEPROM)、寄存器或者本领域熟知的任何其它形式的存储介质,存储器可以用于存储程序指令,该程序指令被处理器601执行时,处理器601执行上述实施例中的所述的方法。
连接线609用于在通信装置的各部件之间传递信息,连接线609可以使用有线的连接方式或采用无线的连接方式,本申请并不对此进行限定。连接609还连接有网络接口604。
网络接口604使用例如但不限于电缆或电绞线一类的连接装置,来实现与其他设备或网络611之间的通信,网络接口604还可以通过无线的形式与网络611互连。
计算机装置600还包括存储控制器612,该存储控制器612的功能可以参考上文对于图1、图3以及图5中存储控制器的功能的介绍,本申请不在此进行赘述。存储控制器612可以与一个或多个外置的存储控制器602连接,并实现本申请在图2和图4所介绍的方法流程。
本申请实施例的一些特征可以由处理器601执行存储器605中的程序指令或者软件代码来完成/支持。存储器605上在加载的软件组件可以从功能或者逻辑上进行概括,例如,图6所示的监控模块620、确定模块630和调整模块640。
在本申请的一个实施例中,当存储器605加载程序指令后,处理器601执行存储器中的上述功能/逻辑模块相关的事务。
此外,图6仅仅是一个计算机装置600的例子,计算机装置600可能包含相比于图6展示的更多或者更少的组件,或者有不同的组件配置方式。同时,图6中展示的各种组件可以用硬件、软件或者硬件与软件的结合方式实施。例如,存储器和处理器可以在一个模块中实现,存储器中的指令可以是预先写入存储器的,也可以是后续处理器在执行的过程中加载的。

Claims (10)

  1. 一种存储器的修复方法,其特征在于,所述方法应用于存储控制器,所述存储控制器与第一存储器和第二存储器相连接,所述方法包括:
    接收第一读取请求,所述第一读取请求用于指示读取所述第一存储器中的第一数据;
    当读取所述第一存储器中的第一数据失败时,从所述第二存储器中获取所述第一数据;
    向所述第一存储器发送写请求,所述写请求中携带有所述第一数据;
    接收所述第一存储器发送的响应消息,所述响应消息用于指示所述第一数据已经被成功写入所述第一存储器;
    根据所述响应消息确定所述第一存储器的逻辑坏块错误已经被修复。
  2. 根据权利要求1所述的方法,其特征在于,所述第一存储器和所述第二存储器是安全数字SD卡,且所述第一存储器和所述第二存储器存储相同的数据。
  3. 根据权利要求1或2所述的方法,其特征在于,所述方法还包括:
    接收第二读取请求,所述第二读取请求用于指示读取所述第一存储器中的第二数据;
    当读取所述第一存储器中的第二数据失败时,向所述第二存储器发送获取请求,其中,所述获取请求用于从第二存储器中获取所述第二数据;
    当获取所述第二存储器中的第二数据失败时,确定所述第一存储器和所述第二存储器产生了物理坏块;
    将记录的所述第一存储器的读错误次数和所述第二存储器的读错误次数分别加1。
  4. 根据权利要求3所述的方法,其特征在于,所述方法还包括:
    当所述第一存储器的读错误次数或所述第二存储器的读错误次数达到阈值时,发出警告。
  5. 根据权利要求1-4任一项所述的方法,其特征在于,所述存储控制器还与第三存储器相连接,所述第一存储器、所述第二存储器和所述第三存储器构成独立磁盘冗余阵列RAID 5,
    所述从所述第二存储器中获取所述第一数据,包括:
    根据所述第二存储器中保存的所述第一数据的校验信息,恢复所述第一数据。
  6. 一种存储控制器,其特征在于,所述存储控制器与第一存储器和第二存储器相连接,所述存储控制器包括:
    接收模块,用于:接收第一读取请求,所述第一读取请求用于指示读取所述第一存储器中的第一数据;
    获取模块,用于:当读取所述第一存储器中的第一数据失败时,从所述第二存储器中获取所述第一数据;
    写入模块,用于:向所述第一存储器发送写请求,所述写请求中携带有所述第一数据;
    监控模块,用于:接收所述第一存储器发送的响应消息,所述响应消息用于指示所述第一数据已经被成功写入所述第一存储器;
    根据所述响应消息确定所述第一存储器的逻辑坏块错误已经被修复。
  7. 根据权利要求6所述的存储控制器,其特征在于,所述第一存储器和所述第二存储器是安全数字SD卡,且所述第一存储器和所述第二存储器存储相同的数据。
  8. 根据权利要求6或7所述的存储控制器,其特征在于,
    所述接收模块还用于:接收第二读取请求,所述第二读取请求用于指示读取所述第一存储器中的第二数据;
    当读取所述第一存储器中的第二数据失败时,向所述第二存储器发送获取请求;
    所述监控模块还用于:当获取所述第二存储器中的第二数据失败时,确定所述第一存储器和所述第二存储器产生了逻辑坏块;
    将记录的所述第一存储器的读错误次数和所述第二存储器的读错误次数分别加1;
    当所述第一存储器的读错误次数或所述第二存储器的读错误次数达到阈值时,发出警告。
  9. 根据权利要求6-8任一项所述的存储控制器,其特征在于,所述存储控制器还与第三存储器相连接,所述第一存储器、所述第二存储器和所述第三存储器构成独立磁盘冗余阵列RAID 5,
    所述获取模块用于:根据所述第二存储器中保存的所述第一数据的校验信息,恢复所述第一数据。
  10. 一种计算机装置,其特征在于,所述计算机装置包括存储器和处理器,所述存储器用于存储程序代码,所述处理器用于调用所述存储器中的程序代码以执行如权利要求1-5任一项所述的方法。
PCT/CN2020/095660 2019-11-07 2020-06-11 一种存储器的修复方法及装置 WO2021088368A1 (zh)

Applications Claiming Priority (4)

Application Number Priority Date Filing Date Title
CN201911083992 2019-11-07
CN201911083992.8 2019-11-07
CN201911424838.2 2019-12-31
CN201911424838.2A CN111221681A (zh) 2019-11-07 2019-12-31 一种存储器的修复方法及装置

Publications (1)

Publication Number Publication Date
WO2021088368A1 true WO2021088368A1 (zh) 2021-05-14

Family

ID=70832790

Family Applications (1)

Application Number Title Priority Date Filing Date
PCT/CN2020/095660 WO2021088368A1 (zh) 2019-11-07 2020-06-11 一种存储器的修复方法及装置

Country Status (2)

Country Link
CN (1) CN111221681A (zh)
WO (1) WO2021088368A1 (zh)

Families Citing this family (2)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN111221681A (zh) * 2019-11-07 2020-06-02 华为技术有限公司 一种存储器的修复方法及装置
CN117581301A (zh) * 2021-12-14 2024-02-20 英特尔公司 用于防止电子设备中的存储器故障的设备和方法

Citations (6)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US20090182785A1 (en) * 2008-01-16 2009-07-16 Bluearc Uk Limited Multi-Way Checkpoints in a Data Storage System
CN101777013A (zh) * 2009-01-12 2010-07-14 成都市华为赛门铁克科技有限公司 一种固态硬盘及数据读写方法
CN102200937A (zh) * 2011-05-31 2011-09-28 深圳创维-Rgb电子有限公司 一种与非闪存中数据读取的方法、装置及电视机系统
CN102541469A (zh) * 2011-12-13 2012-07-04 华为技术有限公司 固件存储系统中数据保护的方法、设备及系统
CN106844088A (zh) * 2017-02-20 2017-06-13 郑州云海信息技术有限公司 一种raid存储系统的数据发送方法及装置
CN111221681A (zh) * 2019-11-07 2020-06-02 华为技术有限公司 一种存储器的修复方法及装置

Family Cites Families (1)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN103309775B (zh) * 2013-07-03 2015-08-12 苏州科达科技股份有限公司 一种高可靠磁盘阵列的容错方法

Patent Citations (6)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US20090182785A1 (en) * 2008-01-16 2009-07-16 Bluearc Uk Limited Multi-Way Checkpoints in a Data Storage System
CN101777013A (zh) * 2009-01-12 2010-07-14 成都市华为赛门铁克科技有限公司 一种固态硬盘及数据读写方法
CN102200937A (zh) * 2011-05-31 2011-09-28 深圳创维-Rgb电子有限公司 一种与非闪存中数据读取的方法、装置及电视机系统
CN102541469A (zh) * 2011-12-13 2012-07-04 华为技术有限公司 固件存储系统中数据保护的方法、设备及系统
CN106844088A (zh) * 2017-02-20 2017-06-13 郑州云海信息技术有限公司 一种raid存储系统的数据发送方法及装置
CN111221681A (zh) * 2019-11-07 2020-06-02 华为技术有限公司 一种存储器的修复方法及装置

Also Published As

Publication number Publication date
CN111221681A (zh) 2020-06-02

Similar Documents

Publication Publication Date Title
TWI553650B (zh) 以記憶體控制器來處理資料錯誤事件之方法、設備及系統
EP1984822B1 (en) Memory transaction replay mechanism
US9542271B2 (en) Method and apparatus for reducing read latency
KR102571747B1 (ko) 데이터 저장 장치 및 그것의 동작 방법
KR20180065423A (ko) 리페어 가능한 휘발성 메모리를 포함하는 스토리지 장치 및 상기 스토리지 장치의 동작 방법
KR20140013095A (ko) 데이터 무결성을 제공하기 위한 방법 및 장치
JP2004038290A (ja) 情報処理システムおよび同システムで用いられるディスク制御方法
WO2021088368A1 (zh) 一种存储器的修复方法及装置
CN113835923A (zh) 一种复位系统、数据处理系统以及相关设备
JP2010500699A (ja) メモリデバイス内のセクタごとに許容できるビットエラー
CN111625199A (zh) 提升固态硬盘数据通路可靠性的方法、装置、计算机设备及存储介质
CN114579163A (zh) 一种磁盘固件升级方法、计算装置及系统
KR102469098B1 (ko) 불휘발성 메모리 장치, 불휘발성 메모리 장치의 동작 방법 및 이를 포함하는 데이터 저장 장치
US9043655B2 (en) Apparatus and control method
CN116264100A (zh) 快速存储器ecc错误纠正
CN114840456A (zh) 存储设备的带外管理方法、基板管理控制器和存储设备
KR20180078426A (ko) 데이터 저장 장치의 에러 정정 코드 처리 방법
CN113868000B (zh) 一种链路故障修复方法、系统及相关组件
US11822793B2 (en) Complete and fast protection against CID conflict
CN117632579B (zh) 存储器控制方法和存储器存储装置
CN113535459B (zh) 响应电源事件的数据存取方法及装置
US20230214151A1 (en) Memory system and operating method thereof
WO2024016971A1 (zh) 错误确定方法及系统、处理器、内存
US11650925B2 (en) Memory interface management
US20240194282A1 (en) Flash memory module testing method and associated memory controller and memory device

Legal Events

Date Code Title Description
121 Ep: the epo has been informed by wipo that ep was designated in this application

Ref document number: 20884364

Country of ref document: EP

Kind code of ref document: A1

NENP Non-entry into the national phase

Ref country code: DE

122 Ep: pct application non-entry in european phase

Ref document number: 20884364

Country of ref document: EP

Kind code of ref document: A1