WO2019232993A1 - Procédé et appareil de commande de flux de récupération de données adaptatif, dispositif électronique et support de stockage - Google Patents

Procédé et appareil de commande de flux de récupération de données adaptatif, dispositif électronique et support de stockage Download PDF

Info

Publication number
WO2019232993A1
WO2019232993A1 PCT/CN2018/108128 CN2018108128W WO2019232993A1 WO 2019232993 A1 WO2019232993 A1 WO 2019232993A1 CN 2018108128 W CN2018108128 W CN 2018108128W WO 2019232993 A1 WO2019232993 A1 WO 2019232993A1
Authority
WO
WIPO (PCT)
Prior art keywords
statistical period
flow control
data block
control threshold
load category
Prior art date
Application number
PCT/CN2018/108128
Other languages
English (en)
Chinese (zh)
Inventor
陈学伟
Original Assignee
平安科技(深圳)有限公司
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by 平安科技(深圳)有限公司 filed Critical 平安科技(深圳)有限公司
Publication of WO2019232993A1 publication Critical patent/WO2019232993A1/fr

Links

Images

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F3/00Input arrangements for transferring data to be processed into a form capable of being handled by the computer; Output arrangements for transferring data from processing unit to output unit, e.g. interface arrangements
    • G06F3/06Digital input from, or digital output to, record carriers, e.g. RAID, emulated record carriers or networked record carriers
    • G06F3/0601Interfaces specially adapted for storage systems
    • G06F3/0668Interfaces specially adapted for storage systems adopting a particular infrastructure
    • G06F3/067Distributed or networked storage systems, e.g. storage area networks [SAN], network attached storage [NAS]
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F3/00Input arrangements for transferring data to be processed into a form capable of being handled by the computer; Output arrangements for transferring data from processing unit to output unit, e.g. interface arrangements
    • G06F3/06Digital input from, or digital output to, record carriers, e.g. RAID, emulated record carriers or networked record carriers
    • G06F3/0601Interfaces specially adapted for storage systems
    • G06F3/0602Interfaces specially adapted for storage systems specifically adapted to achieve a particular effect
    • G06F3/061Improving I/O performance
    • G06F3/0613Improving I/O performance in relation to throughput
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F3/00Input arrangements for transferring data to be processed into a form capable of being handled by the computer; Output arrangements for transferring data from processing unit to output unit, e.g. interface arrangements
    • G06F3/06Digital input from, or digital output to, record carriers, e.g. RAID, emulated record carriers or networked record carriers
    • G06F3/0601Interfaces specially adapted for storage systems
    • G06F3/0628Interfaces specially adapted for storage systems making use of a particular technique
    • G06F3/0638Organizing or formatting or addressing of data
    • G06F3/064Management of blocks
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F3/00Input arrangements for transferring data to be processed into a form capable of being handled by the computer; Output arrangements for transferring data from processing unit to output unit, e.g. interface arrangements
    • G06F3/06Digital input from, or digital output to, record carriers, e.g. RAID, emulated record carriers or networked record carriers
    • G06F3/0601Interfaces specially adapted for storage systems
    • G06F3/0668Interfaces specially adapted for storage systems adopting a particular infrastructure
    • G06F3/0671In-line storage system
    • G06F3/0673Single storage device

Definitions

  • the present application relates to the field of computer technology, and in particular, to an adaptive data recovery flow control method, device, electronic device, and storage medium.
  • a common data redundancy strategy is to store multiple copies of data on different physical nodes. When some copies are damaged, the damaged copies can be repaired based on the intact copies.
  • a first aspect of the present application provides an adaptive data recovery flow control method, where the method includes:
  • steps d) -f) are repeatedly performed until a recovery operation is performed on data in all statistical periods of the failed storage node.
  • a second aspect of the present application provides an adaptive data recovery flow control device, where the device includes:
  • a synchronization module for regularly synchronizing information of each storage node in the distributed storage system
  • a detection module for detecting whether a storage node has failed
  • An obtaining module configured to obtain a storage list of a failed storage node when the detection module detects a failure of the storage node
  • Identification module used to identify the IO load category of the user application in the previous statistical period
  • a calculation module configured to calculate a flow control threshold corresponding to the current statistical period according to the IO load category in the previous statistical period
  • the recovery module is configured to perform a recovery operation on the data in the current statistical period of the storage node that has failed according to the storage list and the flow control threshold corresponding to the current statistical period.
  • a third aspect of the present application provides an electronic device.
  • the electronic device includes a processor and a memory, where the memory is configured to store at least one instruction, and the processor is configured to execute the at least one instruction to implement the following steps:
  • steps d) -f) are repeatedly performed until a recovery operation is performed on data in all statistical periods of the failed storage node.
  • a fourth aspect of the present application provides a non-volatile readable storage medium. At least one instruction is stored on the non-volatile readable storage medium, and when the at least one instruction is executed by a processor, the following steps are implemented:
  • steps d) -f) are repeatedly performed until a recovery operation is performed on data in all statistical periods of the failed storage node.
  • the adaptive data recovery flow control method, device, electronic device and storage medium described in the present application can divide a recovery period into multiple statistical periods, and in each statistical period, according to the user application in the previous statistical period
  • the IO load category of the device dynamically adjusts the corresponding flow control threshold in the current statistical period, and recovers the data in the current statistical period according to different flow control thresholds.
  • the IO load of user applications in the previous statistical period is high, reduce the flow control threshold for fault recovery in the current statistical period, so as to reduce the intensity of fault recovery and ensure the business IO load.
  • FIG. 1 is a flowchart of an adaptive data recovery flow control method provided in Embodiment 1 of the present application.
  • FIG. 2 is a functional block diagram of an adaptive data recovery flow control device provided in Embodiment 2 of the present application.
  • FIG. 3 is a schematic diagram of an electronic device according to a third embodiment of the present application.
  • the adaptive data recovery flow control method in the embodiment of the present application is applied to one or more electronic devices.
  • the adaptive data recovery flow control method can also be applied to a hardware environment composed of an electronic device and a server connected to the electronic device through a network.
  • the network includes, but is not limited to: a wide area network, a metropolitan area network, or a local area network.
  • the adaptive data recovery flow control method in the embodiment of the present application may be executed by a server or an electronic device; it may also be executed jointly by the server and the electronic device.
  • the adaptive data recovery flow control function provided by the method of the present application can be directly integrated on the electronic device, or an Client.
  • the method provided in this application can also be run on a device such as a server in the form of Software Development Kit (SDK), and provide an interface for adaptive data recovery flow control functions in the form of SDK, an electronic device.
  • SDK Software Development Kit
  • other devices can implement the function of adaptively controlling data recovery through the provided interface.
  • FIG. 1 is a flowchart of an adaptive data recovery flow control method provided in Embodiment 1 of the present application. According to different requirements, the execution order in this flowchart can be changed, and some steps can be omitted.
  • the distributed storage system (hereinafter referred to as a storage system) adopts a cluster storage method for distributed data storage.
  • the distributed storage is a data storage technology that uses the remaining disk space on each storage system in the cluster through the network and integrates the storage resources of these scattered remaining disk spaces to form a virtual Storage device, which stores data in various corners of the cluster.
  • each storage node described in this application is each sub storage system in the cluster.
  • the storage node may be a storage server, a computer, or a storage device.
  • the information of each storage node in the synchronized distributed storage system may include: 1) a storage center in the storage system performs information synchronization of each storage node; or 2) adopts In a decentralized method, any one storage node in the storage system initiates information synchronization of each storage node.
  • the synchronization of the information of each storage node may include, but is not limited to, synchronization of a CPU, a memory, a disk free space, and a list of stored files.
  • the storage file list records information such as the name, size, and location of data stored in each storage node.
  • the failure of the storage node may be that any one or more storage nodes in the storage system cannot be started, powered off, or disconnected from the network, or any one of the storage systems or Disks in multiple storage nodes have failed, etc. Therefore, the detecting whether a storage node is faulty includes: detecting whether any one or more storage nodes in the storage system have failed to start, power off, or disconnected from the network, or the storage system. Whether disks in any one or more storage nodes have failed, etc.
  • any one of the storage nodes in the storage system fails, such as failure to start, power off, or network disconnection, the failed storage node is disconnected from other storage nodes and / or storage centers. Therefore, the other storage nodes The node and / or storage center can detect that a storage node has failed.
  • the synchronization information sent by the failed storage node to other storage nodes and / or storage centers will include the failure information of the disk.
  • Other storage nodes and / or storage centers can detect that a storage node has failed.
  • step S13 When it is detected that a storage node has failed, step S13 is performed; when it is not detected that a storage node has failed, step S12 is continued.
  • obtaining the storage list of the storage node that has failed includes obtaining information such as the name, size, and location of data stored in the storage node that has failed.
  • a recovery period may include multiple statistical periods, and a statistical period may be a preset time period. For example, a statistical period is set to 1 second.
  • the IO load category includes: a high load category, a normal load category, and a low load category.
  • the identifying the IO load category of the user application in the previous statistical period may include:
  • the average data block size of the IO in the last statistical period may be calculated by using an arithmetic average algorithm, a geometric mean algorithm, or a root mean square algorithm.
  • N is the number of data blocks of IO
  • S i is the data block size of each IO.
  • N is the number of data blocks of IO
  • S i is the data block size of each IO.
  • N is the number of data blocks of IO
  • S i is the data block size of each IO.
  • the data block sizes of the ten IOs are: 2M, 1M, 3M, 0.5M, 10M, 4M, 0.1M, 1.2M, 5M. And 8M.
  • the transmission delay refers to the time required for a node to enter a data block from the node to the transmission medium when transmitting data, that is, the time required for a sending site to start sending data frames to the completion of data frame transmission.
  • the transmission delay of the data block may be obtained from a load measurement tool or a performance monitoring tool installed in each storage node.
  • the average data block delay of the IO in the last statistical period may also be calculated by using an arithmetic average algorithm, a geometric mean algorithm, or a root mean square algorithm. Assume that assuming that the transmission delays of ten IOs in the previous statistical period are: 1s, 0.8s, 1.5s, 0.4s, 5s, 2s, 0.02s, 0.6s, 3s, and 4.5s, then When the average IO block delay in the previous statistical period is calculated using the arithmetic mean algorithm, the result is:
  • the average data block size of the IO in the previous statistical period is calculated using the arithmetic average algorithm, the average data block delay of the IO in the previous statistical period is also calculated using the arithmetic average algorithm; if The average data block size of the IO in the previous statistical period is calculated using the geometric mean algorithm, and the average data block delay of the IO in the previous statistical period is also calculated using the geometric mean algorithm; or The average data block size of the IO is calculated using the root mean square average algorithm, and the average data block delay of the IO in the previous statistical period is also calculated using the root mean square average algorithm.
  • the reference value of the size of the IO data block and the reference value of the corresponding data block delay may be preset by an administrator of the storage system according to experience. For example, according to experience, when a 4K data block is transmitted, the delay is the smallest, and in the ideal state, it can reach 50ms, then the reference value of the IO data block size can be set to 4k, and the corresponding data block delay reference value can be set. It is 50ms.
  • the average data block size of the IO in the previous statistical period is X
  • the average data block delay is Y
  • the reference value of the data block size is M
  • the reference value of the corresponding data block delay is N
  • the calculation formula of the IO load intensity in the previous statistical period is:
  • the load classification model includes, but is not limited to, a Support Vector Machine (SVM) model.
  • SVM Support Vector Machine
  • Using the average data block size of the IO in the last statistical period, the average data block delay of the IO in the last statistical period, and the IO load intensity in the last statistical period as the load classification model The input is calculated by the load classification model, and the IO load category in the previous statistical period is output.
  • SVM Support Vector Machine
  • the training process of the load classification model includes:
  • training samples in the training sets of different load categories are distributed to different folders. For example, training samples of high load category are distributed to the first folder, training samples of normal load category are distributed to the second folder, and training samples of low load category are distributed to the third folder.
  • training samples of the first preset ratio for example, 70%
  • second preset ratios for example, 30%
  • the accuracy rate is greater than or equal to a preset accuracy rate, end training, and use the trained load classification model as a classifier to identify the IO load category in the current statistical period; if the accuracy rate is less than When the accuracy is preset, the number of positive samples and the number of negative samples are increased to retrain the load classification model until the accuracy is greater than or equal to the preset accuracy.
  • the flow control refers to flow control. There are two methods for implementing flow control: one is to implement flow control based on source address, destination address, source port, destination port, and protocol type through the QoS module of routers and switches; the other is to use professional flow control equipment Implement application-based flow control.
  • Each statistical period in the recovery period can correspond to a flow control threshold.
  • the flow control threshold corresponding to each statistical cycle is dynamically adjusted.
  • the flow control threshold corresponding to the current statistical cycle can be calculated based on the IO load category in the previous statistical cycle.
  • the flow control threshold corresponding to the next statistical cycle can be calculated according to the current statistical cycle. Calculated within the IO load category.
  • the flow control threshold corresponding to the first statistical period in the recovery period of this application is a preset flow control threshold, which can be preset by the administrator of the storage system based on experience. That is, when a preset flow control threshold is used as the flow control threshold of the first statistical period in the recovery period, the flow control threshold corresponding to the second statistical period is calculated according to the IO load category in the first statistical period; according to The IO load category in the second statistical period calculates the flow control threshold corresponding to the third statistical period; and so on.
  • calculating the flow control threshold corresponding to the current statistical period according to the IO load category in the previous statistical period may include:
  • the flow control threshold is reduced according to the first preset amplitude, so as to perform a recovery operation on the data of the storage node with a low flow control threshold in the current statistical period. Reduce the speed of data recovery to ensure efficient access to user applications.
  • the first preset amplitude may be 1/2 of a flow control threshold corresponding to a previous statistical period. That is, the flow control threshold corresponding to the current statistical period is 1/2 of the flow control threshold corresponding to the previous statistical period, and the flow control threshold corresponding to the next statistical period is 1/2 of the flow control threshold corresponding to the current statistical period.
  • the flow control threshold is increased according to the second preset amplitude to perform a recovery operation on the data of the storage node with a high flow control threshold in the current statistical period.
  • the speed of data recovery is improved.
  • the second preset amplitude may be 1.5 times a flow control threshold corresponding to a previous statistical period. That is, the flow control threshold corresponding to the current statistical period is 1.5 times the flow control threshold corresponding to the previous statistical period, and the flow control threshold corresponding to the next statistical period is 1.5 times the flow control threshold corresponding to the current statistical period.
  • the flow control threshold corresponding to the previous statistical cycle is used as the flow control threshold corresponding to the current statistical cycle.
  • step S14 When it is determined that a recovery operation is performed on data in all statistical periods of the failed storage node, the process ends; when it is determined that a recovery operation is not performed on data in all statistical periods of the failed storage node, Return to step S14 described above.
  • the adaptive data recovery flow control method described in this application periodically synchronizes information of each storage node in a distributed storage system; when a failure of a storage node is detected, the failed storage is acquired Node's storage list; identify the IO load category of the user application in the previous statistical period; calculate the flow control threshold corresponding to the current statistical period according to the IO load category in the previous statistical period; according to the storage list and the flow corresponding to the current statistical period Control a threshold, and perform a recovery operation on data in the current statistical period of the failed storage node until a recovery operation is performed on data in all statistical periods of the failed storage node.
  • This application can divide a recovery period into multiple statistical periods.
  • each statistical period dynamically adjust the corresponding flow control threshold in the current statistical period according to the IO load category applied by the user in the previous statistical period. Control the threshold to restore the data in the current statistical period.
  • the IO load of user applications in the previous statistical period is high, reduce the flow control threshold for fault recovery in the current statistical period, so as to reduce the intensity of fault recovery and ensure the business IO load.
  • user applications When the I / O load intensity is low, increase the flow control threshold for fault recovery in the current statistical period, so as to achieve the goal of increasing the fault recovery intensity and recovering the distributed storage system to a healthy state as soon as possible. That is, this application can improve the data recovery efficiency of the large-scale distributed storage system and reduce the risk of data loss, while avoiding a significant impact on the performance of normal input and output services, and has a good flow control effect.
  • the corresponding flow control threshold in the current statistical cycle is automatically adjusted dynamically according to the IO load category of the user application in the previous statistical cycle, without manual adjustment by the manager, which reduces the workload of the manager and avoids
  • the problem of inaccurate adjustment caused by subjective factors can be dynamically adjusted with changes in the distributed storage system system and its hardware facilities, and has high reliability.
  • FIG. 2 is a functional module diagram of a preferred embodiment of an adaptive data recovery flow control device of the present application.
  • the adaptive data recovery flow control device 20 (hereinafter referred to as "data recovery flow control device 20") runs in an electronic device.
  • the data recovery flow control device 20 may include a plurality of functional modules composed of program code segments.
  • the program code of each program segment in the data recovery flow control device 20 may be stored in a memory and executed by at least one processor to execute (see FIG. 1 and related description for details) adaptive data recovery flow control. method.
  • the data recovery flow control device 20 of the electronic device may be divided into a plurality of functional modules according to functions performed by the device.
  • the functional modules may include a synchronization module 201, a detection module 202, an acquisition module 203, an identification module 204, a training module 205, a calculation module 206 / recovery module 207, and a judgment module 208.
  • the module referred to in the present application refers to a series of computer-readable instruction segments capable of being executed by at least one processor and capable of performing fixed functions, which are stored in a memory. In some embodiments, functions of each module will be described in detail in subsequent embodiments.
  • the synchronization module 201 is configured to periodically synchronize information of each storage node in the distributed storage system.
  • the distributed storage system (hereinafter referred to as a storage system) adopts a cluster storage method for distributed data storage.
  • the distributed storage is a data storage technology that uses the remaining disk space on each storage system in the cluster through the network and integrates the storage resources of these scattered remaining disk spaces to form a virtual Storage device, which stores data in various corners of the cluster.
  • each storage node described in this application is each sub storage system in the cluster.
  • the storage node may be a storage server, a computer, or a storage device.
  • the synchronization module 201 synchronizing information of each storage node in the distributed storage system may include: 1) a storage center in the storage system performs information synchronization of each storage node; or 2) Using a decentralized method, any one storage node in the storage system initiates information synchronization of each storage node.
  • the synchronization of the information of each storage node may include, but is not limited to, synchronization of a CPU, a memory, a disk free space, and a list of stored files.
  • the storage file list records information such as the name, size, and location of data stored in each storage node.
  • the detection module 202 is configured to detect whether a storage node has failed.
  • the failure of the storage node may be that any one or more storage nodes in the storage system cannot be started, powered off, or disconnected from the network, or any one of the storage systems or Disks in multiple storage nodes have failed, etc. Therefore, the detection module 202 detects whether a storage node has failed, including: detecting whether any one or more storage nodes in the storage system have failed to start, power off, or disconnected from the network; Describes whether the disks in any one or more storage nodes in the storage system have failed.
  • any one of the storage nodes in the storage system fails, such as failure to start, power off, or network disconnection, the failed storage node is disconnected from other storage nodes and / or storage centers. Therefore, the other storage nodes The node and / or storage center can detect that a storage node has failed.
  • the synchronization information sent by the failed storage node to other storage nodes and / or storage centers will include the failure information of the disk.
  • Other storage nodes and / or storage centers can detect that a storage node has failed.
  • An obtaining module 203 is configured to obtain a storage list of a storage node that has failed when the detection module 202 detects that a storage node has failed.
  • obtaining the storage list of the storage node that has failed includes obtaining information such as the name, size, and location of data stored in the storage node that has failed.
  • the identification module 204 is configured to identify an IO load category of a user application in a previous statistical period.
  • a recovery period may include multiple statistical periods, and a statistical period may be a preset time period. For example, a statistical period is set to 1 second.
  • the IO load category includes: a high load category, a normal load category, and a low load category.
  • the identification module 204 identifying the IO load category of the user application in the previous statistical period may include:
  • the average data block size of the IO in the last statistical period may be calculated by using an arithmetic average algorithm, a geometric mean algorithm, or a root mean square algorithm.
  • N is the number of data blocks of IO
  • S i is the data block size of each IO.
  • N is the number of data blocks of IO
  • S i is the data block size of each IO.
  • N is the number of data blocks of IO
  • S i is the data block size of each IO.
  • the data block sizes of the ten IOs are: 2M, 1M, 3M, 0.5M, 10M, 4M, 0.1M, 1.2M, 5M And 8M.
  • the transmission delay refers to the time required for a node to enter a data block from the node to the transmission medium when transmitting data, that is, the time required for a sending site to start sending data frames to the completion of data frame transmission The total time required for a receiving station, or the time required for a receiving station to start receiving data frames and finish receiving data frames.
  • the transmission delay of the data block may be obtained from a load measurement tool or a performance monitoring tool installed in each storage node.
  • the average data block delay of the IO in the last statistical period may also be calculated by using an arithmetic average algorithm, a geometric mean algorithm, or a root mean square algorithm. Assume that assuming that the transmission delays of ten IOs in the previous statistical period are: 1s, 0.8s, 1.5s, 0.4s, 5s, 2s, 0.02s, 0.6s, 3s, and 4.5s, then When the average IO block delay in the previous statistical period is calculated using the arithmetic mean algorithm, the result is:
  • the average data block size of the IO in the previous statistical period is calculated using the arithmetic average algorithm, the average data block delay of the IO in the previous statistical period is also calculated using the arithmetic average algorithm; if The average data block size of the IO in the previous statistical period is calculated using the geometric mean algorithm, and the average data block delay of the IO in the previous statistical period is also calculated using the geometric mean algorithm; or The average data block size of the IO is calculated using the root mean square average algorithm, and the average data block delay of the IO in the previous statistical period is also calculated using the root mean square average algorithm.
  • the reference value of the size of the IO data block and the reference value of the corresponding data block delay may be preset by an administrator of the storage system according to experience. For example, according to experience, when a 4K data block is transmitted, the delay is the smallest, and in the ideal state, it can reach 50ms, then the reference value of the IO data block size can be set to 4k, and the corresponding data block delay reference value can be set. It is 50ms.
  • the average data block size of the IO in the previous statistical period is X
  • the average data block delay is Y
  • the reference value of the data block size is M
  • the reference value of the corresponding data block delay is N
  • the calculation formula of the IO load intensity in the previous statistical period is:
  • the load classification model includes, but is not limited to, a Support Vector Machine (SVM) model.
  • SVM Support Vector Machine
  • Using the average data block size of the IO in the last statistical period, the average data block delay of the IO in the last statistical period, and the IO load intensity in the last statistical period as the load classification model The input is calculated by the load classification model, and the IO load category in the previous statistical period is output.
  • SVM Support Vector Machine
  • the training module 205 is configured to train the load classification model.
  • the process of the training module 205 training the load classification model includes:
  • training samples in the training sets of different load categories are distributed to different folders. For example, training samples of high load category are distributed to the first folder, training samples of normal load category are distributed to the second folder, and training samples of low load category are distributed to the third folder.
  • training samples of the first preset ratio for example, 70%
  • second preset ratios for example, 30%
  • the accuracy rate is greater than or equal to a preset accuracy rate, end training, and use the trained load classification model as a classifier to identify the IO load category in the current statistical period; if the accuracy rate is less than When the accuracy is preset, the number of positive samples and the number of negative samples are increased to retrain the load classification model until the accuracy is greater than or equal to the preset accuracy.
  • the calculation module 206 is configured to calculate a flow control threshold corresponding to the current statistical period according to the IO load category in the previous statistical period.
  • the flow control refers to flow control. There are two methods for implementing flow control: one is to implement flow control based on source address, destination address, source port, destination port, and protocol type through the QoS module of routers and switches; the other is to use professional flow control equipment Implement application-based flow control.
  • Each statistical period in the recovery period can correspond to a flow control threshold.
  • the flow control threshold corresponding to each statistical cycle is dynamically adjusted.
  • the flow control threshold corresponding to the current statistical cycle can be calculated based on the IO load category in the previous statistical cycle.
  • the flow control threshold corresponding to the next statistical cycle can be calculated according to the current statistical cycle. Calculated within the IO load category.
  • the flow control threshold corresponding to the first statistical period in the recovery period of this application is a preset flow control threshold, which can be preset by the administrator of the storage system based on experience. That is, when a preset flow control threshold is used as the flow control threshold of the first statistical period in the recovery period, the flow control threshold corresponding to the second statistical period is calculated according to the IO load category in the first statistical period; according to The IO load category in the second statistical period calculates the flow control threshold corresponding to the third statistical period; and so on.
  • the calculating module 206 calculating the flow control threshold corresponding to the current statistical period according to the IO load category in the previous statistical period may include:
  • the flow control threshold is reduced according to the first preset amplitude, so as to perform a recovery operation on the data of the storage node with a low flow control threshold in the current statistical period. Reduce the speed of data recovery to ensure efficient access to user applications.
  • the first preset amplitude may be 1/2 of a flow control threshold corresponding to a previous statistical period. That is, the flow control threshold corresponding to the current statistical period is 1/2 of the flow control threshold corresponding to the previous statistical period, and the flow control threshold corresponding to the next statistical period is 1/2 of the flow control threshold corresponding to the current statistical period.
  • the flow control threshold is increased according to the second preset amplitude to perform a recovery operation on the data of the storage node with a high flow control threshold in the current statistical period.
  • the speed of data recovery is improved.
  • the second preset amplitude may be 1.5 times a flow control threshold corresponding to a previous statistical period. That is, the flow control threshold corresponding to the current statistical period is 1.5 times the flow control threshold corresponding to the previous statistical period, and the flow control threshold corresponding to the next statistical period is 1.5 times the flow control threshold corresponding to the current statistical period.
  • the flow control threshold corresponding to the previous statistical cycle is used as the flow control threshold corresponding to the current statistical cycle.
  • the recovery module 207 is configured to perform a recovery operation on the data in the current statistical period of the storage node that has failed according to the storage list and the flow control threshold corresponding to the current statistical period.
  • the determining module 208 is configured to determine whether a recovery operation is performed on data in all statistical periods of the faulty storage node.
  • the judging module 208 determines that the recovery operation is not performed on the data in all the statistical cycles of the failed storage node, it returns to execute the aforementioned identifying module 204.
  • the synchronization module 201 periodically synchronizes information of each storage node in the distributed storage system; the acquisition module 203 detects a storage node in the detection module 202 When a failure occurs, obtain the storage list of the storage node that failed; the identification module 204 identifies the IO load category of the user application in the previous statistical period; the calculation module 206 calculates the corresponding IO load category in the previous statistical period Flow control threshold; the recovery module 207 performs a recovery operation on data in the current statistical period of the failed storage node according to the storage list and the flow control threshold corresponding to the current statistical period, until the failed storage node Perform recovery operations on all data in the statistical period.
  • This application can divide a recovery period into multiple statistical periods. In each statistical period, dynamically adjust the corresponding flow control threshold in the current statistical period according to the IO load category applied by the user in the previous statistical period. Control the threshold to restore the data in the current statistical period.
  • the IO load of user applications in the previous statistical period is high, reduce the flow control threshold for fault recovery in the current statistical period, so as to reduce the intensity of fault recovery and ensure the business IO load.
  • the previous statistical period user applications When the I / O load intensity is low, increase the flow control threshold for fault recovery in the current statistical period, so as to achieve the goal of increasing the fault recovery intensity and recovering the distributed storage system to a healthy state as soon as possible. That is, this application can improve the data recovery efficiency of a large-scale distributed storage system and reduce the risk of data loss, while avoiding a significant impact on normal I / O business performance, and has a good flow control effect.
  • the corresponding flow control threshold in the current statistical cycle is automatically adjusted dynamically according to the IO load category of the user application in the previous statistical cycle, without manual adjustment by the manager, which reduces the workload of the manager and avoids
  • the problem of inaccurate adjustment caused by subjective factors can be dynamically adjusted with changes in the distributed storage system system and its hardware facilities, and has high reliability.
  • the above integrated unit implemented in the form of a software functional module may be stored in a non-volatile readable storage medium.
  • the above software function module is stored in a storage medium and includes several instructions for causing a computer device (which may be a personal computer, a dual-screen device, or a network device) or a processor to execute the embodiments described in this application. Part of the method.
  • FIG. 3 is a schematic diagram of an electronic device provided in Embodiment 5 of the present application.
  • the electronic device 3 includes a memory 31, at least one processor 32, computer-readable instructions 33 stored in the memory 31 and executable on the at least one processor 32, and at least one communication bus 34.
  • the computer-readable instructions 33 may be divided into one or more modules / units, and the one or more modules / units are stored in the memory 31 and processed by the at least one processor 32 Execute to complete this application.
  • the one or more modules / units may be a series of computer-readable instruction segments capable of performing specific functions, and the instruction segments are used to describe the execution process of the computer-readable instructions 33 in the electronic device 3.
  • the electronic device 3 may be a computing device such as a desktop computer, a notebook, a palmtop computer, and a cloud server.
  • a computing device such as a desktop computer, a notebook, a palmtop computer, and a cloud server.
  • the schematic diagram 3 is only an example of the electronic device 3, and does not constitute a limitation on the electronic device 3. It may include more or less components than shown in the figure, or some components may be combined or different
  • the electronic device 3 may further include an input / output device, a network access device, a bus, and the like.
  • the at least one processor 32 may be a central processing unit (CPU), or other general-purpose processors, digital signal processors (DSPs), and application-specific integrated circuits (ASICs). ), Ready-made programmable gate array (Field-Programmable Gate Array, FPGA) or other programmable logic devices, discrete gate or transistor logic devices, discrete hardware components, etc.
  • the processor 32 may be a microprocessor or the processor 32 may be any conventional processor.
  • the processor 32 is a control center of the electronic device 3, and uses various interfaces and lines to connect the entire electronic device 3. The various parts.
  • the memory 31 may be configured to store the computer-readable instructions 33 and / or modules / units, and the processor 32 may execute or execute the computer-readable instructions and / or modules / units stored in the memory 31, and
  • the data stored in the memory 31 is called to implement various functions of the electronic device 3.
  • the memory 31 may mainly include a storage program area and a storage data area, where the storage program area may store an operating system, application programs required for at least one function (such as a sound playback function, an image playback function, etc.), etc .; Data (such as audio data, phone book, etc.) created according to the use of the electronic device 3 are stored.
  • the memory 31 may include a high-speed random access memory, and may also include a non-volatile memory, such as a hard disk, an internal memory, a plug-in hard disk, a Smart Media Card (SMC), and a Secure Digital (SD). Card, flash memory card (Flash card), at least one disk storage device, flash memory device, or other volatile solid-state storage device.
  • a non-volatile memory such as a hard disk, an internal memory, a plug-in hard disk, a Smart Media Card (SMC), and a Secure Digital (SD).
  • SSD Secure Digital
  • flash memory card Flash card
  • flash memory device at least one disk storage device, flash memory device, or other volatile solid-state storage device.
  • the integrated module / unit of the electronic device 3 When the integrated module / unit of the electronic device 3 is implemented in the form of a software functional unit and sold or used as an independent product, it can be stored in a non-volatile readable storage medium. Based on this understanding, this application implements all or part of the processes in the methods of the above embodiments, and can also be completed by computer-readable instructions to instruct related hardware.
  • the computer-readable instructions can be stored in a non-volatile memory. In the read storage medium, when the computer-readable instructions are executed by a processor, the steps of the foregoing method embodiments can be implemented.
  • the computer-readable instructions may be in a source code form, an object code form, an executable file, or some intermediate form.
  • the non-volatile readable medium may include: any entity or device capable of carrying the computer-readable instruction code, a recording medium, a U disk, a mobile hard disk, a magnetic disk, an optical disk, a computer memory, a read-only memory (ROM, Read-Only Memory), Random Access Memory (RAM, Random Access Memory), electric carrier signals, telecommunication signals, and software distribution media.
  • ROM Read-Only Memory
  • RAM Random Access Memory
  • electric carrier signals telecommunication signals
  • telecommunication signals and software distribution media.
  • the content contained in the non-volatile readable medium may be appropriately increased or decreased according to the requirements of legislation and patent practices in the jurisdictions. For example, in some jurisdictions, according to legislation and patent practices, non- Volatile readable media does not include electrical carrier signals and telecommunication signals.
  • each functional unit in each embodiment of the present application may be integrated in the same processing unit, or each unit may exist separately physically, or two or more units may be integrated in the same unit.
  • the integrated unit can be implemented in the form of hardware, or in the form of hardware plus software functional modules.

Landscapes

  • Engineering & Computer Science (AREA)
  • Theoretical Computer Science (AREA)
  • Human Computer Interaction (AREA)
  • Physics & Mathematics (AREA)
  • General Engineering & Computer Science (AREA)
  • General Physics & Mathematics (AREA)
  • Information Retrieval, Db Structures And Fs Structures Therefor (AREA)
  • Data Exchanges In Wide-Area Networks (AREA)

Abstract

L'invention concerne un procédé de commande de flux de récupération de données adaptatif, ledit procédé consistant à : synchroniser périodiquement des informations de nœuds de stockage dans un système de stockage distribué (S11) ; lorsqu'il est détecté qu'un nœud de stockage a échoué (S12), acquérir une liste de stockage du nœud de stockage défaillant (S13) ; identifier une catégorie de charge ES d'une application utilisateur dans une période statistique précédente (S14) ; calculer, selon la catégorie de charge ES dans la période statistique précédente, une valeur seuil de commande de flux correspondant à la période statistique actuelle (S15) ; selon la liste de stockage et la valeur de seuil de commande de flux correspondant à la période statistique actuelle, exécuter une opération de récupération sur les données du nœud de stockage défaillant dans la période statistique actuelle (S16) ; déterminer si l'opération de récupération est effectuée sur les données du nœud de stockage défaillant dans toutes les périodes statistiques et, le cas échéant, mettre fin au processus (S17). L'invention concerne également un appareil de commande de flux de récupération de données adaptatif, ainsi qu'un dispositif électronique et un support de stockage. Selon le procédé, l'impact évident sur les performances de service d'entrée et de sortie normales peut être évité, tandis que l'efficacité de récupération des données du système de stockage distribué à grande échelle est améliorée et le risque de perte de données est réduit. Le procédé permet également une commande efficace des flux.
PCT/CN2018/108128 2018-06-04 2018-09-27 Procédé et appareil de commande de flux de récupération de données adaptatif, dispositif électronique et support de stockage WO2019232993A1 (fr)

Applications Claiming Priority (2)

Application Number Priority Date Filing Date Title
CN201810565004.2 2018-06-04
CN201810565004.2A CN108804039B (zh) 2018-06-04 2018-06-04 自适应的数据恢复流控方法、装置、电子设备及存储介质

Publications (1)

Publication Number Publication Date
WO2019232993A1 true WO2019232993A1 (fr) 2019-12-12

Family

ID=64087212

Family Applications (1)

Application Number Title Priority Date Filing Date
PCT/CN2018/108128 WO2019232993A1 (fr) 2018-06-04 2018-09-27 Procédé et appareil de commande de flux de récupération de données adaptatif, dispositif électronique et support de stockage

Country Status (2)

Country Link
CN (1) CN108804039B (fr)
WO (1) WO2019232993A1 (fr)

Families Citing this family (9)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US10963332B2 (en) * 2018-12-17 2021-03-30 Western Digital Technologies, Inc. Data storage systems and methods for autonomously adapting data storage system performance, capacity and/or operational requirements
CN110120973A (zh) * 2019-04-28 2019-08-13 华为技术有限公司 一种请求控制方法、相关设备及计算机存储介质
CN110516117A (zh) * 2019-07-22 2019-11-29 平安科技(深圳)有限公司 图计算的类别型变量存储方法、装置、设备及存储介质
CN110750213A (zh) * 2019-09-09 2020-02-04 华为技术有限公司 一种硬盘管理方法及装置
CN110673977B (zh) * 2019-09-27 2022-06-07 浪潮电子信息产业股份有限公司 一种数据恢复的优化方法、装置、设备及介质
CN111258816B (zh) * 2020-01-17 2023-08-08 西安奥卡云数据科技有限公司 Rpo调整方法、装置及计算机可读存储介质
CN113377861B (zh) * 2020-02-25 2023-04-07 中移(苏州)软件技术有限公司 分布式存储系统的重构方法、装置、设备和存储介质
CN114064362B (zh) * 2021-11-16 2022-08-05 北京志凌海纳科技有限公司 用于分布式存储中的数据恢复方法、系统及计算机可读存储介质
CN116627362B (zh) * 2023-07-26 2023-09-22 大汉电子商务有限公司 一种基于分布式储存的财务数据处理方法

Citations (4)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US20130111172A1 (en) * 2011-10-31 2013-05-02 International Business Machines Corporation Data Migration Between Storage Devices
CN105930498A (zh) * 2016-05-06 2016-09-07 中国银联股份有限公司 一种分布式数据库的管理方法及系统
CN106201354A (zh) * 2016-07-12 2016-12-07 乐视控股(北京)有限公司 数据存储方法及系统
CN107544862A (zh) * 2016-06-29 2018-01-05 中兴通讯股份有限公司 一种基于纠删码的存储数据重构方法和装置、存储节点

Patent Citations (4)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US20130111172A1 (en) * 2011-10-31 2013-05-02 International Business Machines Corporation Data Migration Between Storage Devices
CN105930498A (zh) * 2016-05-06 2016-09-07 中国银联股份有限公司 一种分布式数据库的管理方法及系统
CN107544862A (zh) * 2016-06-29 2018-01-05 中兴通讯股份有限公司 一种基于纠删码的存储数据重构方法和装置、存储节点
CN106201354A (zh) * 2016-07-12 2016-12-07 乐视控股(北京)有限公司 数据存储方法及系统

Also Published As

Publication number Publication date
CN108804039B (zh) 2021-01-29
CN108804039A (zh) 2018-11-13

Similar Documents

Publication Publication Date Title
WO2019232993A1 (fr) Procédé et appareil de commande de flux de récupération de données adaptatif, dispositif électronique et support de stockage
WO2019232926A1 (fr) Procédé et appareil de commande de flux et de vérification de cohérence de données, dispositif électronique et support de stockage
US10261853B1 (en) Dynamic replication error retry and recovery
US10289451B2 (en) Method, apparatus, and system for adjusting deployment location of virtual machine
US9665420B2 (en) Causal engine and correlation engine based log analyzer
US8726045B2 (en) Automated power topology discovery by detecting communication over power line and associating PDU port identifier to computing device
CN108633311B (zh) 一种基于调用链的并发控制的方法、装置及控制节点
WO2017162011A1 (fr) Procédé et dispositif de traitement de données de performances d'élément de réseau, et nms
US20190163371A1 (en) Next generation storage controller in hybrid environments
WO2019232927A1 (fr) Procédé et appareil de régulation de flux de suppression de données distribuées, dispositif électronique, et support de stockage
CN109828960B (zh) 日志库扩容方法、系统、计算机装置及可读存储介质
US11863439B2 (en) Method, apparatus and storage medium for application identification
CN110708369B (zh) 设备节点的文件部署方法、装置、调度服务器及存储介质
CN111880967A (zh) 云场景下的文件备份方法、装置、介质和电子设备
TWI537829B (zh) 回復虛擬機器影像之一先前版本之分法、系統及電腦程式產品
WO2022078347A1 (fr) Procédé et appareil de planification de tâches, dispositif électronique et support de stockage
US20240143456A1 (en) Log replay methods and apparatuses, data recovery methods and apparatuses, and electronic devices
CN108763107B (zh) 后台写盘流控方法、装置、电子设备及存储介质
US10896056B2 (en) Cluster expansion method and apparatus, electronic device and storage medium
US20190044835A1 (en) Technologies for filtering network packets on ingress
WO2019232925A1 (fr) Procédé et appareil de commande de flux de migration de données de point d'accès sans fil, et dispositif électronique et support de données
CN112073327A (zh) 一种抗拥塞的软件分流方法、装置及存储介质
CN109298974B (zh) 系统控制方法、装置、计算机及计算机可读存储介质
CN108471387B (zh) 一种日志流量分散控制方法及系统
CN110704382B (zh) 文件部署方法、装置、服务器及存储介质

Legal Events

Date Code Title Description
121 Ep: the epo has been informed by wipo that ep was designated in this application

Ref document number: 18921992

Country of ref document: EP

Kind code of ref document: A1

NENP Non-entry into the national phase

Ref country code: DE

122 Ep: pct application non-entry in european phase

Ref document number: 18921992

Country of ref document: EP

Kind code of ref document: A1