WO2020140369A1 - 数据恢复控制方法、服务器及存储介质 - Google Patents

数据恢复控制方法、服务器及存储介质 Download PDF

Info

Publication number
WO2020140369A1
WO2020140369A1 PCT/CN2019/088629 CN2019088629W WO2020140369A1 WO 2020140369 A1 WO2020140369 A1 WO 2020140369A1 CN 2019088629 W CN2019088629 W CN 2019088629W WO 2020140369 A1 WO2020140369 A1 WO 2020140369A1
Authority
WO
WIPO (PCT)
Prior art keywords
data recovery
server cluster
server
preset
concurrency
Prior art date
Application number
PCT/CN2019/088629
Other languages
English (en)
French (fr)
Inventor
兰东平
Original Assignee
平安科技(深圳)有限公司
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by 平安科技(深圳)有限公司 filed Critical 平安科技(深圳)有限公司
Publication of WO2020140369A1 publication Critical patent/WO2020140369A1/zh

Links

Images

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F11/00Error detection; Error correction; Monitoring
    • G06F11/07Responding to the occurrence of a fault, e.g. fault tolerance
    • G06F11/14Error detection or correction of the data by redundancy in operation

Definitions

  • This application relates to the field of data processing, and in particular to a data recovery control method, server, and storage medium.
  • the present application provides a data recovery control method, server, and storage medium. Its purpose is to flexibly adjust the data recovery method according to the current load of the system to ensure that it does not affect the normal use of the system.
  • a data recovery control method which includes:
  • Receiving step detect and receive the data recovery request sent by the client
  • Prediction step In response to the data recovery request, obtain the historical values of the number of reads/writes per second (Input/OutputOperations, Per Second, IOPS) and throughput of the server cluster to which the server belongs within a preset time, and predict based on the historical values The predicted values of IOPS and throughput of the server cluster, according to the predicted values and preset detection rules, determine whether the server cluster is busy;
  • IOPS Input/OutputOperations, Per Second
  • Adjustment step when it is judged that the server cluster is not busy, calculate the number of concurrent data restorations that can be increased by the server cluster according to the predicted values of IOPS and throughput, and adjust the data restoration concurrentness of the server cluster according to the calculation result Number, when it is determined that the server cluster is busy, adjust the data recovery concurrent number of the server cluster according to a preset concurrency reduction rule, and perform data on the storage node related to the data recovery request according to the adjusted data recovery concurrent number restore.
  • the present application also provides a server that communicates with a client and a storage node.
  • the server includes: a memory and a processor, and a data recovery control program is stored on the memory, and the data recovery control program is used by The processor executes the following steps:
  • Receiving step detect and receive the data recovery request sent by the client
  • Prediction step in response to the data recovery request, obtain historical values of IOPS and throughput of the server cluster to which the server belongs within a preset time, and predict the predicted values of IOPS and throughput of the server cluster according to the historical values, Determine whether the server cluster is busy according to the predicted value and preset detection rules;
  • Adjustment step when it is judged that the server cluster is not busy, calculate the number of concurrent data restorations that can be increased by the server cluster according to the predicted values of IOPS and throughput, and adjust the data restoration concurrentness of the server cluster according to the calculation result Number, when it is determined that the server cluster is busy, adjust the data recovery concurrent number of the server cluster according to a preset concurrency reduction rule, and perform data on the storage node related to the data recovery request according to the adjusted data recovery concurrent number restore.
  • the present application also provides a computer-readable storage medium, the computer-readable storage medium includes a data recovery control program, and when the data recovery control program is executed by the processor, the data recovery described above can be achieved Any step in the control method.
  • the data recovery control method, server and storage medium proposed in this application solve the technical problem of occupying a large amount of IO during automatic data recovery of the distributed storage system, and can improve the reading efficiency and recovery efficiency of the distributed storage system data, which can ensure the system
  • the normal operation of the system can prevent the loss of data copies and the risk of data loss.
  • FIG. 1 is a schematic diagram of a preferred embodiment of the application server
  • FIG. 2 is a schematic block diagram of a preferred embodiment of the data recovery control program in FIG. 1;
  • FIG. 3 is a flowchart of a preferred embodiment of a data recovery control method of this application.
  • This application provides a server 1.
  • FIG. 1 it is a schematic diagram of a preferred embodiment of the server 1 of the present application.
  • the server 1 includes but is not limited to a memory 11 and a processor 12.
  • the memory 11 includes at least one type of readable storage medium, and the readable storage medium includes a flash memory, a hard disk, a multimedia card, a card-type memory (for example, SD or DX memory, etc.), and a random access memory (RAM) , Static random access memory (SRAM), read-only memory (ROM), electrically erasable programmable read-only memory (EEPROM), programmable read-only memory (PROM), magnetic memory, magnetic disk, optical disk, etc.
  • the memory 11 may be an internal storage unit of the server 1, such as a hard disk or a memory of the server 1.
  • the memory 11 may also be an external storage device of the server 1, such as a plug-in hard disk equipped with the server 1, a smart memory card (Smart, Media, Card, SMC), and secure digital (Secure Digital) , SD) card, flash card (Flash Card), etc.
  • the memory 11 may also include both the internal storage unit of the server 1 and its external storage device.
  • the memory 11 is generally used to store an operating system installed in the server 1 and various application software, such as program codes of the data recovery control program 10.
  • the memory 11 can also be used to temporarily store various types of data that have been output or will be output.
  • the processor 12 may be a central processing unit (CPU), controller, microcontroller, microprocessor, or other data processing chip.
  • the processor 12 is generally used to control the overall operation of the server 1, for example, to perform control and processing related to data interaction or communication.
  • the processor 12 is used to run the program code stored in the memory 11 or process data, for example, the program code of the data recovery control program 10.
  • FIG. 1 only shows the server 1 having the components 11-12 and the data recovery control program 10, but it should be understood that it is not required to implement all the components shown, and more or fewer components may be implemented instead.
  • the server 1 may further include a user interface.
  • the user interface may include a display and an input unit such as a keyboard.
  • the optional user interface may also include a standard wired interface and a wireless interface.
  • the display may be an LED display, a liquid crystal display, a touch-sensitive liquid crystal display, an organic light-emitting diode (OLED) touch device, or the like.
  • the display may also be appropriately called a display screen or a display unit, for displaying information processed in the server 1 and for displaying a visual user interface.
  • the server 1 may also include a radio frequency (Radio Frequency, RF) circuit, a sensor, an audio circuit, etc., which will not be repeated here.
  • RF Radio Frequency
  • the server 1 is any server in the server cluster of the distributed storage system.
  • a distributed storage system usually includes multiple servers, clients that communicate with the server, and a series of storage nodes.
  • Each storage node may be a storage device, such as a hard disk, magnetic disk, or other network storage device, or may be an electronic device that provides storage space, such as a personal computer, server, and so on.
  • data is evenly distributed among storage nodes in the form of multiple copies. When the data of a storage node is damaged, data recovery is performed through backup copies of other storage nodes.
  • the processor 12 may implement the following steps when executing the data recovery control program 10 stored in the memory 11:
  • Receiving step detect and receive the data recovery request sent by the client
  • Prediction step in response to the data recovery request, obtain historical values of IOPS and throughput of the server cluster to which the server belongs within a preset time, and predict the predicted values of IOPS and throughput of the server cluster according to the historical values, Determine whether the server cluster is busy according to the predicted value and preset detection rules;
  • Adjustment step when it is judged that the server cluster is not busy, calculate the number of concurrent data restorations that can be increased by the server cluster according to the predicted values of IOPS and throughput, and adjust the data restoration concurrentness of the server cluster according to the calculation result Number, when it is determined that the server cluster is busy, adjust the data recovery concurrent number of the server cluster according to a preset concurrency reduction rule, and perform data on the storage node related to the data recovery request according to the adjusted data recovery concurrent number restore.
  • the data recovery control program 10 may be divided into multiple modules, which are stored in the memory 12 and executed by the processor 13 to complete the application.
  • the module referred to in this application refers to a series of computer program instruction segments capable of performing specific functions.
  • FIG. 2 it is a program module diagram of an embodiment of the data recovery control program 10 in FIG. 1.
  • the data recovery control program 10 may be divided into: a receiving module 110, a prediction module 120, and an adjustment module 130.
  • the receiving module 110 is used to detect and receive the data recovery request sent by the client.
  • a storage node failure rate prediction step may also be included before receiving the data recovery request.
  • the receiving module 110 regularly monitors the preset index of the monitored storage node, inputs the index value of the preset index into a pre-trained storage node failure prediction model, and determines the storage node failure rate according to the model output result.
  • first warning information is sent to the client, and the client sends a data recovery request to the server cluster in response to the first warning information.
  • the storage node when the storage node is a hard disk, obtain all SMART information of the hard disk, extract the state of the key SMART attribute type or the value of the key SMART attribute value from the SMART information; according to the key SMART attribute type or the key SMART attribute
  • the mapping relationship between the value and the value determines the index value corresponding to each preset index, and generates a one-dimensional matrix according to the predetermined index order as the model input.
  • the random forest algorithm is used to train and model historical data to generate a fault prediction model and improve the fault prediction rate. You can also determine the hard disk classification according to the hard disk supplier information, and select the corresponding hard disk failure prediction model corresponding to the classification to predict the hard disk failure rate.
  • the receiving module 110 further performs user identity information authentication on the user of the client who initiated the data recovery request.
  • the user identity information authentication passes the subsequent steps, and the user identity information authentication fails. Describe the data recovery request and generate second warning information.
  • the receiving module 110 matches the user's identity information with the preset white list with the requested permission. When there is data matching the user's identity information in the white list, the user is deemed to have the permission; when the white list does not exist with the user When the identity information matches the data, it is assumed that the user does not have permission. For example, the receiving module 110 obtains the device identifier included in the file restoration instruction, and determines whether the device identifier is a pre-bound whitelist. If so, the request is considered normal, and if not, the request is considered abnormal. For another example, the receiving module 110 obtains the user's identity information, and determines whether the user has the right to the data recovery request according to the user's identity information. If so, the prediction step is continued. If not, the data recovery request is rejected and the first 2. Early warning information.
  • the prediction module 120 is configured to obtain the historical values of IOPS and throughput of the server cluster within a preset time in response to the data recovery request, and predict the IOPS and throughput of the server cluster according to the historical values , Based on the predicted value and the preset detection rule, to determine whether the server cluster is busy.
  • an exponential smoothing algorithm can be used to calculate the predicted values of IOPS and throughput.
  • St is the smoothed value of time t
  • yt is the actual value of time t
  • St-1 is the smoothed value of time t-1
  • a is the smoothing constant, and its value range is [0,1]
  • the preset detection rules include:
  • the trigger threshold of the IOPS and throughput prediction values is derived from the statistical analysis results of the peak data of IOPS and throughput. Any index greater than or equal to the threshold will trigger control.
  • Threshold value of each index upper limit of cluster index performance-peak statistical analysis result.
  • the adjustment module 130 is configured to calculate the number of concurrent concurrent data restorations that can be increased by the server cluster according to the predicted values of IOPS and throughput when it is determined that the server cluster is not busy, and adjust the number of concurrent data recovery according to the calculation result
  • the data recovery concurrent number of the server cluster when it is judged that the server cluster is busy, adjust the data recovery concurrent number of the server cluster according to a preset concurrency reduction rule, and respond to the data recovery request according to the adjusted data recovery concurrent number Recover data from related storage nodes.
  • the increaseable data recovery concurrent number (throughput upper limit value-throughput predicted value) * response time.
  • Throughput is the number of requests processed by the system per unit time
  • concurrent number is the number of requests that the system can process at the same time
  • response time is the average response time.
  • adjusting the concurrent number of data recovery of the server cluster includes increasing the number of concurrent files participating in the recovery on the monitored storage node. For example, when the server cluster is not busy, 10 files on the monitored storage node are performing recovery operations at the same time, and the number of concurrent file recovery can be increased to 5, then the storage node can be set to allow 15 File recovery operation, this situation will speed up the data recovery to a certain extent.
  • the preset concurrency reduction rules include reducing the data recovery rate of the monitored storage node, reducing the number of concurrent files participating in the recovery on the monitored disk, or reducing the monitored storage nodes participating in the recovery at the same time Quantity.
  • reducing the file recovery rate of the hard disk may be set to let the recovery thread sleep for 100ms after each file is recovered, that is, to perform the next file recovery operation after 100ms.
  • the recovery time of each file is longer, by reducing the rate of recovering files on a single disk and reducing the number of concurrent files participating in recovery on a single disk
  • the effect is not good.
  • the occupancy has an immediate effect.
  • the adjustment module 130 may further include: acquiring the real-time access delay of the server cluster, calculating a decrease in access delay, and when the decrease in access delay is greater than a third preset threshold, then Maintain the current data recovery speed, and when the decrease in the access delay is less than the third preset threshold, adjust the number of concurrent data recovery of the server cluster according to the preset concurrency reduction rule.
  • the real-time access delay of the system is obtained, and the decrease B of the access delay is calculated.
  • the calculation formula is:
  • the historical access delay is the access delay before the recovery concurrent adjustment is performed
  • the real-time access delay is the access delay after the recovery concurrent adjustment is performed.
  • FIG. 3 it is a flowchart of a preferred embodiment of the data recovery control method of the present application.
  • Step S10 Detect and receive the data recovery request sent by the client.
  • the server Before receiving the data recovery instruction, the server regularly monitors the SMART index of the monitored storage node, enters the SMART index index value into a pre-trained storage node failure prediction model, and determines the storage node failure rate according to the model output result
  • the failure rate of the storage node is greater than or equal to the first preset threshold
  • the first warning message is sent to the client, and the client sends a data recovery request to the server cluster.
  • the storage node is a hard disk
  • the hard disk classification can also be determined according to the hard disk supplier information, and the hard disk failure prediction model corresponding to the classification can be selected according to the hard disk classification to predict the hard disk failure rate.
  • the client that initiated the file restoration request may also be authenticated for user identity information. If the user identity information is authenticated, the prediction step is performed, and if the user identity information authentication fails, it is rejected. The data recovery request generates second warning information.
  • the user's identity information with the preset white list with the requested permission.
  • the user is considered to have permission; when there is no data in the white list that matches the user's identity information
  • the device identification included in the file recovery instruction is obtained, and it is determined whether the device identification is a pre-bound whitelist. If so, the request is considered normal, and if not, the request is considered abnormal.
  • the user's identity information is obtained, and the user's identity information is used to determine whether the user has the permission of the data recovery request. If so, continue to perform the subsequent steps; if not, generate the second warning information.
  • Step S20 in response to the data recovery request, obtain historical values of IOPS and throughput of the server cluster to which the server belongs within a preset time, and predict the predicted values of IOPS and throughput of the server cluster according to the historical values, According to the predicted value and the preset detection rule, it is determined whether the server cluster is busy.
  • an exponential smoothing algorithm may be used to calculate the predicted values of IOPS and throughput.
  • the preset detection rules include:
  • the trigger threshold of the IOPS and throughput prediction values comes from the statistical analysis results of the IOPS and throughput peak data. Any index greater than or equal to the threshold will trigger control.
  • Threshold value of each index upper limit of cluster index performance-peak statistical analysis result.
  • Step S30 when it is determined that the server cluster is not busy, according to the predicted values of the IOPS and throughput, calculate the number of concurrent data recovery that the server cluster can increase, and adjust the data recovery concurrent of the server cluster according to the calculation result Number, when it is determined that the server cluster is busy, adjust the data recovery concurrent number of the server cluster according to a preset concurrency reduction rule, and perform data on the storage node related to the data recovery request according to the adjusted data recovery concurrent number restore.
  • the increaseable data recovery concurrent number (throughput upper limit value-throughput predicted value) * response time. Throughput is the number of requests processed by the system per unit time, concurrent number is the number of requests that the system can process at the same time, and the response time is the average response time.
  • adjusting the concurrent number of data recovery of the server cluster includes increasing the number of concurrent files participating in the recovery on the monitored storage node. For example, when the server cluster is not busy, 10 files on the monitored storage node are performing recovery operations at the same time, and the number of concurrent file recovery can be increased to 5, then the storage node can be set to allow 15 File recovery operation, this situation will speed up the data recovery to a certain extent.
  • the preset concurrency reduction rules include reducing the data recovery rate of the monitored storage node, reducing the number of concurrent files participating in the recovery on the monitored disk, or reducing the monitored storage nodes participating in the recovery at the same time Quantity.
  • reducing the rate at which the hard disk restores files can be set to let the recovery thread sleep for 100ms after each file is restored, that is, to perform the next file recovery operation after 100ms.
  • the recovery time of each file is longer, by reducing the rate of recovering files on a single disk and reducing the number of concurrent files participating in recovery on a single disk
  • the effect is not good.
  • the occupancy has an immediate effect.
  • the real-time access delay of the server cluster may be obtained to calculate the decrease in access delay.
  • the decrease in access delay is greater than the third preset threshold, Maintain the current data recovery speed, and when the decrease in the access delay is less than the third preset threshold, reduce the data recovery speed according to the preset concurrent reduction rule.
  • the real-time access delay of the system is obtained, and the decrease B of the access delay is calculated.
  • the calculation formula is:
  • the historical access delay is the access delay before the recovery concurrent adjustment is performed
  • the real-time access delay is the access delay after the recovery concurrent adjustment is performed.
  • the embodiments of the present application also propose a computer-readable storage medium, which may be a hard disk, a multimedia card, an SD card, a flash memory card, an SMC, a read-only memory (ROM), an erasable programmable Any one or any combination of several types of read memory (EPROM), portable compact disk read-only memory (CD-ROM), USB memory, etc.
  • the computer-readable storage medium includes a data recovery control program 10, and when the data recovery control program 10 is executed by a processor, the following operations are implemented:
  • Receiving step detect and receive the data recovery request sent by the client
  • Prediction step in response to the data recovery request, obtain historical values of IOPS and throughput of the server cluster to which the server belongs within a preset time, and predict the predicted values of IOPS and throughput of the server cluster according to the historical values, Determine whether the server cluster is busy according to the predicted value and preset detection rules;
  • Adjustment step when it is judged that the server cluster is not busy, calculate the number of concurrent data restorations that can be increased by the server cluster according to the predicted values of IOPS and throughput, and adjust the data restoration concurrentness of the server cluster according to the calculation result Number, when it is determined that the server cluster is busy, adjust the data recovery concurrent number of the server cluster according to a preset concurrency reduction rule, and perform data on the storage node related to the data recovery request according to the adjusted data recovery concurrent number restore.

Landscapes

  • Engineering & Computer Science (AREA)
  • Theoretical Computer Science (AREA)
  • Quality & Reliability (AREA)
  • Physics & Mathematics (AREA)
  • General Engineering & Computer Science (AREA)
  • General Physics & Mathematics (AREA)
  • Information Retrieval, Db Structures And Fs Structures Therefor (AREA)

Abstract

本申请涉及数据处理技术,揭露了一种数据恢复控制方法、服务器及存储介质,该方法包括如下步骤:服务器接收客户端发出的数据恢复请求,接着,获取预设时间内所属服务器集群的IOPS及吞吐量的历史值计算出IOPS及吞吐量的预测值,根据所述预测值与预设规则判断服务器集群系统是否繁忙,当系统不繁忙时,根据所述预测值计算出可增加的数据恢复并发数调整所述服务器集群的数据恢复并发数,当系统繁忙时,根据预设的并发降低规则调整所述服务器集群的数据恢复并发数,并对所述数据恢复请求相关存储节点进行数据恢复。利用本申请,解决了数据恢复时占用大量IO的技术问题,既能保证系统的正常运行,又能防止数据副本缺失,避免数据丢失的风险。

Description

数据恢复控制方法、服务器及存储介质
本申请基于巴黎公约申明享有2019年1月4日递交的申请号为CN201910008850.9、名称为“数据恢复控制方法、服务器及存储介质”的中国专利申请的优先权,该中国专利申请的整体内容以参考的方式结合在本申请中。
技术领域
本申请涉及数据处理领域,尤其涉及一种数据恢复控制方法、服务器及存储介质。
背景技术
在分布式存储系统里,一旦因为坏盘、服务重启等原因,造成硬盘数据需自动恢复时,往往占用硬盘的大量IO,影响当前存储系统对文件读取效率响应时间,但是存储系统自身无法对数据恢复的频率进行调整,进而影响用户正常访问使用。
发明内容
鉴于以上内容,本申请提供一种数据恢复控制方法、服务器及存储介质。其目的在于根据系统当前负载,灵活调整数据恢复方式,保证不影响系统的正常使用。
为实现上述目的,本申请提供一种数据恢复控制方法,该方法包括:
接收步骤:侦测并接收客户端发出的数据恢复请求;
预测步骤:响应所述数据恢复请求,获取预设时间内该服务器所属的服务器集群的每秒读写次数(Input/OutputOperations Per Second,IOPS)及吞吐量的历史值,根据所述历史值预测得到所述服务器集群的IOPS及吞吐量的预测值,根据所述预测值和预设检测规则,判断所述服务器集群是否繁忙;
调整步骤:当判断所述服务器集群不繁忙时,根据所述IOPS及吞吐量的预测值,计算出所述服务器集群可增加的数据恢复并发数,根据计算结果调整所述服务器集群的数据恢复并发数,当判断所述服务器集群繁忙时,根据预设的并发降低规则调整所述服务器集群的数据恢复并发数,并根据调整后 的数据恢复并发数对所述数据恢复请求相关的存储节点进行数据恢复。
为实现上述目的,本申请还提供一种服务器,该服务器通信连接客户端及存储节点,该服务器包括:存储器及处理器,所述存储器上存储数据恢复控制程序,所述数据恢复控制程序被所述处理器执行,可实现如下步骤:
接收步骤:侦测并接收客户端发出的数据恢复请求;
预测步骤:响应所述数据恢复请求,获取预设时间内该服务器所属的服务器集群的IOPS及吞吐量的历史值,根据所述历史值预测得到所述服务器集群的IOPS及吞吐量的预测值,根据所述预测值和预设检测规则,判断所述服务器集群是否繁忙;
调整步骤:当判断所述服务器集群不繁忙时,根据所述IOPS及吞吐量的预测值,计算出所述服务器集群可增加的数据恢复并发数,根据计算结果调整所述服务器集群的数据恢复并发数,当判断所述服务器集群繁忙时,根据预设的并发降低规则调整所述服务器集群的数据恢复并发数,并根据调整后的数据恢复并发数对所述数据恢复请求相关的存储节点进行数据恢复。
为实现上述目的,本申请还提供一种计算机可读存储介质,所述计算机可读存储介质中包括数据恢复控制程序,所述数据恢复控制程序被处理器执行时,可实现如上所述数据恢复控制方法中的任意步骤。
本申请提出的数据恢复控制方法、服务器及存储介质,解决了分布式存储系统数据自动恢复时占用大量IO的技术问题,可以提高分布式存储系统数据的读取效率、恢复效率,既能保证系统的正常运行,又能防止数据副本缺失,避免数据丢失的风险。
附图说明
图1为本申请服务器较佳实施例的示意图;
图2为图1中数据恢复控制程序较佳实施例的模块示意图;
图3为本申请数据恢复控制方法较佳实施例的流程图;
本申请目的的实现、功能特点及优点将结合实施例,参照附图做进一步说明。
具体实施方式
为了使本申请的目的、技术方案及优点更加清楚明白,以下结合附图及实施例,对本申请进行进一步详细说明。应当理解,此处所描述的具体实施例仅用以解释本申请,并不用于限定本申请。基于本申请中的实施例,本领域普通技术人员在没有做出创造性劳动前提下所获得的所有其他实施例,都属于本申请保护的范围。
本申请提供一种服务器1。参照图1所示,为本申请服务器1较佳实施例的示意图。
所述服务器1包括但不限于存储器11、处理器12。
其中,所述存储器11至少包括一种类型的可读存储介质,所述可读存储介质包括闪存、硬盘、多媒体卡、卡型存储器(例如,SD或DX存储器等)、随机访问存储器(RAM)、静态随机访问存储器(SRAM)、只读存储器(ROM)、电可擦除可编程只读存储器(EEPROM)、可编程只读存储器(PROM)、磁性存储器、磁盘、光盘等。在一些实施例中,所述存储器11可以是所述服务器1的内部存储单元,例如该服务器1的硬盘或内存。在另一些实施例中,所述存储器11也可以是所述服务器1的外部存储设备,例如该服务器1配备的插接式硬盘,智能存储卡(Smart Media Card,SMC),安全数字(Secure Digital,SD)卡,闪存卡(Flash Card)等。当然,所述存储器11还可以既包括所述服务器1的内部存储单元也包括其外部存储设备。本实施例中,所述存储器11通常用于存储安装于所述服务器1的操作系统和各类应用软件,例如数据恢复控制程序10的程序代码等。此外,所述存储器11还可以用于暂时地存储已经输出或者将要输出的各类数据。
所述处理器12在一些实施例中可以是中央处理器(Central Processing Unit,CPU)、控制器、微控制器、微处理器、或其他数据处理芯片。该处理器12通常用于控制所述服务器1的总体操作,例如执行数据交互或者通信相关的控制和处理等。本实施例中,所述处理器12用于运行所述存储器11中存储的程序代码或者处理数据,例如运行数据恢复控制程序10的程序代码等。
图1仅示出了具有组件11-12以及数据恢复控制程序10的服务器1,但是应理解的是,并不要求实施所有示出的组件,可以替代的实施更多或者更少的组件。
可选地,所述服务器1还可以包括用户接口,用户接口可以包括显示器(Display)、输入单元比如键盘(Keyboard),可选的用户接口还可以包括标准的有线接口、无线接口。可选地,在一些实施例中,显示器可以是LED显示器、液晶显示器、触控式液晶显示器以及有机发光二极管(Organic Light-Emitting Diode,OLED)触摸器等。其中,显示器也可以适当的称为显示屏或显示单元,用于显示在服务器1中处理的信息以及用于显示可视化的用户界面。
该服务器1还可以包括射频(Radio Frequency,RF)电路、传感器和音频电路等等,在此不再赘述。
在本实施例中,所述服务器1为分布式存储系统服务器集群中的任意一台服务器。分布式存储系统通常包括多台服务器,与服务器通信连接的客户端,以及一系列的存储节点。每个存储节点可能是一个存储设备,例如硬盘、磁盘或其他网络存储装置,也可能为提供存储空间的电子装置,例如个人电脑、服务器等等。在分布式存储系统,数据以多副本的形式均匀地分布在各个存储节点。当某个存储节点的数据发生损毁时,通过其他存储节点的备份副本执行数据恢复。
在上述实施例中,处理器12执行存储器11中存储的数据恢复控制程序10时可以实现如下步骤:
接收步骤:侦测并接收客户端发出的数据恢复请求;
预测步骤:响应所述数据恢复请求,获取预设时间内该服务器所属的服务器集群的IOPS及吞吐量的历史值,根据所述历史值预测得到所述服务器集群的IOPS及吞吐量的预测值,根据所述预测值和预设检测规则,判断所述服务器集群是否繁忙;
调整步骤:当判断所述服务器集群不繁忙时,根据所述IOPS及吞吐量的预测值,计算出所述服务器集群可增加的数据恢复并发数,根据计算结果调整所述服务器集群的数据恢复并发数,当判断所述服务器集群繁忙时,根据预设的并发降低规则调整所述服务器集群的数据恢复并发数,并根据调整后的数据恢复并发数对所述数据恢复请求相关的存储节点进行数据恢复。
关于上述步骤的详细介绍,请参照下述图2关于数据恢复控制程序10实施例的程序模块图以及图3关于数据恢复控制方法实施例的流程图的说明。
在其他实施例中,所述数据恢复控制程序10可以被分割为多个模块,该多个模块被存储于存储器12中,并由处理器13执行,以完成本申请。本申请所称的模块是指能够完成特定功能的一系列计算机程序指令段。
参照图2所示,为图1中数据恢复控制程序10一实施例的程序模块图。在本实施例中,所述数据恢复控制程序10可以被分割为:接收模块110、预测模块120和调整模块130。
所述接收模块110,用于侦测并接收客户端发出的数据恢复请求。
在其他实施例中,在接收数据恢复请求之前,还可以包括存储节点故障率预测步骤。所述接收模块110定时监控被监控的存储节点的预设指标,将所述预设指标的指标值输入预先训练好的存储节点故障预测模型,根据模型输出结果确定存储节点的故障率,当存储节点的故障率大于或等于第一预设阈值时,向所述客户端发出第一预警信息,所述客户端响应该第一预警信息向所述服务器集群发出数据恢复请求。
例如,当所述存储节点为硬盘时,获取硬盘的所有SMART信息,从SMART信息中提取关键SMART属性的类型的状态或者关键SMART属性的值的状态;根据关键SMART属性的类型或者关键SMART属性的值与数值映射关系,确定各预设指标对应的指标值,将所述各预设指标对应的指标值按照预先确定的顺序生成一维矩阵,作为模型输入。根据不同品牌硬盘的smart数据分布不平衡的情况,使用随机森林算法,对历史数据进行训练建模,生成故障预测模型,提高故障预测率。还可以根据硬盘的供货商信息确定硬盘的分类,根据硬盘的分类选择与分类对应的额硬盘故障预测模型进行硬盘故障率预测。
在另一个实施例中,所述接收模块110还对发起所述数据恢复请求的客户端的用户进行用户身份信息鉴定,用户身份信息鉴定通过则执行所述后续步骤,用户身份信息鉴定失败则拒绝所述数据恢复请求并生成第二预警信息。
例如,接收模块110将用户的身份信息与预设的具备请求权限的白名单进行匹配,当白名单中存在与用户身份信息匹配的数据时,认为用户具备权限;当白名单中不存在与用户身份信息匹配的数据时,认为用户不具备权限。例如,接收模块110获取文件恢复指令中包含的设备标识,判断设备标识是否为预先绑定的白名单,若是,则认为请求正常,若否,则认为请求不正常。再 例如,接收模块110获取用户的身份信息,根据用户身份信息判断用户是否具备数据恢复请求的权限,若具备,继续执行所述预测步骤,若不具备,则拒绝所述数据恢复请求并生成第二预警信息。
所述预测模块120,用于响应所述数据恢复请求,获取预设时间内服务器集群的IOPS及吞吐量的历史值,根据所述历史值预测得到所述服务器集群的IOPS及吞吐量的预测值,根据所述预测值和预设检测规则,判断所述服务器集群是否繁忙。
在本实施例中,计算IOPS及吞吐量这两项指标的预测值可以采用指数平滑算法,指数平滑算法的基本公式为:St=ayt+(1-a)St-1
其中,St是时间t的平滑值,yt是时间t的实际值,St-1是时间t-1的平滑值,a是平滑常数,其取值范围为[0,1];St是yt和St-1的加权算数平均数,a取值的大小变化决定yt和St-1对St的影响程度,当a取1时,St=yt;当a取0时,St=St-1。
基于指数平滑算法的基本公式,进行适当演变,可以用于计算IOPS及吞吐量的预测值。
所述的预设检测规则包括:
当所述预测值中任意一项指标大于或等于第二预设阈值时,判断所述服务器集群繁忙;
当所述预测值中所有指标均小于第二预设阈值时,判断所述服务器集群不繁忙。
需要说明的是,IOPS及吞吐量预测值的触发阈值(即第二预设阈值),来源于IOPS及吞吐量的峰值数据的统计分析结果。任意一项指标大于或等于阈值,都将触发控制。
每项指标的阈值=集群该项指标性能上限-峰值统计分析结果。
所述的调整模块130,用于当判断所述服务器集群不繁忙时,根据所述IOPS及吞吐量的预测值,计算出所述服务器集群可增加的数据恢复并发数,根据计算结果调整所述服务器集群的数据恢复并发数,当判断所述服务器集群繁忙时,根据预设的并发降低规则调整所述服务器集群的数据恢复并发数,并根据调整后的数据恢复并发数对所述数据恢复请求相关的存储节点进行数 据恢复。
所述可增加的数据恢复并发数=(吞吐量上限值-吞吐量预测值)*响应时间。吞吐量为单位时间内系统处理的请求数,并发数为系统能同时处理的请求数,响应时间为平均响应时间。当判断所述服务器集群不繁忙时,调整所述服务器集群的数据恢复并发数包括增加所述被监控的存储节点上参与恢复的并发文件数。例如,当服务器集群不繁忙时,所述被监控的存储节点上有10个文件同时在进行恢复操作,计算出可增加的文件恢复并发数为5,那么可以设定该存储节点上同时允许15个文件进行恢复操作,这种情况会在一定程度上加快数据恢复速度。
所述的预设的并发降低规则包括降低所述被监控的存储节点数据恢复的速率、降低所述被监控的磁盘上参与恢复的并发文件数或减少同时参与恢复的所述被监控的存储节点数量。
例如,当所述存储节点为硬盘时,降低所述硬盘恢复文件的速率,可以设定为每恢复完一个文件,让恢复线程睡眠(sleep)100ms,即100ms后执行下一个文件的恢复操作。
在正常情况下,统一存储节点上可能有10个文件同时在进行恢复操作,那么当需要降低恢复速度时,降低所述被监控的存储节点上参与恢复的并发文件数,可以设定为所述被监控的存储节点只允许5个文件进行恢复操作,这种情况会在一定程度上降低恢复效率。
当所述存储节点为磁盘,数据恢复的对象是大文件时,每个文件恢复的时间都较长,通过降低单块磁盘恢复文件的速率和降低单块磁盘上参与恢复的并发文件数的方法效果不佳,这时可以采取减少同时参与恢复的所述被监控的磁盘数量的方式,例如对同时参与恢复的磁盘数减半,以此快速的减少服务器上的CPU、内存、网络等资源的占用,起到立竿见影的效果。
在另一个实施例中,调整模块130还可以包括:获取所述服务器集群的实时访问延时,计算出访问延时的降幅,当所述访问延时的降幅大于第三预设阈值时,则保持当前的数据恢复速度,当所述访问延时的降幅小于第三预设阈值时,则按所述预设的并发降低规则调整所述服务器集群的数据恢复并发数。
例如,在执行恢复并发调整5min后,获取系统的实时访问延时,并计算 访问延时的降幅B,计算公式为:
B=(实时访问延时-历史访问延时)/历史访问延时
其中,历史访问延时为执行恢复并发调整前的访问延时,实时访问延时为执行恢复并发调整后的访问延时。
当B大于或等于预设阈值40%时,认为控制有效,则保持当前的数据恢复速度,当B小于预设阈值40%时,认为控制效果不明显,需进一步按所述预设的并发降低规则调整所述服务器集群的数据恢复并发数。重复上述步骤,直到访问延时的降幅大于或等于预设阈值为止。
如图3所示,是本申请数据恢复控制方法较佳实施例的流程图。
步骤S10,侦测并接收客户端发出的数据恢复请求。在接收数据恢复指令之前,所述服务器定时监控被监控的存储节点的SMART指标,将所述SMART指标的指标值输入预先训练好的存储节点故障预测模型,根据模型输出结果确定存储节点的故障率,当存储节点的故障率大于或等于第一预设阈值时,向所述客户端发出第一预警信息,客户端向所述服务器集群发出数据恢复请求。当所述存储节点为硬盘时,还可以根据硬盘的供货商信息确定硬盘的分类,根据硬盘的分类选择与分类对应的额硬盘故障预测模型进行硬盘故障率预测。
在其他实施例中,在接收文件恢复指令后,还可以对发起所述文件恢复请求的客户端进行用户身份信息鉴定,用户身份信息鉴定通过则执行所述预测步骤,用户身份信息鉴定失败则拒绝所述数据恢复请求并生成第二预警信息。
将用户的身份信息与预设的具备请求权限的白名单进行匹配,当白名单中存在与用户身份信息匹配的数据时,认为用户具备权限;当白名单中不存在与用户身份信息匹配的数据时,认为用户不具备权限。例如,获取文件恢复指令中包含的设备标识,判断设备标识是否为预先绑定白名单,若是,则认为请求正常,若否,则认为请求不正常。再例如,获取用户的身份信息,根据用户身份信息判断用户是否具备数据恢复请求的权限,若具备,继续执行后续步骤;若不具备,生成第二预警信息。
步骤S20,响应所述数据恢复请求,获取预设时间内该服务器所属的服务 器集群的IOPS及吞吐量的历史值,根据所述历史值预测得到所述服务器集群的IOPS及吞吐量的预测值,根据所述预测值和预设检测规则,判断所述服务器集群是否繁忙。在本实施例中,计算IOPS及吞吐量这两项指标的预测值可以采用指数平滑算法。
所述的预设检测规则包括:
当所述预测值中任意一项指标大于或等于第二预设阈值时,判断系统繁忙;
当所述预测值中所有指标均小于第二预设阈值时,判断系统不繁忙。
需要说明的是,IOPS及吞吐量预测值的触发阈值,来源于IOPS及吞吐量峰值数据的统计分析结果。任意一项指标大于或等于阈值,都将触发控制。
每项指标的阈值=集群该项指标性能上限-峰值统计分析结果。
步骤S30,当判断所述服务器集群不繁忙时,根据所述IOPS及吞吐量的预测值,计算出所述服务器集群可增加的数据恢复并发数,根据计算结果调整所述服务器集群的数据恢复并发数,当判断所述服务器集群繁忙时,根据预设的并发降低规则调整所述服务器集群的数据恢复并发数,并根据调整后的数据恢复并发数对所述数据恢复请求相关的存储节点进行数据恢复。所述可增加的数据恢复并发数=(吞吐量上限值-吞吐量预测值)*响应时间。吞吐量为单位时间内系统处理的请求数,并发数为系统能同时处理的请求数,响应时间为平均响应时间。当判断所述服务器集群不繁忙时,调整所述服务器集群的数据恢复并发数包括增加所述被监控的存储节点上参与恢复的并发文件数。例如,当服务器集群不繁忙时,所述被监控的存储节点上有10个文件同时在进行恢复操作,计算出可增加的文件恢复并发数为5,那么可以设定该存储节点上同时允许15个文件进行恢复操作,这种情况会在一定程度上加快数据恢复速度。
所述的预设的并发降低规则包括降低所述被监控的存储节点数据恢复的速率、降低所述被监控的磁盘上参与恢复的并发文件数或减少同时参与恢复的所述被监控的存储节点数量。
例如,当所述存储节点为硬盘时,降低所述硬盘恢复文件的速率,可以设定为每恢复完一个文件,让恢复线程睡眠(sleep)100ms,即100ms后执行 下一个文件的恢复操作。
在正常情况下,统一存储节点上可能有10个文件同时在进行恢复操作,那么当需要降低恢复速度时,降低所述被监控的存储节点上参与恢复的并发文件数,可以设定为所述被监控的存储节点只允许5个文件进行恢复操作,这种情况会在一定程度上降低恢复效率。
当所述存储节点为磁盘,数据恢复的对象是大文件时,每个文件恢复的时间都较长,通过降低单块磁盘恢复文件的速率和降低单块磁盘上参与恢复的并发文件数的方法效果不佳,这时可以采取减少同时参与恢复的所述被监控的磁盘数量的方式,例如对同时参与恢复的磁盘数减半,以此快速的减少服务器上的CPU、内存、网络等资源的占用,起到立竿见影的效果。
在其他实施例中,在调整数据恢复并发后,还可以获取所述服务器集群的实时访问延时,计算出访问延时的降幅,当所述访问延时的降幅大于第三预设阈值时,保持当前的数据恢复速度,当所述访问延时的降幅小于第三预设阈值时,按所述预设的并发降低规则降低数据恢复速度。
例如,在执行恢复并发调整5min后,获取系统的实时访问延时,并计算访问延时的降幅B,计算公式为:
B=(实时访问延时-历史访问延时)/历史访问延时
其中,历史访问延时为执行恢复并发调整前的访问延时,实时访问延时为执行恢复并发调整后的访问延时。
当B大于或等于预设阈值50%时,认为控制有效,当B小于预设阈值50%时,认为控制效果不明显,需进一步执行数据恢复并发控制。重复上述步骤,直到访问延时的降幅大于或等于预设阈值为止。
此外,本申请实施例还提出一种计算机可读存储介质,该计算机可读存储介质可以是硬盘、多媒体卡、SD卡、闪存卡、SMC、只读存储器(ROM)、可擦除可编程只读存储器(EPROM)、便携式紧致盘只读存储器(CD-ROM)、USB存储器等等中的任意一种或者几种的任意组合。所述计算机可读存储介质中包括数据恢复控制程序10,所述数据恢复控制程序10被处理器执行时实现如下操作:
接收步骤:侦测并接收客户端发出的数据恢复请求;
预测步骤:响应所述数据恢复请求,获取预设时间内该服务器所属的服务器集群的IOPS及吞吐量的历史值,根据所述历史值预测得到所述服务器集群的IOPS及吞吐量的预测值,根据所述预测值和预设检测规则,判断所述服务器集群是否繁忙;
调整步骤:当判断所述服务器集群不繁忙时,根据所述IOPS及吞吐量的预测值,计算出所述服务器集群可增加的数据恢复并发数,根据计算结果调整所述服务器集群的数据恢复并发数,当判断所述服务器集群繁忙时,根据预设的并发降低规则调整所述服务器集群的数据恢复并发数,并根据调整后的数据恢复并发数对所述数据恢复请求相关的存储节点进行数据恢复。
本申请之计算机可读存储介质的具体实施方式与上述数据恢复控制方法的具体实施方式大致相同,在此不再赘述。
上述本申请实施例序号仅仅为了描述,不代表实施例的优劣。
需要说明的是,上述本申请实施例序号仅仅为了描述,不代表实施例的优劣。并且本文中的术语“包括”、“包含”或者其任何其他变体意在涵盖非排他性的包含,从而使得包括一系列要素的过程、装置、物品或者方法不仅包括那些要素,而且还包括没有明确列出的其他要素,或者是还包括为这种过程、装置、物品或者方法所固有的要素。在没有更多限制的情况下,由语句“包括一个……”限定的要素,并不排除在包括该要素的过程、装置、物品或者方法中还存在另外的相同要素。
通过以上的实施方式的描述,本领域的技术人员可以清楚地了解到上述实施例方法可借助软件加必需的通用硬件平台的方式来实现,当然也可以通过硬件,但很多情况下前者是更佳的实施方式。基于这样的理解,本申请的技术方案本质上或者说对现有技术做出贡献的部分可以以软件产品的形式体现出来,该计算机软件产品存储在如上所述的一个存储介质(如ROM/RAM、磁碟、光盘)中,包括若干指令用以使得一台终端设备(可以是手机,计算机,服务器,或者网络设备等)执行本申请各个实施例所述的方法。
以上仅为本申请的优选实施例,并非因此限制本申请的专利范围,凡是利用本申请说明书及附图内容所作的等效结构或等效流程变换,或直接或间接运用在其他相关的技术领域,均同理包括在本申请的专利保护范围内。
以上仅为本申请的优选实施例,并非因此限制本申请的专利范围,凡是利用本申请说明书及附图内容所作的等效结构或等效流程变换,或直接或间接运用在其他相关的技术领域,均同理包括在本申请的专利保护范围内。

Claims (20)

  1. 一种数据恢复控制方法,应用于服务器,该服务器通信连接客户端及存储节点,其特征在于,所述方法包括以下步骤:
    接收步骤:侦测并接收客户端发出的数据恢复请求;
    预测步骤:响应所述数据恢复请求,获取预设时间内该服务器所属的服务器集群的每秒读写次数及吞吐量的历史值,根据所述历史值预测得到所述服务器集群的每秒读写次数及吞吐量的预测值,根据所述预测值和预设检测规则,判断所述服务器集群是否繁忙;
    调整步骤:当判断所述服务器集群不繁忙时,根据所述每秒读写次数及吞吐量的预测值,计算出所述服务器集群可增加的数据恢复并发数,根据计算结果调整所述服务器集群的数据恢复并发数,当判断所述服务器集群繁忙时,根据预设的并发降低规则调整所述服务器集群的数据恢复并发数,并根据调整后的数据恢复并发数对所述数据恢复请求相关的存储节点进行数据恢复。
  2. 如权利要求1所述的数据恢复控制方法,其特征在于,所述方法还包括:
    获取所述服务器集群的实时访问延时,计算出访问延时的降幅,当所述访问延时的降幅大于第三预设阈值时,保持当前的数据恢复并发数,当所述访问延时的降幅小于第三预设阈值时,按所述预设的并发降低规则调整所述服务器集群的数据恢复并发数。
  3. 如权利要求1所述的数据恢复控制方法,其特征在于,所述服务器定时监控被监控的存储节点的预设指标,将所述预设指标的指标值输入预先训练好的存储节点故障预测模型,根据模型输出结果确定存储节点的故障率,当存储节点的故障率大于或等于第一预设阈值时,向所述客户端发出第一预警信息,所述客户端响应该第一预警信息向所述服务器集群发出所述数据恢复请求。
  4. 如权利要求1所述的数据恢复控制方法,其特征在于,所述接收步骤还包括:
    对发起所述数据恢复请求的客户端的用户进行用户身份信息鉴定,用户 身份信息鉴定通过则执行所述预测步骤,用户身份信息鉴定失败则拒绝所述数据恢复请求并生成第二预警信息。
  5. 如权利要求1所述的数据恢复控制方法,其特征在于,所述预设检测规则包括:
    当所述预测值中任意一项指标大于或等于第二预设阈值时,判断所述服务器集群繁忙;
    当所述预测值中所有指标均小于第二预设阈值时,判断所述服务器集群不繁忙。
  6. 如权利要求1所述的数据恢复控制方法,其特征在于,所述预设的并发降低规则包括:
    降低所述被监控的存储节点恢复数据的速率、降低所述被监控的存储节点上参与恢复的并发文件数或减少同时参与恢复的所述被监控的存储节点数量。
  7. 如权利要求3至6中任意一项所述的数据恢复控制方法,其特征在于,所述方法还包括:
    获取所述服务器集群的实时访问延时,计算出访问延时的降幅,当所述访问延时的降幅大于第三预设阈值时,保持当前的数据恢复并发数,当所述访问延时的降幅小于第三预设阈值时,按所述预设的并发降低规则调整所述服务器集群的数据恢复并发数。
  8. 一种服务器,该服务器通信连接客户端及存储节点,其特征在于,所述服务器包括:存储器及处理器,所述存储器上存储数据恢复控制程序,所述数据恢复控制程序被所述处理器执行,可实现如下步骤:
    接收步骤:侦测并接收客户端发出的数据恢复请求;
    预测步骤:响应所述数据恢复请求,获取预设时间内该服务器所属的服务器集群的每秒读写次数及吞吐量的历史值,根据所述历史值预测得到所述服务器集群的每秒读写次数及吞吐量的预测值,根据所述预测值和预设检测规则,判断所述服务器集群是否繁忙;
    调整步骤:当判断所述服务器集群不繁忙时,根据所述每秒读写次数及吞吐量的预测值,计算出所述服务器集群可增加的数据恢复并发数,根据计算结果调整所述服务器集群的数据恢复并发数,当判断所述服务器集群繁忙 时,根据预设的并发降低规则调整所述服务器集群的数据恢复并发数,并根据调整后的数据恢复并发数对所述数据恢复请求相关的存储节点进行数据恢复。
  9. 如权利要求8所述的服务器,其特征在于,所述数据恢复控制程序被所述处理器执行还包括:
    获取所述服务器集群的实时访问延时,计算出访问延时的降幅,当所述访问延时的降幅大于第三预设阈值时,保持当前的数据恢复并发数,当所述访问延时的降幅小于第三预设阈值时,按所述预设的并发降低规则调整所述服务器集群的数据恢复并发数。
  10. 如权利要求8所述的服务器,其特征在于,所述服务器定时监控被监控的存储节点的预设指标,将所述预设指标的指标值输入预先训练好的存储节点故障预测模型,根据模型输出结果确定存储节点的故障率,当存储节点的故障率大于或等于第一预设阈值时,向所述客户端发出第一预警信息,所述客户端响应该第一预警信息向所述服务器集群发出所述数据恢复请求。
  11. 如权利要求8所述的服务器,其特征在于,所述接收步骤包括:
    对发起所述数据恢复请求的客户端的用户进行用户身份信息鉴定,用户身份信息鉴定通过则执行所述预测步骤,用户身份信息鉴定失败则拒绝所述数据恢复请求并生成第二预警信息。
  12. 如权利要求8所述的服务器,其特征在于,所述预设检测规则包括:
    当所述预测值中任意一项指标大于或等于第二预设阈值时,判断所述服务器集群繁忙;
    当所述预测值中所有指标均小于第二预设阈值时,判断所述服务器集群不繁忙。
  13. 如权利要求8所述的服务器,其特征在于,所述预设的并发降低规则包括:
    降低所述被监控的存储节点恢复数据的速率、降低所述被监控的存储节点上参与恢复的并发文件数或减少同时参与恢复的所述被监控的存储节点数量。
  14. 如权利要求10至13中任意一项所述的服务器,其特征在于,所述数据恢复控制程序被所述处理器执行还包括:
    获取所述服务器集群的实时访问延时,计算出访问延时的降幅,当所述访问延时的降幅大于第三预设阈值时,保持当前的数据恢复并发数,当所述访问延时的降幅小于第三预设阈值时,按所述预设的并发降低规则调整所述服务器集群的数据恢复并发数。
  15. 一种计算机可读存储介质,其特征在于,所述计算机可读存储介质中包括数据恢复控制程序,所述数据恢复控制程序被处理器执行时,可实现如下步骤:
    接收步骤:侦测并接收客户端发出的数据恢复请求;
    预测步骤:响应所述数据恢复请求,获取预设时间内该服务器所属的服务器集群的每秒读写次数及吞吐量的历史值,根据所述历史值预测得到所述服务器集群的每秒读写次数及吞吐量的预测值,根据所述预测值和预设检测规则,判断所述服务器集群是否繁忙;
    调整步骤:当判断所述服务器集群不繁忙时,根据所述每秒读写次数及吞吐量的预测值,计算出所述服务器集群可增加的数据恢复并发数,根据计算结果调整所述服务器集群的数据恢复并发数,当判断所述服务器集群繁忙时,根据预设的并发降低规则调整所述服务器集群的数据恢复并发数,并根据调整后的数据恢复并发数对所述数据恢复请求相关的存储节点进行数据恢复。
  16. 如权利要求15所述的计算机可读存储介质,其特征在于,所述数据恢复控制程序被处理器执行还包括:
    获取所述服务器集群的实时访问延时,计算出访问延时的降幅,当所述访问延时的降幅大于第三预设阈值时,保持当前的数据恢复并发数,当所述访问延时的降幅小于第三预设阈值时,按所述预设的并发降低规则调整所述服务器集群的数据恢复并发数。
  17. 如权利要求15所述的计算机可读存储介质,其特征在于,所述接收步骤包括:
    对发起所述数据恢复请求的客户端的用户进行用户身份信息鉴定,用户身份信息鉴定通过则执行所述预测步骤,用户身份信息鉴定失败则拒绝所述数据恢复请求并生成第二预警信息。
  18. 如权利要求15所述的计算机可读存储介质,其特征在于,所述预设 检测规则包括:
    当所述预测值中任意一项指标大于或等于第二预设阈值时,判断所述服务器集群繁忙;
    当所述预测值中所有指标均小于第二预设阈值时,判断所述服务器集群不繁忙。
  19. 如权利要求15所述的计算机可读存储介质,其特征在于,所述预设的并发降低规则包括:
    降低所述被监控的存储节点恢复数据的速率、降低所述被监控的存储节点上参与恢复的并发文件数或减少同时参与恢复的所述被监控的存储节点数量。
  20. 如权利要求17至19中任意一项所述的计算机可读存储介质,其特征在于,所述数据恢复控制程序被处理器执行还包括:
    获取所述服务器集群的实时访问延时,计算出访问延时的降幅,当所述访问延时的降幅大于第三预设阈值时,保持当前的数据恢复并发数,当所述访问延时的降幅小于第三预设阈值时,按所述预设的并发降低规则调整所述服务器集群的数据恢复并发数。
PCT/CN2019/088629 2019-01-04 2019-05-27 数据恢复控制方法、服务器及存储介质 WO2020140369A1 (zh)

Applications Claiming Priority (2)

Application Number Priority Date Filing Date Title
CN201910008850.9 2019-01-04
CN201910008850.9A CN109857592B (zh) 2019-01-04 2019-01-04 数据恢复控制方法、服务器及存储介质

Publications (1)

Publication Number Publication Date
WO2020140369A1 true WO2020140369A1 (zh) 2020-07-09

Family

ID=66894016

Family Applications (1)

Application Number Title Priority Date Filing Date
PCT/CN2019/088629 WO2020140369A1 (zh) 2019-01-04 2019-05-27 数据恢复控制方法、服务器及存储介质

Country Status (2)

Country Link
CN (1) CN109857592B (zh)
WO (1) WO2020140369A1 (zh)

Cited By (7)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN111782378A (zh) * 2020-07-29 2020-10-16 平安银行股份有限公司 自适应性的处理性能调整方法、服务器及可读存储介质
CN112307123A (zh) * 2020-11-02 2021-02-02 北京明略昭辉科技有限公司 一种分布式数据库故障预警方法、装置、设备和存储介质
CN112506950A (zh) * 2020-12-10 2021-03-16 深圳前海微众银行股份有限公司 数据聚合处理方法、计算节点、计算集群及存储介质
CN114338816A (zh) * 2021-12-22 2022-04-12 阿里巴巴(中国)有限公司 无服务器架构下的并发控制方法、装置、设备及存储介质
CN114546852A (zh) * 2022-02-21 2022-05-27 北京百度网讯科技有限公司 一种性能测试方法、装置、电子设备和存储介质
CN115080370A (zh) * 2022-06-17 2022-09-20 天翼数字生活科技有限公司 软件并发能力评估方法及装置、存储介质及电子设备
CN113283803B (zh) * 2021-06-17 2024-04-23 金蝶软件(中国)有限公司 一种物资需求计划的制定方法、相关装置及存储介质

Families Citing this family (10)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN110427311B (zh) * 2019-06-26 2020-07-28 华中科技大学 基于时序特征处理与模型优化的磁盘故障预测方法和系统
CN110795284B (zh) * 2019-10-25 2022-03-22 浪潮电子信息产业股份有限公司 一种数据恢复方法、装置、设备及可读存储介质
CN112748851A (zh) * 2019-10-30 2021-05-04 北京白山耘科技有限公司 数据读取方法、装置和系统
CN111400241B (zh) * 2019-11-14 2024-04-05 杭州海康威视系统技术有限公司 数据重构方法和装置
CN110955728A (zh) * 2019-11-28 2020-04-03 深圳市恒泰能源科技有限公司 用电数据传输方法、服务器及存储介质
CN111309525B (zh) * 2020-02-24 2023-01-06 苏州浪潮智能科技有限公司 一种控制数据恢复精度的方法、系统、设备及介质
CN112153110B (zh) * 2020-08-14 2023-04-14 咪咕文化科技有限公司 系统的请求处理方法、装置、电子设备及存储介质
US11435930B2 (en) * 2020-09-17 2022-09-06 EMC IP Holding Company LLC Intelligent recovery from multiple clouds copies
CN113282571A (zh) * 2021-05-26 2021-08-20 北京金山云网络技术有限公司 数据转移方法、装置、电子设备和存储介质
CN113805800B (zh) * 2021-08-08 2023-08-18 苏州浪潮智能科技有限公司 一种基于raid条带的写io的方法、装置、设备及可读介质

Citations (6)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN105825428A (zh) * 2016-04-22 2016-08-03 中国农业银行股份有限公司 一种面向商业银行的生产模拟测试方法和系统
CN106304166A (zh) * 2016-09-18 2017-01-04 上海斐讯数据通信技术有限公司 一种无线路由器并发用户数的测试系统及方法
US20170116034A1 (en) * 2015-10-27 2017-04-27 Tata Consultancy Services Limited Systems and methods for service demand based performance prediction with varying workloads
CN107480015A (zh) * 2017-07-04 2017-12-15 网易(杭州)网络有限公司 负载测试方法、装置、系统、存储介质与压测服务器
CN108551465A (zh) * 2018-03-09 2018-09-18 平安科技(深圳)有限公司 服务器并发数控制方法、装置、计算机设备及存储介质
CN109032914A (zh) * 2018-09-06 2018-12-18 掌阅科技股份有限公司 资源占用数据预测方法、电子设备、存储介质

Family Cites Families (6)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN102024016B (zh) * 2010-11-04 2013-03-13 曙光信息产业股份有限公司 一种分布式文件系统快速数据恢复的方法
CN103345435B (zh) * 2013-06-28 2015-04-22 环境保护部华南环境科学研究所 用于数据备份的目的服务器系统
CN106254032B (zh) * 2016-08-08 2019-07-16 上海交通大学 一种基于内容感知的资源调度方法
US10824455B2 (en) * 2016-12-02 2020-11-03 Nutanix, Inc. Virtualized server systems and methods including load balancing for virtualized file servers
CN107391633A (zh) * 2017-06-30 2017-11-24 北京奇虎科技有限公司 数据库集群自动优化处理方法、装置及服务器
CN109117306A (zh) * 2018-07-24 2019-01-01 广东浪潮大数据研究有限公司 一种基于对象读写时延调整数据恢复速度的方法及装置

Patent Citations (6)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US20170116034A1 (en) * 2015-10-27 2017-04-27 Tata Consultancy Services Limited Systems and methods for service demand based performance prediction with varying workloads
CN105825428A (zh) * 2016-04-22 2016-08-03 中国农业银行股份有限公司 一种面向商业银行的生产模拟测试方法和系统
CN106304166A (zh) * 2016-09-18 2017-01-04 上海斐讯数据通信技术有限公司 一种无线路由器并发用户数的测试系统及方法
CN107480015A (zh) * 2017-07-04 2017-12-15 网易(杭州)网络有限公司 负载测试方法、装置、系统、存储介质与压测服务器
CN108551465A (zh) * 2018-03-09 2018-09-18 平安科技(深圳)有限公司 服务器并发数控制方法、装置、计算机设备及存储介质
CN109032914A (zh) * 2018-09-06 2018-12-18 掌阅科技股份有限公司 资源占用数据预测方法、电子设备、存储介质

Cited By (10)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN111782378A (zh) * 2020-07-29 2020-10-16 平安银行股份有限公司 自适应性的处理性能调整方法、服务器及可读存储介质
CN111782378B (zh) * 2020-07-29 2024-04-16 平安银行股份有限公司 自适应性的处理性能调整方法、服务器及可读存储介质
CN112307123A (zh) * 2020-11-02 2021-02-02 北京明略昭辉科技有限公司 一种分布式数据库故障预警方法、装置、设备和存储介质
CN112506950A (zh) * 2020-12-10 2021-03-16 深圳前海微众银行股份有限公司 数据聚合处理方法、计算节点、计算集群及存储介质
CN113283803B (zh) * 2021-06-17 2024-04-23 金蝶软件(中国)有限公司 一种物资需求计划的制定方法、相关装置及存储介质
CN114338816A (zh) * 2021-12-22 2022-04-12 阿里巴巴(中国)有限公司 无服务器架构下的并发控制方法、装置、设备及存储介质
CN114546852A (zh) * 2022-02-21 2022-05-27 北京百度网讯科技有限公司 一种性能测试方法、装置、电子设备和存储介质
CN114546852B (zh) * 2022-02-21 2024-04-09 北京百度网讯科技有限公司 一种性能测试方法、装置、电子设备和存储介质
CN115080370A (zh) * 2022-06-17 2022-09-20 天翼数字生活科技有限公司 软件并发能力评估方法及装置、存储介质及电子设备
CN115080370B (zh) * 2022-06-17 2024-03-22 天翼数字生活科技有限公司 软件并发能力评估方法及装置、存储介质及电子设备

Also Published As

Publication number Publication date
CN109857592B (zh) 2023-09-15
CN109857592A (zh) 2019-06-07

Similar Documents

Publication Publication Date Title
WO2020140369A1 (zh) 数据恢复控制方法、服务器及存储介质
CN111447150A (zh) 访问请求限流方法、服务器及存储介质
CN111030936B (zh) 网络访问的限流控制方法、装置及计算机可读存储介质
WO2019024161A1 (zh) 基于区块链的数据管理方法、区块链系统、服务器及可读存储介质
WO2019205371A1 (zh) 服务器、消息分配的方法及存储介质
WO2019169758A1 (zh) 数据处理装置、方法及计算机可读存储介质
US10318727B2 (en) Management device, management method, and computer-readable recording medium
CN108494557B (zh) 社保数字证书管理方法、计算机可读存储介质及终端设备
CN110222107B (zh) 一种数据发送方法及相关设备
WO2021120975A1 (zh) 一种监控方法及装置
CN111641563B (zh) 基于分布式场景的流量自适应方法与系统
US20110071811A1 (en) Using event correlation and simulation in authorization decisions
WO2019148728A1 (zh) 电子装置、分布式系统执行任务分配方法及存储介质
CN108196959B (zh) Etl系统的资源管理方法及装置
WO2019169763A1 (zh) 电子装置、业务系统风险控制方法及存储介质
WO2019140739A1 (zh) 客户回访的判断方法、电子装置及计算机可读存储介质
CN113259429B (zh) 会话保持管控方法、装置、计算机设备及介质
CN108255703B (zh) 一种sql脚本的故障修复方法及其终端
US20120150895A1 (en) Maximum allowable runtime query governor
WO2020042503A1 (zh) 风控系统的验证方法、装置、设备及存储介质
WO2019200762A1 (zh) 保险平台数据处理方法、电子装置及计算机可读存储介质
WO2019169771A1 (zh) 电子装置、访问指令信息获取方法及存储介质
CN109948332A (zh) 一种物理机登录密码重置方法及装置
WO2020248384A1 (zh) 防止请求重复访问方法、装置、计算机设备及存储介质
CN110569114B (zh) 一种业务处理方法、装置、设备及存储介质

Legal Events

Date Code Title Description
121 Ep: the epo has been informed by wipo that ep was designated in this application

Ref document number: 19907203

Country of ref document: EP

Kind code of ref document: A1

NENP Non-entry into the national phase

Ref country code: DE

122 Ep: pct application non-entry in european phase

Ref document number: 19907203

Country of ref document: EP

Kind code of ref document: A1