CN114675998A - Method, device, equipment and medium for monitoring timed snapshot task - Google Patents

Method, device, equipment and medium for monitoring timed snapshot task Download PDF

Info

Publication number
CN114675998A
CN114675998A CN202210300900.2A CN202210300900A CN114675998A CN 114675998 A CN114675998 A CN 114675998A CN 202210300900 A CN202210300900 A CN 202210300900A CN 114675998 A CN114675998 A CN 114675998A
Authority
CN
China
Prior art keywords
snapshot task
data
snapshot
task
detected
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Pending
Application number
CN202210300900.2A
Other languages
Chinese (zh)
Inventor
孟祥奎
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Suzhou Inspur Intelligent Technology Co Ltd
Original Assignee
Suzhou Inspur Intelligent Technology Co Ltd
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Suzhou Inspur Intelligent Technology Co Ltd filed Critical Suzhou Inspur Intelligent Technology Co Ltd
Priority to CN202210300900.2A priority Critical patent/CN114675998A/en
Publication of CN114675998A publication Critical patent/CN114675998A/en
Pending legal-status Critical Current

Links

Images

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F11/00Error detection; Error correction; Monitoring
    • G06F11/07Responding to the occurrence of a fault, e.g. fault tolerance
    • G06F11/14Error detection or correction of the data by redundancy in operation
    • G06F11/1402Saving, restoring, recovering or retrying
    • G06F11/1446Point-in-time backing up or restoration of persistent data
    • G06F11/1448Management of the data involved in backup or backup restore
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F11/00Error detection; Error correction; Monitoring
    • G06F11/30Monitoring
    • G06F11/3055Monitoring arrangements for monitoring the status of the computing system or of the computing system component, e.g. monitoring if the computing system is on, off, available, not available

Landscapes

  • Engineering & Computer Science (AREA)
  • Theoretical Computer Science (AREA)
  • Quality & Reliability (AREA)
  • Physics & Mathematics (AREA)
  • General Engineering & Computer Science (AREA)
  • General Physics & Mathematics (AREA)
  • Computing Systems (AREA)
  • Debugging And Monitoring (AREA)

Abstract

The invention provides a method, a device, equipment and a readable medium for monitoring a timing snapshot task, wherein the method comprises the following steps: detecting the system environment every time a threshold value is passed, and judging whether the snapshot task is influenced or not based on the detected system environment; responding to the influence of the detected system environment on the snapshot task, and sending the detected system parameters and corresponding alarms to the user; responding to a triggering condition meeting the snapshot task, executing the snapshot task and judging whether the snapshot task is executed successfully; responding to the execution failure of the snapshot task, and re-executing the snapshot task at set time intervals within a preset time period; and responding to the fact that the snapshot task still fails to be executed within the preset time period, and sending out a warning that the snapshot task fails to be executed to the user. By using the scheme of the invention, the user can be informed in time, the risk existing in the execution process of the current timing snapshot task can be avoided in advance, the abnormal timing snapshot task can be repaired in time, the loss expansion can be avoided, and the user experience is improved.

Description

Method, device, equipment and medium for monitoring timed snapshot task
Technical Field
The present invention relates to the field of computers, and more particularly, to a method, an apparatus, a device and a readable medium for monitoring a timed snapshot task.
Background
Similar to using a camera to capture an instant image at a certain time point in the nature, the snapshot of the file system is to capture the content of a certain data set at a certain time point, so as to prevent data from being polluted or data from being lost due to virus, configuration file damage, system crash and the like. When the user wants to restore to the state when the snapshot is created, the user can perform the restoring operation by rolling back the snapshot, and pass through from the snapshot time point as the starting point, or create a plurality of parallel storage spaces from the data image at the time point. In order to facilitate timely backup of file system data, a timing snapshot task is usually set in a server, a user can set a timing strategy to automatically backup the file system and create a snapshot, and the problem that the use of the user is affected due to the fact that the difference between data before and after the snapshot is rolled back is too large because the snapshot backup operation is not performed for a long time is avoided. However, the timing snapshot task is automatically performed in the background, and the user cannot perceive whether the timing snapshot task is successful or not, so that the user is difficult to timely process the result generated by the execution failure of the timing snapshot task, and the user service may be affected, thereby causing user loss. In the existing warning and management of the timing snapshot task, the process of triggering the snapshot task is mainly aimed at, but the processing in the triggering process is limited, and the prejudgment on the possible risks of the task creation before triggering and the adverse consequences caused by the failure of the task creation after triggering cannot be actively avoided.
Disclosure of Invention
In view of this, embodiments of the present invention provide a method, an apparatus, a device, and a readable medium for monitoring a timing snapshot task, and by using the technical solution of the present invention, a user can be notified in time and risks existing in an execution process of a current timing snapshot task can be avoided in advance, and a repair can be performed in time after an abnormality occurs, so that loss expansion can be avoided, and user experience can be improved.
In view of the above object, an aspect of the embodiments of the present invention provides a method for monitoring a timed snapshot task, including the following steps:
detecting the system environment every time a threshold value is passed, and judging whether the snapshot task is influenced or not based on the detected system environment;
responding to the influence of the detected system environment on the snapshot task, and sending the detected system parameters and corresponding alarms to the user;
responding to a triggering condition meeting the snapshot task, executing the snapshot task and judging whether the snapshot task is executed successfully;
responding to the execution failure of the snapshot task, and re-executing the snapshot task at set time intervals within a preset time period;
and responding to the fact that the snapshot task still fails to be executed within the preset time period, and sending out a warning that the snapshot task fails to be executed to the user.
According to an embodiment of the present invention, detecting a system environment every time a threshold time elapses, and determining whether or not to affect the snapshot task based on the detected system environment includes:
detecting the utilization rate of a CPU of the system, the data volume of I/O waiting and the data of the capacity of a storage pool respectively every time a threshold value is passed;
comparing the detected data with corresponding preset thresholds respectively;
and determining that the system environment influences the snapshot task in response to the fact that the detection data of any one of the utilization rate of the CPU, the amount of the I/O waiting data and the capacity of the storage pool is larger than a corresponding preset threshold value.
According to an embodiment of the present invention, the data for detecting the usage rate of the CPU of the system, the amount of data waiting for I/O, and the capacity of the storage pool, respectively, every elapse of the threshold time includes:
setting a detection time period and a detection interval time;
detecting the utilization rate of a CPU of the system, the data volume of I/O waiting and the data of the capacity of a storage pool once every detection interval in a detection time period;
and adding the detected data of the same type, and averaging to obtain final detection data.
According to an embodiment of the present invention, further comprising:
and responding to any one of three conditions of system power failure, configuration node switching and system upgrading, and cleaning various types of resource residues in the system after the snapshot task is executed.
In another aspect of the embodiments of the present invention, there is also provided an apparatus for monitoring a timed snapshot task, including:
the judging module is configured to detect the system environment every time a threshold value is passed, and judge whether the snapshot task is influenced or not based on the detected system environment;
the sending module is configured to respond to the influence of the detected system environment on the snapshot task and send the detected system parameters and the corresponding alarm to the user;
the execution module is configured to respond to the triggering condition meeting the snapshot task, execute the snapshot task and judge whether the snapshot task is executed successfully;
the retry module is configured to respond to the execution failure of the snapshot task and re-execute the snapshot task at set time intervals within a preset time period;
and the warning module is configured to send a warning of the snapshot task execution failure to the user in response to the snapshot task still failing to execute within the preset time period.
According to an embodiment of the invention, the determining module is further configured to:
detecting the utilization rate of a CPU of the system, the data volume of I/O waiting and the data of the capacity of a storage pool respectively every time a threshold value is passed;
comparing the detected data with corresponding preset thresholds respectively;
and determining that the system environment influences the snapshot task in response to the fact that the detection data of any one of the utilization rate of the CPU, the amount of the I/O waiting data and the capacity of the storage pool is larger than a corresponding preset threshold value.
According to an embodiment of the invention, the determining module is further configured to:
setting a detection time period and a detection interval time;
detecting the utilization rate of a CPU of the system, the data volume of I/O waiting and the data of the capacity of a storage pool once every detection interval in a detection time period;
and adding the detected data of the same type, and averaging to obtain final detection data.
According to an embodiment of the invention, the system further comprises a cleaning module configured to:
and responding to any one of three conditions of system power failure, configuration node switching and system upgrading, and cleaning various types of resource residues in the system after the snapshot task is executed.
In another aspect of an embodiment of the present invention, there is also provided a computer apparatus including:
at least one processor; and
a memory storing computer instructions executable on the processor, the instructions when executed by the processor implementing the steps of any of the methods described above.
In another aspect of the embodiments of the present invention, a computer-readable storage medium is further provided, in which a computer program is stored, and the computer program realizes the steps of any one of the above methods when executed by a processor.
The invention has the following beneficial technical effects: the method for monitoring the timing snapshot task provided by the embodiment of the invention detects the system environment by every threshold time and judges whether the snapshot task is influenced or not based on the detected system environment; responding to the influence of the detected system environment on the snapshot task, and sending the detected system parameters and corresponding alarms to the user; responding to a trigger condition meeting the snapshot task, executing the snapshot task and judging whether the snapshot task is executed successfully or not; responding to the execution failure of the snapshot task, and re-executing the snapshot task at set time intervals within a preset time period; the technical scheme that the snapshot task is still failed to be executed in the preset time period is responded, the snapshot task execution failure alarm is sent to the user, the user can be informed in time, risks existing in the current timing snapshot task execution process can be avoided in advance, the snapshot task execution failure alarm can be repaired in time after the abnormality occurs, loss expansion can be avoided, and user experience is improved.
Drawings
In order to more clearly illustrate the embodiments of the present invention or the technical solutions in the prior art, the drawings used in the description of the embodiments or the prior art will be briefly described below, and it is obvious that the drawings in the following description are only some embodiments of the present invention, and it is obvious for those skilled in the art that other embodiments can be obtained by using the drawings without creative efforts.
FIG. 1 is a schematic flow chart diagram of a method of monitoring a timed snapshot task in accordance with one embodiment of the present invention;
FIG. 2 is a schematic diagram of an apparatus for monitoring a timed snapshot task according to one embodiment of the present invention;
FIG. 3 is a schematic diagram of a computer device according to one embodiment of the present invention;
fig. 4 is a schematic diagram of a computer-readable storage medium according to an embodiment of the present invention.
Detailed Description
In order to make the objects, technical solutions and advantages of the present invention more apparent, the following embodiments of the present invention are described in further detail with reference to the accompanying drawings.
In view of the above object, a first aspect of embodiments of the present invention proposes an embodiment of a method of monitoring a timed snapshot task. Fig. 1 shows a schematic flow diagram of the method.
As shown in fig. 1, the method may include the steps of:
s1 detects a system environment every time a threshold time elapses, and determines whether or not the snapshot task is affected based on the detected system environment.
The method can adopt C language and Python language, design a monitoring module for realizing the timing snapshot task, integrate the monitoring module into the system and automatically run according to the set parameters. The method comprises the steps of performing directional detection on environmental conditions required by snapshot task triggering to prevent snapshot related operation from being failed in execution, notifying a user when an abnormal environment exists, notifying the user of a potential risk that the timing snapshot related operation fails, preprocessing by the user, and performing detection for multiple times if a certain time is left between the completion of the timing snapshot task and the next time before the timing snapshot task is triggered, for example, setting the detection to be performed once every 30 minutes, and performing the detection 2h before the next timing snapshot task is triggered, wherein the detection can be performed for 4 times at most. The method can provide a manual calling interface for a user, and after the user finishes processing, the manual calling can be used for detecting and judging whether the user operation is effective or not. A user interface may be provided to allow the user to manually turn off the automatic monitoring function and provide a list of currently monitored resources to the user for selection. The method can provide an interface for the user to open the application at fixed time, and when the user adopts a mode of temporarily closing part of the application to reduce the risk of operation failure of the snapshot task, the application can be added into the application list which is opened at fixed time, and the application is automatically opened at the appointed time. Extensible interfaces can be reserved, new resources can be actively added for monitoring according to requirements, and thresholds can be actively set. During detection, the utilization rate of a CPU of the system, the data volume of I/O waiting and the data of the capacity of the storage pool can be detected respectively every threshold time, the detected data are compared with corresponding preset thresholds respectively, and if the detected data of any one of the utilization rate of the CPU, the data volume of I/O waiting and the capacity of the storage pool is greater than the corresponding preset threshold, the influence of the system environment on the snapshot task is determined. During detection, a multi-point detection averaging strategy is adopted, a detection time period and a detection interval time are firstly set, then the utilization rate of a CPU (central processing unit) of the system, the data volume of I/O (input/output) waiting and the data of the storage pool capacity are detected once every detection interval time in the detection time period, and finally the detected data of the same type are added and averaged to obtain the final detection data.
S2 sends the detected system parameters and corresponding alerts to the user in response to the detected system environment affecting the snapshot task.
If the judgment result shows that the current system environment may affect the snapshot task to be performed, the detected specific data and the alarm need to be sent to the user, and the user is prompted to close part of the application or take other abnormal countermeasures, so as to reduce the risk of the snapshot task operation failure.
S3, in response to the trigger condition of the snapshot task being satisfied, executes the snapshot task and determines whether the snapshot task is successfully executed.
When a user creates a timing snapshot task for a file system, the system automatically allocates resources for the file system, sets the timing task, periodically polls the timing task, judges whether a triggering condition of the timing snapshot task is met, deletes or creates a snapshot after the triggering condition is met, and continuously judges whether the deletion or creation operation of the snapshot is successfully executed.
S4 re-executes the snapshot task at set intervals for a preset period of time in response to the snapshot task execution failure.
If the execution of the deleting or creating operation of the snapshot fails, the system automatically judges the reason of the operation execution failure, automatically generates alarm information, and records the alarm information in the resource attribute of the timed task, wherein the attribute information records the abnormal condition and detailed abnormal condition information of the current task. After the execution of the timing snapshot task fails, a time period may be set, for example, 5 minutes, and then the snapshot task may be triggered again at an interval of 1 minute, if the snapshot creation or deletion operation is successfully executed during this period, it is queried whether the resource contains the alarm information, and if so, the alarm information is cleared, and no feedback to the user is needed.
S5 gives a warning to the user that the snapshot task failed to execute in response to the snapshot task still failing to execute within the preset time period.
If the snapshot task still fails to be executed within the time period, an alarm needs to be sent to the user, the alarm of the snapshot creation and deletion failure is separately carried out, the creation failure and the success only affect the created alarm information and do not affect the deletion of the alarm information, and vice versa. And after the alarm information is generated, reporting the alarm information, triggering detection operation of timing snapshot alarm every 5s, automatically acquiring the alarm information, and displaying the alarm information to a user.
By the technical scheme, the user can be informed in time, the risk existing in the execution process of the current timing snapshot task can be avoided in advance, the abnormal timing snapshot task can be repaired in time, loss expansion can be avoided, and user experience is improved.
In a preferred embodiment of the present invention, detecting a system environment every time a threshold time elapses, and determining whether to affect the snapshot task based on the detected system environment includes:
detecting the utilization rate of a CPU of the system, the data volume of I/O waiting and the data of the capacity of a storage pool respectively every time a threshold value is passed;
comparing the detected data with corresponding preset thresholds respectively;
and determining that the system environment influences the snapshot task in response to the fact that the detection data of any one of the utilization rate of the CPU, the amount of the I/O waiting data and the capacity of the storage pool is larger than a corresponding preset threshold value.
In a preferred embodiment of the present invention, the data for detecting the usage rate of the CPU of the system, the amount of data waiting for I/O, and the capacity of the storage pool, respectively, every lapse of the threshold time comprises:
setting a detection time period and a detection interval time;
detecting the utilization rate of a CPU of the system, the data volume of I/O waiting and the data of the capacity of a storage pool once every detection interval in a detection time period;
and adding the detected data of the same type, and averaging to obtain final detection data. The overall environment in the current system is detected, and the use condition of a CPU in the current system, the numerical value of IO wait, the capacity of a storage pool and other factors can be detected and judged. And (3) during detection, a multi-point monitoring and averaging strategy is adopted, for example, when the CPU state is detected, the system state is continuously detected for 5min, the detection is performed once every 10s, the average value is finally obtained and used for judging the utilization rate of the CPU of the system, and other data can be detected by adopting the same method. When the abnormal state and the over high utilization rate of various types of resources in the system are found, the snapshot task operation can fail, the user is prompted to close part of the applications or take other abnormal countermeasures, so that the risk of the snapshot task operation failure is reduced.
In a preferred embodiment of the present invention, the method further comprises:
and responding to any one of three conditions of system power failure, configuration node switching and system upgrading, and cleaning various types of resource residues in the system after the snapshot task is executed. Under normal conditions, when the snapshot creation or deletion operation fails to be executed, the system can automatically clear the resources generated by the operation or influence on the system configuration, and avoid generating resource residues to influence the normal operation of other services. But when the system is powered off, the configuration node is switched or the system is upgraded, the relevant snapshot operation can not normally clear the residual resource, the resource needs to be actively cleared, when the snapshot task is triggered at regular time, the residual or abnormal configuration of each type of resource in the system is actively cleared, a snapshot relation mapping table can be automatically maintained in the system, when the snapshot is created, the new snapshot information is written into the mapping table, when the snapshot is deleted, deleting the snapshot information from the mapping table, automatically cleaning after triggering the snapshot task at regular time, checking the validity of each item in the mapping table, the invalid entries are actively cleaned, the resources required by various types of snapshots in the system are verified according to the information in the mapping table, when finding the resources of the snapshot which do not exist in the mapping table, the residual resources are automatically cleaned, and the influence of the resource residue on the normal service is avoided.
Failure in the execution of the timed snapshot task may result in failure in the backup of the user data. When the system fails and data rollback is needed, the latest snapshot information of the file system cannot be obtained in time, so that the data loss cost of the file system is increased. The technical scheme of the invention can inform the user of the risk in the current timing snapshot task execution process in time, avoid in advance, inform the user in time after the abnormality occurs, repair in time, automatically clear the resource residue and abnormal configuration information of each type in the system, avoid loss expansion and improve the user experience.
It should be noted that, as can be understood by those skilled in the art, all or part of the processes in the methods of the embodiments described above can be implemented by instructing relevant hardware by a computer program, and the program may be stored in a computer-readable storage medium, and when executed, the program may include the processes of the embodiments of the methods described above. The storage medium may be a magnetic disk, an optical disk, a Read-Only Memory (ROM), a Random Access Memory (RAM), or the like. The embodiments of the computer program may achieve the same or similar effects as any of the above-described method embodiments.
Furthermore, the method disclosed according to an embodiment of the present invention may also be implemented as a computer program executed by a CPU, and the computer program may be stored in a computer-readable storage medium. The computer program, when executed by the CPU, performs the above-described functions defined in the method disclosed in the embodiments of the present invention.
In view of the above object, according to a second aspect of the embodiments of the present invention, there is provided an apparatus for monitoring a timed snapshot task, as shown in fig. 2, the apparatus 200 includes:
the judging module is configured to detect the system environment every time a threshold value is passed, and judge whether the snapshot task is influenced or not based on the detected system environment;
the sending module is configured to respond to the influence of the detected system environment on the snapshot task and send the detected system parameters and the corresponding alarm to the user;
the execution module is configured to respond to the triggering condition meeting the snapshot task, execute the snapshot task and judge whether the snapshot task is executed successfully;
the retry module is configured to respond to the execution failure of the snapshot task and re-execute the snapshot task at set time intervals within a preset time period;
and the warning module is configured to send a warning of the snapshot task execution failure to the user in response to the snapshot task still failing to execute within the preset time period.
In a preferred embodiment of the present invention, the determining module is further configured to:
detecting the utilization rate of a CPU of the system, the data volume of I/O waiting and the data of the capacity of a storage pool respectively every time a threshold value is passed;
comparing the detected data with corresponding preset thresholds respectively;
and determining that the system environment influences the snapshot task in response to the fact that the detection data of any one of the utilization rate of the CPU, the amount of the I/O waiting data and the capacity of the storage pool is larger than a corresponding preset threshold value.
In a preferred embodiment of the present invention, the determining module is further configured to:
setting a detection time period and a detection interval time;
detecting the utilization rate of a CPU of the system, the data volume of I/O waiting and the data of the capacity of a storage pool once every detection interval in a detection time period;
and adding the detected data of the same type, and averaging to obtain final detection data.
In a preferred embodiment of the present invention, the apparatus further comprises a cleaning module configured to:
and responding to any one of three conditions of system power failure, configuration node switching and system upgrading, and cleaning various types of resource residues in the system after the snapshot task is executed.
In view of the above object, a third aspect of the embodiments of the present invention provides a computer device. Fig. 3 is a schematic diagram of an embodiment of a computer device provided by the present invention. As shown in fig. 3, an embodiment of the present invention includes the following means: at least one processor 21; and a memory 22, the memory 22 storing computer instructions 23 executable on the processor, the instructions when executed by the processor implementing the method of:
detecting the system environment every time a threshold value is passed, and judging whether the snapshot task is influenced or not based on the detected system environment;
responding to the influence of the detected system environment on the snapshot task, and sending the detected system parameters and corresponding alarms to the user;
responding to a triggering condition meeting the snapshot task, executing the snapshot task and judging whether the snapshot task is executed successfully;
responding to the execution failure of the snapshot task, and re-executing the snapshot task at set time intervals within a preset time period;
and responding to the fact that the snapshot task still fails to be executed within the preset time period, and sending out a warning that the snapshot task fails to be executed to the user.
In a preferred embodiment of the present invention, detecting a system environment every time a threshold time elapses, and determining whether to affect the snapshot task based on the detected system environment includes:
detecting the utilization rate of a CPU of the system, the data volume of I/O waiting and the data of the capacity of a storage pool respectively every threshold time;
respectively comparing the detected data with corresponding preset threshold values;
and determining that the system environment influences the snapshot task in response to the fact that the detection data of any one of the utilization rate of the CPU, the amount of the I/O waiting data and the capacity of the storage pool is larger than a corresponding preset threshold value.
In a preferred embodiment of the present invention, the data for detecting the usage rate of the CPU of the system, the amount of data waiting for I/O, and the capacity of the storage pool, respectively, every lapse of the threshold time comprises:
setting a detection time period and a detection interval time;
detecting the utilization rate of a CPU of the system, the data volume of I/O waiting and the data of the capacity of a storage pool once every detection interval in a detection time period;
and adding the detected data of the same type, and averaging to obtain final detection data.
In a preferred embodiment of the present invention, the method further comprises:
and responding to any one of three conditions of system power failure, configuration node switching and system upgrading, and cleaning various types of resource residues in the system after the snapshot task is executed.
In view of the above object, a fourth aspect of the embodiments of the present invention proposes a computer-readable storage medium. FIG. 4 is a schematic diagram illustrating an embodiment of a computer-readable storage medium provided by the present invention. As shown in fig. 4, the computer-readable storage medium 31 stores a computer program 32 that, when executed by a processor, performs the method of:
detecting the system environment every threshold time, and judging whether the snapshot task is influenced or not based on the detected system environment;
responding to the influence of the detected system environment on the snapshot task, and sending the detected system parameters and corresponding alarms to the user;
responding to a triggering condition meeting the snapshot task, executing the snapshot task and judging whether the snapshot task is executed successfully;
responding to the execution failure of the snapshot task, and re-executing the snapshot task at set time intervals within a preset time period;
and responding to the fact that the snapshot task still fails to be executed within the preset time period, and sending out a warning that the snapshot task fails to be executed to the user.
In a preferred embodiment of the present invention, detecting a system environment every time a threshold time elapses, and determining whether to affect the snapshot task based on the detected system environment includes:
detecting the utilization rate of a CPU of the system, the data volume of I/O waiting and the data of the capacity of a storage pool respectively every time a threshold value is passed;
comparing the detected data with corresponding preset thresholds respectively;
and determining that the system environment influences the snapshot task in response to the fact that the detection data of any one of the utilization rate of the CPU, the amount of the I/O waiting data and the capacity of the storage pool is larger than a corresponding preset threshold value.
In a preferred embodiment of the present invention, the data for detecting the usage rate of the CPU of the system, the amount of data waiting for I/O, and the capacity of the storage pool, respectively, every lapse of the threshold time comprises:
setting a detection time period and a detection interval time;
detecting the utilization rate of a CPU of the system, the data volume of I/O waiting and the data of the capacity of a storage pool once every detection interval in a detection time period;
and adding the detected data of the same type, and averaging to obtain final detection data.
In a preferred embodiment of the present invention, further comprising:
and responding to any one of three conditions of system power failure, configuration node switching and system upgrading, and cleaning various types of resource residues in the system after the snapshot task is executed.
Furthermore, the methods disclosed according to embodiments of the present invention may also be implemented as a computer program executed by a processor, which may be stored in a computer-readable storage medium. Which when executed by a processor performs the above-described functions defined in the methods disclosed in embodiments of the invention.
Further, the above method steps and system elements may also be implemented using a controller and a computer readable storage medium for storing a computer program for causing the controller to implement the functions of the above steps or elements.
Those of skill would further appreciate that the various illustrative logical blocks, modules, circuits, and algorithm steps described in connection with the disclosure herein may be implemented as electronic hardware, computer software, or combinations of both. To clearly illustrate this interchangeability of hardware and software, various illustrative components, blocks, modules, circuits, and steps have been described above generally in terms of their functionality. Whether such functionality is implemented as software or hardware depends upon the particular application and design constraints imposed on the overall system. Skilled artisans may implement the described functionality in varying ways for each particular application, but such implementation decisions should not be interpreted as causing a departure from the scope of the disclosed embodiments of the present invention.
In one or more exemplary designs, the functions may be implemented in hardware, software, firmware, or any combination thereof. If implemented in software, the functions may be stored on or transmitted over as one or more instructions or code on a computer-readable medium. Computer-readable media includes both computer storage media and communication media including any medium that facilitates transfer of a computer program from one place to another. A storage media may be any available media that can be accessed by a general purpose or special purpose computer. By way of example, and not limitation, such computer-readable media can comprise RAM, ROM, EEPROM, CD-ROM or other optical disk storage, magnetic disk storage or other magnetic storage devices, or any other medium that can be used to carry or store desired program code in the form of instructions or data structures and that can be accessed by a general-purpose or special-purpose computer, or a general-purpose or special-purpose processor. Also, any connection is properly termed a computer-readable medium. For example, if the software is transmitted from a website, server, or other remote source using a coaxial cable, fiber optic cable, twisted pair, Digital Subscriber Line (DSL), or wireless technologies such as infrared, radio, and microwave, then the coaxial cable, fiber optic cable, twisted pair, DSL, or wireless technologies such as infrared, radio, and microwave are included in the definition of medium. Disk and disc, as used herein, includes Compact Disc (CD), laser disc, optical disc, Digital Versatile Disc (DVD), floppy disk, blu-ray disc where disks usually reproduce data magnetically, while discs reproduce data optically with lasers. Combinations of the above should also be included within the scope of computer-readable media.
The foregoing is an exemplary embodiment of the present disclosure, but it should be noted that various changes and modifications could be made herein without departing from the scope of the present disclosure as defined by the appended claims. The functions, steps and/or actions of the method claims in accordance with the disclosed embodiments described herein need not be performed in any particular order. Furthermore, although elements of the disclosed embodiments of the invention may be described or claimed in the singular, the plural is contemplated unless limitation to the singular is explicitly stated.
It should be understood that, as used herein, the singular forms "a", "an" and "the" are intended to include the plural forms as well, unless the context clearly supports the exception. It should also be understood that "and/or" as used herein is meant to include any and all possible combinations of one or more of the associated listed items.
The numbers of the embodiments disclosed in the embodiments of the present invention are merely for description, and do not represent the merits of the embodiments.
It will be understood by those skilled in the art that all or part of the steps for implementing the above embodiments may be implemented by hardware, or may be implemented by a program instructing relevant hardware, and the program may be stored in a computer-readable storage medium, and the above-mentioned storage medium may be a read-only memory, a magnetic disk or an optical disk, etc.
Those of ordinary skill in the art will understand that: the discussion of any embodiment above is meant to be exemplary only, and is not intended to intimate that the scope of the disclosure, including the claims, of embodiments of the invention is limited to these examples; within the idea of an embodiment of the invention, also technical features in the above embodiment or in different embodiments may be combined and there are many other variations of the different aspects of the embodiments of the invention as described above, which are not provided in detail for the sake of brevity. Therefore, any omissions, modifications, substitutions, improvements, and the like that may be made without departing from the spirit and principles of the embodiments of the present invention are intended to be included within the scope of the embodiments of the present invention.

Claims (10)

1. A method for monitoring a timed snapshot task, comprising the steps of:
detecting the system environment every time a threshold value is passed, and judging whether the snapshot task is influenced or not based on the detected system environment;
responding to the influence of the detected system environment on the snapshot task, and sending the detected system parameters and corresponding alarms to the user;
responding to a triggering condition meeting the snapshot task, executing the snapshot task and judging whether the snapshot task is executed successfully;
responding to the execution failure of the snapshot task, and re-executing the snapshot task at set time intervals within a preset time period;
and responding to the fact that the snapshot task still fails to be executed within the preset time period, and sending out a warning of the snapshot task failure to the user.
2. The method of claim 1, wherein detecting a system environment every time a threshold time elapses and determining whether to affect the snapshot task based on the detected system environment comprises:
detecting the utilization rate of a CPU of the system, the data volume of I/O waiting and the data of the capacity of a storage pool respectively every time a threshold value is passed;
comparing the detected data with corresponding preset thresholds respectively;
and determining that the system environment influences the snapshot task in response to the fact that the detection data of any one of the utilization rate of the CPU, the amount of the I/O waiting data and the capacity of the storage pool is larger than a corresponding preset threshold value.
3. The method of claim 2, wherein detecting data of usage rate of a CPU of the system, an amount of data of I/O wait, and a capacity of the storage pool, respectively, every lapse of a threshold time comprises:
setting a detection time period and a detection interval time;
detecting the utilization rate of a CPU of the system, the data volume of I/O waiting and the data of the capacity of a storage pool once every detection interval in a detection time period;
and adding the detected data of the same type, and averaging to obtain final detection data.
4. The method of claim 1, further comprising:
and responding to any one of three conditions of system power failure, configuration node switching and system upgrading, and cleaning various resource residues in the system after the snapshot task is executed.
5. An apparatus for monitoring a timed snapshot task, the apparatus comprising:
the judging module is configured to detect the system environment every time a threshold value is passed, and judge whether the snapshot task is influenced or not based on the detected system environment;
the sending module is configured to respond to the influence of the detected system environment on the snapshot task and send the detected system parameters and the corresponding alarm to the user;
the execution module is configured to respond to the meeting of the triggering condition of the snapshot task, execute the snapshot task and judge whether the snapshot task is executed successfully;
the retry module is configured to respond to the execution failure of the snapshot task and re-execute the snapshot task at set time intervals within a preset time period;
and the warning module is configured to respond to the fact that the snapshot task still fails to be executed within the preset time period, and send a warning that the snapshot task fails to be executed to the user.
6. The apparatus of claim 5, wherein the determining module is further configured to:
detecting the utilization rate of a CPU of the system, the data volume of I/O waiting and the data of the capacity of a storage pool respectively every time a threshold value is passed;
comparing the detected data with corresponding preset thresholds respectively;
and determining that the system environment influences the snapshot task in response to the fact that the detection data of any one of the utilization rate of the CPU, the amount of the I/O waiting data and the capacity of the storage pool is larger than a corresponding preset threshold value.
7. The apparatus of claim 6, wherein the determining module is further configured to:
setting a detection time period and a detection interval time;
detecting the utilization rate of a CPU of the system, the data volume of I/O waiting and the data of the capacity of a storage pool once every detection interval in a detection time period;
and adding the detected data of the same type, and averaging to obtain final detection data.
8. The apparatus of claim 5, further comprising a cleaning module configured to:
and responding to any one of three conditions of system power failure, configuration node switching and system upgrading, and cleaning various types of resource residues in the system after the snapshot task is executed.
9. A computer device, comprising:
at least one processor; and
a memory storing computer instructions executable on the processor, the instructions when executed by the processor implementing the steps of the method of any one of claims 1 to 4.
10. A computer-readable storage medium, in which a computer program is stored which, when being executed by a processor, carries out the steps of the method according to any one of claims 1 to 4.
CN202210300900.2A 2022-03-25 2022-03-25 Method, device, equipment and medium for monitoring timed snapshot task Pending CN114675998A (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN202210300900.2A CN114675998A (en) 2022-03-25 2022-03-25 Method, device, equipment and medium for monitoring timed snapshot task

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN202210300900.2A CN114675998A (en) 2022-03-25 2022-03-25 Method, device, equipment and medium for monitoring timed snapshot task

Publications (1)

Publication Number Publication Date
CN114675998A true CN114675998A (en) 2022-06-28

Family

ID=82073672

Family Applications (1)

Application Number Title Priority Date Filing Date
CN202210300900.2A Pending CN114675998A (en) 2022-03-25 2022-03-25 Method, device, equipment and medium for monitoring timed snapshot task

Country Status (1)

Country Link
CN (1) CN114675998A (en)

Cited By (1)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN115378794A (en) * 2022-08-19 2022-11-22 中国建设银行股份有限公司 Gateway fault detection method and device based on snapshot mode

Cited By (1)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN115378794A (en) * 2022-08-19 2022-11-22 中国建设银行股份有限公司 Gateway fault detection method and device based on snapshot mode

Similar Documents

Publication Publication Date Title
CN107179957B (en) Physical machine fault classification processing method and device and virtual machine recovery method and system
CN107515796B (en) Equipment abnormity monitoring processing method and device
US11892922B2 (en) State management methods, methods for switching between master application server and backup application server, and electronic devices
CN109274544B (en) Fault detection method and device for distributed storage system
CN107480014A (en) A kind of High Availabitity equipment switching method and device
CN112363865A (en) Database fault recovery method and device and face image search system
CN112732674B (en) Cloud platform service management method, device, equipment and readable storage medium
CN111901176B (en) Fault determination method, device, equipment and storage medium
CN114675998A (en) Method, device, equipment and medium for monitoring timed snapshot task
CN108958965A (en) A kind of BMC monitoring can restore the method, device and equipment of ECC error
CN108243031B (en) Method and device for realizing dual-computer hot standby
TWI518680B (en) Method for maintaining file system of computer system
CN111342986A (en) Distributed node management method and device, distributed system and storage medium
CN111917576B (en) Storage cluster control method and device, computer readable storage medium and processor
CN111930719B (en) Database access method, device and system
WO2017080362A1 (en) Data managing method and device
CN110968456B (en) Method and device for processing fault disk in distributed storage system
CN111130856A (en) Server configuration method, system, equipment and computer readable storage medium
CN111897626A (en) Cloud computing scene-oriented virtual machine high-reliability system and implementation method
CN109104314B (en) Method and device for modifying log configuration file
CN113778763B (en) Intelligent switching method and system for three-way interface service faults
CN112269693B (en) Node self-coordination method, device and computer readable storage medium
CN113112023B (en) Inference service management method and device of AIStation inference platform
CN113886497A (en) Bidirectional real-time data monitoring method and device
CN101295275A (en) Computer auxiliary management method and system

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination