WO2012149719A1 - 一种建立检查点的方法和系统 - Google Patents

一种建立检查点的方法和系统 Download PDF

Info

Publication number
WO2012149719A1
WO2012149719A1 PCT/CN2011/079180 CN2011079180W WO2012149719A1 WO 2012149719 A1 WO2012149719 A1 WO 2012149719A1 CN 2011079180 W CN2011079180 W CN 2011079180W WO 2012149719 A1 WO2012149719 A1 WO 2012149719A1
Authority
WO
WIPO (PCT)
Prior art keywords
application process
checkpoint
establishing
application
unit
Prior art date
Application number
PCT/CN2011/079180
Other languages
English (en)
French (fr)
Inventor
赵琪
方应
Original Assignee
华为技术有限公司
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by 华为技术有限公司 filed Critical 华为技术有限公司
Priority to PCT/CN2011/079180 priority Critical patent/WO2012149719A1/zh
Priority to CN201180001571.1A priority patent/CN102369514B/zh
Publication of WO2012149719A1 publication Critical patent/WO2012149719A1/zh

Links

Classifications

    • HELECTRICITY
    • H04ELECTRIC COMMUNICATION TECHNIQUE
    • H04LTRANSMISSION OF DIGITAL INFORMATION, e.g. TELEGRAPHIC COMMUNICATION
    • H04L43/00Arrangements for monitoring or testing data switching networks

Definitions

  • the present invention relates to the field of application backup technologies, and in particular, to a method and system for establishing checkpoints. Background technique
  • Hot backup refers to recording the running status of the application in the system as a backup file and saving it when the system is working normally.
  • System-generated hot backups typically use a fixed-period pair of files in the system. When the system fails and re-runs, you can use the backup file to restore the system to the state of the checkpoint establishment time.
  • the system Due to the process of establishing a server system checkpoint, the system generally establishes checkpoints for all applications in a fixed cycle, and for some application processes in the application, between two adjacent checkpoints.
  • the running state of the application process has not changed, but the system has performed two backup operations, which increases the data overhead, resulting in waste of resources. Summary of the invention
  • the present invention provides a method for establishing a checkpoint and a checkpoint management system, which can reduce data overhead caused by establishing checkpoints and save system resources.
  • the present invention provides a method of establishing a checkpoint, including:
  • the steps to set the checkpoint trigger condition include: The operation status of each application process is monitored, and the response output interval of each application process is determined, where the response output interval refers to a time interval between two response outputs of the application process;
  • the response output of the application process is used as a trigger condition for establishing a checkpoint of the application process
  • the trigger condition for establishing the checkpoint is established for the application process by the time when the preset checkpoint is reached;
  • the steps to monitor the application process include:
  • the present invention provides a system for establishing a checkpoint, comprising:
  • a determining unit configured to determine whether a checkpoint trigger condition is currently required to be set, and if yes, performing an operation of triggering the setting unit, and if not, performing an operation of the checkpoint establishing unit;
  • a trigger setting unit configured to set a checkpoint trigger condition for each application process
  • a checkpoint establishing unit is configured to establish a checkpoint for the application process when an application process satisfies the trigger condition.
  • the embodiment of the present invention discloses a method and system for establishing a checkpoint. By monitoring the running state of each application process, the response output interval of each application process can be determined, and the response of each application process is determined. The output interval is compared with the preset time checkpoint period, and different trigger conditions for establishing checkpoints are determined for different application processes, so that the system can reasonably arrange the cycle of checkpoints for each application process to avoid an application process at present.
  • FIG. 2 is a schematic diagram of an application scenario of a method for establishing a checkpoint according to the present invention
  • FIG. 3 is a schematic flowchart of recovering an application process state by using a checkpoint established by the present invention after a server system fails;
  • FIG. 4 is a schematic flow chart of another embodiment of a method for establishing a checkpoint according to the present disclosure
  • FIG. 5 is a schematic diagram of another application scenario of a method for establishing a checkpoint according to the present invention
  • FIG. 6 is a schematic structural diagram of an embodiment of a system for establishing a checkpoint according to the present invention.
  • the invention discloses a method and a system for establishing a checkpoint. According to different response intervals of different application processes, different triggering conditions for establishing checkpoints are determined for different application processes, and an application process satisfies a trigger condition for establishing a checkpoint. When, a checkpoint is established for the application process.
  • FIG. 1 a schematic flowchart of an embodiment of a method for establishing a checkpoint according to the present invention is shown.
  • the method of this embodiment may include the following steps:
  • Step 101 It is judged whether the setting of the checkpoint trigger condition is currently required, and if yes, the step of setting the checkpoint trigger condition is performed, and the process proceeds to step 102. If not, the step of monitoring the application process is performed, and the process proceeds to step 105.
  • the specific setting of the checkpoint trigger condition may be determined according to factors such as the performance of the server system, the application of the server system, and the number of applications running by the server system.
  • the setting of the checkpoint trigger condition can be set when the application is initialized in the server system, and after the initialization is completed, the trigger condition of each application process remains unchanged.
  • each application in the server system may involve multiple application processes during its operation (or called an application process group), and the running speed of the same application process at different times, the interval between two adjacent requests received, and the interval between two adjacent response outputs may change greatly, so It is also possible to set a period in which the checkpoint trigger condition is set in advance, and periodically set the checkpoint trigger condition. Specifically, it may be determined whether the preset time at which the checkpoint trigger condition is set is currently satisfied, and if so, the checkpoint trigger condition is performed. For example, the checkpoint trigger condition can be set every two hours. Of course, it is also possible to monitor the running status of the application process in real time to determine the response output interval of the application process.
  • the operation of the checkpoint trigger condition can be performed, and steps 102, 103, and 104 are performed.
  • Step 102 Monitor the running status of each application process, and determine the response output interval of each application process.
  • the response output interval refers to the time interval between two adjacent outputs of the application process.
  • the determination of the response output interval also takes into account factors such as the average response time of the application process and the frequency of requests received by the application process.
  • the running status of each application process is monitored, and the frequency of receiving the request by different application processes, the response time of the application process, and the input time interval of the two adjacent requests of the application process are monitored, and the application process is adjacent to the application process twice.
  • application process 1 may receive one every 15ms.
  • the request response time after receiving the request is 5ms, the response output interval of the application process may be 15ms; and the application process 2 receives a request every 80ms, the response time is 25ms, and the response output interval of the application process may be 80ms. .
  • the period for establishing checkpoints for the two application processes in the above example is the same, for example, the checkpoints can be established for the two application processes every 20 ms, and the checkpoint needs to be established when the checkpoint is established. Both application processes are frozen, interrupting the execution of two application processes, and frequently establishing checkpoints for both application processes consumes a large amount of system resources. For example, for the application process 2, the application process receives the request at an interval much longer than 20 ms, so the running state of the application process 2 does not change between the two adjacent checkpoints, but the server system is the application. The process established checkpoints, causing waste of resources.
  • the application process 2 It is also possible that the program is interrupted due to being frozen, which increases the interruption time of the application, which in turn affects system performance. If the server system establishes a checkpoint period for the two application processes, when the power is suddenly turned off, the information of the running state of the system application process in the checkpoint to the power-off time may be lost compared to the present invention by monitoring the application process.
  • the running state determines the response output interval of different application processes, distinguishes the differences of each application process, and determines different triggering conditions for establishing checkpoints for different application processes.
  • Step 102 When the response output interval of the application process is greater than a preset period of establishing a checkpoint, the response output of the application process is used as a trigger condition for establishing a checkpoint of the application process.
  • Step 103 When the response output interval of the application process is less than the preset period of establishing the checkpoint, the trigger condition for establishing the checkpoint is established for the application process by the time when the preset checkpoint is reached.
  • the preset period of establishing the checkpoint means that the system presets the time at which the checkpoint is established, and the interval between the two preset preset checkpoints is a preset period of establishing the checkpoint, and the period is It can be determined based on system performance and reliability. Specifically, the period of the preset checkpoint is determined by considering the response time of each application process in the system and the interval of the response output, and the length of the preset period is not specifically limited.
  • the response output interval of the application process is compared with the preset period of establishing the checkpoint to set the trigger condition for each application process to establish a checkpoint. For example, suppose the preset establishment checkpoint period is 30ms, the application process 1 output response output interval is 15ms, the application process 2 response output interval is 28ms, and the application process 3 response output interval is 60ms, and for the application process 1 and the application process 2, the triggering condition for establishing the checkpoint for the two application processes at the preset time of establishing the checkpoint; and for the application process 3, the triggering condition of the checkpoint of the application process is established. There is a responsive output for the application process.
  • Step 104 Determine, for an application process, whether a trigger condition for establishing a checkpoint is currently met, and if yes, establish a checkpoint for the application process.
  • the step of monitoring the application process may be performed. For an application process, determining whether the current application process is satisfied according to the trigger condition for establishing the checkpoint set for the application process. Establish a trigger condition for the checkpoint, and if so, establish a checkpoint for the application process.
  • the application process to be set up is frozen, ensuring that the running state of the application process does not change during the process of establishing a checkpoint for the application process, with depth priority
  • the search mode traverses the application process and the related thread, and records the running state of the application process of the checkpoint to be established, and generates a backup file corresponding to the running state of the application process of the checkpoint to be established.
  • the backup file generated when the checkpoint is established may also be referred to as a hot backup file, and the hot backup file may include an open file list of the application process, a network state, a stack segment pointer, and the like.
  • the response output interval of each application process can be determined, and the response output interval of the application process is compared with a preset period of establishing a checkpoint to determine for different application processes.
  • Different trigger conditions for establishing checkpoints When the response output interval of an application process is short, a checkpoint can be established for the application process at a time when the preset checkpoint is reached; when an application performs When the response output interval is long, the period in which the application process establishes the checkpoint will also increase accordingly, so that the system can reasonably arrange the cycle of the checkpoints of each application process to prevent an application process from currently establishing a checkpoint.
  • the running state of the application process is the same as that of the previous checkpoint, which reduces the data overhead caused by the establishment of checkpoints and wastes system resources.
  • the time for establishing checkpoints by different application processes is different, and the application process is only frozen when the checkpoint is established, so if the response of an application process is When the time interval is long and the output time interval of the application process is long, the checkpoint operation for the application process is triggered only when the application process has a response output, thereby reducing the interruption time of the application process.
  • the application process when establishing a checkpoint for an application process, the application process needs to be frozen. After the checkpoint is established and the corresponding backup file is generated, the application process needs to be unfrozen, and the running state of the application process is restored, so that the application process can receive the request from the client.
  • the application processes running in the server system can be divided into two categories.
  • One is a memory application.
  • the data state of the application changes with the data request input and the process state of the application process.
  • the other type is a non-memory application.
  • the request input and response output of the application process do not affect the data state of the application. For example, right In some information query programs, when a user query request is received, the user's requested information is provided to the user, but the data state of the application itself does not change.
  • checkpoints may not be established for the application process or manually set by the checkpoint.
  • each line segment in the figure respectively indicate the frequency at which the application process receives the request and the response output receives the request (ie, the response output interval), and assumes that the response output interval of the same application process is unchanged in different time periods;
  • the dotted line in the figure indicates the time point at which the system presets the checkpoint, and the interval between the two broken lines is a preset period of establishing the checkpoint.
  • the application process 1, the application process 2, and the application process 3 are all memory applications; the application process 4 is a non-memory application.
  • the application process 4 is a non-memory application, checkpoints may not be established for the application process, thereby reducing system resource consumption.
  • the response output interval of application process 1 is the shortest among the three application processes, and the response output interval of application process 3 is the longest, and application process 1 and application process 2
  • the response output interval is less than the preset length of the period in which the checkpoint is established, and the response output interval of the application process 3 is much larger than the preset period of establishing the checkpoint.
  • the preset time for establishing the checkpoint is used as the trigger condition for establishing the checkpoint by the application process 1 and the application process 2, and the checkpoint is established for the application process 1 and the application process 2 periodically at the preset time of establishing the checkpoint.
  • the rectangle-like graph in the figure represents the operation of establishing a checkpoint
  • the response output of the application process 3 is used as the trigger condition for the application process 3 to establish a checkpoint
  • a checkpoint is established for the application process 3 when the application process 3 generates a response output.
  • Step 301 Read a list of application processes to be restored, and sequentially read information of the application process to be restored.
  • the application process list to be restored lists the application processes to be restored in order, and then reads the information of each application process in order to restore the running status of each application process one by one.
  • Step 302 Obtain a backup file generated at a recent checkpoint corresponding to an application process.
  • Step 303 Restore the running state of the application process to the running state of the backup file creation time according to the obtained backup file.
  • Step 303 Start the application process, and continue to read the list of application processes to be restored. When there are still application processes to be restored, proceed to step 302 until all the application processes in the list are restored to their corresponding operating states. The most recent checkpoint establishes the operational status of the moment.
  • the application process can continue to perform process operations.
  • the system continues to resume subsequent application processes until all application processes are running.
  • each application process can be logged while monitoring the running status of each application process.
  • the log record mainly reflects the modification of the data state of the application process. For example, the log record can record the data request input by the application process, the data response, and the application process performs the data in the file list according to the received request. What changes, etc.
  • Step 401 Monitor the running status of each application process, and perform log recording on each application process.
  • the application process is monitored.
  • an application process receives the data request, the data request status, the data modification status, and the response data output of the application process are recorded to generate a log record.
  • Step 402 Determine whether the setting of the checkpoint trigger condition is currently required. If yes, execute the step of setting the checkpoint trigger condition, and proceed to step 403. If no, execute the step of monitoring the application process, and proceed to step 406.
  • Step 403 Determine a response output interval of each application process according to the log record.
  • the response output interval refers to the time interval between two adjacent outputs of the application process.
  • the response output interval here has the same meaning as in the previous embodiment, and details are not described herein again. The difference is that the response output interval obtained by each application process can also be obtained through the log information of each application process.
  • Step 404 When the response output interval of the application process is greater than a preset period of establishing a checkpoint, the response output of the application process is used as a trigger condition for establishing a checkpoint of the application process.
  • Step 405 When the response output interval of the application process is less than the preset period of establishing the checkpoint, the trigger condition for establishing the checkpoint is established for the application process by the period of reaching the preset establishment checkpoint.
  • Step 406 Determine, for an application process, whether a trigger condition for establishing a checkpoint is currently met, and if yes, establish a checkpoint for the application process.
  • the process of logging the application process may be to record the information about the data state involved in the process of the application process.
  • the log of the application process may be updated at regular intervals. recording.
  • the log record between the last checkpoint of the application process and the current checkpoint is deleted, and the log record of the application process is updated.
  • the application process updates the log record of the application process every time a response output is generated.
  • there are other ways of logging the application process as long as the system can be faulty, the data request status of the application process, the data modification status, and the like in the time period from the most recent checkpoint to the time when the failure occurred. It can be recorded.
  • the device generated by the latest checkpoint corresponding to the application process may be used.
  • the files, as well as the corresponding log records of the application process will restore the application process to the running state at the time of the failure.
  • the application process may be restored to the process state at the time of establishing the backup file by using the backup file generated by the latest checkpoint, and the application process may be restored to the process at the time of the failure by using the log record corresponding to the application process.
  • Operating status That is, after the application state is restored to the application state of the checkpoint establishment time by using the backup file, the data state information recorded in the log record corresponding to the application process may be further utilized to restore the application process to the system failure occurrence time.
  • the running state is, after the application state is restored to the application state of the checkpoint establishment time by using the backup file, the data state information recorded in the log record corresponding to the application process may be further utilized to restore the application process to the system failure occurrence time.
  • the response output interval of the application process 1 is smaller than the preset period of establishing the checkpoint.
  • the application process 1 establishes a checkpoint to generate a corresponding backup file.
  • the application process 1 is logged.
  • the circle labeled 5 in the figure represents the log record during the running process of the application process 1, and after the checkpoint is established for the application process 1, Delete the log record generated before the checkpoint, update the log record of the application process 1, and continue to record the log of the application process 1.
  • the circle marked with 6 in the figure represents the log record of the updated application process 1.
  • the running state of the application process 1 can be restored to the backup file numbered 6 by using the backup file numbered 6. Generate the running state of the moment. And using the generated log record of 6 to restore the data state of the application process to the data state closest to the downtime.
  • the application process 2 is similar to the process of establishing a checkpoint by the application process 1, and will not be described here.
  • the response output interval of the application process 3 is greater than the preset period of establishing the checkpoint.
  • a checkpoint is established for the application process 3.
  • the figure is marked with a number 5 and resembles a rectangle.
  • the representative is the backup file created by the application process 3 at the checkpoint, and during the running of the application process 3, the application process 3 is logged, and the circle labeled 4 in the figure represents the application process. 3 generated log records.
  • the backup file of the application process 3 can be obtained in order to restore the running state of the application process 3. Since the application process 3 establishes the checkpoint to the time when the downtime occurs, the application process does not have the running state. A change occurs, there is no corresponding log record, and only the backup file generated by the application process 3 at the checkpoint can be used to restore the running state of the application process 3.
  • FIG. 6 is a schematic structural diagram of an embodiment of a system for establishing a checkpoint according to the present invention.
  • the system of this embodiment includes:
  • the determining unit 610 is configured to determine whether a checkpoint trigger condition is currently required to be set, and if yes, perform an operation of the trigger setting unit, and if not, perform an operation of the checkpoint establishing unit.
  • the trigger setting unit 620 is configured to set a checkpoint trigger condition for each application process.
  • a checkpoint establishing unit 630 is configured to establish a checkpoint for the application process when an application process satisfies the trigger condition.
  • the trigger setting unit 620 includes: a status monitoring unit 621, a response interval determining unit 622, a first trigger unit 623, and a second trigger unit 624.
  • the status monitoring unit 621 is configured to monitor the running status of each application process.
  • the response interval determining unit 622 is configured to determine a response output interval of each application process according to an operating state of the application process, where the response output interval refers to a time interval between two response outputs of the application process;
  • the first triggering unit 623 is configured to: when a response output interval of the application process is greater than a preset period of establishing a checkpoint, use a response output of the application process as a trigger condition for establishing a checkpoint of the application process;
  • the second triggering unit 624 is configured to: when the response output interval of the application process is less than a preset period of establishing a checkpoint, trigger the establishment of the checkpoint for the application process by reaching a preset time for establishing the checkpoint Condition.
  • the specific setting of the checkpoint triggering condition may be determined according to factors such as the performance of the server system, the application of the server system, and the number of applications running by the server system.
  • the setting of the checkpoint triggering condition may be performed when the system is initialized.
  • the determining unit 610 may include:
  • the first determining unit is configured to determine whether the application is initialized, and if so, perform an operation of triggering the setting unit, and if not, perform an operation of the checkpoint establishing unit.
  • the system can also set a period, and then periodically set the checkpoint trigger condition.
  • the determining unit 610 can include: a second determining unit, configured to determine whether the current time is a preset setting. The time at which the checkpoint trigger condition is checked, and if so, the operation of the trigger setting unit is performed; if not, the operation of the checkpoint establishing unit is performed.
  • the running state of the application process can also be monitored in real time, thereby determining the response output interval of each application process, comparing the response output interval of each application process with the preset period of establishing the checkpoint, and then setting different processes for different application processes. Trigger condition.
  • the checkpoint establishing unit is specifically configured to freeze the application process to be set up, and record the application process of the checkpoint to be established.
  • the execution state generates a backup file corresponding to the execution state of the application process of the checkpoint to be established.
  • the checkpoint establishment unit establishes a checkpoint for the application process, the application process needs to be thawed to restore the application process's ability to receive the request, and restore the running state of the application process.
  • the status monitoring unit can monitor the running status of each application in real time, and send information about the running status of each application process to the response interval determining unit, so that the response determining unit 622 can determine the response output interval of each application process, and further
  • the first trigger unit and the second trigger unit determine, for each application process, a trigger condition for establishing a checkpoint, and trigger an operation of the checkpoint establishing unit to establish a checkpoint for different application processes.
  • the application type determining unit is configured to determine whether the process state and the data state change during the running of the application process.
  • the checkpoint establishment triggering unit is configured to perform an operation of the checkpoint establishing unit when the application type determining unit determines that the process state and the data state of the application process change during the running process.
  • the application determining unit may also determine whether the process state and the data state of an application process change according to the running state information of each application acquired by the monitoring unit. If an application process is running and does not affect the process status and data status, there is no need to establish a checkpoint for the application process. Of course, the response interval determination unit, the first trigger unit, and the second may not be performed. The operation of the trigger unit.
  • the checkpoint management system can complete the checkpoint establishment, delete the checkpoint, and restore the running state of the application process by using the backup file generated by the checkpoint.
  • the server system fails, it needs to utilize the backup file generated at the latest checkpoint of the application process.
  • the corresponding checkpoint management system further includes:
  • the process recovery unit is configured to restore the running state of the application process according to the backup file generated at the checkpoint after the system fails.
  • the checkpoint management system further includes: a logging unit, configured to be in operation of each application process , logging the respective application processes.
  • the log record mainly reflects the modification of the data state of the application process.
  • the log record can record the data request input by an application process, the data response, and what the application process does according to the received request to the data in the file list. Modify and so on.
  • the response interval determining unit may also determine the response output interval of the application process based on the log record.
  • the process recovery unit includes: a process recovery subunit, configured to restore the application process to the process running state of the backup file generation time by using the backup file generated at the latest checkpoint, and use the log record Restore the application process to the running state at the time of the failure.
  • the log record before the moment can be deleted, and the application process is reproduced after the checkpoint.
  • the logging unit including: Logging An update unit for updating the logging of the application after the checkpoint establishment unit establishes a checkpoint.
  • the steps of a method or algorithm described in connection with the embodiments disclosed herein can be implemented directly in hardware, a software module executed by a processor, or a combination of both.
  • the software module can be placed in random access memory (RAM), memory, read only memory (ROM), electrically programmable ROM, electrically erasable programmable ROM, registers, hard disk, removable disk, CD-ROM, or technical field. Any other form of storage medium known.

Abstract

本发明公开了一种建立检查点的方法和系统,该方法包括:监测各个应用进程的运行状态,确定各个应用进程的响应输出间隔;当应用进程的响应输出间隔大于预设的建立检查点的周期时,以该应用进程的响应输出作为该应用进程建立检查点的触发条件;当应用进程的响应输出间隔小于预设的建立检查点的周期时,以到达预设的建立检查点的时刻为该应用进程建立检查点的触发条件;如果当前不需要进行检查点触发条件的设定,且某应用进程当前满足建立检查点的触发条件时,为该应用进程建立检查点。本发明的方法能降低建立检查点带来的数据开销,节省系统资源。

Description

一种建立检查点的方法和系统
技术领域
本发明涉及应用程序备份技术领域,尤其涉及一种建立检查点的方法和系 统。 背景技术
随着计算机和通信技术的发展,对服务器系统的可靠性要求越来越高。 为 了避免由于服务器死机(宕机 )或误操作而导致应用程序的运行状态的数据信 息丟失, 需要对系统进行热备份。
热备份是指在系统正常工作的情况下,将系统中应用程序运行状态记录成 备份文件并保存下来。系统生成热备份一般是采用以固定周期对系统中各个应 份文件。 当系统出现故障并重新运行时, 可以利用备份文件将系统恢复至检查 点建立时刻的状态。
由于建立服务器系统检查点的过程中,系统一般会以一个固定的周期来为 所有应用程序均建立检查点, 而对于应用程序中的某些应用进程而言,在相邻 两次检查点之间,应用进程的运行状态并没有发生改变,但是系统却进行了两 次备份操作, 增加了数据开销, 从而造成了资源浪费。 发明内容
有鉴于此, 本发明提供一种建立检查点的方法和检查点管理系统, 能降低 建立检查点带来的数据开销, 节省系统资源。
为实现上述目的本发明提供了一种建立检查点的方法, 包括:
判断当前是否需要进行检查点触发条件的设定,如果是, 则执行以下设定 检查点触发条件的步骤, 如果否, 则执行以下监测应用进程的步骤;
设定检查点触发条件的步骤包括: 监测各个应用进程的运行状态,确定各个应用进程的响应输出间隔,其中, 响应输出间隔是指应用进程相邻两次响应输出之间的时间间隔;
当应用进程的响应输出间隔大于预设的建立检查点的周期时,以该应用进 程的响应输出作为该应用进程建立检查点的触发条件;
当应用进程的响应输出间隔小于预设的建立检查点的周期时,以到达预设 的建立检查点的时刻为该应用进程建立检查点的触发条件;
监测应用进程的步骤包括:
对于一个应用进程, 判断当前是否满足建立检查点的触发条件, 如果是, 则为该应用进程建立检查点。
另一方面, 本发明提供了一种建立检查点的系统, 包括:
判断单元, 用于判断当前是否需要进行检查点触发条件的设定, 如果是, 则执行触发设定单元的操作, 如果否, 则执行检查点建立单元的操作;
触发设定单元, 用于为各个应用进程设定检查点触发条件;
检查点建立单元, 用于当某个应用进程满足触发条件时, 为该应用进程建 立检查点。 从上述的技术方案可以看出,本发明实施例公开一种建立检查点的方法和 系统,通过监测各个应用进程的运行状态, 可以确定出各个应用进程的响应输 出间隔, 将各个应用进程的响应输出间隔与预设的时间检查点的周期作比较, 为不同应用进程确定不同的建立检查点的触发条件,这样系统可以合理的安排 各个应用进程建立检查点的周期,避免某个应用进程在当前建立检查点时, 该 应用进程的运行状态与上一检查点时刻的运行状态相同的情况出现,从而减少 建立检查点带来的数据开销, 造成系统资源浪费。 附图说明 为了更清楚地说明本发明实施例或现有技术中的技术方案,下面将对实施 例或现有技术描述中所需要使用的附图作筒单地介绍,显而易见地, 下面描述 中的附图仅仅是本发明的一些实施例, 对于本领域普通技术人员来讲,在不付 出创造性劳动的前提下, 还可以根据这些附图获得其他的附图。 图 1 为本发明公开的一种建立检查点的方法的一个实施例的流程示意 图;
图 2 为本发明的一种建立检查点的方法的一个应用场景示意图; 图 3 为服务器系统出现故障后, 利用本发明所建立的检查点恢复应用进 程运行状态的流程示意图;
图 4 为本发明公开的一种建立检查点的方法的另一个实施例的流程示意 图;
图 5 为本发明一种建立检查点的方法的另一个应用场景示意图; 图 6 为本发明公开的一种建立检查点的系统的一个实施例的结构示意 图。 具体实施方式 下面将结合本发明实施例中的附图,对本发明实施例中的技术方案进行清 楚、 完整地描述, 显然, 所描述的实施例仅仅是本发明一部分实施例, 而不是 全部的实施例。基于本发明中的实施例, 本领域普通技术人员在没有做出创造 性劳动前提下所获得的所有其他实施例, 都属于本发明保护的范围。
本发明公开了一种建立检查点的方法和系统,根据不同应用进程的响应输 出间隔, 为不同应用进程确定不同的建立检查点的触发条件, 并在某个应用进 程满足建立检查点的触发条件时, 为该应用进程建立检查点。
参见图 1 , 示出了本发明一种建立检查点的方法的一个实施例的流程示意 图, 本实施例的方法可以包括以下步骤:
步骤 101 : 判断当前是否需要进行检查点触发条件的设定, 如果是, 则执 行设定检查点触发条件的步骤, 进入步骤 102, 如果否, 则执行监测应用进程 的步骤, 进入步骤 105。
其中, 具体何时进行检查点触发条件的设定可以根据服务器系统的性能、 服务器系统的应用场合、服务器系统运行的应用程序的数量等因素来确定。 可 以设定在服务器系统中应用程序进行初始化时, 进行检查点触发条件的设定, 并在完成初始化之后, 各个应用进程的触发条件保持不变。
由于服务器系统中每个应用程序的运行过程中可能会涉及多个应用进程 (或者称为应用进程组), 而不同时刻同一应用进程的运行速度、 相邻两次接 收到请求的时间间隔,相邻两次响应输出的时间间隔等都可能会发生较大的变 化, 因此也可以预先设置进行检查点触发条件设定的周期, 并周期性的进行检 查点触发条件的设定。具体的可以判断当前是否满足预设的进行检查点触发条 件设定的时刻, 如果是, 则执行检查点触发条件的操作。 如, 可以每隔两个小 时进行一次设定检查点触发条件的操作。 当然,也可以采用实时监测应用进程 的运行状态, 以确定应用进程的响应输出间隔。
当满足设定检查点触发条件的执行条件时,就可以进行检查点触发条件的 操作, 执行步骤 102、 103和 104。
步骤 102: 监测各个应用进程的运行状态, 确定各个应用进程的响应输出 间隔。
其中, 响应输出间隔是指应用进程相邻两次输出之间的时间间隔。 当然该 响应输出间隔的确定还综合考虑了该应用进程的平均响应时间以及该应用进 程接收到的请求的频率等因素。
监测各个应用进程的运行状态,可以获取到不同应用进程的接收到请求的 频率, 该应用进程的响应时间, 以及该应用进程的相邻两次请求的输入时间间 隔, 该应用进程相邻两次输出的时间间隔等。
不同的应用进程接收到请求的频率、 以及响应时间等会存在差异, 因此不 同的应用进程的相邻两次输出之间的时间间隔也会有差异, 例如应用进程 1 可能每隔 15ms接收到一个请求, 接收到请求后的响应时间为 5ms, 该应用进 程的响应输出间隔可以为 15ms; 而应用进程 2每隔 80ms接收到一次请求,响 应时间为 25ms , 该应用进程的响应输出间隔可以为 80ms。
需要说明的是,在现有技术中对于以上例子中的两个应用进程建立检查点 的周期是相同的, 如可以每隔 20ms为这两个应用进程建立检查点, 当建立检 查点时需要将这两个应用进程均冻结, 中断两个应用进程的执行, 而频繁的为 两个应用进程均建立检查点消耗了大量的系统资源。 如对于应用进程 2而言, 该应用进程接收到请求的间隔时间远大于 20ms, 因此相邻两个检查点之间, 该应用进程 2的运行状态并没有发生变化,但是服务器系统却为该应用进程建 立了检查点, 造成了资源的浪费。 而且在建立检查点的过程中该应用进程 2 还可能由于被冻结而中断程序运行,使应用程序的中断时间增加, 进而影响系 统性能。如果将服务器系统为这两个应用进程建立检查点的周期延长, 当突然 断电时,检查点到断电时刻中的系统应用进程运行状态的信息可能丟失的会较 本发明通过监测应用进程的运行状态,确定出不同应用进程的响应输出间 隔, 区分出各个应用进程的差异性, 进而针对不同的应用进程确定不同的建立 检查点的触发条件。
步骤 102: 当应用进程的响应输出间隔大于预设的建立检查点的周期时, 以该应用进程的响应输出作为该应用进程建立检查点的触发条件。
步骤 103: 当应用进程的响应输出间隔小于预设的建立检查点的周期时, 以到达预设的建立检查点的时刻为该应用进程建立检查点的触发条件。
其中, 预设的建立检查点的周期是指系统预先设定了建立检查点的时刻, 相邻两个预设的建立检查点的时刻的间隔为一个预设的建立检查点的周期,该 周期可以根据系统性能以及可靠性来确定。具体的可以考虑系统中各个应用进 程的响应时间以及响应输出的间隔来确定该预设的建立检查点的周期,对于该 预设的周期的时间长度不做具体限定。
得到各个应用进程的响应输出间隔之后,将应用进程的响应输出间隔与预 设的建立检查点的周期进行比较,以设定各个应用进程各自建立检查点的触发 条件。 例如, 假设预设的建立检查点的周期为 30ms, 应用进程 1的输出响应 输出间隔为 15ms, 应用进程 2的响应输出间隔为 28ms, 而应用进程 3的响应 输出间隔为 60ms, 而对于应用进程 1和应用进程 2来说, 在该预设的建立检 查点的时刻点即为这两个应用进程建立检查点的触发条件; 而对于应用进程 3 而言, 该应用进程建立检查点的触发条件为应用进程有响应输出。
步骤 104:对于一个应用进程,判断当前是否满足建立检查点的触发条件, 如果是, 则为该应用进程建立检查点。
在确定了不同应用进程建立检查点的触发条件后,可以执行监测应用进程 的步骤, 对于某个应用进程而言, 依据为其设定的建立检查点的触发条件, 判 断当前该应用进程是否满足建立检查点的触发条件,如果是, 就为该应用进程 建立检查点。 当应用进程满足建立检查点的触发条件时,将待建立检查点的应用进程进 行冻结,保证在为该应用进程建立检查点的过程中, 该应用进程的运行状态不 会发生变化, 以深度优先搜索的方式遍历该应用进程以及相关的线程, 并记录 该待建立检查点的应用进程的运行状态,生成与该待建立检查点的应用进程的 运行状态相对应的备份文件。建立检查点时生成的备份文件也可以称为热备份 文件, 在该热备份文件中可以包括该应用进程的打开的文件列表、 网络状态、 堆栈段指针等。
本实施例中,通过监测各个应用进程的运行状态, 可以确定出各个应用进 程的响应输出间隔,并将应用进程的响应输出间隔与预设的建立检查点的周期 作比较, 为不同应用进程确定不同的建立检查点的触发条件, 当某个应用进程 的响应输出间隔较短时,可以在预设的建立检查点的时间点到达的时刻为该应 用进程建立检查点; 当某个应用进行的响应输出时间间隔较长时, 该应用进程 建立检查点的周期也会相应的增长,这样系统可以合理的安排各个应用进程建 立检查点的周期,避免某个应用进程在当前建立检查点时, 该应用进程的运行 状态与上一检查点时刻的运行状态相同的情况出现,从而减少建立检查点带来 的数据开销, 造成系统资源浪费。
同时, 当不同应用进程建立检查点的触发条件不同, 不同应用进程建立检 查点的时间也不相同, 而只有在建立检查点时, 该应用进程才会被冻结, 因此 如果某个应用进程的响应时间较长且该应用进程的输出时间间隔较长时,则只 有在该应用进程有响应输出时, 才触发为该应用进程建立检查点的操作,从而 减少了该应用进程的中断时间。
需要说明的是,在为应用进程建立检查点时,需要将该应用进程进行冻结。 当完成检查点的建立, 生成相应的备份文件之后,还需要将该应用进程进行解 冻, 恢复应用进程的运行状态, 使该应用进程可以接收来自客户端的请求。
服务器系统中运行的应用进程可以分为两类, 一类为记忆型应用,记忆型 应用进程在运行过程中,应用程序的数据状态会随应用进程的数据请求输入和 程状态都会发生改变。 另一类是非记忆型应用, 这类应用进程在运行过程中, 应用进程的请求输入和响应输出不会对应用程序的数据状态带来影响。如, 对 于一些信息查询程序,接收到用户查询请求时,将用户的请求的信息提供给用 户,但是应用程序本身的数据状态等都不会发生什么改变。对于非记忆型的应 用进程,由于其应用状态和应用程序的相关数据不会随应用进程的输入和输出 而发生改变,可以不对该类应用进程建立检查点或者由手动设置检查点。因此, 在检查点建立之前还可以根据应用进程的运行状态,判断应用进程运行过程中 进程状态和数据状态是否发生改变, 如果是, 则执行建立检查点的操作。 也就 是说,对于在运行过程中进程状态和数据状态发生改变的应用进程, 才进行检 查点触发条件的设定的操作, 并为该应用进程建立检查点。 为便于理解,下面以一具体的应用场景对上述实施例中描述的建立检查点 的方法进行详细描述, 参见图 2, 为本发明中对不同应用进程建立检查点的示 意图。图中每个线段的两端分别表示该应用进程接收到请求和响应输出所对应 程接收到请求的频率 (即, 响应输出间隔), 假设不同时间段内同一应用进程 的响应输出间隔不变; 图中的虚线表示系统预设的建立检查点的时间点, 两条 虚线之间的间隔为一个预设的建立检查点的周期。 其中, 应用进程 1、 应用进 程 2和应用进程 3均为记忆型应用; 应用进程 4为非记忆型应用。
由图 2可以看出, 由于应用进程 4为非记忆型应用, 可以不为该应用进程 建立检查点, 进而减少系统资源消耗。 对于应用进程 1、 应用进程 2和应用进 程 3而言, 三个应用进程中, 应用进程 1的响应输出间隔的最短, 应用进程 3 的响应输出间隔的最长,且应用进程 1和应用进程 2的响应输出间隔小于预设 的建立检查点的周期的时间长度,而应用进程 3的响应输出间隔远大于预设的 建立检查点的周期。 因此,将预设的建立检查点的时刻作为应用进程 1和应用 进程 2建立检查点的触发条件,并周期性的在预设的建立检查点的时刻为应用 进程 1和应用进程 2建立检查点(图中类似长方形的图形代表建立检查点的操 作); 将应用进程 3的响应输出作为应用进程 3建立检查点的触发条件, 并在 该应用进程 3产生响应输出时为应用进程 3建立检查点。 需要说明的是, 当服务器系统出现宕机或误操作等故障后,服务器系统恢 复正常时可以根据最近检查点处所生成的备份文件,将系统中各个应用程序恢 复至该最近检查点建立时刻的各个应用进程的运行状态。
为了清楚的描述服务器系统出现故障后,利用检查点处建立的备份文件恢 复应用进程的运行状态, 参见图 3 , 为服务器系统故障重启后, 利用检查点生 成的备份文件恢复系统中应用进程的运行状态的流程示意图, 包括:
步骤 301 : 读取待恢复的应用进程的列表, 依次读取待恢复的应用进程的 信息。
该待恢复的应用进程列表中按先后顺序列出了各个待恢复的应用进程,然 后依次读取各个应用进程的信息, 以便逐次恢复各个应用进程的运行状态。
步骤 302: 获取某应用进程所对应的最近检查点处所生成的备份文件。 步骤 303: 根据获取到的备份文件, 将该应用进程的运行状态恢复至该备 份文件建立时刻的运行状态。
当需要恢复某个应用进程的运行状态时,首先需要获取到该应用进程最近 的检查点处所生成的备份文件。获取到该备份文件之后, 可以根据备份文件中 记录的信息恢复应用进程的运行状态。
步骤 303: 启动该应用进程, 并继续读取待恢复的应用进程的列表, 当仍 有待恢复的应用进程时, 继续执行步骤 302, 直至将列表中所有应用进程的运 行状态均恢复到其对应的最近的检查点建立时刻的运行状态。
当该应用进程恢复后, 该应用进程可以继续执行进程操作。 系统继续恢复 后续的应用进程, 直至所有的应用进程的运行状态均恢复为止。
由于服务器系统出现故障后,对于某个应用进程而言, 最近的检查点处所 记录的进程运行状态与在系统出现故障的时刻该应用进程的运行状态间仍有 差距, 该应用进程运行的数据信息仍有部分丟失。 为了进一步减少由于系统故 障而导致的应用进程的数据丟失,在监测各个应用进程的运行状态的同时, 可 以对各个应用进程进行日志记录。该日志记录主要反映该应用进程的数据状态 做了哪些修改,如该日志记录中可以记录该应用进程输入的数据请求、数据响 应以及该应用进程依据接收到的请求对文件列表中的数据做了哪些修改等。
参见图 4, 示出了本发明另一实施例所提供的一种建立检查点的方法的流 程示意图, 本实施例包括: 步骤 401 :监测各个应用进程的运行状态,对各个应用进程进行日志记录。 监测应用进程, 当某个应用进程接收到数据请求之后,对该应用进程的数 据请求状态, 数据修改状态、 响应的数据输出等进行记录生成日志记录。
步骤 402: 判断当前是否需要进行检查点触发条件的设定, 如果是, 则执 行设定检查点触发条件的步骤, 进入步骤 403 , 如果否, 则执行监测应用进程 的步骤, 进入步骤 406。
步骤 403: 依据日志记录确定各个应用进程的响应输出间隔。
其中, 响应输出间隔是指应用进程相邻两次输出之间的时间间隔。此处的 响应输出间隔与上一实施例中含义相同,在此不再赘述。 不同之处在于获取到 各个应用进程的响应输出间隔还可以通过各个应用进程的日志记录的信息获 取到。
步骤 404: 当应用进程的响应输出间隔大于预设的建立检查点的周期时, 以该应用进程的响应输出作为该应用进程建立检查点的触发条件。
步骤 405: 当应用进程的响应输出间隔小于预设的建立检查点的周期时, 以到达预设的建立检查点的周期为该应用进程建立检查点的触发条件。
步骤 406:对于一个应用进程,判断当前是否满足建立检查点的触发条件, 如果是, 则为该应用进程建立检查点。
步骤 404、 步骤 405和步骤 406的操作过程分别与上一实施例中的步骤 102、 步骤 103和步骤 104的过程相同, 在此不再赘述。
需要说明的是,对各个应用进程进行日志记录的过程可以是将应用进程过 程中的所涉及的数据状态的信息均记录下来,为了节省系统内存空间也可以每 隔一定时间更新一次应用进程的日志记录。可以在对某个应用进程完成建立检 查点的操作时,删除该应用进程上次检查点到当前检查点之间的日志记录, 更 新该应用进程的日志记录。还可以是应用进程每产生一次响应输出就更新一次 该应用进程的日志记录。 当然还有其他进行应用进程的日志记录的方式, 只要 能保证系统发生故障时,从最近的检查点到故障发生时刻之间的时间段内,应 用进程的数据请求状态, 数据修改状态等日志记录被记录下来即可。
由于在本实施例中增加了对应用进程进行日志记录的操作,在服务器系统 出现故障系统重启之后,可以利用应用进程对应的最近的检查点处所生成的备 份文件, 以及应用进程对应的日志记录,将应用进程会恢复至故障发生时刻的 运行状态。 具体的: 可以利用最近的检查点处所生成的备份文件, 将应用进程 恢复至该备份文件建立时刻的进程状态, 并利用该应用进程对应的日志记录, 将该应用进程恢复到故障发生时刻的进程运行状态。也就是说,在利用备份文 件将应用状态恢复到检查点建立时刻的应用状态之后,还可以进一步利用该应 用进程对应的日志记录中记录的数据状态信息,将该应用进程恢复至系统故障 发生时刻的运行状态。
为便于理解,下面以一具体的应用场景对上述实施例中描述的建立检查点 的方法以及服务器系统出现故障后,应用进程恢复运行状态的过程进行详细描 述, 参见图 5 , 图中每个线段含义与图 2中的含义相同, 图中时间段 1、 时间 段 2和时间段 4相邻的虚线表示建立检查点的时刻, 时间段 1、 时间段 2、 和 时间段 3的时间长度均为一个预设的时间检查点的周期,其余虚线表示系统出 现故障的时间点, 以及故障恢复后的时间点。 在图 5中, 应用进程 1、 应用进 程 2和应用进程 3均为记忆型应用; 应用进程 4为非记忆型应用。
由图中可以看出应用进程 1 的响应输出间隔小于预设的建立检查点的周 期, 当到达预设的建立检查点的时刻时, 为应用进程 1建立检查点生成相应的 备份文件。 同时在应用进程 1运行过程中, 为该应用进程 1进行了日志记录。 如图, 在应用进程 1在执行过程中, 对应用进程 1进行日志记录, 图中的标号 为 5的圓圏代表应用进程 1运行过程中的日志记录,当为该应用进程 1建立检 查点之后,删除该检查点之前生成的日志记录,更新该应用进程 1的日志记录, 并继续记录该应用进程 1的日志,图中标号为 6的圓圏代表更新后的应用进程 1的日志记录。
当服务器系统出现宕机, 该应用进程 1的运行中断后, 为了恢复应用进程 1的运行状态, 就可以利用标号为 6的备份文件将应用进程 1的运行状态恢复 至该标号为 6的备份文件生成时刻的运行状态。并利用之生成的标号为 6的日 志记录, 将该应用进程的数据状态恢复至最接近宕机时刻的数据状态。
应用进程 2与应用进程 1建立检查点的过程类似, 在此不再赘述。
应用进程 3的响应输出间隔大于预设的建立检查点的周期, 当应用进程 3 有响应输出时, 为该应用进程 3建立检查点 , 图中标有标号 5且类似长方形 的代表为该应用进程 3在检查点处建立的备份文件,且在应用进程 3运行过程 中, 对应用进程 3进行了日志记录, 如图中的标号为 4的圓圏, 代表为该应用 进程 3生成的日志记录。
当服务器系统出现宕机,为了恢复应用进程 3的运行状态可以将获取到该 应用进程 3的备份文件,由于在应用进程 3建立检查点到出现宕机的时间段内, 应用进程的运行状态没有发生改变, 没有相应的日志记录,仅利用该应用进程 3在检查点处生成的备份文件, 就可以恢复应用进程 3的运行状态。
由于应用进程 4为非记忆型应用,可以不对该应用进程建立检查点或者是 进行日志记录, 当然具体情况可以根据需要进行设定, 在图 5中, 对该应用进 程 4进行了日志记录。 参见图 6, 为本发明一种建立检查点的系统的一个实施例的结构示意图, 本实施例的系统包括:
判断单元 610, 用于判断当前是否需要进行检查点触发条件的设定, 如果 是, 则执行触发设定单元的操作, 如果否, 则执行检查点建立单元的操作。
触发设定单元 620, 用于为各个应用进程设定检查点触发条件。
检查点建立单元 630, 用于当某个应用进程满足触发条件时, 为该应用进 程建立检查点。
具体的, 该触发设定单元 620, 包括: 状态监测单元 621、 响应间隔确定 单元 622、 第一触发单元 623和第二触发单元 624。
其中, 状态监测单元 621 , 用于监测各个应用进程的运行状态。
响应间隔确定单元 622, 用于依据应用进程的运行状态, 确定各个应用进 程的响应输出间隔, 其中, 响应输出间隔是指应用进程相邻两次响应输出之间 的时间间隔;
第一触发单元 623 , 用于当应用进程的响应输出间隔大于预设的建立检查 点的周期时, 以该应用进程的响应输出作为该应用进程建立检查点的触发条 件;
第二触发单元 624, 用于当应用进程的响应输出间隔小于预设的建立检查 点的周期时,以到达预设的建立检查点的时刻为该应用进程建立检查点的触发 条件。
需要说明的是,具体何时进行检查点触发条件的设定可以根据服务器系统 的性能、服务器系统的应用场合、服务器系统运行的应用程序的数量等因素来 确定。 可以在系统初始化时, 进行检查点触发条件的设定, 对应的, 该判断单 元 610可以包括:
第一判断单元, 用于判断应用程序是否进行初始化, 如果是, 则执行触发 设定单元的操作, 如果否, 则执行检查点建立单元的操作。
当然,也可以由系统设定一个周期, 然后周期性的进行检查点触发条件的 设定, 对应的, 判断单元 610可以包括: 第二判断单元, 用于判断当前时刻是 否为预设的设定检查点触发条件的时刻,如果是,则执行触发设定单元的操作; 如果否, 则执行检查点建立单元的操作。
当然也可以实时监测应用进程的运行状态,从而确定各个应用进程的响应 输出间隔,将各个应用进程的响应输出间隔与预设的建立检查点的周期进行比 较, 进而为不同的应用进程设定不同的触发条件。
在为某应用进程建立检查点时, 需要将该应用进程进行冻结, 该检查点建 立单元, 具体用于将待建立检查点的应用进程进行冻结, 并记录所述待建立检 查点的应用进程的执行状态,生成与所述待建立检查点的应用进程执行状态相 对应的备份文件。 当然, 在检查点建立单元为该应用进程建立了检查点之后, 还需要为该应用进程进行解冻, 以恢复该应用进程接收请求的能力, 恢复该应 用进程的运行状态。
本实施例中状态监测单元可以实时监测各个应用程序的运行状态,将各个 应用进程的运行状态的信息发送到响应间隔确定单元, 以便响应确定单元 622 可以确定该各个应用进程的响应输出间隔,进而由第一触发单元和第二触发单 元为各个应用进程确定建立检查点的触发条件,并触发检查点建立单元的操作 为不同应用进程建立检查点。
由于应用进程可以划分为记忆型应用和非记忆型应用,由于非记忆型的应 用进程在接收到请求输入到响应输出的过程中,不会对该应用进程涉及的文件 以及数据状态带来影响, 可以不对该类非记忆型应用的应用进程建立检查点。 为了进一步的节省系统开销, 本实施例的系统还可以包括: 应用类型判断单元,用于判断应用进程运行过程中进程状态和数据状态是 否发生改变。
检查点建立触发单元,用于当所述应用类型判断单元判断出应用进程在运 行过程中进程状态和数据状态发生改变时, 执行所述检查点建立单元的操作。
该应用判断单元也可以依据监测单元获取到的各个应用进行的运行状态 信息, 来判断某个应用进程的进程状态和数据状态是否发生改变。如果某个应 用进程在运行过程中, 不会对该进程状态和数据状态产生影响, 则无需为该应 用进程建立检查点, 当然也就可以不进行响应间隔确定单元、第一触发单元和 第二触发单元的操作。
检查点管理系统可以完成检查点的建立、删除检查点, 以及利用检查点生 成的备份文件恢复应用进程的运行状态, 当服务器系统出现故障后, 需要利用 应用进程最近检查点处生成的备份文件, 来恢复该应用进程的运行状态,对应 的检查点管理系统还包括:
进程恢复单元,用于当系统出现故障后,根据检查点处所生成的备份文件, 恢复应用进程的运行状态。
进一步的, 为了使服务器系统出现故障后, 能最大程度的保证应用进程的 所涉及的数据状态的完整性, 该检查点管理系统还包括: 日志记录单元, 用于 在各个应用进程的运行过程中, 对所述各个应用进程进行日志记录。
该日志记录主要反映应用进程的数据状态做了哪些修改,如该日志记录中 可以记录某应用进程输入的数据请求、数据响应以及该应用进程依据接收到的 请求对文件列表中的数据做了哪些修改等。 当为应用进程进行日志记录时, 响 应间隔确定单元还可以依据日志记录确定应用进程的响应输出间隔。
与日志记录单元相对应, 进程恢复单元, 包括: 进程恢复子单元, 用于利 用最近检查点处所生成的备份文件,将应用进程恢复至该备份文件生成时刻的 进程运行状态,并利用该日志记录将该应用进程恢复到故障发生时刻的运行状 态。
为了减少占用的内存空间,对于一个应用进程而言,如果在某时刻为该应 用进程建立了检查点, 那么该时刻之前的日志记录可以被删除, 并重现记录该 应用进程在该检查点之后的日志, 与此对应, 日志记录单元, 包括: 日志记录 更新单元,用于在检查点建立单元建立检查点之后,更新应用程序的日志记录。 本说明书中各个实施例采用递进的方式描述,每个实施例重点说明的都是 与其他实施例的不同之处,各个实施例之间相同相似部分互相参见即可。对于 实施例公开的装置而言, 由于其与实施例公开的方法相对应, 所以描述的比较 筒单, 相关之处参见方法部分说明即可。
专业人员还可以进一步意识到,结合本文中所公开的实施例描述的各示例 的单元及算法步骤, 能够以电子硬件、 计算机软件或者二者的结合来实现, 为 了清楚地说明硬件和软件的可互换性,在上述说明中已经按照功能一般性地描 述了各示例的组成及步骤。这些功能究竟以硬件还是软件方式来执行,取决于 技术方案的特定应用和设计约束条件。专业技术人员可以对每个特定的应用来 使用不同方法来实现所描述的功能, 但是这种实现不应认为超出本发明的范 围。
结合本文中所公开的实施例描述的方法或算法的步骤可以直接用硬件、处 理器执行的软件模块, 或者二者的结合来实施。软件模块可以置于随机存储器 ( RAM )、内存、只读存储器( ROM )、电可编程 ROM、电可擦除可编程 ROM、 寄存器、 硬盘、 可移动磁盘、 CD-ROM, 或技术领域内所公知的任意其它形式 的存储介质中。
对所公开的实施例的上述说明,使本领域专业技术人员能够实现或使用本 发明。 对这些实施例的多种修改对本领域的专业技术人员来说将是显而易见 的, 本文中所定义的一般原理可以在不脱离本发明的精神或范围的情况下, 在 其它实施例中实现。 因此, 本发明将不会被限制于本文所示的这些实施例, 而 是要符合与本文所公开的原理和新颖特点相一致的最宽的范围。

Claims

权 利 要 求
1、 一种建立检查点的方法, 其特征在于, 包括:
判断当前是否需要进行检查点触发条件的设定,如果是, 则执行以下设定 检查点触发条件的步骤, 如果否, 则执行以下监测应用进程的步骤;
设定检查点触发条件的步骤包括:
监测各个应用进程的运行状态,确定各个应用进程的响应输出间隔,其中, 响应输出间隔是指应用进程相邻两次响应输出之间的时间间隔;
当应用进程的响应输出间隔大于预设的建立检查点的周期时,以该应用进 程的响应输出作为该应用进程建立检查点的触发条件;
当应用进程的响应输出间隔小于预设的建立检查点的周期时,以到达预设 的建立检查点的时刻为该应用进程建立检查点的触发条件;
监测应用进程的步骤包括:
对于一个应用进程, 判断当前是否满足建立检查点的触发条件, 如果是, 则为该应用进程建立检查点。
2、 根据权利要求 1所述的方法, 其特征在于, 所述判断当前是否需要进 行检查点触发条件的设定, 包括:
判断应用程序是否进行初始化。
3、 根据权利要求 1所述的方法, 其特征在于, 所述判断当前是否需要进 行检查点触发条件的设定, 包括:
判断当前时刻是否为预设的设定检查点触发条件的时刻。
4、 根据权利要求 1至 3任一项所述的方法, 其特征在于, 在为应用进程 建立检查点之前, 还包括:
根据应用进程的运行状态,判断应用进程运行过程中进程状态和数据状态 是否发生改变, 如果是, 则执行所述建立检查点的操作。
5、 根据权利要求 1所述的方法, 其特征在于, 所述建立检查点包括: 将待建立检查点的应用进程进行冻结,并记录所述待建立检查点的应用进 程的运行状态,生成与所述待建立检查点的应用进程的运行状态相对应的备份 文件。
6、根据权利要求 1所述的方法, 其特征在于, 在建立检查点之后还包括: 当系统出现故障后,根据最近检查点处所生成的备份文件, 恢复应用进程 的运行 ^犬态。
7、 根据权利要求 6所述的方法, 其特征在于, 在监测各个应用进程的运 行状态的同时, 对所述各个应用进程进行日志记录;
所述根据最近检查点处所生成的备份文件, 恢复应用进程的运行状态, 包 括:
利用所述备份文件,将所述应用进程恢复至所述备份文件建立时刻的进程 运行状态,并利用所述日志记录将所述应用进程恢复到故障发生时刻的运行状 态。
8、 根据权利要求 7所述的方法, 其特征在于, 在建立检查点之后, 还包 括: 更新应用进程的日志记录。
9、 一种建立检查点的系统, 其特征在于, 包括:
判断单元, 用于判断当前是否需要进行检查点触发条件的设定, 如果是, 则执行触发设定单元的操作, 如果否, 则执行检查点建立单元的操作;
触发设定单元, 用于为各个应用进程设定检查点触发条件;
检查点建立单元, 用于当某个应用进程满足触发条件时, 为该应用进程建 立检查点。
10、 根据权利要求 9所述的系统, 其特征在于, 所述触发设定单元包括: 状态监测单元, 用于监测各个应用进程的运行状态
响应间隔确定单元, 用于依据应用进程的运行状态,确定各个应用进程的 响应输出间隔, 其中, 响应输出间隔是指应用进程相邻两次响应输出之间的时 间间隔;
第一触发单元,用于当应用进程的响应输出间隔大于预设的建立检查点的 周期时, 以该应用进程的响应输出作为该应用进程建立检查点的触发条件; 第二触发单元,用于当应用进程的响应输出间隔小于预设的建立检查点的 周期时, 以到达预设的建立检查点的时刻为该应用进程建立检查点的触发条 件。
11、 根据权利要求 9或 10所述的系统, 其特征在于, 所述判断单元包括: 第一判断单元, 用于判断应用程序是否进行初始化, 如果是, 则执行触发 设定单元的操作, 如果否, 则执行以下监测应用进程的操作。
12、 根据权利要求 9或 10所述的系统, 其特征在于, 所述判断单元包括: 第二判断单元,用于判断当前时刻是否为预设的设定检查点触发条件的时 刻, 如果是, 则执行触发设定单元的操作; 如果否, 则执行检查点建立单元的 操作。
13、 根据权利要求 9或 10所述的系统, 其特征在于, 还包括:
应用类型判断单元,用于判断应用进程运行过程中进程状态和数据状态是 否发生改变;
检查点建立触发单元,用于当所述应用类型判断单元判断出应用进程在运 行过程中进程状态和数据状态发生改变时, 执行所述检查点建立单元的操作。
14、 根据权利要求 9或 10所述的系统, 其特征在于, 所述检查点建立单 元, 具体包括, 用于将待建立检查点的应用进程进行冻结, 并记录所述待建立 检查点的应用进程的运行状态,生成与所述待建立检查点的应用进程的运行状 态相对应的备份文件。
15、 根据权利要求 9或 10所述的系统, 其特征在于, 还包括:
进程恢复单元,用于当系统出现故障后,根据检查点处所生成的备份文件, 恢复应用进程的运行状态。
16、 根据权利要求 15所述的系统, 其特征在于, 还包括:
曰志记录单元, 用于在各个应用进程的运行过程中,对所述各个应用进程 进行日志记录;
所述进程恢复单元, 包括:
进程恢复子单元, 用于利用所述备份文件,将应用进程恢复至所述备份文 件生成时刻的进程运行状态,并利用所述日志记录将所述应用进程恢复到故障 发生时刻的运行状态。
17、 根据权利要求 16所述的系统, 其特征在于, 所述日志记录单元, 还 包括: 日志记录更新单元, 用于在检查点建立单元建立检查点之后, 更新应用 程序的日志记录。
PCT/CN2011/079180 2011-08-31 2011-08-31 一种建立检查点的方法和系统 WO2012149719A1 (zh)

Priority Applications (2)

Application Number Priority Date Filing Date Title
PCT/CN2011/079180 WO2012149719A1 (zh) 2011-08-31 2011-08-31 一种建立检查点的方法和系统
CN201180001571.1A CN102369514B (zh) 2011-08-31 2011-08-31 一种建立检查点的方法和系统

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
PCT/CN2011/079180 WO2012149719A1 (zh) 2011-08-31 2011-08-31 一种建立检查点的方法和系统

Publications (1)

Publication Number Publication Date
WO2012149719A1 true WO2012149719A1 (zh) 2012-11-08

Family

ID=45761448

Family Applications (1)

Application Number Title Priority Date Filing Date
PCT/CN2011/079180 WO2012149719A1 (zh) 2011-08-31 2011-08-31 一种建立检查点的方法和系统

Country Status (2)

Country Link
CN (1) CN102369514B (zh)
WO (1) WO2012149719A1 (zh)

Families Citing this family (9)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US9202047B2 (en) 2012-05-14 2015-12-01 Qualcomm Incorporated System, apparatus, and method for adaptive observation of mobile device behavior
US10089582B2 (en) 2013-01-02 2018-10-02 Qualcomm Incorporated Using normalized confidence values for classifying mobile device behaviors
CN103197982B (zh) * 2013-03-28 2016-03-09 哈尔滨工程大学 一种任务局部最优检查点间隔搜索方法
CN103259845B (zh) * 2013-04-12 2016-03-30 赵利林 基于网络中断的数据备份任务的改进方法
CN106708656B (zh) * 2015-07-30 2020-05-22 北京国双科技有限公司 用户操作的恢复方法和装置
CN106656557A (zh) * 2016-10-31 2017-05-10 网易(杭州)网络有限公司 业务状态处理方法和装置
CN111124720B (zh) * 2019-12-26 2021-05-04 江南大学 一种自适应的检查点间隔动态设置方法
CN113515430A (zh) * 2021-09-14 2021-10-19 国汽智控(北京)科技有限公司 进程的状态监控方法、装置和设备
CN114564361B (zh) * 2022-03-03 2024-05-07 合众新能源汽车股份有限公司 用于智能驾驶平台的应用管理方法及系统

Citations (4)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN101145946A (zh) * 2007-09-17 2008-03-19 中兴通讯股份有限公司 一种基于消息日志的容错集群系统和方法
CN101216792A (zh) * 2008-01-14 2008-07-09 中兴通讯股份有限公司 实时操作系统的任务管理方法、装置及实时操作系统
US20090076990A1 (en) * 2007-09-18 2009-03-19 Mickey Iqbal Method and system for automatically controlling in-process software distributions
US20100088494A1 (en) * 2008-10-02 2010-04-08 International Business Machines Corporation Total cost based checkpoint selection

Patent Citations (4)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN101145946A (zh) * 2007-09-17 2008-03-19 中兴通讯股份有限公司 一种基于消息日志的容错集群系统和方法
US20090076990A1 (en) * 2007-09-18 2009-03-19 Mickey Iqbal Method and system for automatically controlling in-process software distributions
CN101216792A (zh) * 2008-01-14 2008-07-09 中兴通讯股份有限公司 实时操作系统的任务管理方法、装置及实时操作系统
US20100088494A1 (en) * 2008-10-02 2010-04-08 International Business Machines Corporation Total cost based checkpoint selection

Also Published As

Publication number Publication date
CN102369514B (zh) 2013-09-11
CN102369514A (zh) 2012-03-07

Similar Documents

Publication Publication Date Title
WO2012149719A1 (zh) 一种建立检查点的方法和系统
CN110071821B (zh) 确定事务日志的状态的方法,节点和存储介质
Zhang et al. A hybrid approach to high availability in stream processing systems
EP2820531B1 (en) Interval-controlled replication
JP4054616B2 (ja) 論理計算機システム、論理計算機システムの構成制御方法および論理計算機システムの構成制御プログラム
JP4345334B2 (ja) 耐障害計算機システム、プログラム並列実行方法およびプログラム
US9348706B2 (en) Maintaining a cluster of virtual machines
US11892922B2 (en) State management methods, methods for switching between master application server and backup application server, and electronic devices
US9098439B2 (en) Providing a fault tolerant system in a loosely-coupled cluster environment using application checkpoints and logs
CN102394914A (zh) 集群脑裂处理方法和装置
JP2011186783A (ja) スナップショット管理方法、スナップショット管理装置、及びプログラム
WO2019020081A1 (zh) 分布式系统及其故障恢复方法、装置、产品和存储介质
CN114138732A (zh) 一种数据处理方法及装置
CN110351313B (zh) 数据缓存方法、装置、设备及存储介质
CN111752488B (zh) 存储集群的管理方法、装置、管理节点及存储介质
CN111342986B (zh) 分布式节点管理方法及装置、分布式系统、存储介质
JP6124644B2 (ja) 情報処理装置および情報処理システム
CN111177104B (zh) 一种nas存储系统的日志下刷方法及装置
CN115314361B (zh) 一种服务器集群管理方法及其相关组件
CN111756826A (zh) 一种dlm的锁信息传输方法以及相关装置
JP6269199B2 (ja) 管理サーバおよび障害復旧方法、並びにコンピュータ・プログラム
JP2004258936A (ja) モバイル通信端末及びそれに用いるフェイルセーフ方法並びにそのプログラム
CN110266790B (zh) 边缘集群管理方法、装置、边缘集群及可读存储介质
CN108599982B (zh) 数据恢复方法及相关设备
CN114501684A (zh) 用于自动恢复连接的方法、装置、扩展器和计算机介质

Legal Events

Date Code Title Description
WWE Wipo information: entry into national phase

Ref document number: 201180001571.1

Country of ref document: CN

121 Ep: the epo has been informed by wipo that ep was designated in this application

Ref document number: 11864738

Country of ref document: EP

Kind code of ref document: A1

NENP Non-entry into the national phase

Ref country code: DE

122 Ep: pct application non-entry in european phase

Ref document number: 11864738

Country of ref document: EP

Kind code of ref document: A1