CN110618864A - Interrupt task recovery method and device - Google Patents

Interrupt task recovery method and device Download PDF

Info

Publication number
CN110618864A
CN110618864A CN201910888051.5A CN201910888051A CN110618864A CN 110618864 A CN110618864 A CN 110618864A CN 201910888051 A CN201910888051 A CN 201910888051A CN 110618864 A CN110618864 A CN 110618864A
Authority
CN
China
Prior art keywords
target task
target
task
state
process corresponding
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Pending
Application number
CN201910888051.5A
Other languages
Chinese (zh)
Inventor
堵新政
张毅然
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Beijing Mininglamp Software System Co ltd
Original Assignee
Beijing Mininglamp Software System Co ltd
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Beijing Mininglamp Software System Co ltd filed Critical Beijing Mininglamp Software System Co ltd
Priority to CN201910888051.5A priority Critical patent/CN110618864A/en
Publication of CN110618864A publication Critical patent/CN110618864A/en
Pending legal-status Critical Current

Links

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F9/00Arrangements for program control, e.g. control units
    • G06F9/06Arrangements for program control, e.g. control units using stored programs, i.e. using an internal store of processing equipment to receive or retain programs
    • G06F9/46Multiprogramming arrangements
    • G06F9/48Program initiating; Program switching, e.g. by interrupt
    • G06F9/4806Task transfer initiation or dispatching
    • G06F9/4843Task transfer initiation or dispatching by program, e.g. task dispatcher, supervisor, operating system
    • G06F9/485Task life-cycle, e.g. stopping, restarting, resuming execution
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F9/00Arrangements for program control, e.g. control units
    • G06F9/06Arrangements for program control, e.g. control units using stored programs, i.e. using an internal store of processing equipment to receive or retain programs
    • G06F9/46Multiprogramming arrangements
    • G06F9/48Program initiating; Program switching, e.g. by interrupt
    • G06F9/4806Task transfer initiation or dispatching
    • G06F9/4843Task transfer initiation or dispatching by program, e.g. task dispatcher, supervisor, operating system
    • G06F9/4881Scheduling strategies for dispatcher, e.g. round robin, multi-level priority queues

Landscapes

  • Engineering & Computer Science (AREA)
  • Software Systems (AREA)
  • Theoretical Computer Science (AREA)
  • Physics & Mathematics (AREA)
  • General Engineering & Computer Science (AREA)
  • General Physics & Mathematics (AREA)
  • Debugging And Monitoring (AREA)

Abstract

The invention provides an interrupt task recovery method and device, wherein the method comprises the following steps: monitoring the progress and the state of a target task in a first preset time period; determining whether the target task is abnormally interrupted or not according to the process and the state of the target task; restarting the target task at a second preset time period under the condition that the target task is abnormally interrupted; and under the condition that the target task is restarted within the preset restarting times successfully, creating a second target process corresponding to the target task, and maintaining the mapping relation between the target task and the second target process, so that the problems that the abnormal interruption of the task cannot be timely recovered in the related technology, the computing resources are idle, and the business data processing flow cannot be carried out can be solved, the interrupted task can be timely recovered, the idle computing resources are avoided, and the business data processing flow can be normally carried out.

Description

Interrupt task recovery method and device
Technical Field
The invention relates to the field of computers, in particular to an interrupt task recovery method and device.
Background
With the development of computer network technology and the increase of data processing capacity, distributed data processing systems are widely used.
The distributed data processing system comprises a plurality of task nodes, and the plurality of task nodes can process data simultaneously, so that the data processing efficiency of the system is greatly improved.
In the big data management process, calculation tasks with different magnitudes are generated in each stage, such as an acquisition task, a conversion task, a cleaning task, a fusion task and the like. The tasks use computing resources (CPU, memory and the like) of the cluster machine, work division and cooperation are carried out, and the whole business data is processed. The tasks have independent tasks and dependent tasks, and if the tasks are interrupted due to abnormal reasons, the tasks can be idle and wasted if the tasks cannot be recovered in time, and even normal operation of a business process is influenced. Especially for dependent tasks, all the post tasks cannot be performed after the pre-task is interrupted. And the manual recovery is relied on, so that the time and the labor are consumed, and the task recovery is difficult to ensure in time.
Aiming at the problems that the task abnormal interruption can not be recovered in time in the related technology, so that the computing resources are idle and the business data processing flow can not be carried out, no solution is provided.
Disclosure of Invention
The embodiment of the invention provides an interrupted task recovery method and device, which are used for at least solving the problems that in the related technology, task abnormal interruption cannot be timely recovered, so that computing resources are idle, and a business data processing flow cannot be carried out.
According to an embodiment of the present invention, there is provided an interrupt task recovery method including:
monitoring the progress and the state of a target task in a first preset time period;
determining whether the target task is abnormally interrupted or not according to the process and the state of the target task;
restarting the target task at a second preset time period under the condition that the target task is abnormally interrupted;
and under the condition that the target task is restarted successfully within the preset restarting times, creating a second target process corresponding to the target task, and maintaining the mapping relation between the target task and the second target process.
Optionally, determining whether the target task is interrupted abnormally according to the process and the state information of the target task includes:
detecting whether a first target process corresponding to the target task exists or not;
determining that the target task is abnormally interrupted when the detection result indicates that the first target process corresponding to the target task exists;
determining that the target task is abnormally interrupted when the detection result indicates that the first target process corresponding to the target task exists and the state of the target task is a failure state;
and under the condition that the first target process corresponding to the target task exists and the state of the target task is a running state, determining that the target task is not abnormally interrupted.
Optionally, before restarting the target task at a second predetermined time period, the method further comprises:
and under the condition that the first target process corresponding to the target task exists, killing the first target process.
Optionally, after restarting the target task at a second predetermined time period, the method further comprises:
and if the state of the target task is a failure state, converting the state of the target task from the failure state to the running state.
Optionally, before monitoring the progress and the status of the target task for the first predetermined time period, the method further comprises:
receiving a setting instruction for setting the second preset time period and the preset restarting times;
and setting the second preset time period and the preset restarting times according to the setting instruction.
Optionally, before monitoring the progress and status of the task for the first predetermined time period, the method further comprises:
detecting the target task starting;
creating the first target process corresponding to the target task, and converting the state of the target task into the running state;
and maintaining the mapping relation between the target task and the first target process.
According to another embodiment of the present invention, there is also provided an interrupted task resuming apparatus including:
the monitoring module is used for monitoring the progress and the state of the target task in a first preset time period;
the determining module is used for determining whether the target task is abnormally interrupted according to the process and the state of the target task;
the restarting module is used for restarting the target task in a second preset time period under the condition that the target task is abnormally interrupted;
and the first creation module is used for creating a second target process corresponding to the target task and maintaining the mapping relation between the target task and the second target process under the condition that the target task is restarted successfully within the preset restart times.
Optionally, the determining module includes:
the detection submodule is used for detecting whether a first target process corresponding to the target task exists or not;
the first determining submodule is used for determining that the target task is abnormally interrupted under the condition that the detection result is that the first target process corresponding to the target task exists;
the second determining submodule is used for determining that the target task is abnormally interrupted when the detection result is that the first target process corresponding to the target task exists and the state of the target task is a failure state;
and a third determining submodule, configured to determine that no abnormal interrupt occurs in the target task when the first target process corresponding to the target task exists and the state of the target task is a running state.
Optionally, the apparatus further comprises:
and the killing module is used for killing the first target process under the condition that the first target process corresponding to the target task exists.
Optionally, the apparatus further comprises:
and the conversion module is used for converting the state of the target task from the failure state to the running state if the state of the target task is the failure state.
Optionally, the apparatus further comprises:
the receiving module is used for receiving a setting instruction for setting the second preset time period and the preset restarting times;
and the setting module is used for setting the second preset time period and the preset restarting times according to the setting instruction.
Optionally, the apparatus further comprises:
the detection module is used for detecting the starting of the target task;
the second creating module is used for creating the first target process corresponding to the target task and converting the state of the target task into the running state;
and the maintenance module is used for maintaining the mapping relation between the target task and the first target process.
According to a further embodiment of the present invention, a computer-readable storage medium is also provided, in which a computer program is stored, wherein the computer program is configured to perform the steps of any of the above-described method embodiments when executed.
According to yet another embodiment of the present invention, there is also provided an electronic device, including a memory in which a computer program is stored and a processor configured to execute the computer program to perform the steps in any of the above method embodiments.
According to the invention, the progress and the state of the target task are monitored in a first preset time period; determining whether the target task is abnormally interrupted or not according to the process and the state of the target task; restarting the target task at a second preset time period under the condition that the target task is abnormally interrupted; and under the condition that the target task is restarted within the preset restarting times successfully, creating a second target process corresponding to the target task, and maintaining the mapping relation between the target task and the second target process, so that the problems that the abnormal interruption of the task cannot be timely recovered in the related technology, the computing resources are idle, and the business data processing flow cannot be carried out can be solved, the interrupted task can be timely recovered, the idle computing resources are avoided, and the business data processing flow can be normally carried out.
Drawings
The accompanying drawings, which are included to provide a further understanding of the invention and are incorporated in and constitute a part of this application, illustrate embodiment(s) of the invention and together with the description serve to explain the invention without limiting the invention. In the drawings:
fig. 1 is a block diagram of a hardware structure of a mobile terminal according to an embodiment of the present invention;
FIG. 2 is a flow chart of an interrupt task recovery method according to an embodiment of the present invention;
FIG. 3 is a flow diagram of a task abort automatic recovery mechanism according to an embodiment of the invention;
fig. 4 is a block diagram of an interrupted task resuming device according to an embodiment of the present invention.
Detailed Description
The invention will be described in detail hereinafter with reference to the accompanying drawings in conjunction with embodiments. It should be noted that the embodiments and features of the embodiments in the present application may be combined with each other without conflict.
It should be noted that the terms "first," "second," and the like in the description and claims of the present invention and in the drawings described above are used for distinguishing between similar elements and not necessarily for describing a particular sequential or chronological order.
Example 1
The method provided by the first embodiment of the present application may be executed in a mobile terminal, a computer terminal, or a similar computing device. Taking a mobile terminal as an example, fig. 1 is a block diagram of a hardware structure of the mobile terminal according to an interrupt task recovery method in an embodiment of the present invention, as shown in fig. 1, a mobile terminal 10 may include one or more processors 102 (only one is shown in fig. 1) (the processor 102 may include, but is not limited to, a processing device such as a microprocessor MCU or a programmable logic device FPGA), and a memory 104 for storing data, and optionally, the mobile terminal may further include a transmission device 106 for a communication function and an input/output device 108. It will be understood by those skilled in the art that the structure shown in fig. 1 is only an illustration, and does not limit the structure of the mobile terminal. For example, the mobile terminal 10 may also include more or fewer components than shown in FIG. 1, or have a different configuration than shown in FIG. 1.
The memory 104 may be used to store a computer program, for example, a software program of application software and a module, such as a computer program corresponding to the message receiving method in the embodiment of the present invention, and the processor 102 executes various functional applications and data processing by running the computer program stored in the memory 104, so as to implement the method described above. The memory 104 may include high speed random access memory, and may also include non-volatile memory, such as one or more magnetic storage devices, flash memory, or other non-volatile solid-state memory. In some instances, the memory 104 may further include memory located remotely from the processor 102, which may be connected to the mobile terminal 10 via a network. Examples of such networks include, but are not limited to, the internet, intranets, local area networks, mobile communication networks, and combinations thereof.
The transmission device 106 is used for receiving or transmitting data via a network. Specific examples of the network described above may include a wireless network provided by a communication provider of the mobile terminal 10. In one example, the transmission device 106 includes a Network adapter (NIC), which can be connected to other Network devices through a base station so as to communicate with the internet. In one example, the transmission device 106 may be a Radio Frequency (RF) module, which is used for communicating with the internet in a wireless manner.
Based on the above-mentioned mobile terminal, this embodiment provides a method for recovering an interrupted task, and fig. 2 is a flowchart of the method for recovering an interrupted task according to the embodiment of the present invention, as shown in fig. 2, the flowchart includes the following steps:
step S202, monitoring the progress and the state of a target task in a first preset time period;
step S204, determining whether the target task is abnormally interrupted according to the process and the state of the target task;
in the embodiment of the present invention, task abnormal interruption can be further divided into: the abnormal interruption caused by the interruption or the death of the progress of the task node and the abnormal interruption caused by the downtime of the task node. Or, the task that the network exception or other causes and fails, the task that service hangs or machine failure causes is interrupted.
In order to detect the abnormal interruption of the data processing task, the central server starts a detection thread on each task node for detecting the abnormal interruption of the task node. The detection thread detects the task node process every first preset time period (such as 1 minute).
Step S206, under the condition that the target task is abnormally interrupted, restarting the target task at a second preset time period;
step S208, under the condition that the target task is restarted successfully within the preset restarting times, a second target process corresponding to the target task is created, and the mapping relation between the target task and the second target process is maintained.
Monitoring the progress and the state of the target task in a first preset time period through the steps S202 to S208; determining whether the target task is abnormally interrupted or not according to the process and the state of the target task; restarting the target task at a second preset time period under the condition that the target task is abnormally interrupted; and under the condition that the target task is restarted within the preset restarting times successfully, creating a second target process corresponding to the target task, and maintaining the mapping relation between the target task and the second target process, so that the problems that the abnormal interruption of the task cannot be timely recovered in the related technology, the computing resources are idle, and the business data processing flow cannot be carried out can be solved, the interrupted task can be timely recovered, the idle computing resources are avoided, and the business data processing flow can be normally carried out.
In an embodiment of the present invention, the step S204 may specifically include:
detecting whether a first target process corresponding to the target task exists or not;
determining that the target task is abnormally interrupted when the detection result indicates that the first target process corresponding to the target task exists;
determining that the target task is abnormally interrupted when the detection result indicates that the first target process corresponding to the target task exists and the state of the target task is a failure state;
and under the condition that the first target process corresponding to the target task exists and the state of the target task is a running state, determining that the target task is not abnormally interrupted.
In the embodiment of the invention, before the target task is restarted at the second preset time period, the first target process is killed under the condition that the first target process corresponding to the target task exists. In particular, a process may be killed by a process handle.
In the embodiment of the present invention, after the target task is restarted at a second predetermined time period, if the state of the target task is a failure state, the state of the target task is converted from the failure state to the running state. The second predetermined time period may be preset, for example, set to 0.5 minutes.
In the embodiment of the invention, before monitoring the progress and the state of a target task in a first preset time period, a setting instruction for setting the second preset time period and the preset restart times is received; and setting the second preset time period and the preset restarting times according to the setting instruction.
In the embodiment of the invention, before monitoring the progress and the state of the task in a first preset time period, the starting of the target task is detected; creating the first target process corresponding to the target task, and converting the state of the target task into the running state; and maintaining the mapping relation between the target task and the first target process.
According to the task abnormal interrupt automatic recovery mechanism, after a task is started, a process is generated and the task state is changed into the running state; task abnormal interruption has two conditions, one is task interruption caused by network abnormality or other reasons, the state of the task is a failure state, and the task process may exist or may not exist; the other is that the task is interrupted due to service hang-up or machine failure, the task state is not changed and still is in a running state, but the process does not exist. Aiming at the above two situations, the automatic recovery of the task is realized, which comprises the following steps:
1) the user sets the maximum retry number N, namely the maximum number of times that the user can try to rerun the task aiming at the abnormal interrupt condition; retry period P, i.e., how often to rerun; when the maximum trial times N are reached, the task cannot be started, and manual intervention is needed;
2) after the task is started, maintaining the mapping relation between the progress of the task and the task, and simultaneously recording the running state of the task;
3) the task monitoring program monitors the progress and the state of the task in real time and judges whether an abnormally interrupted task exists or not;
4) aiming at a task which fails due to network abnormality or other reasons, a monitoring program firstly checks whether a process of the task exists or not, kills the process if the process exists, then rerun the task, and if the operation is failed, continuously trying according to a retry period P until the maximum trying number N is reached; if the restart is successful, updating the mapping relation between the process and the task, and changing the task state into an operating state;
5) the task is rerun aiming at the task interruption caused by service hang-up or machine failure, if the attempted operation fails, the attempt is continued according to a retry period P until the maximum attempt number N is reached; if the restart is successful, updating the mapping relation between the process and the task;
6) for the recovery operation of manual intervention, operation and maintenance personnel record a recovery method into a system and maintain the mapping relation between the diagnosis type and the recovery method, when a monitoring program reaches the maximum retry number and still cannot be started, the diagnosis type can be judged according to an abnormal log, and if the diagnosis type exists on the system, the corresponding recovery method is used for further trying to recover tasks; if the abnormal condition does not exist, the abnormal condition is informed to the operation and maintenance personnel for intervention through the mail for a new diagnosis type.
7) And the task monitoring program informs the operation and maintenance personnel of the abnormal and repaired results of each task in a mail mode.
The specific implementation of the scheme comprises the following steps:
1) supposing that a Task is Task-1, after the Task is started, generating a Process-1, changing the Task state into Running state Running, maintaining the mapping relation between the Task and the Process (Task-1, Process-1), and recording the Task state as Running;
2) setting the maximum retry number N to be 10 and the re-period P to be 1 minute (1 min);
3) suppose that the Task-1 fails to operate due to network failure, the Task state becomes a Failed state Failed, and the Process-1 exits at the same time, namely, the mapping relation (Task-1, Process-1) does not exist;
4) the monitoring program checks that a Task-1 fails and simultaneously checks whether the progress of the Task exists;
5) if the process does not exist, restarting the Task-1, and trying to restart again every 1min because the network is not recovered in time; after 5 times of restarting attempts, the Task-1 is started successfully, a new Process-2 is generated, the mapping relation (Task-1, Process-2) between the Task-1 and the new Process-2 is maintained, and meanwhile, the Failed state of the Task state is changed into the Running state.
The following examples illustrate the present invention in detail.
FIG. 3 is a flow chart of the task abort automatic recovery mechanism according to an embodiment of the present invention, as shown in FIG. 3, and the implementation flow of the embodiment is as follows:
step 301, the user sets the maximum number of attempts N and the attempt period P, and goes to step 302;
step 302, maintaining the mapping relation between the tasks and the processes, synchronizing the states of the tasks, and going to step 303;
303, monitoring the progress and the state of a task in real time by a monitoring program;
step S304, detecting whether an abnormal interrupt task exists, if not, going to step 303 to continue monitoring, and if so, going to step 305;
step 305, traversing the interrupt task to obtain the task state;
step 306, judging whether the state is Running or failure, if Running, turning to step 307, and if Failed, turning to step 308;
step 307, restarting the task;
step 308, detecting a task process;
step 309, the monitor program detects whether the process of the interrupt task exists, if so, the step 310 is carried out, otherwise, the step 307 is carried out;
step 310, killing the invalid old task process, and turning to step 307;
step 311, judging whether the restart result is successful, if so, going to step 302, re-maintaining the mapping relationship between the new process and the task, and synchronizing the task state; otherwise, continuing to step S312;
step S312, judging whether the maximum trial frequency N is reached, if not, turning to step 307, and if so, turning to step 313;
step 313, after the task is tried for the maximum number of times N, the task still cannot be started, and manual intervention is required to find out specific reasons.
By the automatic recovery mechanism for task abnormal interruption, the task can be recovered in time when the task is abnormally interrupted, so that the idle and waste of cluster machine resources are avoided, the normal operation of a business data processing flow is ensured, the workload and the working time of manual intervention are reduced, and the labor cost is greatly saved.
Through the above description of the embodiments, those skilled in the art can clearly understand that the method according to the above embodiments can be implemented by software plus a necessary general hardware platform, and certainly can also be implemented by hardware, but the former is a better implementation mode in many cases. Based on such understanding, the technical solutions of the present invention may be embodied in the form of a software product, which is stored in a storage medium (e.g., ROM/RAM, magnetic disk, optical disk) and includes instructions for enabling a terminal device (e.g., a mobile phone, a computer, a server, or a network device) to execute the method according to the embodiments of the present invention.
Example 2
The embodiment of the present invention further provides an interrupt task recovery apparatus, which is used to implement the foregoing embodiment and preferred embodiments, and the description of the apparatus is omitted here. As used below, the term "module" may be a combination of software and/or hardware that implements a predetermined function. Although the means described in the embodiments below are preferably implemented in software, an implementation in hardware, or a combination of software and hardware is also possible and contemplated.
Fig. 4 is a block diagram of an interrupted task resuming device according to an embodiment of the present invention, as shown in fig. 4, including:
a monitoring module 42 for monitoring the progress and status of the target task at a first predetermined time period;
a determining module 44, configured to determine whether an abort occurs to the target task according to the process and the state of the target task;
a restart module 46, configured to restart the target task at a second predetermined time period when the target task is abnormally interrupted;
and the first creating module 48 is configured to, if the target task is restarted successfully within the predetermined restart times, create a second target process corresponding to the target task, and maintain a mapping relationship between the target task and the second target process.
Optionally, the determining module 44 includes:
the detection submodule is used for detecting whether a first target process corresponding to the target task exists or not;
the first determining submodule is used for determining that the target task is abnormally interrupted under the condition that the detection result is that the first target process corresponding to the target task exists;
the second determining submodule is used for determining that the target task is abnormally interrupted when the detection result is that the first target process corresponding to the target task exists and the state of the target task is a failure state;
and a third determining submodule, configured to determine that no abnormal interrupt occurs in the target task when the first target process corresponding to the target task exists and the state of the target task is a running state.
Optionally, the apparatus further comprises:
and the killing module is used for killing the first target process under the condition that the first target process corresponding to the target task exists.
Optionally, the apparatus further comprises:
and the conversion module is used for converting the state of the target task from the failure state to the running state if the state of the target task is the failure state.
Optionally, the apparatus further comprises:
the receiving module is used for receiving a setting instruction for setting the second preset time period and the preset restarting times;
and the setting module is used for setting the second preset time period and the preset restarting times according to the setting instruction.
Optionally, the apparatus further comprises:
the detection module is used for detecting the starting of the target task;
the second creating module is used for creating the first target process corresponding to the target task and converting the state of the target task into the running state;
and the maintenance module is used for maintaining the mapping relation between the target task and the first target process.
It should be noted that, the above modules may be implemented by software or hardware, and for the latter, the following may be implemented, but not limited to: the modules are all positioned in the same processor; alternatively, the modules are respectively located in different processors in any combination.
Example 3
Embodiments of the present invention also provide a computer-readable storage medium, in which a computer program is stored, wherein the computer program is configured to perform the steps of any of the above method embodiments when executed.
Alternatively, in the present embodiment, the storage medium may be configured to store a computer program for executing the steps of:
s11, monitoring the progress and the state of the target task in a first preset time period;
s12, determining whether the target task is abnormally interrupted according to the process and the state of the target task;
s13, restarting the target task in a second preset time period when the target task is abnormally interrupted;
s14, under the condition that the target task is restarted successfully within the preset restart times, a second target process corresponding to the target task is created, and the mapping relation between the target task and the second target process is maintained.
Optionally, in this embodiment, the storage medium may include, but is not limited to: various media capable of storing computer programs, such as a usb disk, a Read-Only Memory (ROM), a Random Access Memory (RAM), a removable hard disk, a magnetic disk, or an optical disk.
Example 4
Embodiments of the present invention also provide an electronic device comprising a memory having a computer program stored therein and a processor arranged to run the computer program to perform the steps of any of the above method embodiments.
Optionally, the electronic apparatus may further include a transmission device and an input/output device, wherein the transmission device is connected to the processor, and the input/output device is connected to the processor.
Optionally, in this embodiment, the processor may be configured to execute the following steps by a computer program:
s11, monitoring the progress and the state of the target task in a first preset time period;
s12, determining whether the target task is abnormally interrupted according to the process and the state of the target task;
s13, restarting the target task in a second preset time period when the target task is abnormally interrupted;
s14, under the condition that the target task is restarted successfully within the preset restart times, a second target process corresponding to the target task is created, and the mapping relation between the target task and the second target process is maintained.
Optionally, the specific examples in this embodiment may refer to the examples described in the above embodiments and optional implementation manners, and this embodiment is not described herein again.
It will be apparent to those skilled in the art that the modules or steps of the present invention described above may be implemented by a general purpose computing device, they may be centralized on a single computing device or distributed across a network of multiple computing devices, and alternatively, they may be implemented by program code executable by a computing device, such that they may be stored in a storage device and executed by a computing device, and in some cases, the steps shown or described may be performed in an order different than that described herein, or they may be separately fabricated into individual integrated circuit modules, or multiple ones of them may be fabricated into a single integrated circuit module. Thus, the present invention is not limited to any specific combination of hardware and software.
The above description is only a preferred embodiment of the present invention and is not intended to limit the present invention, and various modifications and changes may be made by those skilled in the art. Any modification, equivalent replacement, or improvement made within the principle of the present invention should be included in the protection scope of the present invention.

Claims (10)

1. An interrupt task recovery method, comprising:
monitoring the progress and the state of a target task in a first preset time period;
determining whether the target task is abnormally interrupted or not according to the process and the state of the target task;
restarting the target task at a second preset time period under the condition that the target task is abnormally interrupted;
and under the condition that the target task is restarted successfully within the preset restarting times, creating a second target process corresponding to the target task, and maintaining the mapping relation between the target task and the second target process.
2. The method of claim 1, wherein determining whether an abort occurred in the target task based on the process and state information of the target task comprises:
detecting whether a first target process corresponding to the target task exists or not;
determining that the target task is abnormally interrupted when the detection result indicates that the first target process corresponding to the target task exists;
determining that the target task is abnormally interrupted when the detection result indicates that the first target process corresponding to the target task exists and the state of the target task is a failure state;
and under the condition that the first target process corresponding to the target task exists and the state of the target task is a running state, determining that the target task is not abnormally interrupted.
3. The method of claim 2, wherein prior to restarting the target task at a second predetermined time period, the method further comprises:
and under the condition that the first target process corresponding to the target task exists, killing the first target process.
4. The method of claim 2, wherein after restarting the target task for a second predetermined period of time, the method further comprises:
and if the state of the target task is a failure state, converting the state of the target task from the failure state to the running state.
5. The method of claim 1, wherein prior to monitoring the progress and status of the target task for the first predetermined period of time, the method further comprises:
receiving a setting instruction for setting the second preset time period and the preset restarting times;
and setting the second preset time period and the preset restarting times according to the setting instruction.
6. The method of any of claims 1 to 5, wherein prior to monitoring the progress and status of tasks for a first predetermined period of time, the method further comprises:
detecting the target task starting;
creating the first target process corresponding to the target task, and converting the state of the target task into the running state;
and maintaining the mapping relation between the target task and the first target process.
7. An interrupt task recovery apparatus, comprising:
the monitoring module is used for monitoring the progress and the state of the target task in a first preset time period;
the determining module is used for determining whether the target task is abnormally interrupted according to the process and the state of the target task;
the restarting module is used for restarting the target task in a second preset time period under the condition that the target task is abnormally interrupted;
and the first creation module is used for creating a second target process corresponding to the target task and maintaining the mapping relation between the target task and the second target process under the condition that the target task is restarted successfully within the preset restart times.
8. The apparatus of claim 7, wherein the determining module comprises:
the detection submodule is used for detecting whether a first target process corresponding to the target task exists or not;
the first determining submodule is used for determining that the target task is abnormally interrupted under the condition that the detection result is that the first target process corresponding to the target task exists;
the second determining submodule is used for determining that the target task is abnormally interrupted when the detection result is that the first target process corresponding to the target task exists and the state of the target task is a failure state;
and a third determining submodule, configured to determine that no abnormal interrupt occurs in the target task when the first target process corresponding to the target task exists and the state of the target task is a running state.
9. A computer-readable storage medium, in which a computer program is stored, wherein the computer program is configured to carry out the method of any one of claims 1 to 6 when executed.
10. An electronic device comprising a memory and a processor, wherein the memory has stored therein a computer program, and wherein the processor is arranged to execute the computer program to perform the method of any of claims 1 to 6.
CN201910888051.5A 2019-09-19 2019-09-19 Interrupt task recovery method and device Pending CN110618864A (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN201910888051.5A CN110618864A (en) 2019-09-19 2019-09-19 Interrupt task recovery method and device

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN201910888051.5A CN110618864A (en) 2019-09-19 2019-09-19 Interrupt task recovery method and device

Publications (1)

Publication Number Publication Date
CN110618864A true CN110618864A (en) 2019-12-27

Family

ID=68923699

Family Applications (1)

Application Number Title Priority Date Filing Date
CN201910888051.5A Pending CN110618864A (en) 2019-09-19 2019-09-19 Interrupt task recovery method and device

Country Status (1)

Country Link
CN (1) CN110618864A (en)

Cited By (7)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN111589107A (en) * 2020-05-14 2020-08-28 北京代码乾坤科技有限公司 Behavior prediction method and device of virtual model
CN111597032A (en) * 2020-05-26 2020-08-28 北京学之途网络科技有限公司 Task scheduling management method and device and electronic equipment
CN111737060A (en) * 2020-08-07 2020-10-02 北京金山云网络技术有限公司 Method and device for processing component exception and electronic equipment
CN112105044A (en) * 2020-09-22 2020-12-18 紫光展锐(重庆)科技有限公司 Resident state detection method, device and equipment
CN113242437A (en) * 2021-04-01 2021-08-10 联通(广东)产业互联网有限公司 RTSP (real time streaming protocol) video plug-in-free playing method, system, device and storage medium
CN114356533A (en) * 2022-03-15 2022-04-15 北京仁科互动网络技术有限公司 Micro-service non-perception publishing system and method, electronic equipment and storage medium
CN115794550A (en) * 2022-11-23 2023-03-14 广州汽车集团股份有限公司 Process management method, device, vehicle and storage medium

Citations (5)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN106484555A (en) * 2016-09-29 2017-03-08 广东欧珀移动通信有限公司 Abnormality detection and the method recovered and mobile terminal
CN107515796A (en) * 2017-07-31 2017-12-26 北京奇安信科技有限公司 A kind of unit exception monitor processing method and device
CN107967189A (en) * 2016-10-20 2018-04-27 南京途牛科技有限公司 Abnormal task retries method and device
CN108052430A (en) * 2017-11-30 2018-05-18 努比亚技术有限公司 Mobile terminal restarts localization method, mobile terminal and computer readable storage medium
CN109725998A (en) * 2018-12-26 2019-05-07 亚信科技(中国)有限公司 A kind of task retries method and device

Patent Citations (5)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN106484555A (en) * 2016-09-29 2017-03-08 广东欧珀移动通信有限公司 Abnormality detection and the method recovered and mobile terminal
CN107967189A (en) * 2016-10-20 2018-04-27 南京途牛科技有限公司 Abnormal task retries method and device
CN107515796A (en) * 2017-07-31 2017-12-26 北京奇安信科技有限公司 A kind of unit exception monitor processing method and device
CN108052430A (en) * 2017-11-30 2018-05-18 努比亚技术有限公司 Mobile terminal restarts localization method, mobile terminal and computer readable storage medium
CN109725998A (en) * 2018-12-26 2019-05-07 亚信科技(中国)有限公司 A kind of task retries method and device

Cited By (11)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN111589107A (en) * 2020-05-14 2020-08-28 北京代码乾坤科技有限公司 Behavior prediction method and device of virtual model
CN111597032A (en) * 2020-05-26 2020-08-28 北京学之途网络科技有限公司 Task scheduling management method and device and electronic equipment
CN111597032B (en) * 2020-05-26 2024-03-26 北京明略昭辉科技有限公司 Task scheduling management method and device and electronic equipment
CN111737060A (en) * 2020-08-07 2020-10-02 北京金山云网络技术有限公司 Method and device for processing component exception and electronic equipment
CN112105044A (en) * 2020-09-22 2020-12-18 紫光展锐(重庆)科技有限公司 Resident state detection method, device and equipment
CN112105044B (en) * 2020-09-22 2022-08-02 紫光展锐(重庆)科技有限公司 Resident state detection method, device and equipment
CN113242437A (en) * 2021-04-01 2021-08-10 联通(广东)产业互联网有限公司 RTSP (real time streaming protocol) video plug-in-free playing method, system, device and storage medium
CN114356533A (en) * 2022-03-15 2022-04-15 北京仁科互动网络技术有限公司 Micro-service non-perception publishing system and method, electronic equipment and storage medium
CN114356533B (en) * 2022-03-15 2022-06-14 北京仁科互动网络技术有限公司 Micro-service non-perception issuing system and method, electronic equipment and storage medium
CN115794550A (en) * 2022-11-23 2023-03-14 广州汽车集团股份有限公司 Process management method, device, vehicle and storage medium
CN115794550B (en) * 2022-11-23 2024-04-02 广州汽车集团股份有限公司 Process management method, device, vehicle and storage medium

Similar Documents

Publication Publication Date Title
CN110618864A (en) Interrupt task recovery method and device
CN109714202B (en) Client off-line reason distinguishing method and cluster type safety management system
CN110830283B (en) Fault detection method, device, equipment and system
US11706080B2 (en) Providing dynamic serviceability for software-defined data centers
JP2003022258A (en) Backup system for server
CN103607297A (en) Fault processing method of computer cluster system
CN109656742B (en) Node exception handling method and device and storage medium
US7093013B1 (en) High availability system for network elements
CN112506702B (en) Disaster recovery method, device, equipment and storage medium for data center
CN106330523A (en) Cluster server disaster recovery system and method, and server node
CN104486108A (en) Node configuration method base on Zookeeper and node configuration system based on Zookeeper
CN110955514A (en) Method, system and computer readable medium for improving utilization rate of Linux business process
CN111565135A (en) Method for monitoring operation of server, monitoring server and storage medium
CN113434327A (en) Fault processing system, method, equipment and storage medium
KR20200078328A (en) Systems and methods of monitoring software application processes
CN112052095B (en) Distributed high-availability big data mining task scheduling system
CN116055285B (en) Process management method and system of industrial control system
CN115002013A (en) Method and device for determining running state, storage medium and electronic device
CN114615310A (en) Method and device for maintaining TCP connection and electronic equipment
JP6421516B2 (en) Server device, redundant server system, information takeover program, and information takeover method
CN113765690A (en) Cluster switching method, system, device, terminal, server and storage medium
CN109995554A (en) The control method and cloud dispatch control device of multi-stage data center active-standby switch
CN114116178A (en) Cluster framework task management method and related device
CN115549751A (en) Remote sensing satellite ground station monitoring system and method
US11954509B2 (en) Service continuation system and service continuation method between active and standby virtual servers

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination
RJ01 Rejection of invention patent application after publication

Application publication date: 20191227

RJ01 Rejection of invention patent application after publication