CN101093453A - Method for implementing checkpoint of Linux program at user level based on virtual kernel object - Google Patents

Method for implementing checkpoint of Linux program at user level based on virtual kernel object Download PDF

Info

Publication number
CN101093453A
CN101093453A CN 200710035438 CN200710035438A CN101093453A CN 101093453 A CN101093453 A CN 101093453A CN 200710035438 CN200710035438 CN 200710035438 CN 200710035438 A CN200710035438 A CN 200710035438A CN 101093453 A CN101093453 A CN 101093453A
Authority
CN
China
Prior art keywords
thread
checkpoint
application
layer
routine
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Granted
Application number
CN 200710035438
Other languages
Chinese (zh)
Other versions
CN100465899C (en
Inventor
杨金民
张大方
黎文伟
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Hunan University
Original Assignee
Hunan University
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Hunan University filed Critical Hunan University
Priority to CNB2007100354383A priority Critical patent/CN100465899C/en
Publication of CN101093453A publication Critical patent/CN101093453A/en
Application granted granted Critical
Publication of CN100465899C publication Critical patent/CN100465899C/en
Expired - Fee Related legal-status Critical Current
Anticipated expiration legal-status Critical

Links

Images

Landscapes

  • Retry When Errors Occur (AREA)

Abstract

A method for realizing user grade of Linux program check point based on virtual kernel object includes inserting check point layer between application layer and system API layer, forming check point layer by application linear program check point control layer and mapping layer of kernel object state track-recording to object quotation, repositioning call of application layer on system APZ to check point layer, executing position flag on application linear program-setting, controlling check point of application linear program not to be dropped into said mapping layer and system API layer as well as kernel layer.

Description

Linux program breakpoint user class implementation method based on virtual kernel objects
Technical field
The present invention relates to a kind of Linux of catching program process state, and the recovering process state making its method of continue carrying out, specifically is a kind of Linux program breakpoint user class implementation method based on virtual kernel objects.
Background technology
Process checkpoint can realize that at operating system grade, user class or application layer its characteristics are respectively arranged respectively.The process checkpoint that operating system grade realizes is transparent to user program, obtains the kernel data structure of process easily, but needs to revise system kernel, and its configurability and transplantability are poor, and the checkpoint expense is also big.Name a person for a particular job checkpoint function of the process check of user class realizes that part is compiled as a storehouse and is linked to application program, can realize that application programs is transparent, has configuration of being easy to and expense features of smaller.But its realization mechanism is relevant with operating system platform, and platform transplantation is relatively poor.The advantage of application layer checkpoint is can the implementation platform independence, can transplant between different operating system, and its weak point is many to the qualification of using, can only be effective to the certain applications program.
Since the process checkpoint of user class have application programs transparent, be easy to configuration, expense is less and characteristics such as practicality, therefore a lot of checkpoint system all are chosen in user class and realize.Unix, Windows, Linux use the widest operating system at present, all have certain methods in user class implementation process checkpoint.
Process status comprises user's space state and kernel spacing state.The user's space state can directly conduct interviews to it in user class in process user address space and CPU register.The kernel spacing state is meant system kernel object and the state thereof relevant with target process.Kernel spacing can not directly be conducted interviews to it in user class by operating system management.The existing method of obtaining the kernel spacing state is a trace daemon to the calling of the API of system, and judges the kernel spacing state based on trace information then.
There are coupled relation in process user's space state and kernel spacing state, promptly have quoting of related object in the kernel spacing in the user's space state., require to realize to the operation that conducts interviews of system kernel object in user class by calling system API.When user class was created (opening) system kernel object by calling system API, operating system was returned an object reference.Application program all is to use object reference to call corresponding system API as identification parameter to the visit of kernel objects to finish subsequently.When recovering, system restarts, and uses checkpoint information to come the recovering process state, makes process continue to carry out from checkpointed state.Existing kernel spacing recovering state method is to use checkpoint information to finish by system's API Calls.This kernel spacing recovering state method can not realize the recovery of strict conformance, and the user's space state after the recovery and the coupled relation of kernel spacing state might be broken.The object reference that operating system is given when recovering exactly specifically, is not necessarily identical with object reference in being kept at the user's space state.This object reference not reproducing characteristic recovers to carry out the problem of bringing to process.In the execution after the fault recovery, application program can use the object reference before recovering to go to visit the kernel objects that the recovery back is created, and causes application recovery to carry out failure.
At this problem, existing solution comprises object reference clone method and virtual objects method.In the object reference clone method, during recovery repeatedly allocating object quote copy function, return an identical fiducial value until system.The problem of this method is might can not duplicate an identical fiducial value and make and recover failure.In the virtual objects method, application program uses virtual objects to quote the visit kernel objects, checkpoint system is in charge of virtual objects and is quoted with real object and quote, and is responsible for virtual objects quoted when the application access kernel objects converting real object to and quoting.When doing the checkpoint, be a kind of asynchronous operation in user class to the control of using thread, the checkpoint location of The Application of Thread has uncertainty.When the The Application of Thread checkpoint was arranged in system's API Calls code, system API code or kernel code, the kernel objects state that writes down in the checkpoint may be inconsistent with real kernel objects state.The virtual objects method still exists recovers unsuccessful hidden danger.Fig. 1 is for recovering unsuccessful example.We know, program can be regarded as by the unit serial connection of calculating, system's API Calls is such and forms.Supposition The Application of Thread checkpoint is dropped on and is provided with in the file pointer API of this system code among Fig. 1.Restart following two problems that have of recover carrying out: 1) use the kernel file object (owing to can not recover by strict conformance, file include has become h=3) after original file include (h=1) goes to visit recovery, carry out failure; 2) even do not have quoted problem after recovering, use checkpoint information that file pointer is reverted to p=5 during recovery, after the file pointer API Calls is set returns, p=8, and actually should equal 5, produce logic error.
Summary of the invention
Above-mentioned defective at the prior art existence, the present invention aims to provide a kind of Linux program breakpoint user class implementation method based on virtual kernel objects, the executory function calling relationship of reprogramming, make original uncontrollable coupled relation become controlled coupled relation, control The Application of Thread checkpoint is not dropped in system's API Calls code, system API code or the kernel code; The present invention is a kind of Linux of catching program process state, and the recovering process state makes it continue the method for carrying out, in user class is the migration of Linux program process between different machines, for process correct recovery based on the checkpoint under dynamic environment provides support, to realize load balance, System Fault Tolerance and efficient software misarrangement.
To achieve the above object of the invention, technical scheme of the present invention is: a kind of Linux program breakpoint user class implementation method based on virtual kernel objects, between application layer and system's api layer, insert the checkpoint layer, wherein the checkpoint layer comprises The Application of Thread checkpoint key-course and kernel objects status tracking record and object reference mapping layer, and application layer is redirected to the checkpoint layer to calling of the API of system; To using thread the executing location sign is set: when the execution of The Application of Thread when application layer moves into the checkpoint layer, its executing location sign of set is when the execution of The Application of Thread its executing location sign that resets when The Application of Thread checkpoint key-course returns application layer; The checkpoint of control The Application of Thread does not fall into kernel objects status tracking record and object reference mapping layer, system's api layer and inner nuclear layer, and the process checkpoint implementation method is as follows:
When process initiation is carried out, carry out the checkpoint initialization routine, finish the API of system and intercept and capture, promptly system's API Calls be redirected, checkpoint thread creation, checkpoint parameter read operation; The process check point process comprises following three phases:
1) make all The Application of Thread enter the checkpoint ready state; Thread set checkpoint, checkpoint sign triggers the checkpoint look-at-me; The execution of all The Application of Thread will be interrupted, and change and will go to carry out interruption routine, in interruption routine, if The Application of Thread finds that the executing location sign of oneself at reset mode, then calls the checkpoint routine; Executing location is masked as the The Application of Thread of SM set mode, and its execution is returned from interruption routine, and will move into The Application of Thread checkpoint key-course, calls the checkpoint routine; In the routine of checkpoint, The Application of Thread discharges its all synchronization objects that have, and obtains self thread context, enters the checkpoint ready state then, and waiting process state snapshot is finished;
2) the process status snapshot obtains: after all The Application of Thread enter the checkpoint ready state, the checkpoint thread is done the process status snapshot, comprise the The Application of Thread context, all virtual kernel objects, the The Application of Thread stack, the heap that application program is used, global variable and the internal storage data piece of using application;
3) carry out recovery: after the process status snapshot is finished, the checkpoint thread triggers all The Application of Thread and recovers to carry out: The Application of Thread recovers to have the synchronization object that it discharged in the phase one, wait for the notice of checkpoint thread then, after all The Application of Thread are all recovered to have its synchronization object, the checkpoint thread triggers all The Application of Thread and recovers to carry out, the run mode thread recovers its normal execution, the waiting state thread then again the call waiting api function enter waiting status.
It is as follows to recover implementation based on the process of checkpoint:
Process is restarted when carrying out, and main thread at first calls the checkpoint initialization routine, checks whether be that process is recovered then.If then carry out and recover routine.In recovering routine, main thread at first reads the checkpoint that will recover, recover all virtual kernel objects based on checkpoint information then, and by system's API Calls establishment respective synchronization object, its state is made as the state of respective virtual kernel objects, serves as to transmit parameter to create sub-thread with thread context place memory address again.Sub-thread entrance function is not original thread function, but a function that only comprises calling system api function siglongjmp () statement.The sub-thread that is created uses the transmission parameter call api function siglongjmp of system () to recover its thread context once starting.After main thread has been created sub-thread, from the checkpoint, read own stack pointer value, call a recursive function then and reduce own stack pointer,, use the checkpoint data then until its value of preserving in less than the checkpoint, recovering process user's space state, comprising own thread stack data (data at the bottom of from the stack pointer preserved to stack this section internal memory), sub-thread stack data, heap, global variable, and the internal storage data piece of using application.Last main thread calling system api function siglongjmp () recovers its thread context.After all The Application of Thread had been recovered context, the checkpoint thread triggered all The Application of Thread and recovers to carry out.The Application of Thread is recovered its former synchronization object that has, and waits for the notice of checkpoint thread then.After all The Application of Thread were all recovered to have its synchronization object, the checkpoint thread triggered all The Application of Thread and recovers to carry out.The run mode thread recovers its normal execution, the waiting state thread then again call waiting API enter waiting status.So far, process is recovered to finish.
Principle of work of the present invention is described in detail as follows:
Described Linux program breakpoint user class implementation method based on virtual kernel objects, between application layer and system's api layer, insert virtual kernel objects layer (checkpoint layer), application code no longer directly goes to have visited kernel objects by calling system API, visits the corresponding virtual kernel objects but be redirected.To the application layer requested operation, virtual kernel objects calls corresponding system API again and finishes.Like this, system's api layer is to application layer just no longer visible (promptly transparent).The property value of the corresponding kernel objects of virtual kernel objects track record is safeguarded two object reference values: offer the virtual objects fiducial value of application layer, the real object fiducial value of corresponding kernel objects simultaneously.The real object fiducial value only to virtual kernel objects as seen, and is invisible to application layer.The storage of real object fiducial value in user address space just had manageability like this, and checkpoint system can be managed its memory address.At above-mentioned virtual kernel objects, need control the checkpoint of using thread.The control of using the thread checkpoint is had two requirements: 1) property value of the property value of the virtual kernel objects of assurance and corresponding kernel objects is in full accord; 2) in the user's space state, the real object fiducial value in the member variable that appears at corresponding virtual kernel objects, can not appear at other (for example in thread stack) Anywhere.It is fully possible that the real object fiducial value appears at other places, and for example, when virtual kernel objects calling system API, it will appear in the thread stack as transmitting parameter.
In the methods of the invention, the checkpoint layer is divided into two-layer: The Application of Thread checkpoint key-course and kernel objects status tracking record and object reference mapping layer, the code segment that The Application of Thread checkpoint key-course is controlled under it is carried out with atomic way with regard to the checkpoint, be that the thread checkpoint does not fall into wherein, to satisfy the control requirement of above-mentioned The Application of Thread checkpoint.
Concrete control method is as follows: when program start was carried out, main thread at first called the checkpoint initialization routine, finished the API of system and intercepted and captured operations such as (system's API Calls is redirected), checkpoint thread creation, checkpoint parameter read.The purpose that system API intercepts and captures is to insert the checkpoint layer between application layer and system's api layer, also is referred to as virtual kernel objects layer.The function of checkpoint thread is to work in coordination with The Application of Thread to finish process checkpoint and process recovery.
Said system API intercepts and captures and is prior art, but also reference system API is definite in the definition of virtual kernel objects class.Key point is the control of The Application of Thread checkpoint, and its control method is as follows:
The operation that begins of The Application of Thread checkpoint key-course is exactly to announce that invokes thread enters system call (giving the executing location flag set of thread), and end operation is that the declaration invokes thread is returned (resetting for the executing location sign of thread) from system call.In the time will doing the checkpoint, the checkpoint thread is provided with the checkpoint sign, triggers the checkpoint look-at-me.Punctum at this moment, The Application of Thread can be divided two classes: 1) its executing location is masked as reset mode; 2) its executing location is masked as SM set mode.Arbitrary The Application of Thread only may belong to a class.The thread that executing location is masked as reset mode is the run mode thread certainly.The run mode thread may be and executing location is masked as the thread of SM set mode, also the waiting state thread may be.After the checkpoint thread triggers the checkpoint look-at-me, the execution of all The Application of Thread will be interrupted, and change and go to carry out interruption routine.In interruption routine, The Application of Thread is at first checked the executing location sign of oneself, if at reset mode, shows its point of interruption in application layer, satisfies two requirements of The Application of Thread checkpoint control, so call the checkpoint routine.If the executing location sign, shows its point of interruption in SM set mode not in application layer, two requirements of The Application of Thread checkpoint control may not be met, and therefore can not call the checkpoint routine, can only directly return from interruption routine, continue to carry out.To the The Application of Thread of run mode, its execution will move into The Application of Thread checkpoint key-course rapidly, calls the checkpoint routine.The thread of run mode discharges its synchronization object that has in the routine of checkpoint, obtain self thread context, enters the checkpoint ready state then, the finishing of waiting process state snapshot.The The Application of Thread of waiting state keeps waiting state after interruption routine is returned.The thread of excessive run mode does not discharge after its synchronization object that has in the routine of checkpoint, the The Application of Thread of waiting state can be waken up, transfer run mode to, it is carried out and also can move into The Application of Thread checkpoint key-course, call the checkpoint routine, discharge the synchronization object that it has, enter the checkpoint ready state.Therefore this strategy can guarantee that all The Application of Thread all will call the checkpoint routine, and its checkpoint can not fall into kernel objects status tracking record and object reference mapping layer, system's api layer and inner nuclear layer, to satisfy two requirements of thread checkpoint control.
In order to do the checkpoint, all The Application of Thread have all discharged its synchronization object that has, and the The Application of Thread of waiting state has also forwarded run mode to.After doing the checkpoint that is over, original state must be restored.Therefore recover to divide two stages: 1) all The Application of Thread are recovered its synchronization object that had before the checkpoint sign is set up moment point; 2) the run mode thread recovers its normal execution, the waiting state thread then again call waiting API enter waiting status.Have only all The Application of Thread all to finish after the phase one recovery, subordinate phase could begin, and is not changed to guarantee the synchronous logic relation between the The Application of Thread.
In sum, the present invention is a kind of Linux of catching program process state, and the recovering process state makes it continue the method for carrying out, in user class is the migration of Linux program process between different machines, for process correct recovery based on the checkpoint under dynamic environment provides support, to realize load balance, System Fault Tolerance, efficient software misarrangement.
Description of drawings
Fig. 1 is an existing checkpoint implementation method synoptic diagram;
Fig. 2 is that the process status coupled relation decomposes and consistance control strategy synoptic diagram among the present invention;
Fig. 3 is the checkpoint control strategy synoptic diagram of The Application of Thread among the present invention;
Fig. 4 is a collaborative sequential chart of doing process checkpoint between checkpoint thread and the The Application of Thread among the embodiment;
Fig. 5 is the sequential chart that process is recovered execution among the embodiment from checkpointed state.
8. 2. 1. interruption routine use thread checkpoint control routine and in accompanying drawing 3, the checkpoint routine 4. and particular content 5. respectively as follows:
Interruption routine (seeing that Fig. 3 is 1.):
If checkpoint sign is in SM set mode, and the execution tick lables of invokes thread is at reset mode, Then call the checkpoint routine 4.;
Use thread checkpoint control routine and 2. (see that Fig. 3 is 2.):
The execution tick lables of set invokes thread;
If 4. the checkpoint sign then calls the checkpoint routine in SM set mode;
Use thread checkpoint control routine and 3. (see that Fig. 3 is 3.):
If 5. the checkpoint sign then calls the checkpoint routine in SM set mode;
The execution tick lables of the invokes thread that resets;
4. the checkpoint routine (sees that Fig. 3 is 4.):
(1) invokes thread discharges its synchronization object of holding;
(2) invokes thread is obtained its thread context;
(3) invokes thread announces that it enters the checkpoint ready state;
(4) invokes thread waiting process state snapshot finishes;
(5) invokes thread recovers to have the synchronization object that it discharges in (1);
(6) invokes thread announces that it enters the checkpoint completion status;
(7) the invokes thread Wait-to-Restore is carried out notice;
The synchro system API that makes thread enter waiting state to pthread_mutex_lock and so on intercepts and captures, and 5. its checkpoint routine (sees that Fig. 3 is 5.):
(1) invokes thread discharges just now thus that API synchronously returns and the synchronization object that obtains;
(2) invokes thread discharges its synchronization object of having held;
(3) invokes thread is obtained its thread context;
(4) invokes thread announces that it enters the checkpoint ready state
(5) invokes thread waiting process state snapshot finishes;
(6) invokes thread recovers to have the synchronization object that it discharges in (2);
(7) invokes thread announces that it enters the checkpoint completion status;
(8) invokes thread Wait-to-Restore exercise notice;
(9) invokes thread is called corresponding API synchronously and is reentered waiting status;
To the intercepting and capturing of asynchronous API, it is 4. identical with above-mentioned checkpoint routine that 5. its checkpoint routine (sees that Fig. 3 5.).
Embodiment
With reference to Fig. 4.Among this embodiment: constantly, three thread: T1 are arranged in the process in the checkpoint, T2, T3 has 2 synchronization objects: mutual exclusion M1, M2.Thread T1 holds M1, at execution state; Thread T2 holds M2, and it has called the pthread_mutex_lock () API of system, in waiting state, further waits for M1; Thread T3 has called the pthread_mutex_lock () API of system in waiting state, waits for M2.In the checkpoint constantly, thread set checkpoint, checkpoint sign, and call signal () and give thread T1, T2, T3 sends the checkpoint look-at-me.T1, T2, T3 is interrupted, and changes and goes to carry out the checkpoint interruption routine.In checkpoint interruption routine (seeing that Fig. 3 1.), T1 checks the checkpoint sign in SM set mode, and 4. the executing location sign of oneself (sees that Fig. 3 4.) at reset mode so call the checkpoint routine.T1 the checkpoint routine 4. in, discharge M1, call the sigsetjmp () API of system and obtain thread context, call finishing of the barrier () API of system waiting process checkpoint then.Thread T2 is in the interruption routine of checkpoint, and the executing location sign of checking oneself so directly return, continues to keep waiting status in SM set mode.After thread T1 discharged M1, thread T2 can be waken up.When the execution of T2 moves forward to the checkpoint control routine when 3. (seeing that Fig. 3 3.), find that the checkpoint sign in SM set mode, 5. (sees that Fig. 3 5.) so call the checkpoint routine.The checkpoint routine 5. in, T2 discharges earlier the M1 that has just held, and discharges the M2 that oneself holds then, calls sigsetjmp () then and obtains thread context, calls finishing of barrier () waiting process checkpoint then.T3 is the same with T2, after T2 discharges M2, finishes waiting status, the checkpoint routine 5. in, discharge the M2 that has just held, call sigsetjmp () then and obtain thread context, call finishing of barrier () waiting process checkpoint then.At all The Application of Thread (T1, T2, T3) all called barrier () afterwards, the checkpoint thread is waken up, and it knows that all The Application of Thread have all entered the checkpoint ready state, (comprise the data in all The Application of Thread stacks so catch the process status snapshot, heap, the global data district uses the internal storage data piece of application, and all virtual kernel objects, the thread context of each The Application of Thread) and be saved on the reliable memory body.Finish after the process status snapshot, checkpoint thread dispatching barrier () recovers thread T1, T2, the execution of T3.Thread T1 recovers to call barrier () and to enter synchronous regime after the holding of M1.Thread T2 recovers also to call barrier () and to enter synchronous regime after the holding of M2.After all The Application of Thread were all recovered to hold its synchronization object, they recovered to carry out synchronously.Thread T1 continues to carry out, and thread T2 calls the pthread_mutex_lock () API of system and waits for M1 again, and thread T3 calls the pthread_mutex_lock () API of system and waits for M2 again.All threads all reset into the state before the checkpoint.
Fig. 5 program process when to be process based on the recovery of above-mentioned checkpoint carry out.When process was recovered, program restarted with reset mode.Main thread T1 reads the checkpoint that will recover after finishing interception system API and create the checkpoint thread in the initialization of checkpoint, carries out and recovers routine.
In recovering routine, main thread recovers all virtual kernel objects based on checkpoint information, creates synchronization object (mutual exclusion M1 and M2), serves as to transmit parameter to create sub-thread T2 and T3 with thread context place memory address then.Sub-thread entrance function is not original thread function, but one only comprises the function that calls the siglongjmp () API of system statement.The sub-thread that is created is once starting use transmission parameter and calling siglongjmp () and recover its thread context.After T2 and T3 context are resumed, then carry out barrier () and enter synchronous regime in 5. the checkpoint routine (sees that Fig. 3 5.), Wait-to-Restore is finished.
Main thread reads own stack pointer value from the checkpoint, call a recursive function then and reduce own stack pointer, in recursive function, judge that by the memory address of checking the local variable in the recursive function whether stack pointer is less than the stack pointer value of preserving in the checkpoint.If less than, then the stack of explanation oneself is big than the stack in the checkpoint, is enough to contain the stack data in the checkpoint.Main thread is based on checkpoint information, recovers all process statuss (comprising own thread stack data (data at the bottom of from the stack pointer preserved to stack this section internal memory), sub-thread stack data, heap, global variable, and the internal storage data piece that uses of application program).Call siglongjmp () then and recover own thread context.After the main thread context is resumed, in 4. the checkpoint routine (sees that Fig. 3 4.), then carry out barrier (), enter synchronous regime, Wait-to-Restore.
In case after all threads all called barrier (), they just continued to carry out synchronously.They carry out the phase III of checkpoint process.Main thread T1 recovery is held M1's, and T2 recovers holding M2.Hold concern full recovery after, their recover synchronously to carry out.T1 recovers normal to be carried out, and T2 and T3 call the pthread_mutex_lock () API of system and reenter waiting status.Process status is recovered to finish.

Claims (1)

1, a kind of Linux program breakpoint user class implementation method based on virtual kernel objects, it is characterized in that, between application layer and system's api layer, insert the checkpoint layer, wherein the checkpoint layer comprises The Application of Thread checkpoint key-course and kernel objects status tracking record and object reference mapping layer, application layer is redirected to the checkpoint layer to calling of the API of system, to using thread the executing location sign is set: when the execution of The Application of Thread when application layer moves into the checkpoint layer, its executing location sign of set is when the execution of The Application of Thread its executing location sign that resets when The Application of Thread checkpoint key-course returns application layer; The checkpoint of control The Application of Thread does not fall into kernel objects status tracking record and object reference mapping layer, system's api layer and inner nuclear layer, and the process checkpoint implementation method is as follows:
When process initiation is carried out, carry out the checkpoint initialization routine, finish the API of system intercepting and capturing, checkpoint thread creation, checkpoint parameter read operation; The process check point process comprises following three phases:
1) make all The Application of Thread enter the checkpoint ready state; Thread set checkpoint, checkpoint sign triggers the checkpoint look-at-me; The execution of all The Application of Thread will be interrupted, and change and will go to carry out interruption routine, in interruption routine, if The Application of Thread finds that the executing location sign of oneself at reset mode, then calls the checkpoint routine; Executing location is masked as the The Application of Thread of SM set mode, and its execution is returned from interruption routine, and will move into The Application of Thread checkpoint key-course, calls the checkpoint routine; In the routine of checkpoint, The Application of Thread discharges its all synchronization objects that have, and obtains self thread context, enters the checkpoint ready state then, and waiting process state snapshot is finished;
2) the process status snapshot obtains: after all The Application of Thread enter the checkpoint ready state, the checkpoint thread is done the process status snapshot, comprise the The Application of Thread context, all virtual kernel objects, the The Application of Thread stack, the heap that application program is used, global variable and the internal storage data piece of using application;
3) carry out recovery: after the process status snapshot is finished, the checkpoint thread triggers all The Application of Thread and recovers to carry out: The Application of Thread recovers to have the synchronization object that it discharged in the phase one, wait for the notice of checkpoint thread then, after all The Application of Thread are all recovered to have its synchronization object, the checkpoint thread triggers all The Application of Thread and recovers to carry out, the run mode thread recovers it just to be carried out, the waiting state thread then again the call waiting api function enter waiting status.
CNB2007100354383A 2007-07-25 2007-07-25 Method for implementing checkpoint of Linux program at user level based on virtual kernel object Expired - Fee Related CN100465899C (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CNB2007100354383A CN100465899C (en) 2007-07-25 2007-07-25 Method for implementing checkpoint of Linux program at user level based on virtual kernel object

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CNB2007100354383A CN100465899C (en) 2007-07-25 2007-07-25 Method for implementing checkpoint of Linux program at user level based on virtual kernel object

Publications (2)

Publication Number Publication Date
CN101093453A true CN101093453A (en) 2007-12-26
CN100465899C CN100465899C (en) 2009-03-04

Family

ID=38991728

Family Applications (1)

Application Number Title Priority Date Filing Date
CNB2007100354383A Expired - Fee Related CN100465899C (en) 2007-07-25 2007-07-25 Method for implementing checkpoint of Linux program at user level based on virtual kernel object

Country Status (1)

Country Link
CN (1) CN100465899C (en)

Cited By (10)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN101383690B (en) * 2008-10-27 2011-06-01 西安交通大学 Grid synchronization method for fault tolerant computer system based on socket
CN102262584A (en) * 2010-05-24 2011-11-30 北大方正集团有限公司 Method and device for checking program operation error
CN102388370A (en) * 2009-06-19 2012-03-21 核心科技有限公司 Computer process management
CN103473133A (en) * 2013-09-25 2013-12-25 浪潮电子信息产业股份有限公司 High availability system-oriented redundant process synchronization method
CN103514395A (en) * 2012-06-20 2014-01-15 阿里巴巴集团控股有限公司 Plug-in right control method and system
CN105531668A (en) * 2013-08-08 2016-04-27 英派尔科技开发有限公司 Migration of executing processes
CN106164866A (en) * 2014-04-08 2016-11-23 微软技术许可有限责任公司 The efficient migration of client-side WEB state
CN107045605A (en) * 2016-02-05 2017-08-15 中兴通讯股份有限公司 A kind of real-time metrics method and device
CN107547566A (en) * 2017-09-29 2018-01-05 新华三信息安全技术有限公司 A kind of method and device of processing business message
CN110737501A (en) * 2018-07-18 2020-01-31 中标软件有限公司 Method and system for realizing functions of check point and recovery point in Docker container

Families Citing this family (1)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
WO2017070861A1 (en) * 2015-10-28 2017-05-04 华为技术有限公司 Interrupt response method, apparatus and base station

Family Cites Families (4)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US6766471B2 (en) * 2000-12-28 2004-07-20 International Business Machines Corporation User-level checkpoint and restart for groups of processes
CN1280726C (en) * 2002-10-18 2006-10-18 上海贝尔有限公司 Virtual machine for embedded systemic software development
KR20050025387A (en) * 2003-09-06 2005-03-14 한국전자통신연구원 Optical tranceiver for reducing crosstalk
US7516361B2 (en) * 2005-06-27 2009-04-07 Sun Microsystems, Inc. Method for automatic checkpoint of system and application software

Cited By (15)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN101383690B (en) * 2008-10-27 2011-06-01 西安交通大学 Grid synchronization method for fault tolerant computer system based on socket
CN102388370A (en) * 2009-06-19 2012-03-21 核心科技有限公司 Computer process management
CN102262584A (en) * 2010-05-24 2011-11-30 北大方正集团有限公司 Method and device for checking program operation error
CN102262584B (en) * 2010-05-24 2014-03-12 北大方正集团有限公司 Method and device for checking program operation error
CN103514395A (en) * 2012-06-20 2014-01-15 阿里巴巴集团控股有限公司 Plug-in right control method and system
CN103514395B (en) * 2012-06-20 2016-09-28 阿里巴巴集团控股有限公司 Plug-in right control method and system
CN105531668B (en) * 2013-08-08 2019-04-23 英派尔科技开发有限公司 Method, transference apparatus and the computer-readable medium of migration for executive process
CN105531668A (en) * 2013-08-08 2016-04-27 英派尔科技开发有限公司 Migration of executing processes
CN103473133A (en) * 2013-09-25 2013-12-25 浪潮电子信息产业股份有限公司 High availability system-oriented redundant process synchronization method
CN106164866A (en) * 2014-04-08 2016-11-23 微软技术许可有限责任公司 The efficient migration of client-side WEB state
CN106164866B (en) * 2014-04-08 2020-01-10 微软技术许可有限责任公司 Efficient migration of client-side WEB state
CN107045605A (en) * 2016-02-05 2017-08-15 中兴通讯股份有限公司 A kind of real-time metrics method and device
CN107547566A (en) * 2017-09-29 2018-01-05 新华三信息安全技术有限公司 A kind of method and device of processing business message
CN107547566B (en) * 2017-09-29 2020-11-20 新华三信息安全技术有限公司 Method and device for processing service message
CN110737501A (en) * 2018-07-18 2020-01-31 中标软件有限公司 Method and system for realizing functions of check point and recovery point in Docker container

Also Published As

Publication number Publication date
CN100465899C (en) 2009-03-04

Similar Documents

Publication Publication Date Title
CN100465899C (en) Method for implementing checkpoint of Linux program at user level based on virtual kernel object
US10901856B1 (en) Method and system for providing checkpointing to windows application groups
US10394621B1 (en) Method and computer readable medium for providing checkpointing to windows application groups
CN104854566B (en) Collapse the method and system recovered
US7472129B2 (en) Lossless recovery for computer systems with map assisted state transfer
US10430298B2 (en) Versatile in-memory database recovery using logical log records
US7293200B2 (en) Method and system for providing transparent incremental and multiprocess checkpointing to computer applications
EP2721498B1 (en) Managing replicated virtual storage at recovery sites
US7523344B2 (en) Method and apparatus for facilitating process migration
US11132294B2 (en) Real-time replicating garbage collection
CN103562904A (en) Replaying jobs at a secondary location of a service
US8752048B1 (en) Method and system for providing checkpointing to windows application groups
AU2012273366A1 (en) Managing replicated virtual storage at recovery sites
CN101154185A (en) Method for performing recovery and playback when running software
CN101599080A (en) A kind of organization system of Backup Data and method
Baude et al. A hybrid message logging-cic protocol for constrained checkpointability
Tullmann et al. User-level checkpointing through exportable kernel state
Shrivastava et al. Structuring fault-tolerant object systems for modularity in a distributed environment
Shapiro et al. Database consistency models
CN115878386A (en) Disaster recovery method and device, electronic equipment and storage medium
US11334445B2 (en) Using non-volatile memory to improve the availability of an in-memory database
Badrinath et al. Checkpointing and recovery of shared memory parallel applications in a cluster
Narasimhan Trade-offs between real-time and fault-tolerance for middleware applications
Gamble et al. Specification of Fenix MPI Fault Tolerance library (V. 1.0)
CN117459401B (en) Method, device, equipment and storage medium for generating network target range environment snapshot

Legal Events

Date Code Title Description
C06 Publication
PB01 Publication
C10 Entry into substantive examination
SE01 Entry into force of request for substantive examination
C14 Grant of patent or utility model
GR01 Patent grant
C17 Cessation of patent right
CF01 Termination of patent right due to non-payment of annual fee

Granted publication date: 20090304