EP0750261B1 - Méthode et appareil pour un système de traitement de communication basé sur des points de reprise - Google Patents

Méthode et appareil pour un système de traitement de communication basé sur des points de reprise Download PDF

Info

Publication number
EP0750261B1
EP0750261B1 EP96304472A EP96304472A EP0750261B1 EP 0750261 B1 EP0750261 B1 EP 0750261B1 EP 96304472 A EP96304472 A EP 96304472A EP 96304472 A EP96304472 A EP 96304472A EP 0750261 B1 EP0750261 B1 EP 0750261B1
Authority
EP
European Patent Office
Prior art keywords
packet
transfer
communication
checkpoint
attribute information
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Expired - Lifetime
Application number
EP96304472A
Other languages
German (de)
English (en)
Other versions
EP0750261A1 (fr
Inventor
Hideaki c/o Intell. Prop. Div. Hirayama
Makoto c/o Intell. Prop. Div. Honda
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Toshiba Corp
Original Assignee
Toshiba Corp
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Toshiba Corp filed Critical Toshiba Corp
Publication of EP0750261A1 publication Critical patent/EP0750261A1/fr
Application granted granted Critical
Publication of EP0750261B1 publication Critical patent/EP0750261B1/fr
Anticipated expiration legal-status Critical
Expired - Lifetime legal-status Critical Current

Links

Images

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F11/00Error detection; Error correction; Monitoring
    • G06F11/07Responding to the occurrence of a fault, e.g. fault tolerance
    • G06F11/14Error detection or correction of the data by redundancy in operation
    • G06F11/1402Saving, restoring, recovering or retrying

Definitions

  • the present invention relates to method and apparatus for a checkpoint based communication processing system suitable for applying to a fault tolerant computer system.
  • the present invention relates to method and apparatus for achieving a higher speed communication processing system which is suitable for applying to a checkpoint based fault tolerant computer system.
  • the present invention relates to method and apparatus for accomplishing a higher efficiency of a checkpoint based communication processing system with maintaining the fault tolerance of the system.
  • the present invention relates to method and apparatus for accomplishing a checkpoint based communication processing system which performs a information transfer by using a plurality of communication packets with taking a checkpoint for avoiding a failure.
  • a checkpoint based fault tolerant computer system proceeds with execution of a program which taking a checkpoint periodically for recovering from a failure which appears between two particular checkpoints.
  • FIG. 7 is a timing chart for explaining the transfer operation of the checkpoint based communication processing system.
  • the reason for delaying the transfer of the communication packets is to measure a possible appearance of a failure between the execution. That means it is possible to cancel the generated communication packets if a failure is detected until the completion of the execution at the next checkpoint. If the communication packets have been transferred before taking the next checkpoint, it is impossible to cancel them when a failure is detected before the next checkpoint.
  • a checkpoint based communication processing system can achieve a higher efficiency of the system while maintaining a fault tolerance of the system by processing attribute information affixed to a plurality of communication packets or transfer packets for transferring promptly so as to minimize waiting numbers of the packet until a completion of execution of particular two check points.
  • a sequence number of a taken checkpoint or a relative positioning information are used as the attribute information.
  • the system judges whether a packet should be transferred yet from the attribute information affixed to the packet.
  • a first embodiment of the checkpoint based communication processing system in accordance with the present common inventive concept uses a sequence number of the checkpoint which has been taken during an execution as an attribute information for judging whether or not the communication packet to which it is affixed should be transferred yet.
  • a second embodiment of the checkpoint based communication processing system uses a relative information among a plurality of transfer packets which are generated from a communication packet as an attribute information for judging whether or not which transfer packet should be transferred yet.
  • the first checkpoint based communication processing system embodying the common inventive concept, for executing an operation and taking a checkpoint periodically in order to facilitate recovery from a failure which is detected in the operation between two checkpoints, includes;
  • a first method, embodying the common inventive concept, for accomplishing a checkpoint based communication processing system in which execution proceeds with the taking of a checkpoint periodically to facilitate recovery from a failure detected between two checkpoints, includes the steps of:
  • the second communication processing system embodying the common inventive concept, for executing an operation and taking a checkpoint periodically in order to facilitate recovery from a failure which is detected in the operation between two checkpoints, which comprises:
  • the last position among the plural generation of the transfer packets is usually used as the certain particular position.
  • a second method, embodying the common inventive concept, for accomplishing a checkpoint based communication processing system for executing a process and taking a checkpoint periodically in order to avoid facilitate recovery from a failure which is detected in an execution between two checkpoints, includes:
  • Fig. 1 is a schematic diagram of one preferable embodiment of a checkpoint based communication system according to the present invention.
  • the checkpoint based communication system 2 includes a control means 21, an incrementing means 22, a generating means 23, an affixing means 24, and a comparing means 25.
  • the control means 21 (hereinafter referred to as a control module controls the total operation of the checkpoint based communication system 2.
  • the incrementing means 22 (hereinafter referred to as a sequence number incrementing module) increments and holds a sequence number of a checkpoint at every time the respective checkpoint is taken periodically at a certain time interval for identifying the respective checkpoint.
  • the generating means 23 (hereinafter referred to as a packet generating module) generates a plurality of communication packets in accordance with a request-to-sent from an application 1 for the system 2.
  • the communication packet means a transport packet in a transport layer, like an IP packet.
  • the affixing means 24 (hereinafter referred to as an attribute information affixing module) affixes the sequence number of the checkpoint kept in the sequence number incrementing module 22 at that time to the communication packet.
  • the comparing means 25 For transferring a communication packet, the comparing means 25 (hereinafter referred to as a comparing and judging module) compares the attribute information attached to the communication packet and a sequence number for the checkpoint which is the lastly taken at the time for a transfer.
  • Fig. 2 is a timing chart for explaining the operation.
  • the control module in the communication processing system instructs the packet generating module to generate a plurality of communication packets 4a, 4b and 4c at the point (2).
  • the generated numbers of the communication packets are just shown as an example for the explanation purpose only. Of course, it does not mean the actual number of the generated communication packets.
  • a checkpoint (3) is taken just after generation or the communication packet 4a.
  • the control module provides an instruction signal to the comparing module for checking whether or not the attribute information affixed to the communication packet coincide with the sequence number of the checkpoint maintained in the incrementing module.
  • the control module sends an instruction to transfer the communication packet 4a promptly through a driver.
  • the attribute information affixed to the respective communication packets 4b and 4c coincides to the sequence number of the checkpoint in the incrementing module. Accordingly, the communication packets 4b and 4c are delayed in their transfer from the system until the next checkpoint has taken.
  • the first embodiment of the present invention is characterized in that the number of the generated communication packets waiting is decreased by judging the attribute information affixed to the communication packet and the sequence number in the increment module.
  • Fig.3 is a flow chart for explaining a method for the first embodiment.
  • the control module sends an instruction to generate a plurality of communication packets.
  • control module instructs the affixing module to affix the sequence number maintained in the incrementing module as an attribute information at the time when the communication packets were generated.
  • control module instructs the comparing module to check whether or not the affixed attribute information coincide with the sequence number of the checkpoint maintained in the incrementing module at the present time for actually transferring the communication packets.
  • the control module sends an instruction to transfer the communication packet having the attribute information promptly from the system through a driver as shown at the fifth step A5.
  • Fig. 4 is a block diagram of another preferable embodiment or a checkpoint based communication processing system according to the present invention.
  • the checkpoint based communication system 3 includes a control module 31 and a communication packet generating module 33 which are substantially the same as the control module 21 and the communication packet generating module 23 respectively shown in Fig. 1.
  • the communication system 3 further includes a transfer packet generating module 33, an attribute information affixing module 34 and a relative position comparing module 35.
  • the control module 31 instructs the communication packet generating module 32 to generate a plurality of communication packets.
  • the communication packets are transferred by using in a transport layer, such as an IP packet for-using in an IP layer.
  • an actual transfer is performed by using a data link layer.
  • the control module 31 instructs the transfer packet generating module 33 to generate a plurality of transfer packets by dividing a communication packet under a particular communication protocol so as to fit the size of the respective transfer packet to the transfer device or a network media.
  • control module 31 instructs the attribute information affixing module 34 to affix a relative position information to the generated transfer packets.
  • the relative position indicates whether the transfer packet is the last one among them. Namely, when a plurality of transfer packets are generated by dividing a communication packet, the last transfer packet only is identified among them.
  • Fig. 5 shows the example in which three transfer packets 5a, 5b and 5c are generated from a communication packet 4a and the last transfer packet 5c is affixed with an attribute information LP.
  • the control module 31 instructs the relative position comparing module 35 to check whether or not the transfer packet is the last one from the affixed attribute information.
  • the transfer packets except the last one like the transfer packets 5a and 5b shown in Fig. 5, are promptly transferred into a transmission line.
  • the last transfer packet 5c only is delayed until the next checkpoint has taken.
  • a receiving side system reconstructs the communication packet from the received transfer packet.
  • the receiving side sends a request signal for resending of the communication packet after detecting a certain timeout.
  • Fig. 6 is a flow chart for explaining the above mentioned operation.
  • the control module in the communication processing system instructs the communication packet generating module to generate a plurality of transfer packets as shown as the first step B1 in Fig. 6. Further the communication packet is divided into a plurality of transfer packets at the second step B2. Then the control module instructs the attribute information affixing module to affix the information for identifying whether or not the transfer packet is the last one at the third step B3.
  • control module instructs the comparing module to recognize whether the transfer packet is the last one or not at the forth step B4. If it is not, the step goes to the sixth step B6 for transferring the transfer packet as promptly. If it is yes, the step goes to the fifth step B5 for waiting the transfer packet until the next checkpoint has taken.
  • the first embodiment of the checkpoint based communication processing system controls the processing by maintaining the checkpoint sequence number.
  • the second embodiment controls the processing by judging a relative position of the transfer packet.
  • an ethernet packet which is generated in an ethernet driver transfer packet corresponds to the transfer packet.

Landscapes

  • Engineering & Computer Science (AREA)
  • Theoretical Computer Science (AREA)
  • Quality & Reliability (AREA)
  • Physics & Mathematics (AREA)
  • General Engineering & Computer Science (AREA)
  • General Physics & Mathematics (AREA)
  • Retry When Errors Occur (AREA)
  • Data Exchanges In Wide-Area Networks (AREA)
  • Maintenance And Management Of Digital Transmission (AREA)

Claims (5)

  1. Système de traitement de communication basé sur des points de contrôle pour exécuter un processus et prélever périodiquement un point de contrôle afin de faciliter une reprise après une défaillance qui est détectée dans le processus entre deux points de contrôle, lequel système comprend des moyens pour commander tout le processus de traitement de communication et des moyens pour générer une pluralité de paquets de communication pour le traitement de communication sous la supervision des moyens de commande, le système étant caractérisé en ce qu'il comprend en outre :
    des moyens pour incrémenter un numéro d'ordre pour un point de contrôle chaque fois qu'un point de contrôle a été prélevé pendant le processus ;
    des moyens pour associer des informations d'attributs aux paquets de communication respectifs, comprenant le numéro d'ordre existant dans les moyens d'incrémentation au moment où le paquet de communication est généré ; et
    des moyens pour évaluer si le paquet de communication doit encore être transféré ou non en comparant les informations d'attributs associées et le numéro d'ordre existant à ce moment dans les moyens d'incrémentation.
  2. Système de traitement de communication pour exécuter un processus et prélever périodiquement un point de contrôle afin de faciliter une reprise après une défaillance qui est détectée dans le processus entre deux points de contrôle, qui comprend des moyens pour commander tout le processus de traitement de communication et des moyens pour générer une pluralité de paquets de communication sous la supervision des moyens de commande, le système de traitement de communication étant caractérisé en ce qu'il comprend en outre :
    des moyens pour générer une pluralité de paquets de transfert à partir du paquet de communication respectif en divisant le paquet de communication selon un protocole de communication particulier ;
    des moyens pour associer des informations d'attributs au paquet de transfert respectif, les informations d'attributs indiquant une position relative d'un paquet de transfert parmi la pluralité de paquets de transfert, pour spécifier la dernière position parmi les paquets de transfert ; et
    des moyens pour évaluer si le paquet de transfert doit encore être transféré ou non si les informations d'attributs associées n'indiquent pas la position spécifiée.
  3. Procédé pour mettre en oeuvre un système de traitement de communication basé sur des points de contrôle, dans lequel le processus se poursuit par le prélèvement périodique d'un point de contrôle pour faciliter une reprise après une défaillance détectée entre deux points de contrôle, le procédé étant caractérisé en ce qu'il comprend les étapes consistant à :
    incrémenter un numéro d'ordre permettant d'identifier le point de contrôle respectif chaque fois qu'un point de contrôle a été prélevé ;
    associer des informations d'attributs au paquet de communication respectif généré en appliquant le numéro d'ordre existant au moment où le paquet de communication est généré ;
    comparer les informations d'attributs associées d'un paquet de communication et le numéro d'ordre courant ; et
    transférer le paquet de communication uniquement si les informations d'attributs associées ne coïncident pas avec le numéro d'ordre courant, et retarder le transfert du paquet de communication si les informations d'attributs associées coïncident avec le numéro d'ordre courant jusqu'au moment où les informations d'attributs ne correspondent plus.
  4. Procédé pour mettre en oeuvre un système de traitement de communication basé sur des points de contrôle, pour exécuter un processus et prélever périodiquement un point de contrôle afin de faciliter une reprise après une défaillance qui est détectée dans un processus entre deux points de contrôle, le procédé étant caractérisé en ce que les étapes consistent à :
    générer une pluralité de paquets de communication à transférer sous la supervision d'un dispositif de commande ;
    générer une pluralité de paquets de transfert à partir du paquet de communication respectif en divisant un paquet de communication selon un protocole de communication particulier afin de les ajuster à une taille propre à un support de transfert ;
    associer des informations d'attributs d'une position relative au paquet de transfert respectif généré afin d'identifier quel paquet de transfert est le dernier parmi les paquets de transfert générés ; et
    transférer le paquet de transfert s'il est associé à ce paquet de transfert des informations d'attributs qui n'indiquent pas la dernière position, mais ordonner de retarder le paquet de transfert jusqu'au prélèvement d'un point de contrôle suivant si les informations d'attributs associées indiquent la dernière position.
  5. Procédé selon la revendication 4, caractérisé en ce que le procédé comprend en outre les étapes consistant à :
    évaluer les informations d'attributs associées d'un paquet de transfert à un moment propice à un transfert réel du paquet de transfert ; et
    transférer le paquet de transfert si les informations d'attributs associées sont évaluées comme étant dans une position autre que la dernière position, mais ordonner la mise en attente du transfert jusqu'à ce qu'un point de contrôle suivant soit nouvellement décompté si les informations d'attributs associées sont évaluées comme étant à la dernière position.
EP96304472A 1995-06-19 1996-06-17 Méthode et appareil pour un système de traitement de communication basé sur des points de reprise Expired - Lifetime EP0750261B1 (fr)

Applications Claiming Priority (3)

Application Number Priority Date Filing Date Title
JP151734/95 1995-06-19
JP7151734A JP2878988B2 (ja) 1995-06-19 1995-06-19 チェックポイント通信処理システム
JP15173495 1995-06-19

Publications (2)

Publication Number Publication Date
EP0750261A1 EP0750261A1 (fr) 1996-12-27
EP0750261B1 true EP0750261B1 (fr) 2002-05-08

Family

ID=15525131

Family Applications (1)

Application Number Title Priority Date Filing Date
EP96304472A Expired - Lifetime EP0750261B1 (fr) 1995-06-19 1996-06-17 Méthode et appareil pour un système de traitement de communication basé sur des points de reprise

Country Status (4)

Country Link
US (1) US5832201A (fr)
EP (1) EP0750261B1 (fr)
JP (1) JP2878988B2 (fr)
DE (1) DE69621078T2 (fr)

Families Citing this family (10)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US6185702B1 (en) * 1997-01-24 2001-02-06 Kabushiki Kaisha Toshiba Method and system for process state management using checkpoints
US5875291A (en) * 1997-04-11 1999-02-23 Tandem Computers Incorporated Method and apparatus for checking transactions in a computer system
JP3711433B2 (ja) * 1998-05-06 2005-11-02 セイコーエプソン株式会社 印刷制御方法及びシステム、記録媒体
KR20010037622A (ko) 1999-10-19 2001-05-15 정선종 분산 시스템에서 메모리 체크포인트를 이용한 독립 체크포인팅 방법
US7249193B1 (en) * 2001-08-28 2007-07-24 Emc Corporation SRDF assist
JP4819644B2 (ja) * 2006-10-12 2011-11-24 株式会社日立製作所 情報処理システム、情報処理方法、情報処理装置
US9251002B2 (en) 2013-01-15 2016-02-02 Stratus Technologies Bermuda Ltd. System and method for writing checkpointing data
EP3090345B1 (fr) 2013-12-30 2017-11-08 Stratus Technologies Bermuda Ltd. Procédé permettant de retarder des points de contrôle par l'inspection de paquets de réseau
WO2015102875A1 (fr) 2013-12-30 2015-07-09 Stratus Technologies Bermuda Ltd. Systèmes et procédés d'établissement de points de reprise au moyen d'un réacheminement de données
EP3090344B1 (fr) 2013-12-30 2018-07-18 Stratus Technologies Bermuda Ltd. Systèmes et procédés de points de contrôle dynamiques

Family Cites Families (6)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US4590554A (en) * 1982-11-23 1986-05-20 Parallel Computers Systems, Inc. Backup fault tolerant computer system
US5043866A (en) * 1988-04-08 1991-08-27 International Business Machines Corporation Soft checkpointing system using log sequence numbers derived from stored data pages and log records for database recovery
EP0465019B1 (fr) * 1990-06-29 1997-05-14 Oracle Corporation Procédé et appareil de gestion d'identificateurs d'état pour reprise efficace
US5555371A (en) * 1992-12-17 1996-09-10 International Business Machines Corporation Data backup copying with delayed directory updating and reduced numbers of DASD accesses at a back up site using a log structured array data storage
US5455946A (en) * 1993-05-21 1995-10-03 International Business Machines Corporation Method and means for archiving modifiable pages in a log based transaction management system
US5418940A (en) * 1993-08-04 1995-05-23 International Business Machines Corporation Method and means for detecting partial page writes and avoiding initializing new pages on DASD in a transaction management system environment

Also Published As

Publication number Publication date
JP2878988B2 (ja) 1999-04-05
EP0750261A1 (fr) 1996-12-27
DE69621078T2 (de) 2002-10-31
DE69621078D1 (de) 2002-06-13
US5832201A (en) 1998-11-03
JPH098869A (ja) 1997-01-10

Similar Documents

Publication Publication Date Title
EP0750261B1 (fr) Méthode et appareil pour un système de traitement de communication basé sur des points de reprise
US8179923B2 (en) System and method for transmitting real-time-critical and non-real-time-critical data in a distributed industrial automation system
JP3982353B2 (ja) フォルトトレラントコンピュータ装置、その再同期化方法及び再同期化プログラム
US7496787B2 (en) Systems and methods for checkpointing
US20070028144A1 (en) Systems and methods for checkpointing
US7124319B2 (en) Delay compensation for synchronous processing sets
CA2339783A1 (fr) Systeme informatique insensible aux defaillances
US20100138579A1 (en) Network adaptor optimization and interrupt reduction
US7418558B2 (en) Information processing system, system control apparatus, and system control method
JPS62150948A (ja) バス故障箇所検出方式
JP3789271B2 (ja) ネットワーク負荷試験方法
JPH1049461A (ja) チェックポイント通信処理システム、及びチェックポイント通信処理方法
US20030229733A1 (en) DMA chaining method, apparatus and system
JP3245552B2 (ja) 転送制御システム
JP2002135280A (ja) ディジタル保護継電システムの伝送方式
JP2000115258A (ja) 通信エラー自動修復システム
JP2642734B2 (ja) データ処理装置
JPH06152624A (ja) トークンパッシング式データ伝送方法
JP2000172308A (ja) 生産計画スケジューリング方法
JPS6116651A (ja) マルチリンク通信処理方式
CN113420038A (zh) 工业控制系统的冗余数据传输方法和装置
JP2002032258A (ja) ジョブ運用方式
JPH088935A (ja) 増設ユニットインタフェース
JPS59200365A (ja) 制御情報転送方式
EP0522759B1 (fr) Circuit de commutation fiable des circuits de traitement des signaux

Legal Events

Date Code Title Description
PUAI Public reference made under article 153(3) epc to a published international application that has entered the european phase

Free format text: ORIGINAL CODE: 0009012

17P Request for examination filed

Effective date: 19960715

AK Designated contracting states

Kind code of ref document: A1

Designated state(s): DE FR GB

GRAG Despatch of communication of intention to grant

Free format text: ORIGINAL CODE: EPIDOS AGRA

17Q First examination report despatched

Effective date: 20010808

GRAG Despatch of communication of intention to grant

Free format text: ORIGINAL CODE: EPIDOS AGRA

GRAH Despatch of communication of intention to grant a patent

Free format text: ORIGINAL CODE: EPIDOS IGRA

REG Reference to a national code

Ref country code: GB

Ref legal event code: IF02

GRAH Despatch of communication of intention to grant a patent

Free format text: ORIGINAL CODE: EPIDOS IGRA

GRAA (expected) grant

Free format text: ORIGINAL CODE: 0009210

AK Designated contracting states

Kind code of ref document: B1

Designated state(s): DE FR GB

REF Corresponds to:

Ref document number: 69621078

Country of ref document: DE

Date of ref document: 20020613

ET Fr: translation filed
PLBE No opposition filed within time limit

Free format text: ORIGINAL CODE: 0009261

STAA Information on the status of an ep patent application or granted ep patent

Free format text: STATUS: NO OPPOSITION FILED WITHIN TIME LIMIT

26N No opposition filed

Effective date: 20030211

REG Reference to a national code

Ref country code: GB

Ref legal event code: 746

Effective date: 20070317

PGFP Annual fee paid to national office [announced via postgrant information from national office to epo]

Ref country code: DE

Payment date: 20080626

Year of fee payment: 13

PGFP Annual fee paid to national office [announced via postgrant information from national office to epo]

Ref country code: FR

Payment date: 20080617

Year of fee payment: 13

PGFP Annual fee paid to national office [announced via postgrant information from national office to epo]

Ref country code: GB

Payment date: 20080618

Year of fee payment: 13

GBPC Gb: european patent ceased through non-payment of renewal fee

Effective date: 20090617

REG Reference to a national code

Ref country code: FR

Ref legal event code: ST

Effective date: 20100226

PG25 Lapsed in a contracting state [announced via postgrant information from national office to epo]

Ref country code: FR

Free format text: LAPSE BECAUSE OF NON-PAYMENT OF DUE FEES

Effective date: 20090630

PG25 Lapsed in a contracting state [announced via postgrant information from national office to epo]

Ref country code: GB

Free format text: LAPSE BECAUSE OF NON-PAYMENT OF DUE FEES

Effective date: 20090617

PG25 Lapsed in a contracting state [announced via postgrant information from national office to epo]

Ref country code: DE

Free format text: LAPSE BECAUSE OF NON-PAYMENT OF DUE FEES

Effective date: 20100101