JP2024128281A

JP2024128281A - Debugging device, debugging system, debugging method, and debugging program

Info

Publication number: JP2024128281A
Application number: JP2023037183A
Authority: JP
Inventors: 成輝前田
Original assignee: NEC Corp
Current assignee: NEC Corp
Priority date: 2023-03-10
Filing date: 2023-03-10
Publication date: 2024-09-24

Abstract

【課題】フォールトトレラントサーバにおいて、片系でデバッグを実施し、デバッグを実施しない方で業務を継続することを可能にする。【解決手段】デバッグ装置は、同期して動作するシステムの各々のメモリのデバッグ領域に、デバッグされるプログラムの中でブレークポイントが設定される命令を含む部分を読み出し、前記同期して動作するシステムのうちのデバッグに用いられるシステムにおいて前記命令をブレークポイント命令で書き換え、前記デバッグ領域のページアウトを抑制するように構成された設定部と、ブレークポイント例外が発生したとき、前記同期して動作するシステムの同期を解除するように構成された例外処理部と、を備える。【選択図】図１[Problem] In a fault-tolerant server, it is possible to perform debugging in one system and continue business in the other system not performing debugging. [Solution] The debugging device comprises a setting unit configured to read a portion including an instruction for setting a breakpoint in a program to be debugged into a debug area in the memory of each of the systems operating in synchronization, and to rewrite the instruction with a breakpoint instruction in a system used for debugging among the systems operating in synchronization, thereby suppressing page-out of the debug area, and an exception processing unit configured to release the synchronization of the systems operating in synchronization when a breakpoint exception occurs. [Selected Figure] Figure 1

Description

本発明は、デバッグ装置、デバッグシステム、デバッグ方法及びデバッグプログラムに関する。 The present invention relates to a debugging device, a debugging system, a debugging method, and a debugging program.

一般に、ユーザがデバッグを行う場合、サーバのレジスタ、スタック、メモリデータ等のデバッグ情報を得るために、業務ＡＰ、あるいはＯＳ、ドライバ等にブレークポイントを設定し、設定したブレークポイントで業務ＡＰないしはシステム全体を止める必要がある。 Generally, when a user debugs, they need to set breakpoints in the business AP, OS, drivers, etc., and stop the business AP or the entire system at the set breakpoints in order to obtain debugging information such as the server's registers, stack, and memory data.

一方、システム全体が止まることを防止するように構成されたサーバ、すなわち、構成部品の一部が故障しても正常に処理を続行するサーバとしてフォールトトレラント（Fault-Tolerant（ＦＴ））サーバがある。フォールトトレラントサーバとはハードウェアの部品を二重化し、障害発生時にもシステムが停止することのない仕組みを備えた非常に可用性が高いサーバである。 On the other hand, there are fault-tolerant (FT) servers, which are configured to prevent the entire system from stopping, that is, servers that continue processing normally even if some of their components fail. A fault-tolerant server is a highly available server that has duplicated hardware components and is equipped with a mechanism that prevents the system from stopping even when a failure occurs.

ＦＴサーバは可用性のためにシステムが二重化されており、両方のシステムのメモリの内容を同一にした上で、各システムのプロセッサをクロック単位で同期させて冗長化させるロックステップ技術により、両方のシステムで全く同じ動作をしている。これにより、故障発生時には対象となるＣＰＵサブシステムを論理的に切り離し、正常な方で動作を続行させることができる。 For availability, the FT server has a duplicated system, with the memory contents of both systems being made identical, and lockstep technology, which synchronizes the processors of each system on a clock-by-clock basis for redundancy, ensures that both systems operate in exactly the same way. This means that in the event of a failure, the affected CPU subsystem can be logically isolated, allowing operation to continue on the healthy system.

しかし、通常の手段ではＦＴサーバの片方のシステムのみにブレークポイントを設定するといった操作ができないため、デバッグを行うためにブレークポイントを設定した場合、ＦＴサーバの両方のシステムを止める必要がある。すなわち、二重化されているＦＴサーバにおいては、両系のプロセッサとメモリをクロック単位で同期させて冗長化させるロックステップ技術により、両系切り離し後であっても片系のみブレークポイントで停止するような運用はできない。そのため、汎用サーバと同様に、デバッグのためにブレークポイントを設定した際は業務ＡＰないしはシステム全体を止める必要があった。 However, using normal means, it is not possible to set a breakpoint on only one of the FT server systems, so when a breakpoint is set for debugging, both FT server systems must be stopped. In other words, in a duplicated FT server, due to lockstep technology that synchronizes the processors and memory of both systems on a clock-by-clock basis to provide redundancy, it is not possible to stop only one system at a breakpoint, even after the two systems are separated. Therefore, just like with a general-purpose server, when a breakpoint is set for debugging, it is necessary to stop the business AP or the entire system.

関連技術には、ファイルの再配置を行うときにスワップアウトを禁止するもの（特許文献１）、ＦＴサーバの同期ずれなどの原因を調査するために両系のＣＰＵの情報を外部にある１台のデバッガで確認するもの（特許文献２）、ＦＴサーバの同期を解除して2つのシステムに分割し、一方で業務を継続し、もう一方でバックアップを採取することで、業務に影響を与えることなく、バックアップの採取を行うもの（特許文献３）、プログラムに設定されたブレークポイントを監視し、所定の条件が満たされる場合、プログラムの実行を停止するもの（特許文献４）、ＦＴサーバの同期を意図的に一時的に解除することで、それぞれ独立して動作する２つのシステムに分割するもの（特許文献５）がある。 Related technologies include prohibiting swap-out when relocating files (Patent Document 1), checking the CPU information of both systems with a single external debugger to investigate causes such as desynchronization of the FT servers (Patent Document 2), desynchronizing the FT server and splitting it into two systems, continuing business operations on one while taking a backup on the other, thereby taking a backup without affecting business operations (Patent Document 3), monitoring breakpoints set in a program and stopping program execution when a certain condition is met (Patent Document 4), and splitting the FT server into two systems that operate independently by intentionally and temporarily desynchronizing it (Patent Document 5).

特開昭６２－２４５４４５号公報Japanese Patent Application Publication No. 62-245445 特許５４７７７２５号公報Patent No. 5477725 特開２０１３－２０６０５２号公報JP 2013-206052 A 特開２０１３－２０６０６１号公報JP 2013-206061 A 特開２０２２－０３６７７８号公報JP 2022-036778 A

ＦＴサーバにおいて二重化されているシステムは両系とも同じ動作をしているため、本来デバッグを実施するのは片方のシステムのみで十分であり、デバッグを実施しない方のシステムで業務を継続できるはずである。しかし、上述の理由によりそうした運用ができないことが課題だった。 Since both systems in a duplicated FT server operate in the same way, it should be sufficient to perform debugging on only one of the systems, and operations should be able to continue on the system not being debugged. However, for the reasons mentioned above, this was not possible, which was an issue.

本発明は、デバッグ装置、デバッグシステム、デバッグ方法及びデバッグプログラムを提供する。 The present invention provides a debugging device, a debugging system, a debugging method, and a debugging program.

また、本発明の第一の態様は、デバッグ装置が、同期して動作するシステムの各々のメモリのデバッグ領域に、デバッグされるプログラムの中でブレークポイントが設定される命令を含む部分を読み出し、前記同期して動作するシステムのうちのデバッグに用いられるシステムにおいて前記命令をブレークポイント命令で書き換え、前記デバッグ領域のページアウトを抑制するように構成された設定部と、
ブレークポイント例外が発生したとき、前記同期して動作するシステムの同期を解除するように構成された例外処理部と、
を備えることを特徴とする。 A first aspect of the present invention is a debugging device including a setting unit configured to read a portion of a program to be debugged, including an instruction for setting a breakpoint, from a debug area of each memory of systems operating in synchronization with each other, and to rewrite the instruction with a breakpoint instruction in a system used for debugging among the systems operating in synchronization with each other, thereby suppressing a page-out of the debug area;
an exception handler configured to desynchronize the synchronously operating systems when a breakpoint exception occurs;
The present invention is characterized by comprising:

また、本発明の第二の態様は、デバッグシステムが、フォールトトレラントサーバと、
上記のデバッグ装置と、
を備えることを特徴とする。 A second aspect of the present invention is a debugging system including:
The above debugging equipment;
The present invention is characterized by comprising:

また、本発明の第三の態様は、デバッグ方法が、同期して動作するシステムの各々のメモリのデバッグ領域に、デバッグされるプログラムの中でブレークポイントが設定される命令を含む部分を読み出し、前記同期して動作するシステムのうちのデバッグに用いられるシステムにおいて前記命令をブレークポイント命令で書き換え、前記デバッグ領域のページアウトを抑制するステップと、
ブレークポイント例外が発生したとき、前記同期して動作するシステムの同期を解除するステップと、
を有することを特徴とする。 A third aspect of the present invention is a debugging method comprising the steps of: reading, into a debug area of a memory of each of systems operating in synchronization with each other, a portion of a program to be debugged that includes an instruction at which a breakpoint is set; rewriting, in a system used for debugging among the systems operating in synchronization with each other, the instruction with a breakpoint instruction; and suppressing a page-out of the debug area;
desynchronizing said synchronously operating systems when a breakpoint exception occurs;
The present invention is characterized by having the following.

本発明の第四の態様は、デバッグプログラムが、同期して動作するシステムの各々のメモリのデバッグ領域に、デバッグされるプログラムの中でブレークポイントが設定される命令を含む部分を読み出し、前記同期して動作するシステムのうちのデバッグに用いられるシステムにおいて前記命令をブレークポイント命令で書き換え、前記デバッグ領域のページアウトを抑制するように構成された設定部、および
ブレークポイント例外が発生したとき、前記同期して動作するシステムの同期を解除するように構成された例外処理部、
としてコンピュータを機能させることを特徴とする。 A fourth aspect of the present invention provides a setting unit configured so that a debug program reads out a portion of a program to be debugged, including an instruction at which a breakpoint is set, into a debug area of each memory of systems operating in synchronization with each other, and rewrites the instruction in a system used for debugging among the systems operating in synchronization with each other with a breakpoint instruction, thereby suppressing page-out of the debug area; and an exception processing unit configured to release synchronization between the systems operating in synchronization with each other when a breakpoint exception occurs.
The present invention is characterized in that the computer functions as a

本発明によれば、ユーザは、ＦＴサーバの片系で業務ＡＰの動作を継続しつつ、もう片系においてブレークポイントで止めた状態で必要なデバッグ情報を得ることができる。 According to this invention, the user can continue running a business AP on one side of the FT server while stopping the other side at a breakpoint to obtain the necessary debug information.

本発明の実施形態による、デバッグ装置の構成を示す図である。FIG. 2 is a diagram showing the configuration of a debugging device according to an embodiment of the present invention. 本発明の実施形態による、ＦＴサーバの同期状態及び同期解除状態を示す図である。FIG. 2 illustrates the synchronized and de-synchronized states of an FT server according to an embodiment of the invention. 本発明の実施形態による、ブレークポイント設定のフローチャートである。4 is a flowchart of setting a breakpoint according to an embodiment of the present invention. 図３の各ステップにおけるシステム０及びシステム１のメモリの状態（１）を示す図である。FIG. 4 is a diagram showing the memory states (1) of system 0 and system 1 at each step in FIG. 3. 図３の各ステップにおけるシステム０及びシステム１のメモリの状態（２）を示す図である。FIG. 4 is a diagram showing the memory states (2) of system 0 and system 1 at each step in FIG. 3. 図３の各ステップにおけるシステム０及びシステム１のメモリの状態（３）を示す図である。FIG. 4 is a diagram showing the memory states (3) of System 0 and System 1 at each step in FIG. 3. 図３の各ステップにおけるシステム０及びシステム１のメモリの状態（４）を示す図である。FIG. 4 is a diagram showing the memory states (4) of System 0 and System 1 at each step in FIG. 3. 図３の各ステップにおけるシステム０及びシステム１のメモリの状態（５）を示す図である。FIG. 4 is a diagram showing the memory states (5) of System 0 and System 1 at each step in FIG. 3. 本発明の実施形態による、ＦＴサーバにおけるデバッグの実施から復帰までのフローチャートである。13 is a flowchart showing a process from debugging to recovery in an FT server according to an embodiment of the present invention. 本発明の他の実施形態による、ＦＴサーバの同期状態及び同期解除状態を示す図である。FIG. 13 illustrates the synchronized and de-synchronized states of an FT server according to another embodiment of the present invention.

まず、本発明の実施形態における基本的な構成および動作を説明する。ユーザはブレークポイントの設定を行う。ブレークポイントの設定は、メモリ上の該当メモリにある命令について、ブレークポイント例外を発生させる命令に書き換える方式を使用する。ＦＴサーバがこの命令を実行してブレークポイント例外が発生したときがデバッグの契機となる。この契機でＦＴサーバの同期を解除し、それぞれ独立して動作する２つのシステム（システム０、システム１）に分割することが可能である。同期を解除している間、システム０、システム１をそれぞれ異なる目的で稼働できる。 First, the basic configuration and operation of an embodiment of the present invention will be described. The user sets a breakpoint. A breakpoint is set by rewriting an instruction in a relevant memory location into an instruction that generates a breakpoint exception. The opportunity to start debugging occurs when the FT server executes this instruction and a breakpoint exception occurs. This opportunity makes it possible to release the synchronization of the FT server and split it into two systems (system 0 and system 1) that operate independently. While the synchronization is released, system 0 and system 1 can each be operated for different purposes.

上記のデバッグ方式は、一実施形態では、デバッグ装置を用いて実現することができる。なお、デバッグ装置は、ハードウェアとソフトウェアを用いて機能を実現してもよく、例えば、サーバのディスクからプログラムを読み出し、サーバのプロセッサがそのプログラムを実行することによって機能が実現されてもよい。あるいは、デバッグ装置の機能は、サーバ内の回路または装置として実現されてもよい。デバッグ装置は、端末からコマンドを使用して制御されてもよい。 In one embodiment, the above debugging method can be realized using a debugging device. The debugging device may realize its functions using hardware and software, for example, by reading a program from a disk of the server and having the server's processor execute the program. Alternatively, the functions of the debugging device may be realized as a circuit or device within the server. The debugging device may be controlled using commands from a terminal.

図１は、本発明の実施形態による、デバッグ装置の構成を示す図である。デバッグ装置１００は、設定部１１０と例外処理部１２０を備える。設定部１１０は、フォールトトレラントサーバにおいて同期して動作するシステムの各々のメモリのデバッグ領域に、デバッグされるプログラムの中でブレークポイントが設定される命令を含む部分を読み出し、同期して動作するシステムのうちのデバッグに用いられるシステムにおいてその命令をブレークポイント命令で書き換え、デバッグ領域のページアウトを抑制するように構成される。例外処理部１２０は、ブレークポイント例外が発生したとき、フォールトトレラントサーバにおいて同期して動作するシステムの同期を解除するように構成される。 Figure 1 is a diagram showing the configuration of a debugging device according to an embodiment of the present invention. The debugging device 100 includes a setting unit 110 and an exception processing unit 120. The setting unit 110 is configured to read a portion of a program to be debugged that includes an instruction for setting a breakpoint in a debug area of each memory of systems operating synchronously in a fault-tolerant server, and rewrite the instruction with a breakpoint instruction in a system used for debugging among the systems operating synchronously, thereby suppressing page-out of the debug area. The exception processing unit 120 is configured to release the synchronization of the systems operating synchronously in the fault-tolerant server when a breakpoint exception occurs.

このようにして、同期を解除してそれぞれ独立して動作する２つのシステムに分割できるＦＴサーバの特長を利用して、ＦＴサーバの片系で業務ＡＰを動作させつつ、ＦＴサーバのもう片系ではブレークポイント設定した命令で業務ＡＰを止め、ユーザがレジスタ、スタック、メモリデータ等のデバッグ情報を得ることが可能になる。 In this way, by taking advantage of the feature of the FT server that it can be desynchronized and divided into two systems that operate independently, a business AP can be run on one system of the FT server while the other system of the FT server stops the business AP with an instruction that has a breakpoint set, allowing the user to obtain debug information such as register, stack, and memory data.

以下、本発明の実施形態における、より詳細なＦＴサーバの構成および動作の具体例を説明する。 Below, we will explain in more detail the configuration and operation of an FT server in an embodiment of the present invention.

図２は、本発明の実施形態による、ＦＴサーバ１０の同期状態及び同期解除状態を示す図である。図２の左側は同期状態を示し、右側は同期解除後の同期解除状態を示す。 Figure 2 is a diagram showing the synchronized and desynchronized states of the FT server 10 according to an embodiment of the present invention. The left side of Figure 2 shows the synchronized state, and the right side shows the desynchronized state after desynchronization.

ＦＴサーバ１０は、２つの中央処理装置（ＣＰＵ）モジュール、２つのメモリ（図示しない）、２つのＦＴコントローラ、２つの入出力（ＩＯ）モジュールを有する。ＩＯモジュールは、ネットワークインターフェースカード（ＮＩＣ）及びディスク（記憶装置）を有する。ＦＴサーバ１０はＮＩＣを介してネットワークに接続する。ＣＰＵモジュール及びＩＯモジュールはデバッグポートを有する。デバッグポート０、デバッグポート１、デバッグポート０’、デバッグポート１’が、それぞれＣＰＵモジュール０、ＣＰＵモジュール１、ＩＯモジュール０、ＩＯモジュール１に設けられ、切り替えスイッチ１１を介してデバッグ端末１２から接続することができる。 The FT server 10 has two central processing unit (CPU) modules, two memories (not shown), two FT controllers, and two input/output (IO) modules. The IO module has a network interface card (NIC) and a disk (storage device). The FT server 10 connects to the network via the NIC. The CPU module and the IO module have debug ports. Debug port 0, debug port 1, debug port 0', and debug port 1' are provided in CPU module 0, CPU module 1, IO module 0, and IO module 1, respectively, and can be connected from the debug terminal 12 via the changeover switch 11.

図２の左側に示す同期状態において、１つのＯＳが動作し、このＯＳの制御の下でＣＰＵモジュール０とＣＰＵモジュール１、ＦＴコントローラ０とＦＴコントローラ１はそれぞれ同期しており、ＩＯモジュール０（ＮＩＣ０、ディスク０）とＩＯモジュール１（ＮＩＣ１、ディスク１）は二重化されている。 In the synchronized state shown on the left side of Figure 2, one OS is running, and under the control of this OS, CPU module 0 and CPU module 1, FT controller 0 and FT controller 1 are synchronized, and IO module 0 (NIC 0, disk 0) and IO module 1 (NIC 1, disk 1) are duplicated.

図２の右側に示す同期解除状態において、ＯＳはＯＳ＃１とＯＳ＃２に分かれて動作し、同期状態ではＦＴサーバ１０は１つのシステムとして動作していたが、システム０（ＣＰＵモジュール０、システム０用のメモリ、ＦＴコントローラ０、ＩＯモジュール０）とシステム１（ＣＰＵモジュール１、システム１用のメモリ、ＦＴコントローラ１、ＩＯモジュール１）の２つの系に分割される。ＩＯモジュール０（ＮＩＣ０、ディスク０）とＩＯモジュール１（ＮＩＣ１、ディスク１）の二重化はそのままである。すなわち、システム１とシステム２から二重化された状態で使用される。 In the desynchronized state shown on the right side of Figure 2, the OS operates separately as OS#1 and OS#2, and in the synchronized state, FT server 10 operates as a single system, but is now divided into two systems: system 0 (CPU module 0, memory for system 0, FT controller 0, IO module 0) and system 1 (CPU module 1, memory for system 1, FT controller 1, IO module 1). IO module 0 (NIC 0, disk 0) and IO module 1 (NIC 1, disk 1) remain duplicated. In other words, they are used in a duplicated state from system 1 and system 2.

同期解除後、デバッグを行う場合には、切り替えスイッチ１１により、デバッグを行う側の系（ここではシステム１とする）のデバッグポート（デバッグポート１）にデバッグ端末１２を接続する。デバッグコマンドによってレジスタ情報、スタック情報、メモリデータ等必要なデバッグ情報を得る。ＦＴサーバ１０は、同期を解除してそれぞれ独立して動作する２つのシステムに分割されているので、片系のみの運用となったことをＯＳや動作中のプロセスは認識しない。つまり、ユーザはＯＳや動作中のプロセスから見えないところで解析用、調査用のデータを抜き出すことができる。ユーザによるデバッグ終了後、業務処理を行うために継続して動作していたシステム０をマスタにして、デバッグに使用していたシステム１を組み込み、同期状態に復帰することができる。 When debugging is to be performed after desynchronization has been released, the debug terminal 12 is connected to the debug port (debug port 1) of the system to be debugged (here, system 1) using the changeover switch 11. Necessary debug information such as register information, stack information, and memory data is obtained using the debug command. Because the FT server 10 has been desynchronized and divided into two systems that operate independently, the OS and running processes are not aware that only one system is being operated. In other words, the user can extract data for analysis and investigation without being visible to the OS and running processes. After the user has finished debugging, system 0, which has continued to operate for business processing, becomes the master, system 1, which was used for debugging, is incorporated, and a synchronized state can be restored.

次に、図３を参照して、上記で説明した同期状態から同期解除状態への移行、及び同期状態への復帰の流れを詳細に説明する。図３は、本発明の実施形態による、ブレークポイント設定のフローチャートである。 Next, the above-described flow of transition from a synchronized state to a desynchronized state and return to a synchronized state will be described in detail with reference to FIG. 3. FIG. 3 is a flowchart of setting a breakpoint according to an embodiment of the present invention.

図３の概略は次の通りである。ユーザはデバックの契機となるブレークポイントの設定を行う。ブレークポイントの設定は、メモリ上の該当メモリにある命令について、ブレークポイント例外を発生させる命令に書き換える方式を使用する。図３に示すように、ＦＴサーバ１０において本発明の実施形態によるデバッグ方式を実現する場合、メモリ上の命令を書き換えるときに、片系のメモリのみ書き換える。また、ページアウトによってメモリの内容がディスクに書き出される際に両系のメモリが一致しているかどうかがチェックされるため、内容が不一致のページについてはページアウトを抑制する必要がある。 The outline of Figure 3 is as follows. The user sets a breakpoint that will trigger debugging. A breakpoint is set by rewriting instructions in the relevant memory to instructions that will cause a breakpoint exception. As shown in Figure 3, when implementing the debugging method according to an embodiment of the present invention on the FT server 10, when rewriting instructions in memory, only the memory in one system is rewritten. Also, when the contents of memory are written to disk by paging out, a check is made to see if the memories in both systems match, so it is necessary to suppress paging out for pages with mismatched contents.

図３に示す具体的な手順は次の通りである。ステップＳ１１において、ブレークポイントを設定する命令があるページをページインする。例えば、デバッグされる業務ＡＰの中でブレークポイントを設定したい命令を含む所定の単位のプログラム部分をディスクからメモリに読み出す。ブレークポイントを設定する命令は、例えば、ユーザがデバッグ端末１２を用いて指示する。それ以外のステップＳ１１からＳ１５の処理は、例えば、図１のデバッグ装置１００の設定部１１０が実行してもよい。ページインする際、各システムのメモリのあるページ（ここではページ番号０のページ）にはデバッグ領域がロードされる（図４）。すなわち、図４において、システム０およびシステム１の各々のメモリのページ番号０のページに、ブレークポイントを設定する命令があるページが、デバッグ領域としてロードされる。続いて、ステップＳ１２において、ブレークポイントを設定する命令があるページをメモリ上でロックする。例えば、Linux（登録商標）OSのAPI mlock等の一般的なＯＳ機能（ＡＰＩ）を使用して対象のメモリ領域をロックする（図５）。すなわち、システム０およびシステム１の各々のメモリのページ番号０のページがロックされる。その後、ステップＳ１３において、両系ともシステムの動作を止めて、二重化状態を解除する。すなわち、両系のメモリの同期をＦＴサーバで解除する（図６）。同期が解除されている状態で、ステップＳ１４において、システム１系のみブレークポイントを設定するアドレスにある命令を書き換える。すなわち、デバッグを行うシステム１のメモリのみ命令を書き換える（図７）。書き換え後、ステップＳ１５において、二重化に復帰して、システムの動作を再開する。すなわち、同期状態に復帰する（図８）。この状態では、両系のメモリの内容は不一致であるが、メモリがロックされており、ページアウトが抑制されているため、ＯＳによってメモリの不一致が検出されることはない。 The specific procedure shown in FIG. 3 is as follows. In step S11, a page containing an instruction to set a breakpoint is paged in. For example, a program portion of a predetermined unit containing an instruction for setting a breakpoint in the business AP to be debugged is read from the disk to memory. The instruction to set a breakpoint is, for example, instructed by the user using the debug terminal 12. The other steps S11 to S15 may be executed by, for example, the setting unit 110 of the debugging device 100 in FIG. 1. When paging in, a debug area is loaded into a certain page (page number 0 in this case) of the memory of each system (FIG. 4). That is, in FIG. 4, a page containing an instruction to set a breakpoint is loaded as a debug area into page number 0 of the memory of each of systems 0 and 1. Next, in step S12, a page containing an instruction to set a breakpoint is locked in memory. For example, a general OS function (API) such as API mlock of the Linux (registered trademark) OS is used to lock the target memory area (FIG. 5). That is, the page with page number 0 of the memory of each of systems 0 and 1 is locked. After that, in step S13, both systems stop operating and the duplex state is released. That is, the FT server releases the synchronization of the memories of both systems (FIG. 6). In the released state of synchronization, in step S14, the instructions at the address where the breakpoint is set only in system 1 are rewritten. That is, only the instructions in the memory of system 1 to be debugged are rewritten (FIG. 7). After the rewriting, in step S15, duplex is restored and system operation is resumed. That is, the synchronized state is restored (FIG. 8). In this state, the contents of the memories of both systems are inconsistent, but because the memories are locked and page-outs are suppressed, the OS does not detect the memory inconsistency.

次に、ＦＴサーバ１０におけるデバッグの実施から復帰までを説明する。図３に示す手順が行われた後、ＦＴサーバ１０において、例えば、デバッグされる業務ＡＰを動作させる。図９のステップＳ２１において、システム１系のみブレークポイント例外が発生し、例外ハンドラ（図１の例外処理部１２０）が実行開始する。ブレークポイント例外の発生後、ステップＳ２２において、システム０系と１系の同期を解除する。すなわち、例外ハンドラでシステム１系を切り離し、システム０系のみ動作を継続する。ステップＳ２３において、デバッグポートとデバッグ端末１２を接続するスイッチをシステム１系に接続し、デバッグ端末１２に制御を移す。ステップＳ２４において、デバッグ端末１２からデバッグコマンドを受け付け、レジスタ、スタック、メモリ情報等のデバッグ情報を取得する。これにより、ユーザはデバッグコマンドによってレジスタ情報、スタック情報、メモリデータ等必要なデバッグ情報を得る。システム０系は業務ＡＰの動作を継続する。デバッグ終了後、ステップＳ２５において、デバッグに使用していたシステム１系のみリセットないしは電源Ｏｆｆし、これによりシステム１のメモリ情報等をクリアする。その後、ステップＳ２６において、業務を継続して動作しているシステム０系をマスタにして、デバッグに使用していたシステム１系を組み込む。ステップＳ２５、Ｓ２６は、ユーザが個々にコマンドを入力することによって行ってもよく、または図１のデバッグ装置１００が復帰処理部（図示しない）をさらに備え、ユーザがコマンドを使用して復帰処理部にステップＳ２５、Ｓ２６を実行させてもよい。ステップＳ２７において、システム０系と１系が同期して業務を継続する。これによって、ＦＴサーバ１０は二重化に復帰する。 Next, the process from debugging to recovery in the FT server 10 will be described. After the procedure shown in FIG. 3 is performed, for example, the business AP to be debugged is operated in the FT server 10. In step S21 of FIG. 9, a breakpoint exception occurs only in the system 1 system, and the exception handler (exception processing unit 120 in FIG. 1) starts to execute. After the breakpoint exception occurs, in step S22, the synchronization between the system 0 system and the system 1 system is released. That is, the exception handler disconnects the system 1 system, and only the system 0 system continues to operate. In step S23, the switch connecting the debug port and the debug terminal 12 is connected to the system 1 system, and control is transferred to the debug terminal 12. In step S24, a debug command is received from the debug terminal 12, and debug information such as register, stack, and memory information is obtained. As a result, the user obtains necessary debug information such as register information, stack information, and memory data by the debug command. The system 0 system continues to operate the business AP. After debugging is completed, in step S25, only system 1 used for debugging is reset or powered off, thereby clearing the memory information of system 1. After that, in step S26, system 0, which continues to operate, becomes the master, and system 1 used for debugging is incorporated. Steps S25 and S26 may be performed by the user inputting commands individually, or the debugging device 100 in FIG. 1 may further include a recovery processing unit (not shown), and the user may use a command to cause the recovery processing unit to execute steps S25 and S26. In step S27, systems 0 and 1 are synchronized to continue operations. This causes the FT server 10 to return to duplication.

以上の通り、ユーザがＦＴサーバでデバッグを行う際、ＦＴサーバは、デバッグ端末からのコマンドに従って、デバッグを行う方のシステムでのみメモリ上の命令でブレークポイントを発生させる命令に書き換え、また、ページアウトによって両システムのメモリの不一致を検出されないようにページアウトを抑制する。デバッグのためにブレークポイント例外が発生し、例外ハンドラが実行開始した際、ＦＴサーバは両システム間の同期を解除してロックステップの機能を自動でキャンセルする。これにより、ユーザはブレークポイントを設定したシステムのみを独立して障害情報採取に利用でき、もう片方のシステムではデバッグの影響を受けずに業務を継続できる。 As described above, when a user debugs on the FT Server, the FT Server follows commands from the debugging terminal to rewrite instructions in memory only on the system being debugged to generate a breakpoint, and also suppresses page-outs so that memory inconsistencies between the two systems are not detected by page-outs. When a breakpoint exception occurs for debugging and the exception handler begins execution, the FT Server releases synchronization between the two systems and automatically cancels the lockstep function. This allows the user to independently collect fault information on only the system where the breakpoint was set, while the other system can continue operations without being affected by debugging.

上記のように、ユーザは、ＦＴサーバの片系で業務ＡＰの動作を継続しつつ、もう片系においてブレークポイントで止めた状態でレジスタ情報、スタック情報、メモリデータ等、必要なデバッグ情報を得ることができる。その際、プロセッサとメモリは、同期を解除してそれぞれ独立して動作する２つのシステムに分割できるというＦＴサーバの特長を利用しているため、片系のみの運用となったことはＯＳや動いているプロセスからは認識できない。つまり、ＯＳや動いているプロセスから見えないところで解析用、調査用のデータを抜き出すことができる。 As described above, the user can continue running business APs on one side of the FT server, while stopping them at a breakpoint on the other side, and obtain the necessary debug information such as register information, stack information, memory data, etc. In this case, the FT server's feature of being able to separate the processor and memory into two systems that can be desynchronized and run independently is utilized, so the fact that only one side is being used is not noticeable to the OS or running processes. In other words, data for analysis or investigation can be extracted without being seen by the OS or running processes.

次に、図１０を参照して、本発明による他の実施例を説明する。図１０の左側に示す左側に示す同期状態は、図２の左側に示す左側に示す同期状態と同じである。図２の右側に示す同期解除状態において、ＯＳはＯＳ＃１とＯＳ＃２に分かれて動作し、ＩＯモジュール０（ＮＩＣ０、ディスク０）とＩＯモジュール１（ＮＩＣ１、ディスク１）の二重化はそのままであるが、図１０の右側に示す同期解除状態は、ＩＯモジュールも含めてシステムごとに全て切り離されている。 Next, another embodiment of the present invention will be described with reference to FIG. 10. The synchronized state shown on the left side of FIG. 10 is the same as the synchronized state shown on the left side of FIG. 2. In the desynchronized state shown on the right side of FIG. 2, the OS operates separately as OS#1 and OS#2, and the duplication of IO module 0 (NIC0, disk 0) and IO module 1 (NIC1, disk 1) remains the same, but in the desynchronized state shown on the right side of FIG. 10, all systems, including the IO modules, are separated.

ＦＴサーバの同期をユーザが意図的に解除することで、それぞれ独立して動作する２つのシステム（システム０、システム１）に分割することが可能であり、同期を解除している間、システム０、システム１で別々のことを実施できる。また、片系切り離し（同期解除）の際、ＩＯモジュールも含めて全て切り離すことで、ユーザはシステムとしてのデータを丸ごと取得することができる。 By intentionally disabling the FT server synchronization, the user can split it into two systems (system 0 and system 1) that operate independently, and while synchronization is disabled, separate tasks can be carried out on system 0 and system 1. In addition, when isolating one system (disabling synchronization), the user can obtain all of the data for that system by disconnecting everything, including the IO module.

同期解除後、ユーザはデバッグで使用する系のデバッグポートであるデバッグポート１’にデバッグ端末を接続し、デバッグコマンドによってレジスタ情報、スタック情報、メモリデータ等必要なデバッグ情報を得る。 After releasing the synchronization, the user connects a debug terminal to debug port 1', which is the debug port for the system used for debugging, and obtains the necessary debug information such as register information, stack information, and memory data using debug commands.

ＯＳは２つのＩＯモジュールへのパスを有しており、片方が障害を起こしたらそのバスを切り離し、もう片方のバスで業務を継続する。よって、運用システム側からＩＯモジュールの縮退が見えるが、切り離す前の両ＩＯモジュールは二重化状態であり、全く同じ動作をしているため、片系切り離し後も、もう片系で業務継続しつつ、切り離した片系のＩＯモジュールを丸ごとデバッグするという運用が可能となっている。 The OS has two paths to the IO modules, and if one of them fails, that bus is disconnected and operations continue on the other bus. Therefore, the degeneration of the IO module is visible from the operational system, but since both IO modules before disconnection were in a duplicated state and performed exactly the same operation, even after disconnecting one system, it is possible to continue operations on the other system while debugging the entire IO module of the disconnected system.

デバッグ終了後、ＦＴサーバは業務継続動作していたシステム０をマスタにしてデバッグに使用していたシステム１を組み込み、同期状態に復帰する。 After debugging is completed, the FT server will make system 0, which was continuing operations, the master and incorporate system 1, which was used for debugging, and return to a synchronized state.

上記のように、片系切り離し（同期解除）の際、入出力（ＩＯ）モジュールも含めて全て切り離すことで、ユーザはシステムとしてのデータを丸ごと取得することができる。ＯＳは２つのＩＯモジュールへのパスを有しており、片方が障害を起こしたらそのバスを切り離し、もう片方のバスで業務を継続する。よって、上記と異なり、運用システム側からＩＯモジュールの縮退が認識される。なお、切り離す前の両ＩＯモジュールは二重化状態であり、全く同じ動作をしているため、片系切り離し後も、もう片系で業務継続しつつ、切り離した片系のＩＯモジュールを丸ごとデバッグするという運用が可能である。 As mentioned above, when isolating one system (de-synchronization), everything is isolated, including the input/output (IO) module, allowing the user to obtain all of the data from the system. The OS has paths to two IO modules, and if a problem occurs on one of them, that bus is isolated and operations continue on the other bus. Therefore, unlike the above, the degeneration of the IO module is recognized from the operational system. Note that both IO modules are in a duplicated state before isolation and operate in exactly the same way, so even after isolating one system, it is possible to continue operations on the other system while debugging the entire IO module of the isolated system.

上述のコンピュータシステムによる処理の過程は、プログラムの形式でコンピュータ読み取り可能な記録媒体に記憶されており、このプログラムをコンピュータが読み出して実行することによって、上記処理が行われる。ここでコンピュータ読み取り可能な記録媒体とは、磁気ディスク、光磁気ディスク、ＣＤ－ＲＯＭ、ＤＶＤ－ＲＯＭ、半導体メモリ等をいう。また、このコンピュータプログラムを通信回線によってコンピュータに配信し、この配信を受けたコンピュータが当該プログラムを実行するようにしても良い。 The processing steps performed by the computer system described above are stored in the form of a program on a computer-readable recording medium, and the computer reads and executes this program to perform the above processing. Here, computer-readable recording medium refers to a magnetic disk, magneto-optical disk, CD-ROM, DVD-ROM, semiconductor memory, etc. Also, this computer program may be distributed to a computer via a communication line, and the computer that receives the distribution may execute the program.

また、上記プログラムは、このプログラムを記憶装置等に格納したコンピュータシステムから、伝送媒体を介して、あるいは、伝送媒体中の伝送波により他のコンピュータシステムに伝送されてもよい。ここで、プログラムを伝送する「伝送媒体」は、インターネット等のネットワーク（通信網）や電話回線等の通信回線（通信線）のように情報を伝送する機能を有する媒体のことをいう。また、上記プログラムは、前述した機能の一部を実現するためのものであっても良い。さらに、前述した機能をコンピュータシステムにすでに記録されているプログラムとの組み合わせで実現できるもの、いわゆる差分ファイル（差分プログラム）であっても良い。 The above program may also be transmitted from a computer system in which the program is stored in a storage device or the like to another computer system via a transmission medium, or by transmission waves in the transmission medium. Here, the "transmission medium" that transmits the program refers to a medium that has the function of transmitting information, such as a network (communication network) such as the Internet or a communication line (communication line) such as a telephone line. The above program may also be one that realizes part of the above-mentioned functions. Furthermore, it may be a so-called difference file (difference program) that can realize the above-mentioned functions in combination with a program already recorded in the computer system.

本発明は、本発明は、同期して動作するシステムのデバッグに利用することができる。 The present invention can be used to debug systems that operate synchronously.

１０、２０ＦＴサーバ
１１スイッチ
１２デバッグ端末
１００デバッグ装置
１１０設定部
１２０例外処理部 10, 20 FT server 11 switch 12 debug terminal 100 debug device 110 setting unit 120 exception processing unit

Claims

1. A debugging device, comprising:
a setting unit configured to read a portion of a program to be debugged, including an instruction for setting a breakpoint, in a debug area of each memory of the systems operating in synchronization with each other, and to rewrite the instruction with a breakpoint instruction in a system used for debugging among the systems operating in synchronization with each other, thereby suppressing a page-out of the debug area;
an exception handler configured to desynchronize the synchronously operating systems when a breakpoint exception occurs;
1. A debugging apparatus comprising:

The debugging device according to claim 1, wherein the exception processing section maintains duplication of the input/output modules of the synchronously operating system when a breakpoint exception occurs.

The debugging device according to claim 1, wherein the exception processing unit separates input/output modules of the synchronously operating systems for each system when a breakpoint exception occurs.

The debug device according to claim 1, further comprising a recovery processing unit configured to, after releasing the synchronization, synchronize the system by incorporating the system used for debugging into a system other than the system used for debugging that operates in synchronization with the system other than the system used for debugging as a master.

The debugging device according to claim 1, wherein the setting unit is controlled by a debugging terminal.

A fault-tolerant server;
A debugging device according to any one of claims 1 to 5,
A debugging system comprising:

1. A debugging method, comprising:
reading a portion of a program to be debugged including an instruction at which a breakpoint is set in the program to be debugged into a debug area of each memory of the systems operating in synchronization with each other, and rewriting the instruction with a breakpoint instruction in a system used for debugging among the systems operating in synchronization with each other, thereby suppressing a page-out of the debug area;
desynchronizing said synchronously operating systems when a breakpoint exception occurs;
A debugging method having the following steps:

a setting unit configured to read a portion of a program to be debugged, including an instruction for setting a breakpoint, into a debug area of each memory of the systems operating in synchronization with each other, and to rewrite the instruction with a breakpoint instruction in a system used for debugging among the systems operating in synchronization with each other, thereby suppressing page-out of the debug area; and an exception processing unit configured to release the synchronization of the systems operating in synchronization with each other when a breakpoint exception occurs.
A debugging program that allows a computer to function as a