JPH1091478A

JPH1091478A - Agent managing method

Info

Publication number: JPH1091478A
Application number: JP8243356A
Authority: JP
Inventors: Katsuya Okamoto; 克也岡本
Original assignee: Nippon Telegraph and Telephone Corp
Current assignee: Nippon Telegraph and Telephone Corp
Priority date: 1996-09-13
Filing date: 1996-09-13
Publication date: 1998-04-10

Abstract

PROBLEM TO BE SOLVED: To provide an agent managing method which can apply a fault adaptation resumption processing without affecting online performance. SOLUTION: An agent transmitted from a terminal 1 programs a management processing content description part. The agent autonomously sets the trace levels of respective places by the content of the management processing content description part, it autonomously returns to the arbitrary place and re-executes the processing for analysis. Thus, a situation in which program in a center is processed for the agent transmitted from the terminal 1 is grasped on a real time basis. When the retrieval agent which requires time for the processing is used, a system operator detects and analyzes the part of a case in short time.

Description

DETAILED DESCRIPTION OF THE INVENTION

【０００１】[0001]

【発明の属する技術分野】本発明は、端末からセンタに
送信される、発信者の目的を埋め込んだエージェント
を、センタの翻訳エンジン内にあるプレースと連携し、
自律的な知的情報処理を行うエージェント型通信システ
ムにおけるエージェント管理方法に関するものである。[0001] The present invention relates to an agent, in which a purpose of a caller is transmitted from a terminal to a center, in cooperation with a place in a translation engine of the center,
The present invention relates to an agent management method in an agent type communication system that performs autonomous intelligent information processing.

【０００２】[0002]

【従来の技術】従来、プログラムの解析のためのログ情
報取得は、センタ内の対象全処理プログラムにログ取得
マクロを埋め込み、共通的な統計情報収集プログラムが
コールされ、ログ情報をＤＢに記述し、編集ツールでロ
グ情報を編集し管理していた。被管理エージェントのロ
グ取得要否の設定は、コマンドによりテーブルを被管理
エージェント個々に設定することによって対応してお
り、また、各処理プログラムでの統計情報収集プログラ
ムのコール要否は各処理プログラムを個別に修正（マク
ロを外す等）することで対応していた。2. Description of the Related Art Conventionally, to acquire log information for analyzing a program, a log acquisition macro is embedded in all target processing programs in a center, a common statistical information collection program is called, and log information is described in a DB. , Log information was edited and managed with an editing tool. The setting of the log acquisition necessity of the managed agent is performed by setting a table for each managed agent by a command, and the necessity of calling the statistical information collection program in each processing program is determined by each processing program. The problem was dealt with by individually modifying (such as removing macros).

【０００３】また、エージェント型通信システムでは、
エージェントの特徴を活かし、エージェント内にログ取
得機能をプログラミングすることによって、被管理エー
ジェントがそのログを保持し移動することによりログを
取得したり、管理用エージェントに渡し、管理エージェ
ントが通知する形でログ取得を行っていた。[0003] In an agent type communication system,
By taking advantage of the features of the agent and programming the log acquisition function in the agent, the managed agent can hold and move the log to acquire the log, pass it to the management agent, and notify the management agent. I was getting logs.

【０００４】[0004]

【発明が解決しようとする課題】しかしながら、検索型
の処理では処理が長時間となり、処理の途中で問題が起
きた場合、解析のための再実行が困難になるという問題
があった。また、予めトレースレベルをあげておくと、
保持するトレース情報が増大し、システム全体のスルー
プットや応答遅延を招くという問題があった。さらにま
た、エージェントは自律的にプレース間を移動するの
で、異常を生じたプレースの特定が困難であり、全体の
ログをつなぎ合わせて調査するため、障害の解析が長期
化してしまっているのが実状である。However, there is a problem that it takes a long time in search-type processing, and if a problem occurs during the processing, it is difficult to re-execute the analysis. Also, if you raise the trace level in advance,
There is a problem that the amount of trace information to be held increases, which causes a throughput and a response delay of the entire system. Furthermore, since agents move between places autonomously, it is difficult to identify the place where an abnormality has occurred, and the whole log is stitched together to investigate, so the failure analysis has been prolonged. It is a fact.

【０００５】本発明の目的は、エージェントの挙動管
理、障害時の解析の問題に、障害に応じて自律的に任意
のプレースから任意のレベルの詳細トレース情報を取得
可能な障害適応型再開処理をオンライン性能に影響を与
えることなく適用可能なエージェント管理方法を提供す
ることにある。SUMMARY OF THE INVENTION An object of the present invention is to provide a failure adaptive restart process capable of autonomously acquiring detailed trace information of an arbitrary level from an arbitrary place in response to a failure in response to the problem of agent behavior management and failure analysis. An object of the present invention is to provide an agent management method applicable without affecting online performance.

【０００６】[0006]

【課題を解決するための手段】前記課題を解決するた
め、端末から送信されたエージェントに管理用処理内容
記述部及びトレース情報を保持させ、また、各プレース
には管理エージェントを常駐させて該プレースでの処理
ログを保持させることにより、モニタリング結果に応じ
て必要なプレースにおいて有効なトレースレベルを設定
し、エージェントとプレースとの同期を合わせることに
よって、障害プレースに局所化した解析のための再開処
理をオンライン性能に影響を及ぼさずに行うことを特徴
とする。In order to solve the above-mentioned problems, an agent transmitted from a terminal has a management processing content description section and trace information stored therein. By keeping the processing log at the time, the effective trace level is set at the required place according to the monitoring result, and the synchronization with the agent and the place is synchronized, so that the restart processing for the analysis localized to the failure place Is performed without affecting online performance.

【０００７】[0007]

【発明の実施の形態】以下、図面を用いて本発明を詳細
に説明する。DETAILED DESCRIPTION OF THE PREFERRED EMBODIMENTS The present invention will be described below in detail with reference to the drawings.

【０００８】図１は本発明の実施の形態の一例を示すエ
ージェント型通信システムの構成を示すもので、利用者
は端末１よりセンタ２へ、依頼をプログラムしたエージ
ェント３を送信する。エージェント３はセンタ２内の翻
訳エンジン４に入り、各プレース５−１，５−２，……
にて情報交換しながら処理される。また、プレース内の
管理エージェントによりエージェントのトレース情報が
随時、管理ノード６へ通知される。本管理エージェント
は、必要時のみ翻訳エンジン４により起動され、処理終
了後は停止状態にされることによって、ＣＰＵ、メモリ
を有効に活用できる。FIG. 1 shows a configuration of an agent-type communication system showing an example of an embodiment of the present invention. A user transmits an agent 3 which has programmed a request from a terminal 1 to a center 2. The agent 3 enters the translation engine 4 in the center 2, and places 5-1, 5-2,...
Is processed while exchanging information. Also, the management agent in the place notifies the management node 6 of the trace information of the agent at any time. The management agent is activated by the translation engine 4 only when necessary, and is stopped after the processing is completed, so that the CPU and the memory can be effectively used.

【０００９】図２はエージェントが保持するプログラム
構成を示すもので、依頼内容記述部３ａ、管理用処理内
容記述部３ｂ及び管理用トレース情報格納部３ｃからな
っている。なお、モニタリングを必要としないエージェ
ントは依頼内容記述部３ａ以外のプログラムを保持しな
い。FIG. 2 shows a program configuration held by the agent, which comprises a request content description section 3a, a management processing content description section 3b, and a management trace information storage section 3c. The agent that does not require monitoring does not hold any program other than the request content description unit 3a.

【００１０】図３はセンタ内でのエージェントの処理の
流れを示すもので、端末１から送信されたエージェント
は翻訳エンジン４内のプレース５−１に入る。処理の途
中で解析のためにモニタリングが必要となった場合は、
各プレースに存在する管理エージェントより管理用処理
内容記述部及び管理用トレース情報記憶部を受け取り、
モニタリングを開始する。FIG. 3 shows a flow of processing of the agent in the center. The agent transmitted from the terminal 1 enters the place 5-1 in the translation engine 4. If monitoring is required for analysis during the process,
Receiving the management processing content description section and the management trace information storage section from the management agent existing in each place,
Start monitoring.

【００１１】モニタリングを必要とするエージェントが
翻訳エンジン４内のプレース５−１に入ると、現時点で
保持している管理用トレース情報格納部３ｃをプレース
５−１の管理エージェントに渡し、管理エージェントで
はカウンタ値を更新するとともに受け取った管理用トレ
ース情報を管理ノード６へ通知する。エージェント及び
管理エージェントは、各プレース５−１，５−２，５−
３，５−４，５−５，５−６で同様の処理を行い、エー
ジェント自らもトレース結果を携えて端末１に戻ってく
る。When an agent requiring monitoring enters the place 5-1 in the translation engine 4, the management trace information storage unit 3c currently held is transferred to the management agent of the place 5-1. It updates the counter value and notifies the management node 6 of the received management trace information. The agent and the management agent are assigned to each of the places 5-1, 5-2, 5-
The same processing is performed in 3, 5-4, 5-5, and 5-6, and the agent itself returns to the terminal 1 with the trace result.

【００１２】通常の処理は図３に示した流れでモニタリ
ングを実現できるが、障害時には一旦、処理を終了し、
トレース情報を確認し、故障箇所等の推定を行ってか
ら、再度、再現試験をすることになり、解析が長期化し
てしまう。また、図４に示すように、エージェント型通
信システムはエージェント特有の機能により、プレース
１１内で既存のエージェント２１に次の処理を依頼した
り、イベントの発生に応じて、翻訳エンジンがプレース
１２内で新規のエージェント２２を生成したり、また、
検索などのように複数の場所に同一のエージェントを送
る場合にプレース１３内で複製エージェント２３，２４
を作成するといった特徴がある。以下、これらを考慮し
ながら本発明の特徴を説明する。In the normal processing, monitoring can be realized according to the flow shown in FIG. 3, but when a failure occurs, the processing is temporarily terminated, and
After confirming the trace information and estimating the failure location, the reproduction test is performed again, and the analysis is prolonged. As shown in FIG. 4, the agent-type communication system uses an agent-specific function to request the existing agent 21 in the place 11 for the next processing, or to cause the translation engine to execute the processing in the place 12 in response to the occurrence of an event. To create a new agent 22,
When sending the same agent to a plurality of places such as searching, the replication agents 23 and 24
There is a feature that creates. Hereinafter, the features of the present invention will be described in consideration of these.

【００１３】図５は管理エージェントが障害を認識する
項目の一例を示すもので、エージェント３は各プレース
を移動する度にその管理用トレース情報格納部３ｃに、
これらの項目に対応するエラー情報等を保持していく。
各プレースの管理エージェントは前記エラー情報によっ
て、エージェントまたはプレースに障害が発生したこと
を認識する。FIG. 5 shows an example of an item for which the management agent recognizes a failure. Each time the agent 3 moves to each place, the agent 3 stores it in the management trace information storage section 3c.
Error information and the like corresponding to these items are stored.
The management agent of each place recognizes from the error information that a failure has occurred in the agent or the place.

【００１４】図６はエージェント自体が異常を起こした
場合の解析の再開処理フローを示すものである。FIG. 6 shows a flow of processing for resuming the analysis when the agent itself has failed.

【００１５】図６において、エージェント４１がプレー
ス３１において障害等により停止した際、該プレース３
１に常駐している管理エージェント４２がエージェント
４１からトレース情報の受け渡しがないことから異常を
感知し、障害を管理ノード６へ通知する。また、エージ
ェントの機能により、障害等で次の処理へ移動できなく
なったエージェント４１は一旦、停止エージェント保管
プレース３２に格納される。In FIG. 6, when an agent 41 stops in place 31 due to a failure or the like, the place 3
The management agent 42 resident in 1 detects an abnormality because there is no transfer of trace information from the agent 41, and notifies the management node 6 of a failure. Further, the agent 41 which cannot be moved to the next processing due to a failure or the like due to the function of the agent is temporarily stored in the stopped agent storage place 32.

【００１６】停止エージェント保管プレース３２の管理
エージェント４３は、再開処理される指定プレース３３
の管理エージェント４４から該停止エージェント４１に
関するトレース情報３ｂ及び処理内容情報３ｃを受け、
該停止エージェント４１の状態を指定プレース３３での
状態に戻し、さらにトレースレベルの再設定を行い、解
折用に再開処理することを示す戻りフラグをオンにし、
該停止エージェント４１’を指定プレース３３へ送信す
る。送り戻された停止エージェント４１’は、指定プレ
ース３３から各プレースのトレースレベルを上げながら
通常の処理を受けて移動する。The management agent 43 of the stop agent storage place 32 sends the designated place 33 to be restarted.
Receiving the trace information 3b and the processing content information 3c relating to the stop agent 41 from the management agent 44 of
The state of the stop agent 41 is returned to the state of the designated place 33, the trace level is reset, and a return flag indicating that the resumption processing for breaking is turned on,
The stop agent 41 'is transmitted to the designated place 33. The stop agent 41 'sent back receives the normal processing and moves from the designated place 33 while increasing the trace level of each place.

【００１７】図７はプレースが異常を起こした場合の解
析の再開処理フローを示すものである。FIG. 7 shows a flow of processing for resuming analysis when a place has an abnormality.

【００１８】図７において、プレース５１が障害等によ
り停止した際、該障害プレース５１に入ってきたエージ
ェント６１は、この障害プレース５１の管理エージェン
ト６２よりプレース５１の異常通知を受ける。この時、
エージェント６１はモニタリングがオフならばオンに設
定され、トレースレベルの設定を行い、解析用に再開処
理することを示す戻りフラグをオンにし、指定プレース
５２へ送信される。In FIG. 7, when the place 51 is stopped due to a failure or the like, the agent 61 entering the failure place 51 receives an abnormality notification of the place 51 from the management agent 62 of the failure place 51. At this time,
If the monitoring is off, the agent 61 is set to on, sets the trace level, turns on a return flag indicating restart processing for analysis, and is transmitted to the designated place 52.

【００１９】この時、エージェント６１は自らが携える
管理用トレース情報より、該エージェント６１の状態を
指定プレース５２での状態に戻し、また、指定プレース
５２では戻りフラグオンのエージェント受けると、過去
に該エージェント６１との間で行われた処理のインタラ
クション情報を元に、該指定プレース５２の状態をエー
ジェント６１の状態と合わせることにより、解析のため
の再開処理を可能とする。At this time, the agent 61 returns the state of the agent 61 to the state of the designated place 52 from the management trace information carried by the agent 61, and when the agent of the return flag is turned on at the designated place 52, the agent 61 By matching the state of the designated place 52 with the state of the agent 61 based on the interaction information of the processing performed with the agent 61, the restart processing for analysis is enabled.

【００２０】送り戻された停止エージェント６１’は、
指定プレース５２から各プレースのトレースレベルを上
げながら通常の処理を受けて移動する。指定プレース５
２へ送り戻されたエージェント６１’が再び、障害のあ
るプレース５１へ入ってきた時点で、エージェントが保
持する管理用処理内容記述部３ｂの解析のための戻りフ
ラグをオフに設定され、エージェントは通常通りに処理
される。The returned stop agent 61 'is
The user moves from the designated place 52 by receiving normal processing while increasing the trace level of each place. Designated place 5
When the agent 61 ′ sent back to the second place enters the failed place 51 again, the return flag for analysis of the management processing content description section 3 b held by the agent is set to off, and the agent Processed as usual.

【００２１】次に、単一エージェントが移動して処理を
行う通常の形態ではなく、図４に示したようなエージェ
ント特有の処理を行っている時に、プレースが障害等で
停止した場合の再開処理について述べる。Next, instead of the normal mode in which a single agent moves and performs processing, a restart processing when a place stops due to a failure or the like while performing processing specific to an agent as shown in FIG. Is described.

【００２２】図８は新規エージェント生成処理を行って
いた場合の解析の再開処理フローを示すものである。FIG. 8 shows a flow of processing for resuming analysis when a new agent generation processing is being performed.

【００２３】図８において、エージェント８１がプレー
ス７１に入り、特殊な処理のために新規エージェント８
２を作成し、新規エージェント８２に作業を依頼する場
合、該プレース７１に障害が起きなければ、依頼エージ
ェント８１が保持する管理用処理内容記述部及び管理用
トレース情報格納部を新規エージェント８２に受け渡
し、その後の処理に伴うトレース情報の取得を委任す
る。In FIG. 8, an agent 81 enters a place 71, and a new agent 8
2 and requesting the new agent 82 to work, if no failure occurs in the place 71, the management processing content description section and the management trace information storage section held by the request agent 81 are transferred to the new agent 82. Delegate acquisition of trace information for subsequent processing.

【００２４】しかし、新規エージェント８２を作成して
いるプレース７１が障害等により異常が確認された時点
で、解析のための再開処理は作成されたエージェント８
２が行うのではなく、作成元のエージェント８１が指定
プレースに戻り、解析の再開処理を実行する。However, when the place 71 creating the new agent 82 is confirmed to be abnormal due to a failure or the like, the analysis restart processing is performed for the created agent 8.
2, the agent 81 of the creation source returns to the designated place, and executes the analysis resumption processing.

【００２５】この時、作成された新規エージェント８２
は停止され、その後の処理は行わない。解析のための再
開処理のためのエージェントが該プレース７１に再び入
ってきた時点で、再び新規のエージェントを作成し、解
析のための戻りフラグをオフに設定し、その後のトレー
ス情報の取得等の処理を新規エージェントに引き継ぎ、
続く処理は新規エージェントが実行する。At this time, the created new agent 82
Is stopped, and the subsequent processing is not performed. When the agent for the resumption processing for analysis reenters the place 71, a new agent is created again, the return flag for analysis is set to off, and the subsequent acquisition of trace information, etc. Take over the process to a new agent,
Subsequent processing is performed by the new agent.

【００２６】図９は複製による分岐処理を行っていた場
合の解析の再開処理フローを示すものである。FIG. 9 shows a flow of processing for resuming analysis in the case where branch processing by duplication is being performed.

【００２７】図９において、エージェント８４がプレー
ス７２に入り、情報を検索するといった処理のために、
複数のエージェント８５，８６を複製し、分岐させて処
理を行う場合に、該プレース７２に障害が起きなけれ
ば、エージェント８４が保持する管理用処理内容記述部
及び管理用トレース情報格納部を各複製エージェント８
５，８６に受け渡し、その後の処理に伴うトレース情報
の取得を各複製エージェント８５，８６に委任する。In FIG. 9, the agent 84 enters the place 72 and searches for information.
When a plurality of agents 85 and 86 are copied and branched for processing, if the place 72 does not fail, the management processing content description section and the management trace information storage section held by the agent 84 are copied. Agent 8
5 and 86, and the acquisition of trace information accompanying the subsequent processing is delegated to each of the replication agents 85 and 86.

【００２８】しかし、エージェントを複製しているプレ
ース７２が障害等により異常が確認された時点で、解析
のための再開処理は複製されたエージェント８５，８６
が行うのではなく、複製元のエージェント８４が指定プ
レースに戻り、解析の再開処理を実行する。However, when the place 72 where the agent is duplicated is confirmed to be abnormal due to a failure or the like, the resuming process for analysis is performed by the duplicated agent 85 or 86.
Instead, the duplication source agent 84 returns to the designated place and executes the analysis restart processing.

【００２９】この時、複製された複数のエージェント８
５，８６は該プレース７２にて停止され、その後の処理
は行わない。また、解析のための再開処理を行う複製元
エージェント８４が該プレース７２に再び入ってきた時
点で、再び複数のエージェントを複製し、解析のための
戻りフラグをオフに設定し、その後のトレース情報の取
得等の処理を各複製エージェントに引き継ぎ、続く処理
は複製エージェントが実行する。At this time, the plurality of copied agents 8
5, 86 are stopped at the place 72, and the subsequent processing is not performed. Further, when the replication source agent 84 for performing the resumption processing for analysis enters the place 72 again, the plurality of agents are duplicated again, the return flag for analysis is set to off, and the subsequent trace information is set. The duplication agent takes over the processing such as the acquisition of the data, and the duplication agent executes the subsequent processing.

【００３０】図１０は既存エージェントとのmeet処理を
行っていた場合の解析の再開処理フローを示すものであ
る。FIG. 10 shows a flow of a process of resuming analysis when a meet process with an existing agent is being performed.

【００３１】図１０において、エージェント８８は、処
理を行うためにプレース７３に既に存在するエージェン
ト８９と情報交換のmeetを行い、続く処理を既存エージ
ェント８９に依頼する。エージェント８８がプレース７
３に入り、既存エージェント８９と情報交換する場合
に、該プレース７３に障害が起きなければ、エージェン
ト８８が保持する管理用処理内容記述部及び管理用トレ
ース情報格納部を既存エージェント８９に受け渡し、そ
の後の処理に伴うトレース情報の取得を既存エージェン
ト８９に委任する。In FIG. 10, an agent 88 performs an information exchange meet with an agent 89 already existing in the place 73 to perform a process, and requests the existing agent 89 for a subsequent process. Agent 88 places 7
3, when exchanging information with the existing agent 89, if a failure does not occur in the place 73, the management processing content description section and the management trace information storage section held by the agent 88 are transferred to the existing agent 89. Acquisition of trace information accompanying the processing of (1) is delegated to the existing agent 89.

【００３２】しかし、エージェントが処理依頼の情報交
換をしているプレース７３が障害等により異常が確認さ
れた時点で、解析のための再開処理は情報交換した既存
エージェン卜８９が行うのではなく、該プレース７３に
入ってきたエージェント８８が指定プレースに戻り、解
析の再開処理を実行する。However, when the place 73 where the agent is exchanging the information of the processing request is confirmed to be abnormal due to a failure or the like, the resuming process for analysis is not performed by the existing agent 89 which exchanged the information. The agent 88 that has entered the place 73 returns to the designated place, and executes analysis resumption processing.

【００３３】この時、エージェント８８との情報交換に
より作業依頼を受けた既存エージェント８９は該プレー
ス７３にとどまり、その後の処理は行わず、管理エージ
ェント９０がトレース情報を用いて、既存エージェント
８９の状態をエージェント８８がプレース７３で処理を
開始した時点に戻し、該プレース７３は翻訳エンジンに
より、該プレース７３の変分ジャーナル情報を用いてエ
ージェント８８がプレース７３で処理を開始した時点の
状態に戻される。At this time, the existing agent 89 that has received the work request by exchanging information with the agent 88 stays in the place 73, does not perform any subsequent processing, and the management agent 90 uses the trace information to change the state of the existing agent 89. To the time when the agent 88 started processing at the place 73, and the place 73 is returned by the translation engine to the state at the time when the agent 88 started processing at the place 73 using the variation journal information of the place 73. .

【００３４】また、解析のための再開処理を終え、エー
ジェント８８がプレース７３に再び入ってきた時点で、
再び既存エージェント８９と情報交換し、解析のための
戻りフラグをオフに設定し、その後のトレース情報の取
得等の処理を既存エージェント８９に引き継ぎ、続く処
理は既存エージェント８９が実行する。When the resuming process for the analysis is completed and the agent 88 reenters the place 73,
Information is exchanged again with the existing agent 89, the return flag for analysis is set to off, and subsequent processing such as acquisition of trace information is taken over by the existing agent 89, and the subsequent processing is executed by the existing agent 89.

【００３５】次に、エージェントとプレースの状態を戻
す方法の詳細について説明する。Next, the details of the method of returning the state of the agent and the place will be described.

【００３６】図１１は翻訳エンジン内の全エージェント
及び全プレースの状態を戻す場合の再開処理フローを示
す。FIG. 11 shows a flow of a restart process when returning the state of all agents and all places in the translation engine.

【００３７】図１１において、翻訳エンジン４は、状態
の変分ジャーナル情報９１を一定周期で翻訳エンジン内
のデータベースに格納する。障害発見プレース１０１の
管理エージェント１１１が障害検出後、該プレース１０
１のモニタリングを受けているエージェント１１２より
トレース情報、タイムスタンプ１１２ａを受け、管理ノ
ード６へ通知する。In FIG. 11, the translation engine 4 stores the state variation journal information 91 in a database in the translation engine at regular intervals. After the management agent 111 of the failure discovery place 101 detects the failure,
The trace information and the time stamp 112a are received from the agent 112 receiving the monitoring of No. 1 and notified to the management node 6.

【００３８】管理ノード６では、翻訳エンジン４に入っ
てくるエージェントを一時的に止めるために、一時停止
エージェント１１３を翻訳エンジン４及び各プレースに
送信する。再開処理開始時刻を示すタイムスタンプ１１
３ａを含む一時停止エージェント１１３の指示に従い、
全プレースは、指定時刻の状態まで翻訳エンジン４の変
分ログを用いて、翻訳エンジン全体とともに状態を戻
す。The management node 6 transmits a temporary stop agent 113 to the translation engine 4 and each place in order to temporarily stop the agent entering the translation engine 4. Time stamp 11 indicating restart processing start time
According to the instruction of the suspension agent 113 including 3a,
All places return the state together with the entire translation engine using the variation log of the translation engine 4 until the state at the designated time.

【００３９】翻訳エンジン４内の全てのエージェントは
自らの保持するトレース情報を用いて、指定時刻の状態
及び指定時刻に処理を行っていたプレース１０２，１０
３に戻り、その時点から全体の同期合わせをした状態で
解析のための再開処理を行う。All the agents in the translation engine 4 use the trace information held by the translation engine 4 to perform the processing at the specified time state and at the specified time.
Then, the process returns to step 3 and the restart processing for analysis is performed in a state where the entire synchronization has been performed from that point.

【００４０】図１２は翻訳エンジン内の障害に関与する
エージェント及びプレースの状態のみ戻す場合の再開処
理フローを示す。FIG. 12 shows a flow of restart processing when only the states of agents and places involved in a failure in the translation engine are returned.

【００４１】図１２において、エージェント１３１のト
レース情報により、管理エージェント１３２が障害を検
出した後、より詳細なトレース情報を取得するために指
定されたプレース１２３に戻る場合、該エージェント１
３１が保持するトレース情報及びタイムスタンプを用い
て、目的のプレースまで逆順にプレース１２２，１２
４，１２３を移動する。この際、各プレースでの処理時
刻及びエージェントＩＤより、各プレースでは該プレー
スが保持する変分ジャーナル情報１４１，１４２より該
当するエージェントとの処理内容等の状態を、部分的に
指定時刻までに戻していく。In FIG. 12, when the management agent 132 detects a failure based on the trace information of the agent 131 and returns to the place 123 designated to acquire more detailed trace information, the agent 1
Using the trace information and the time stamp held by 31, places 122, 12 in reverse order to the target place
Move 4,123. At this time, based on the processing time and the agent ID in each place, in each place, the state of the processing contents and the like with the corresponding agent is partially returned to the designated time from the variation journal information 141 and 142 held in the place. To go.

【００４２】そして、エージェント１３１が目的の指定
プレース１２３まで行き、処理されるべき各プレースの
状態を該エージェント１３１が入る前の状態とした時点
で、解析のための再開処理を部分的に開始する。Then, when the agent 131 goes to the target designated place 123 and the state of each place to be processed is set to a state before the agent 131 enters, the restart processing for analysis is partially started. .

【００４３】この場合、部分的な再開処理となるため、
翻訳エンジンを一時停止する必要がなく、他の障害に無
関係なエージェントは通常通り処理を続行でき、端末１
から新規エージェントの送信が可能となる。In this case, since the process is a partial restart process,
There is no need to suspend the translation engine, and agents unrelated to other failures can continue processing as usual.
Can send a new agent.

【００４４】[0044]

【発明の効果】以上説明したように、本発明によれば、
各個人の目的を埋め込んだエージェントがセンタ内の様
々なプレース間で処理され、結果を出すような自律的な
エージェント型システムにおける、故障個所の迅速な究
明や障害解析に際して、解析のためのトレース情報取得
の処理を障害に応じ、エージェント及び管理エージェン
トとのインタラクションによりエージェント自体で再現
プレースを特定し、そこに的を絞った再開処理を行うこ
とにより、オンライン中に常時、大量ログを収集するこ
とによるスループットや応答性能の低下を局所化でき、
また、検索型エージェントのような結果を出すまでの処
理に長時間を要し、障害発生時に何度も再現試験するこ
とが困難なケースにおいても、通常の処理の過程でトレ
ース情報取得の再開処理を行うことにより、効率的な障
害解析が可能となる。As described above, according to the present invention,
In an autonomous agent-type system in which agents embedding the purpose of each individual are processed between various places in the center and produce results, trace information for analysis is used for quick investigation of failure points and failure analysis in an autonomous agent type system Acquisition processing is performed in response to a failure, the agent itself identifies the reproduction place by interaction with the management agent, and by performing the resumption processing focused on that, by collecting a large amount of logs constantly online It can localize the decrease in throughput and response performance,
Also, in cases where it takes a long time to process results such as a search-type agent and it is difficult to repeat the test in the event of a failure, trace information acquisition is resumed during normal processing. By doing, efficient failure analysis becomes possible.

[Brief description of the drawings]

【図１】本発明の実施の形態の一例を示すエージェント
型通信システムの構成図FIG. 1 is a configuration diagram of an agent-type communication system showing an example of an embodiment of the present invention;

【図２】エージェントが保持するプログラムの構成図FIG. 2 is a configuration diagram of a program held by an agent.

【図３】センタ内でのエージェントの処理の流れを示す
図FIG. 3 is a diagram showing a flow of processing of an agent in the center.

【図４】エージェント特有の機能を示す図FIG. 4 is a diagram showing functions specific to an agent.

【図５】障害を認識する項目の一例を示す図FIG. 5 is a diagram showing an example of an item for recognizing a failure;

【図６】エージェント自体が異常を起こした場合の解析
の再開処理フローを示す図FIG. 6 is a diagram showing a flow of processing for resuming analysis when an agent itself has failed;

【図７】プレースが異常を起こした場合の解析の再開処
理フローを示す図FIG. 7 is a diagram showing a flow of processing for resuming analysis when a place has an abnormality;

【図８】新規エージェント生成処理を行っていた場合の
プレース障害時の解析の再開処理フローを示す図FIG. 8 is a diagram illustrating a flow of a process of resuming analysis at the time of a place failure when a new agent generation process is being performed;

【図９】複製による分岐処理を行っていた場合のプレー
ス障害時の解析の再開処理フローを示す図FIG. 9 is a diagram showing a flow of a process of resuming analysis at the time of a place failure when branch processing by replication is being performed;

【図１０】既存エージェントとのmeet処理を行っていた
場合のプレース障害時の解析の再開処理フローを示す図FIG. 10 is a diagram showing a flow of a process for resuming analysis at the time of a place failure when a meet process is performed with an existing agent;

【図１１】翻訳エンジン内の全エージェント及び全プレ
ースの状態を戻す場合の再開処理フローを示す図FIG. 11 is a diagram showing a restart processing flow when returning the state of all agents and all places in the translation engine.

【図１２】翻訳エンジン内の障害に関与するエージェン
ト及びプレースの状態のみ戻す場合の再開処理フローを
示す図FIG. 12 is a diagram showing a restart processing flow when only the states of agents and places involved in a failure in the translation engine are returned.

[Explanation of symbols]

１…端末、２…センタ、３…エージェント、４…翻訳エ
ンジン、５−１〜５−６…プレース、６…管理ノード。DESCRIPTION OF SYMBOLS 1 ... Terminal, 2 ... Center, 3 ... Agent, 4 ... Translation engine, 5-1-5-6 ... Place, 6 ... Management node.

Claims

[Claims]

A terminal transfers an agent containing a program and data to a center having a translation engine for interpreting and processing the agent, and links the agent program with a place in the translation engine that contains the program and data. While executing autonomously, the agent capable of programming the management processing content description section holds the trace information including the processing place ID, the counter value indicating the processing order of the place, the time stamp, and the interaction information with the place. In the agent management method that manages the agent by using
There is always a management agent that receives the trace information of the agent and notifies the management node or notifies the terminal of the return to the terminal. Based on the monitoring result of the agent, the management agent stops the agent to be processed, stops a place, executes an illegal process, etc. When an error occurs, an error notification is sent to the agent, and the agent returns to an arbitrary place specified in the trace information held therein, and performs the monitoring of the place specified in the management processing content description section for more detailed monitoring. An agent management method, comprising resetting a trace level and restarting processing from the place.

2. When the agent creates a new agent for a specific process, duplicates the agent, or exchanges information with an existing agent for processing, the management process description held by the agent. 2. The agent management method according to claim 1, wherein the agent and the trace information are taken over by an agent requesting the processing, and the acquisition of the trace information is continued.

3. When an agent returns to an arbitrary place for analysis, by setting a return flag in a management processing content description section of the agent, the agent can be distinguished from normal processing. The agent management method according to claim 1, wherein:

4. When the agent returns to any place for analysis, the management node suspends the translation engine and uses the trace information held by the agent to
The status of all places in the translation engine is returned up to the point indicated by the time stamp based on the difference information of the status change periodically output by the translation engine, and the status of all the agents in the translation engine is also managed by the management agent. 2. The agent management method according to claim 1, wherein the information is returned to the current time from the trace information, and after the states of the agent and the place are matched, a restart process for failure analysis is performed.

5. When the agent returns to an arbitrary place for analysis, the state is returned to the processing time at the designated place by using the trace information and the time stamp held by the agent. Using the time stamp at the time of processing the agent, based on the interaction time with the agent and the ID of the agent in the variation journal information output by the place itself, the state of the agent and the place until the interaction time, Extract information,
2. The agent management method according to claim 1, wherein a restart process for autonomous failure analysis from an arbitrary place of the agent is performed by synchronizing with the state of the agent.